diff --git a/ee/dtr/admin/manage-jobs/auto-delete-job-logs.md b/ee/dtr/admin/manage-jobs/auto-delete-job-logs.md new file mode 100644 index 0000000000..3fd7f231da --- /dev/null +++ b/ee/dtr/admin/manage-jobs/auto-delete-job-logs.md @@ -0,0 +1,48 @@ +--- +title: Enable Auto-Deletion of Job Logs +description: Enable auto-deletion of old or unnecessary job logs for maintenance. +keywords: registry, events, log, activity stream +--- + +> BETA DISCLAIMER +> +> This is beta content. It is not yet complete and should be considered a work in progress. This content is subject to change without notice. + +## Overview + +Docker Trusted Registry has a global setting for auto-deletion of job logs which allows them to be removed as part of [garbage collection](../configure/garbage-collection.md). DTR admins can enable auto-deletion of repository events in DTR 2.6 based on specified conditions which are covered below. + +## Steps + +1. In your browser, navigate to `https://` and log in with your UCP credentials. + +2. Select **System** on the left navigation pane which will display the **Settings** page by default. + +3. Scroll down to **Job Logs** and turn on ***Auto-Deletion***. + +![](../../images/auto-delete-job-logs-0.png){: .with-border} + +4. Specify the conditions with which a job log auto-deletion will be triggered. + +![](../../images/auto-delete-job-logs-1.png){: .img-fluid .with-border} + + +DTR allows you to set your auto-deletion conditions based on the following optional job log attributes: + +| Name | Description | Example | +|:----------------|:---------------------------------------------------| :----------------| +| Age | Lets you remove job logs which are older than your specified number of hours, days, weeks or months| `2 months` | +| Max number of events | Lets you specify the maximum number of job logs allowed within DTR. | `100` | + +If you check and specify both, job logs will be removed from DTR during garbage collection if either condition is met. You should see a confirmation message right away. + +5. Click **Start GC** if you're ready. Read more about [garbage collection](../configure/garbage-collection/#under-the-hood) if you're unsure about this operation. + +6. Navigate to **System > Job Logs** to confirm that `onlinegc` has happened. For a detailed breakdown of individual job logs, see [View Job Logs](view-job-logs-on-interface.md). + +![](../../images/auto-delete-repo-events-2.png){: .img-fluid .with-border} + +## Where to go next + +- [View Job Logs](view-job-logs-on-interface.md) + diff --git a/ee/dtr/admin/manage-jobs/job-queue.md b/ee/dtr/admin/manage-jobs/job-queue.md new file mode 100644 index 0000000000..c784a5627b --- /dev/null +++ b/ee/dtr/admin/manage-jobs/job-queue.md @@ -0,0 +1,80 @@ +--- +title: Job Queue +description: Learn how Docker Trusted Registry runs batch jobs for troubleshooting job-related issues. +keywords: dtr, job queue, job management +--- + +Docker Trusted Registry (DTR) uses a job queue to schedule batch jobs. Jobs are added to a cluster-wide job queue, and then consumed and executed by a job runner within DTR. + +![batch jobs diagram](../../images/troubleshoot-batch-jobs-1.svg) + +All DTR replicas have access to the job queue, and have a job runner component +that can get and execute work. + +## How it works + +When a job is created, it is added to a cluster-wide job queue and enters the `waiting` state. +When one of the DTR replicas is ready to claim the job, it waits a random time of up +to `3` seconds to give every replica the opportunity to claim the task. + +A replica claims a job by adding its replica ID to the job. That way, other +replicas will know the job has been claimed. Once a replica claims a job, it adds +that job to an internal queue, which in turn sorts the jobs by their `scheduledAt` time. +Once that happens, the replica updates the job status to `running`, and +starts executing it. + +The job runner component of each DTR replica keeps a `heartbeatExpiration` +entry on the database that is shared by all replicas. If a replica becomes +unhealthy, other replicas notice the change and update the status of the failing worker to `dead`. +Also, all the jobs that were claimed by the unhealthy replica enter the `worker_dead` state, +so that other replicas can claim the job. + +## Job Types + +DTR runs periodic and long-running jobs. The following is a complete list of jobs you can filter for via [the user interface](view-job-logs.md) or [the API](../troubleshoot-batch-jobs.md). + +| Job | Description | +|:------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| gc | Garbage collection job that deletes layers associated with deleted images | +| onlinegc | Garbage collection job that deletes layers associated with deleted images without putting the registry in read only mode | +| onlinegc_metadata | Garbage collection job that deletes metadata associated with deleted images| +| onlinegc_joblogs | Garbage collection job that deletes job logs based on a set job history setting | +| metadatastoremigration | metadatastoremigration is a necessary migration that enables the online gc feature | +| sleep | Sleep is used to test the correctness of the jobrunner. It sleeps for 60 seconds | +| false | False is used to test the correctness of the jobrunner. It runs the `false` command and immediately fails | +| tagmigration | Tag migration is used to synchronize tag and manifest information between the DTR database and the storage backend. | +| bloblinkmigration | bloblinkmigration is a 2.1 to 2.1 upgrade process that adds references for blobs to repositories in the database | +| license_update | License update checks for license expiration extensions if online license updates are enabled | +| scan_check | An image security scanning job. This job does not perform the actual scanning, rather it spawns `scan_check_single` jobs (one for each layer in the image). Once all of the `scan_check_single` jobs are complete, this job will terminate | +| scan_check_single | A security scanning job for a particular layer given by the `parameter: SHA256SUM`. This job breaks up the layer into components and checks each component for vulnerabilities | +| scan_check_all | A security scanning job that updates all of the currently scanned images to display the latest vulnerabilities | +| update_vuln_db | A job that is created to update DTR's vulnerability database. It uses an Internet connection to check for database updates through `https://dss-cve-updates.docker.com/` and updates the `dtr-scanningstore` container if there is a new update available | +| scannedlayermigration | scannedlayermigration is a 2.4 to 2.5 upgrade process that restructures scanned image data | +| push_mirror_tag | A job that pushes a tag to another registry after a push mirror policy has been evaluated | +| poll_mirror | A global cron that evaluates poll mirroring policies | +| webhook | A job that is used to dispatch a webhook payload to a single endpoint | +| nautilus_update_db | Is this different from `update_vuln_db`? | +| ro_registry | What is this? | +| tag_pruning | Tag pruning is a configurable job which cleans up unnecessary or unwanted repository tags. For configuration options, see [Tag Pruning](../user/tag-pruning). | + +## Job Status + +Jobs can have one of the following status values: + +| Status | Description | +|:----------------|:------------------------------------------------------------------------------------------------------------------------------------------| +| waiting | Unclaimed job waiting to be picked up by a worker. | +| running | The job is currently being run by the specified `workerID`. | +| done | The job has successfully completed. | +| error | The job has completed with errors. | +| cancel_request | The status of a job is monitored by the worker in the database. If the job status changes to `cancel_request`, the job is canceled by the worker. | +| cancel | The job has been canceled and was not fully executed. | +| deleted | The job and its logs have been removed. | +| worker_dead | The worker for this job has been declared `dead` and the job will not continue. | +| worker_shutdown | The worker that was running this job has been gracefully stopped. | +| worker_resurrection | The worker for this job has reconnected to the database and will cancel this job. | + +## Where to go next + +- [View Job Logs](view-job-logs-on-interface.md) +- [Troubleshoot Jobs via the API](troubleshoot-jobs-via-api.md) diff --git a/ee/dtr/admin/manage-jobs/troubleshoot-jobs-via-api.md b/ee/dtr/admin/manage-jobs/troubleshoot-jobs-via-api.md new file mode 100644 index 0000000000..869f0e60e6 --- /dev/null +++ b/ee/dtr/admin/manage-jobs/troubleshoot-jobs-via-api.md @@ -0,0 +1,174 @@ +--- +title: Troubleshoot Jobs via the API +description: Learn how Docker Trusted Registry runs batch jobs for job-related troubleshooting. +keywords: dtr, troubleshoot +redirect_from: /ee/dtr/admin/monitor-and-troubleshoot/troubleshoot-batch-jobs/ +--- + +## Overview + +This covers troubleshooting batch jobs via the API and was introduced in DTR 2.2. Starting in DTR 2.6, admins have the ability to [manage job logs](view-job-logs-on-interface.md) using the web interface. This requires familiarity with the [DTR Job Queue](job-queue.md). + +### Job capacity + +Each job runner has a limited capacity and won't claim jobs that require an +higher capacity. You can see the capacity of a job runner using the +`GET /api/v0/workers` endpoint: + +```json +{ + "workers": [ + { + "id": "000000000000", + "status": "running", + "capacityMap": { + "scan": 1, + "scanCheck": 1 + }, + "heartbeatExpiration": "2017-02-18T00:51:02Z" + } + ] +} +``` + +This means that the worker with replica ID `000000000000` has a capacity of 1 +`scan` and 1 `scanCheck`. If this worker notices that the following jobs +are available: + +```json +{ + "jobs": [ + { + "id": "0", + "workerID": "", + "status": "waiting", + "capacityMap": { + "scan": 1 + } + }, + { + "id": "1", + "workerID": "", + "status": "waiting", + "capacityMap": { + "scan": 1 + } + }, + { + "id": "2", + "workerID": "", + "status": "waiting", + "capacityMap": { + "scanCheck": 1 + } + } + ] +} +``` + +Our worker will be able to pick up job id `0` and `2` since it has the capacity +for both, while id `1` will have to wait until the previous scan job is complete: + +```json +{ + "jobs": [ + { + "id": "0", + "workerID": "000000000000", + "status": "running", + "capacityMap": { + "scan": 1 + } + }, + { + "id": "1", + "workerID": "", + "status": "waiting", + "capacityMap": { + "scan": 1 + } + }, + { + "id": "2", + "workerID": "000000000000", + "status": "running", + "capacityMap": { + "scanCheck": 1 + } + } + ] +} +``` +You can get the list of jobs, using the `GET /api/v0/jobs/` endpoint. Each job +looks like this: + +```json +{ + "id": "1fcf4c0f-ff3b-471a-8839-5dcb631b2f7b", + "retryFromID": "1fcf4c0f-ff3b-471a-8839-5dcb631b2f7b", + "workerID": "000000000000", + "status": "done", + "scheduledAt": "2017-02-17T01:09:47.771Z", + "lastUpdated": "2017-02-17T01:10:14.117Z", + "action": "scan_check_single", + "retriesLeft": 0, + "retriesTotal": 0, + "capacityMap": { + "scan": 1 + }, + "parameters": { + "SHA256SUM": "1bacd3c8ccb1f15609a10bd4a403831d0ec0b354438ddbf644c95c5d54f8eb13" + }, + "deadline": "", + "stopTimeout": "" +} +``` +The fields of interest here are: + +* `id`: The ID of the job +* `workerID`: The ID of the worker in a DTR replica that is running this job +* `status`: The current state of the job +* `action`: The type of job the worker will actually perform +* `capacityMap`: The available capacity a worker needs for this job to run + + +### Cron jobs + +Several of the jobs performed by DTR are run in a recurrent schedule. You can +see those jobs using the `GET /api/v0/crons` endpoint: + + +```json +{ + "crons": [ + { + "id": "48875b1b-5006-48f5-9f3c-af9fbdd82255", + "action": "license_update", + "schedule": "57 54 3 * * *", + "retries": 2, + "capacityMap": null, + "parameters": null, + "deadline": "", + "stopTimeout": "", + "nextRun": "2017-02-22T03:54:57Z" + }, + { + "id": "b1c1e61e-1e74-4677-8e4a-2a7dacefffdc", + "action": "update_db", + "schedule": "0 0 3 * * *", + "retries": 0, + "capacityMap": null, + "parameters": null, + "deadline": "", + "stopTimeout": "", + "nextRun": "2017-02-22T03:00:00Z" + } + ] +} +``` + +The `schedule` uses a Unix crontab syntax. + +## Where to go next + +- [Enable auto-deletion of job logs](./auto-delete-job-logs.md) diff --git a/ee/dtr/admin/manage-jobs/view-job-logs-on-interface.md b/ee/dtr/admin/manage-jobs/view-job-logs-on-interface.md new file mode 100644 index 0000000000..306284070f --- /dev/null +++ b/ee/dtr/admin/manage-jobs/view-job-logs-on-interface.md @@ -0,0 +1,64 @@ +--- +title: View Job Logs +description: View a list of jobs happening within DTR and review the detailed logs for each job. +keywords: registry, jobs, log, system management, job queue +--- + +> BETA DISCLAIMER +> +> This is beta content. It is not yet complete and should be considered a work in progress. This content is subject to change without notice. + +As of DTR 2.2, admins were able to [view and troubleshoot jobs within DTR](./troubleshoot-jobs-via-api) using the API. DTR 2.6 enhances those capabilities by adding a **Job Logs** tab under **System** settings on the user interface. The tab displays a sortable and paginated list of jobs along with links to associated job logs. + +## View Jobs List + +To view the list of jobs within DTR, do the following: + +1. Navigate to `https://`and log in with your UCP credentials. + +2. Select **System** from the left navigation pane, and then click **Job Logs**. You should see a paginated list of past, running, and queued jobs. By default, **Job Logs** shows the latest `10` jobs on the first page. + +![](../../images/view-job-logs-1.png){: .img-fluid .with-border} + +3. Specify a filtering option. **Job Logs** lets you filter by: + + * Action: See [Troubleshoot Jobs via the API: Job Types](./troubleshoot-jobs-via-api/#job-types) for an explanation on the different actions or job types. + + * Worker ID: The ID of the worker in a DTR replica that is responsible for running the job. + +![](../../images/view-job-logs-2.png){: .img-fluid .with-border} + +### Job Details + +The following is an explanation of the job-related fields displayed in **Job Logs** and uses the filtered `online_gc` action from above. + +| Job Detail | Description | Example | +|:----------------|:-------------------------------------------------|:--------| +| Action | The type of action or job being performed. See [Job Types](./troubleshoot-jobs-via-api/#job-types) for a full list of job types. | `onlinegc` +| ID | The ID of the job. | `ccc05646-569a-4ac4-b8e1-113111f63fb9` | +| Worker | The ID of the worker node responsible for running the job. | `8f553c8b697c`| +| Status | Current status of the action or job. See [Troubleshoot Jobs via the API: Job Status](./troubleshoot-jobs-via-api/#job-status) for more details. | `done` | +| Start Time | Time when the job started. | `9/23/2018 7:04 PM` | +| Last Updated | Time when the job was last updated. | `9/23/2018 7:04 PM` | +| View Logs | Links to the full logs for the job. | `[View Logs]` | + +4. Optional: Click ***Edit Settings*** on the right of the filtering options to update your **Job Logs** settings. See [Enable auto-deletion of job logs](./auto-delete-job-logs.md) for more details. + +## View Job-specific Logs + +To view the log details for a specific job, do the following: + +1. Click ***View Logs*** next to the job's **Last Updated** value. You will be redirected to the log detail page of your selected job. + +![](../../images/view-job-logs-3.png){: .img-fluid .with-border} + +Notice how the job `ID` is reflected in the URL while the `Action` and the abbreviated form of the job `ID` are reflected in the heading. Also, the JSON lines displayed are job-specific [DTR container logs](https://success.docker.com/article/how-to-check-the-docker-trusted-registry-dtr-logs). See [DTR Internal Components](../architecture/#dtr-internal-components) for more details. + +2. Enter or select a different line count to truncate the number of lines displayed. Lines are cut off from the end of the logs. + +![](../../images/view-job-logs-4.png){: .img-fluid .with-border} + + +## Where to go next + +- [Enable auto-deletion of job logs](./auto-delete-job-logs.md) diff --git a/ee/dtr/images/auto-delete-job-logs-0.png b/ee/dtr/images/auto-delete-job-logs-0.png new file mode 100644 index 0000000000..4cdce3bbb2 Binary files /dev/null and b/ee/dtr/images/auto-delete-job-logs-0.png differ diff --git a/ee/dtr/images/auto-delete-job-logs-1.png b/ee/dtr/images/auto-delete-job-logs-1.png new file mode 100644 index 0000000000..d34af6ee7f Binary files /dev/null and b/ee/dtr/images/auto-delete-job-logs-1.png differ diff --git a/ee/dtr/images/view-job-logs-1.png b/ee/dtr/images/view-job-logs-1.png new file mode 100644 index 0000000000..ade7fb1fd5 Binary files /dev/null and b/ee/dtr/images/view-job-logs-1.png differ diff --git a/ee/dtr/images/view-job-logs-2.png b/ee/dtr/images/view-job-logs-2.png new file mode 100644 index 0000000000..b3f34e83cf Binary files /dev/null and b/ee/dtr/images/view-job-logs-2.png differ diff --git a/ee/dtr/images/view-job-logs-3.png b/ee/dtr/images/view-job-logs-3.png new file mode 100644 index 0000000000..d09319132f Binary files /dev/null and b/ee/dtr/images/view-job-logs-3.png differ diff --git a/ee/dtr/images/view-job-logs-4.png b/ee/dtr/images/view-job-logs-4.png new file mode 100644 index 0000000000..e10f088e7d Binary files /dev/null and b/ee/dtr/images/view-job-logs-4.png differ