Rewrite and break out Troubleshoot Batch Jobs into two pages

Restructure job logs content: Job Queue, View Job Logs (on interface), Troubleshoot Jobs via the API, and Enable Auto-deletion of Job Logs
2018-09-24 19:28:07 -07:00 · 2018-09-24 19:28:07 -07:00 · b8ea668fbf
parent e26559e270
commit b8ea668fbf
10 changed files with 366 additions and 0 deletions
--- a/ee/dtr/admin/manage-jobs/auto-delete-job-logs.md
+++ b/ee/dtr/admin/manage-jobs/auto-delete-job-logs.md
@ -0,0 +1,48 @@
+---
+title: Enable Auto-Deletion of Job Logs
+description: Enable auto-deletion of old or unnecessary job logs for maintenance.
+keywords: registry, events, log, activity stream
+---
+
+> BETA DISCLAIMER
+>
+> This is beta content. It is not yet complete and should be considered a work in progress. This content is subject to change without notice.
+
+## Overview 
+
+Docker Trusted Registry has a global setting for auto-deletion of job logs which allows them to be removed as part of [garbage collection](../configure/garbage-collection.md). DTR admins can enable auto-deletion of repository events in DTR 2.6 based on specified conditions which are covered below.
+
+## Steps
+
+1. In your browser, navigate to `https://<dtr-url>` and log in with your UCP credentials. 
+
+2. Select **System** on the left navigation pane which will display the **Settings** page by default.
+
+3. Scroll down to **Job Logs** and turn on ***Auto-Deletion***.
+
+![](../../images/auto-delete-job-logs-0.png){: .with-border}
+
+4. Specify the conditions with which a job log auto-deletion will be triggered.
+
+![](../../images/auto-delete-job-logs-1.png){: .img-fluid .with-border}
+
+
+DTR allows you to set your auto-deletion conditions based on the following optional job log attributes:
+
+| Name            | Description                                        | Example           |
+|:----------------|:---------------------------------------------------| :----------------|
+| Age        | Lets you remove job logs which are older than your specified number of  hours, days, weeks or months| `2 months` |
+| Max number of events  | Lets you specify the maximum number of job logs allowed within DTR.  | `100` |
+
+If you check and specify both, job logs will be removed from DTR during garbage collection if either condition is met. You should see a confirmation message right away.
+
+5. Click **Start GC** if you're ready. Read more about [garbage collection](../configure/garbage-collection/#under-the-hood) if you're unsure about this operation.
+
+6. Navigate to **System > Job Logs** to confirm that `onlinegc` has happened. For a detailed breakdown of individual job logs, see [View Job Logs](view-job-logs-on-interface.md).
+
+![](../../images/auto-delete-repo-events-2.png){: .img-fluid .with-border}
+
+## Where to go next
+
+- [View Job Logs](view-job-logs-on-interface.md)
+ 
--- a/ee/dtr/admin/manage-jobs/job-queue.md
+++ b/ee/dtr/admin/manage-jobs/job-queue.md
@ -0,0 +1,80 @@
+---
+title: Job Queue
+description: Learn how Docker Trusted Registry runs batch jobs for troubleshooting job-related issues.
+keywords: dtr, job queue, job management
+---
+
+Docker Trusted Registry (DTR) uses a job queue to schedule batch jobs. Jobs are added to a cluster-wide job queue, and then consumed and executed by a job runner within DTR.
+
+![batch jobs diagram](../../images/troubleshoot-batch-jobs-1.svg)
+
+All DTR replicas have access to the job queue, and have a job runner component
+that can get and execute work.
+
+## How it works
+
+When a job is created, it is added to a cluster-wide job queue and enters the `waiting` state.
+When one of the DTR replicas is ready to claim the job, it waits a random time of up
+to `3` seconds to give every replica the opportunity to claim the task.
+
+A replica claims a job by adding its replica ID to the job. That way, other
+replicas will know the job has been claimed. Once a replica claims a job, it adds
+that job to an internal queue, which in turn sorts the jobs by their `scheduledAt` time.
+Once that happens, the replica updates the job status to `running`, and
+starts executing it.
+
+The job runner component of each DTR replica keeps a `heartbeatExpiration`
+entry on the database that is shared by all replicas. If a replica becomes
+unhealthy, other replicas notice the change and update the status of the failing worker to `dead`.
+Also, all the jobs that were claimed by the unhealthy replica enter the `worker_dead` state,
+so that other replicas can claim the job.
+
+## Job Types
+
+DTR runs periodic and long-running jobs. The following is a complete list of jobs you can filter for via [the user interface](view-job-logs.md) or [the API](../troubleshoot-batch-jobs.md).   
+
+| Job               | Description                                                                                                                                                                                                                                               |
+|:------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| gc                | Garbage collection job that deletes layers associated with deleted images                                                                                                                                                                                 |
+| onlinegc                | Garbage collection job that deletes layers associated with deleted images without putting the registry in read only mode  |
+| onlinegc_metadata                | Garbage collection job that deletes metadata associated with deleted images|
+| onlinegc_joblogs                | Garbage collection job that deletes job logs based on a set job history setting |
+| metadatastoremigration   | metadatastoremigration is a necessary migration that enables the online gc feature |
+| sleep             | Sleep is used to test the correctness of the jobrunner. It sleeps for 60 seconds                                                                                                                                                                          |
+| false             | False is used to test the correctness of the jobrunner. It runs the `false` command and immediately fails                                                                                                                                                 |
+| tagmigration      | Tag migration is used to synchronize tag and manifest information between the DTR database and the storage backend.                                                                                                                                       |
+| bloblinkmigration | bloblinkmigration is a 2.1 to 2.1 upgrade process that adds references for blobs to repositories in the database                                                                                                                                          |
+| license_update    | License update checks for license expiration extensions if online license updates are enabled                                                                                                                                                             |
+| scan_check        | An image security scanning job. This job does not perform the actual scanning, rather it spawns `scan_check_single` jobs (one for each layer in the image). Once all of the `scan_check_single` jobs are complete, this job will terminate                |
+| scan_check_single | A security scanning job for a particular layer given by the `parameter: SHA256SUM`. This job breaks up the layer into components and checks each component for vulnerabilities                                                                            |
+| scan_check_all    | A security scanning job that updates all of the currently scanned images to display the latest vulnerabilities                                                                                                                                            |
+| update_vuln_db    | A job that is created to update DTR's vulnerability database. It uses an Internet connection to check for database updates through `https://dss-cve-updates.docker.com/` and updates the `dtr-scanningstore` container if there is a new update available |
+| scannedlayermigration  | scannedlayermigration is a 2.4 to 2.5 upgrade process that restructures scanned image data |
+| push_mirror_tag  | A job that pushes a tag to another registry after a push mirror policy has been evaluated |
+| poll_mirror  | A global cron that evaluates poll mirroring policies |
+| webhook           | A job that is used to dispatch a webhook payload to a single endpoint                                                                                                                                                                                     |
+| nautilus_update_db           | Is this different from `update_vuln_db`?                                                                                                                                                                                     |
+| ro_registry           | What is this?     |
+| tag_pruning           | Tag pruning is a configurable job which cleans up unnecessary or unwanted repository tags. For configuration options, see [Tag Pruning](../user/tag-pruning).                                                                                                                                                                      |
+
+## Job Status
+
+Jobs can have one of the following status values:
+
+| Status          | Description                                                                                                                               |
+|:----------------|:------------------------------------------------------------------------------------------------------------------------------------------|
+| waiting         | Unclaimed job waiting to be picked up by a worker.                                                                              |
+| running         | The job is currently being run by the specified `workerID`.                                                                             |
+| done            | The job has successfully completed.                                                                                                        |
+| error           | The job has completed with errors.                                                                                                         |
+| cancel_request  | The status of a job is monitored by the worker in the database. If the job status changes to `cancel_request`, the job is canceled by the worker. |
+| cancel          | The job has been canceled and was not fully executed.                                                                                          |
+| deleted         | The job and its logs have been removed.                                                                                                        |
+| worker_dead     | The worker for this job has been declared `dead` and the job will not continue.                                                            |
+| worker_shutdown | The worker that was running this job has been gracefully stopped.                                                                          |
+| worker_resurrection | The worker for this job has reconnected to the database and will cancel this job.                                          |
+
+## Where to go next
+
+- [View Job Logs](view-job-logs-on-interface.md)
+- [Troubleshoot Jobs via the API](troubleshoot-jobs-via-api.md)
--- a/ee/dtr/admin/manage-jobs/troubleshoot-jobs-via-api.md
+++ b/ee/dtr/admin/manage-jobs/troubleshoot-jobs-via-api.md
@ -0,0 +1,174 @@
+---
+title: Troubleshoot Jobs via the API
+description: Learn how Docker Trusted Registry runs batch jobs for job-related troubleshooting.
+keywords: dtr, troubleshoot
+redirect_from: /ee/dtr/admin/monitor-and-troubleshoot/troubleshoot-batch-jobs/
+---
+
+## Overview
+
+This covers troubleshooting batch jobs via the API and was introduced in DTR 2.2. Starting in DTR 2.6, admins have the ability to [manage job logs](view-job-logs-on-interface.md) using the web interface. This requires familiarity with the [DTR Job Queue](job-queue.md). 
+
+### Job capacity
+
+Each job runner has a limited capacity and won't claim jobs that require an
+higher capacity. You can see the capacity of a job runner using the
+`GET /api/v0/workers` endpoint:
+
+```json
+{
+  "workers": [
+    {
+      "id": "000000000000",
+      "status": "running",
+      "capacityMap": {
+        "scan": 1,
+        "scanCheck": 1
+      },
+      "heartbeatExpiration": "2017-02-18T00:51:02Z"
+    }
+  ]
+}
+```
+
+This means that the worker with replica ID `000000000000` has a capacity of 1
+`scan` and 1 `scanCheck`. If this worker notices that the following jobs
+are available:
+
+```json
+{
+  "jobs": [
+    {
+      "id": "0",
+      "workerID": "",
+      "status": "waiting",
+      "capacityMap": {
+        "scan": 1
+      }
+    },
+    {
+       "id": "1",
+       "workerID": "",
+       "status": "waiting",
+       "capacityMap": {
+         "scan": 1
+       }
+    },
+    {
+     "id": "2",
+      "workerID": "",
+      "status": "waiting",
+      "capacityMap": {
+        "scanCheck": 1
+      }
+    }
+  ]
+}
+```
+
+Our worker will be able to pick up job id `0` and `2` since it has the capacity
+for both, while id `1` will have to wait until the previous scan job is complete:
+
+```json
+{
+  "jobs": [
+    {
+      "id": "0",
+      "workerID": "000000000000",
+      "status": "running",
+      "capacityMap": {
+        "scan": 1
+      }
+    },
+    {
+       "id": "1",
+       "workerID": "",
+       "status": "waiting",
+       "capacityMap": {
+         "scan": 1
+       }
+    },
+    {
+     "id": "2",
+      "workerID": "000000000000",
+      "status": "running",
+      "capacityMap": {
+        "scanCheck": 1
+      }
+    }
+  ]
+}
+```
+You can get the list of jobs, using the `GET /api/v0/jobs/` endpoint. Each job
+looks like this:
+
+```json
+{
+	"id": "1fcf4c0f-ff3b-471a-8839-5dcb631b2f7b",
+	"retryFromID": "1fcf4c0f-ff3b-471a-8839-5dcb631b2f7b",
+	"workerID": "000000000000",
+	"status": "done",
+	"scheduledAt": "2017-02-17T01:09:47.771Z",
+	"lastUpdated": "2017-02-17T01:10:14.117Z",
+	"action": "scan_check_single",
+	"retriesLeft": 0,
+	"retriesTotal": 0,
+	"capacityMap": {
+      	  "scan": 1
+	},
+	"parameters": {
+      	  "SHA256SUM": "1bacd3c8ccb1f15609a10bd4a403831d0ec0b354438ddbf644c95c5d54f8eb13"
+	},
+	"deadline": "",
+	"stopTimeout": ""
+}
+```
+The fields of interest here are:
+
+* `id`: The ID of the job
+* `workerID`: The ID of the worker in a DTR replica that is running this job
+* `status`: The current state of the job
+* `action`: The type of job the worker will actually perform
+* `capacityMap`: The available capacity a worker needs for this job to run
+
+
+### Cron jobs
+
+Several of the jobs performed by DTR are run in a recurrent schedule. You can
+see those jobs using the `GET /api/v0/crons` endpoint:
+
+
+```json
+{
+  "crons": [
+    {
+      "id": "48875b1b-5006-48f5-9f3c-af9fbdd82255",
+      "action": "license_update",
+      "schedule": "57 54 3 * * *",
+      "retries": 2,
+      "capacityMap": null,
+      "parameters": null,
+      "deadline": "",
+      "stopTimeout": "",
+      "nextRun": "2017-02-22T03:54:57Z"
+    },
+    {
+      "id": "b1c1e61e-1e74-4677-8e4a-2a7dacefffdc",
+      "action": "update_db",
+      "schedule": "0 0 3 * * *",
+      "retries": 0,
+      "capacityMap": null,
+      "parameters": null,
+      "deadline": "",
+      "stopTimeout": "",
+      "nextRun": "2017-02-22T03:00:00Z"
+    }
+  ]
+}
+```
+
+The `schedule` uses a Unix crontab syntax.
+
+## Where to go next
+
+- [Enable auto-deletion of job logs](./auto-delete-job-logs.md)
--- a/ee/dtr/admin/manage-jobs/view-job-logs-on-interface.md
+++ b/ee/dtr/admin/manage-jobs/view-job-logs-on-interface.md
@ -0,0 +1,64 @@
+---
+title: View Job Logs
+description: View a list of jobs happening within DTR and review the detailed logs for each job.
+keywords: registry, jobs, log, system management, job queue
+---
+
+> BETA DISCLAIMER
+>
+> This is beta content. It is not yet complete and should be considered a work in progress. This content is subject to change without notice.
+
+As of DTR 2.2, admins were able to [view and troubleshoot jobs within DTR](./troubleshoot-jobs-via-api) using the API. DTR 2.6 enhances those capabilities by adding a **Job Logs** tab under **System** settings on the user interface. The tab displays a sortable and paginated list of jobs along with links to associated job logs. 
+
+## View Jobs List
+
+To view the list of jobs within DTR, do the following:
+
+1. Navigate to `https://<dtr-url>`and log in with your UCP credentials. 
+
+2. Select **System** from the left navigation pane, and then click **Job Logs**. You should see a paginated list of past, running, and queued jobs. By default, **Job Logs** shows the latest `10` jobs on the first page.
+
+![](../../images/view-job-logs-1.png){: .img-fluid .with-border}
+	
+3. Specify a filtering option. **Job Logs** lets you filter by:
+
+	* Action: See [Troubleshoot Jobs via the API: Job Types](./troubleshoot-jobs-via-api/#job-types) for an explanation on the different actions or job types.
+
+	* Worker ID: The ID of the worker in a DTR replica that is responsible for running the job.
+
+![](../../images/view-job-logs-2.png){: .img-fluid .with-border}
+
+### Job Details 
+ 
+The following is an explanation of the job-related fields displayed in **Job Logs** and uses the filtered `online_gc` action from above.
+
+| Job Detail          | Description                                        | Example |
+|:----------------|:-------------------------------------------------|:--------|
+| Action        |  The type of action or job being performed. See [Job Types](./troubleshoot-jobs-via-api/#job-types) for a full list of job types. | `onlinegc`
+| ID  | The ID of the job. | `ccc05646-569a-4ac4-b8e1-113111f63fb9` |
+| Worker        | The ID of the worker node responsible for running the job. | `8f553c8b697c`| 
+| Status | Current status of the action or job. See [Troubleshoot Jobs via the API: Job Status](./troubleshoot-jobs-via-api/#job-status) for more details.  | `done` |
+| Start Time | Time when the job started. | `9/23/2018 7:04 PM` |
+| Last Updated | Time when the job was last updated. | `9/23/2018 7:04 PM` |
+| View Logs | Links to the full logs for the job.  | `[View Logs]` |  
+
+4. Optional: Click ***Edit Settings*** on the right of the filtering options to update your **Job Logs** settings. See [Enable auto-deletion of job logs](./auto-delete-job-logs.md) for more details.
+
+## View Job-specific Logs
+
+To view the log details for a specific job, do the following:
+
+1. Click ***View Logs*** next to the job's **Last Updated** value. You will be redirected to the log detail page of your selected job.
+
+![](../../images/view-job-logs-3.png){: .img-fluid .with-border}
+
+Notice how the job `ID` is reflected in the URL while the `Action` and the abbreviated form of the job `ID` are reflected in the heading. Also, the JSON lines displayed are job-specific [DTR container logs](https://success.docker.com/article/how-to-check-the-docker-trusted-registry-dtr-logs). See [DTR Internal Components](../architecture/#dtr-internal-components) for more details.
+
+2. Enter or select a different line count to truncate the number of lines displayed. Lines are cut off from the end of the logs.
+
+![](../../images/view-job-logs-4.png){: .img-fluid .with-border}
+
+
+## Where to go next
+
+- [Enable auto-deletion of job logs](./auto-delete-job-logs.md)
--- a/ee/dtr/images/auto-delete-job-logs-0.png
+++ b/ee/dtr/images/auto-delete-job-logs-0.png
--- a/ee/dtr/images/auto-delete-job-logs-1.png
+++ b/ee/dtr/images/auto-delete-job-logs-1.png
--- a/ee/dtr/images/view-job-logs-1.png
+++ b/ee/dtr/images/view-job-logs-1.png
--- a/ee/dtr/images/view-job-logs-2.png
+++ b/ee/dtr/images/view-job-logs-2.png
--- a/ee/dtr/images/view-job-logs-3.png
+++ b/ee/dtr/images/view-job-logs-3.png
--- a/ee/dtr/images/view-job-logs-4.png
+++ b/ee/dtr/images/view-job-logs-4.png