Merge pull request #751 from docker/job-logs-681

Job logs 681
2018-10-15 14:08:54 -07:00 · 2018-10-15 14:08:54 -07:00 · 0d9b5fb9a3
parent c7eb98c696 6a2eea7dc4
commit 0d9b5fb9a3
13 changed files with 397 additions and 2 deletions
--- a/_data/toc.yaml
+++ b/_data/toc.yaml
@ -2345,6 +2345,16 @@ manuals:
          title: Create and manage organizations
        - path: /ee/dtr/admin/manage-users/permission-levels/
          title: Permission levels
+      - sectiontitle: Manage jobs
+        section:
+        - path: /ee/dtr/admin/manage-jobs/job-queue/
+          title: Job Queue          
+        - path: /ee/dtr/admin/manage-jobs/audit-jobs-via-ui/
+          title: Audit Jobs with the Web Interface  
+        - path: /ee/dtr/admin/manage-jobs/audit-jobs-via-api/
+          title: Audit Jobs with the API       
+        - path: /ee/dtr/admin/manage-jobs/auto-delete-job-logs/
+          title: Enable Auto-Deletion of Job Logs
      - sectiontitle: Monitor and troubleshoot
        section:
        - path: /ee/dtr/admin/monitor-and-troubleshoot/
@ -2353,8 +2363,6 @@ manuals:
          title: Check Notary audit logs
        - path: /ee/dtr/admin/monitor-and-troubleshoot/troubleshoot-with-logs/
          title: Troubleshoot with logs
-        - path: /ee/dtr/admin/monitor-and-troubleshoot/troubleshoot-batch-jobs/
-          title: Troubleshoot batch jobs
      - sectiontitle: Disaster recovery
        section:
        - title: Overview
--- a/ee/dtr/admin/manage-jobs/audit-jobs-via-api.md
+++ b/ee/dtr/admin/manage-jobs/audit-jobs-via-api.md
@ -0,0 +1,182 @@
+---
+title: Audit Jobs via the API
+description: Learn how Docker Trusted Registry runs batch jobs for job-related troubleshooting.
+keywords: dtr, troubleshoot, audit, job logs, jobs, api
+redirect_from: /ee/dtr/admin/monitor-and-troubleshoot/troubleshoot-batch-jobs/
+---
+
+
+> BETA DISCLAIMER
+>
+> This is beta content. It is not yet complete and should be considered a work in progress. This content is subject to change without notice.
+
+## Overview
+
+This covers troubleshooting batch jobs via the API and was introduced in DTR 2.2. Starting in DTR 2.6, admins have the ability to [audit jobs](audit-jobs-via-ui.md) using the web interface. 
+
+## Prerequisite
+   * [Job Queue](job-queue.md)
+
+### Job capacity
+
+Each job runner has a limited capacity and will not claim jobs that require a
+higher capacity. You can see the capacity of a job runner via the
+`GET /api/v0/workers` endpoint:
+
+```json
+{
+  "workers": [
+    {
+      "id": "000000000000",
+      "status": "running",
+      "capacityMap": {
+        "scan": 1,
+        "scanCheck": 1
+      },
+      "heartbeatExpiration": "2017-02-18T00:51:02Z"
+    }
+  ]
+}
+```
+
+This means that the worker with replica ID `000000000000` has a capacity of 1
+`scan` and 1 `scanCheck`. Next, review the list of available jobs:
+
+```json
+{
+  "jobs": [
+    {
+      "id": "0",
+      "workerID": "",
+      "status": "waiting",
+      "capacityMap": {
+        "scan": 1
+      }
+    },
+    {
+       "id": "1",
+       "workerID": "",
+       "status": "waiting",
+       "capacityMap": {
+         "scan": 1
+       }
+    },
+    {
+     "id": "2",
+      "workerID": "",
+      "status": "waiting",
+      "capacityMap": {
+        "scanCheck": 1
+      }
+    }
+  ]
+}
+```
+
+If worker `000000000000` notices the jobs
+in `waiting` state above, then it will be able to pick up jobs `0` and `2` since it has the capacity
+for both. Job `1` will have to wait until the previous scan job, `0`, is completed. The job queue will then look like: 
+
+```json
+{
+  "jobs": [
+    {
+      "id": "0",
+      "workerID": "000000000000",
+      "status": "running",
+      "capacityMap": {
+        "scan": 1
+      }
+    },
+    {
+       "id": "1",
+       "workerID": "",
+       "status": "waiting",
+       "capacityMap": {
+         "scan": 1
+       }
+    },
+    {
+     "id": "2",
+      "workerID": "000000000000",
+      "status": "running",
+      "capacityMap": {
+        "scanCheck": 1
+      }
+    }
+  ]
+}
+```
+You can get a list of jobs via the `GET /api/v0/jobs/` endpoint. Each job
+looks like:
+
+```json
+{
+	"id": "1fcf4c0f-ff3b-471a-8839-5dcb631b2f7b",
+	"retryFromID": "1fcf4c0f-ff3b-471a-8839-5dcb631b2f7b",
+	"workerID": "000000000000",
+	"status": "done",
+	"scheduledAt": "2017-02-17T01:09:47.771Z",
+	"lastUpdated": "2017-02-17T01:10:14.117Z",
+	"action": "scan_check_single",
+	"retriesLeft": 0,
+	"retriesTotal": 0,
+	"capacityMap": {
+      	  "scan": 1
+	},
+	"parameters": {
+      	  "SHA256SUM": "1bacd3c8ccb1f15609a10bd4a403831d0ec0b354438ddbf644c95c5d54f8eb13"
+	},
+	"deadline": "",
+	"stopTimeout": ""
+}
+```
+The JSON fields of interest here are:
+
+* `id`: The ID of the job
+* `workerID`: The ID of the worker in a DTR replica that is running this job
+* `status`: The current state of the job
+* `action`: The type of job the worker will actually perform
+* `capacityMap`: The available capacity a worker needs for this job to run
+
+
+### Cron jobs
+
+Several of the jobs performed by DTR are run in a recurrent schedule. You can
+see those jobs using the `GET /api/v0/crons` endpoint:
+
+
+```json
+{
+  "crons": [
+    {
+      "id": "48875b1b-5006-48f5-9f3c-af9fbdd82255",
+      "action": "license_update",
+      "schedule": "57 54 3 * * *",
+      "retries": 2,
+      "capacityMap": null,
+      "parameters": null,
+      "deadline": "",
+      "stopTimeout": "",
+      "nextRun": "2017-02-22T03:54:57Z"
+    },
+    {
+      "id": "b1c1e61e-1e74-4677-8e4a-2a7dacefffdc",
+      "action": "update_db",
+      "schedule": "0 0 3 * * *",
+      "retries": 0,
+      "capacityMap": null,
+      "parameters": null,
+      "deadline": "",
+      "stopTimeout": "",
+      "nextRun": "2017-02-22T03:00:00Z"
+    }
+  ]
+}
+```
+
+The `schedule` field uses a cron expression following the `(seconds) (minutes) (hours) (day of month) (month) (day of week)` format. For example, `57 54 3 * * *` with cron ID `48875b1b-5006-48f5-9f3c-af9fbdd82255` will be run at `03:54:57` on any day of the week or the month, which is `2017-02-22T03:54:57Z` in the example JSON response above.
+
+## Where to go next
+
+- [Enable auto-deletion of job logs](./auto-delete-job-logs.md)
--- a/ee/dtr/admin/manage-jobs/audit-jobs-via-ui.md
+++ b/ee/dtr/admin/manage-jobs/audit-jobs-via-ui.md
@ -0,0 +1,71 @@
+---
+title: Audit Jobs via the Web Interface
+description: View a list of jobs happening within DTR and review the detailed logs for each job.
+keywords: dtr, troubleshoot, audit, job logs, jobs, ui
+---
+
+
+> BETA DISCLAIMER
+>
+> This is beta content. It is not yet complete and should be considered a work in progress. This content is subject to change without notice.
+
+As of DTR 2.2, admins were able to [view and audit jobs within DTR](audit-jobs-via-api) using the API. DTR 2.6 enhances those capabilities by adding a **Job Logs** tab under **System** settings on the user interface. The tab displays a sortable and paginated list of jobs along with links to associated job logs. 
+
+## Prerequisite
+   * [Job Queue](job-queue.md)
+
+## View Jobs List
+
+To view the list of jobs within DTR, do the following:
+
+1. Navigate to `https://<dtr-url>`and log in with your UCP credentials. 
+
+2. Select **System** from the left navigation pane, and then click **Job Logs**. You should see a paginated list of past, running, and queued jobs. By default, **Job Logs** shows the latest `10` jobs on the first page.
+
+    ![](../../images/view-job-logs-1.png){: .img-fluid .with-border}
+
+ 	
+3. Specify a filtering option. **Job Logs** lets you filter by:
+
+	* Action: See [Audit Jobs via the API: Job Types](job-queue/#job-types) for an explanation on the different actions or job types.
+
+	* Worker ID: The ID of the worker in a DTR replica that is responsible for running the job.
+
+    ![](../../images/view-job-logs-2.png){: .img-fluid .with-border}
+
+
+4. Optional: Click **Edit Settings** on the right of the filtering options to update your **Job Logs** settings. See [Enable auto-deletion of job logs](auto-delete-job-logs) for more details.
+
+### Job Details 
+ 
+The following is an explanation of the job-related fields displayed in **Job Logs** and uses the filtered `online_gc` action from above.
+
+| Job Detail          | Description                                        | Example |
+|:----------------|:-------------------------------------------------|:--------|
+| Action        |  The type of action or job being performed. See [Job Types](./job-queue/#job-types) for a full list of job types. | `onlinegc`
+| ID  | The ID of the job. | `ccc05646-569a-4ac4-b8e1-113111f63fb9` |
+| Worker        | The ID of the worker node responsible for running the job. | `8f553c8b697c`| 
+| Status | Current status of the action or job. See [Job Status](./job-queue/#job-status) for more details.  | `done` |
+| Start Time | Time when the job started. | `9/23/2018 7:04 PM` |
+| Last Updated | Time when the job was last updated. | `9/23/2018 7:04 PM` |
+| View Logs | Links to the full logs for the job.  | `[View Logs]` |  
+
+## View Job-specific Logs
+
+To view the log details for a specific job, do the following:
+
+1. Click **View Logs** next to the job's **Last Updated** value. You will be redirected to the log detail page of your selected job.
+
+    ![](../../images/view-job-logs-3.png){: .img-fluid .with-border}
+
+    
+    Notice how the job `ID` is reflected in the URL while the `Action` and the abbreviated form of the job `ID` are reflected in the heading. Also, the JSON lines displayed are job-specific [DTR container logs](https://success.docker.com/article/how-to-check-the-docker-trusted-registry-dtr-logs). See [DTR Internal Components](../../architecture/#dtr-internal-components) for more details.
+
+2. Enter or select a different line count to truncate the number of lines displayed. Lines are cut off from the end of the logs.
+
+    ![](../../images/view-job-logs-4.png){: .img-fluid .with-border}
+
+
+## Where to go next
+
+- [Enable auto-deletion of job logs](./auto-delete-job-logs.md)
--- a/ee/dtr/admin/manage-jobs/auto-delete-job-logs.md
+++ b/ee/dtr/admin/manage-jobs/auto-delete-job-logs.md
@ -0,0 +1,54 @@
+---
+title: Enable Auto-Deletion of Job Logs
+description: Enable auto-deletion of old or unnecessary job logs for maintenance.
+keywords: dtr, jobs, log, job logs, system
+---
+
+> BETA DISCLAIMER
+>
+> This is beta content. It is not yet complete and should be considered a work in progress. This content is subject to change without notice.
+
+## Overview 
+
+Docker Trusted Registry has a global setting for auto-deletion of job logs which allows them to be removed as part of [garbage collection](../configure/garbage-collection.md). DTR admins can enable auto-deletion of repository events in DTR 2.6 based on specified conditions which are covered below.
+
+## Steps
+
+1. In your browser, navigate to `https://<dtr-url>` and log in with your UCP credentials. 
+
+2. Select **System** on the left navigation pane which will display the **Settings** page by default.
+
+3. Scroll down to **Job Logs** and turn on **Auto-Deletion**.
+
+    ![](../../images/auto-delete-job-logs-1.png){: .img-fluid .with-border}
+
+4. Specify the conditions with which a job log auto-deletion will be triggered.
+
+    DTR allows you to set your auto-deletion conditions based on the following optional job log attributes:
+
+    | Name            | Description                                        | Example           |
+    |:----------------|:---------------------------------------------------| :----------------|
+    | Age        | Lets you remove job logs which are older than your specified number of  hours, days, weeks or months| `2 months` |
+    | Max number of events  | Lets you specify the maximum number of job logs allowed within DTR.  | `100` |
+
+    ![](../../images/auto-delete-job-logs-2.png){: .img-fluid .with-border}
+
+
+    If you check and specify both, job logs will be removed from DTR during garbage collection if either condition is met. You should see a confirmation message right away.
+
+5. Click **Start Deletion** if you're ready. Read more about [garbage collection](../configure/garbage-collection/#under-the-hood) if you're unsure about this operation.
+
+6.  Navigate to **System > Job Logs** to confirm that [**onlinegc_joblogs**](job-queue/#job-types) has started. For a detailed breakdown of individual job logs, see [View Job-specific Logs](audit-jobs-via-ui/#view-job-specific-logs) in "Audit Jobs via the Web Interface."
+
+
+![](../../images/auto-delete-job-logs-3.png){: .img-fluid .with-border}
+
+
+> Job Log Deletion
+>
+> When you enable auto-deletion of job logs, the logs will be permanently deleted during garbage collection. See [Configure logging drivers](../../../../config/containers/logging/configure/) for a list of supported logging drivers and plugins.
+
+## Where to go next
+
+- [Monitor Docker Trusted Registry](monitor-and-troubleshoot.md)
+ 
--- a/ee/dtr/admin/manage-jobs/job-queue.md
+++ b/ee/dtr/admin/manage-jobs/job-queue.md
@ -0,0 +1,80 @@
+---
+title: Job Queue
+description: Learn how Docker Trusted Registry runs batch jobs for troubleshooting job-related issues.
+keywords: dtr, job queue, job management
+---
+
+Docker Trusted Registry (DTR) uses a job queue to schedule batch jobs. Jobs are added to a cluster-wide job queue, and then consumed and executed by a job runner within DTR.
+
+![batch jobs diagram](../../images/troubleshoot-batch-jobs-1.svg)
+
+All DTR replicas have access to the job queue, and have a job runner component
+that can get and execute work.
+
+## How it works
+
+When a job is created, it is added to a cluster-wide job queue and enters the `waiting` state.
+When one of the DTR replicas is ready to claim the job, it waits a random time of up
+to `3` seconds to give every replica the opportunity to claim the task.
+
+A replica claims a job by adding its replica ID to the job. That way, other
+replicas will know the job has been claimed. Once a replica claims a job, it adds
+that job to an internal queue, which in turn sorts the jobs by their `scheduledAt` time.
+Once that happens, the replica updates the job status to `running`, and
+starts executing it.
+
+The job runner component of each DTR replica keeps a `heartbeatExpiration`
+entry on the database that is shared by all replicas. If a replica becomes
+unhealthy, other replicas notice the change and update the status of the failing worker to `dead`.
+Also, all the jobs that were claimed by the unhealthy replica enter the `worker_dead` state,
+so that other replicas can claim the job.
+
+## Job Types
+
+DTR runs periodic and long-running jobs. The following is a complete list of jobs you can filter for via [the user interface](view-job-logs.md) or [the API](../troubleshoot-batch-jobs.md).   
+
+| Job               | Description                                                                                                                                                                                                                                               |
+|:------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| gc                | A garbage collection job that deletes layers associated with deleted images.                                                                                                                                                                                 |
+| onlinegc                | A garbage collection job that deletes layers associated with deleted images without putting the registry in read-only mode.  |
+| onlinegc_metadata                | A garbage collection job that deletes metadata associated with deleted images. |
+| onlinegc_joblogs                | A garbage collection job that deletes job logs based on a configured job history setting. |
+| metadatastoremigration   | A necessary migration that enables the `onlinegc` feature. |
+| sleep             | Used for testing the correctness of the jobrunner. It sleeps for 60 seconds.                                                                                                                                                                           |
+| false             | Used for testing the correctness of the jobrunner. It runs the `false` command and immediately fails.                                                                                                                                                 |
+| tagmigration      | Used for synchronizing tag and manifest information between the DTR database and the storage backend.                                                                                                                                       |
+| bloblinkmigration | A DTR 2.1 to 2.2 upgrade process that adds references for blobs to repositories in the database.                                                                                                                                          |
+| license_update    | Checks for license expiration extensions if online license updates are enabled.                                                                                                                                                             |
+| scan_check        | An image security scanning job. This job does not perform the actual scanning, rather it spawns `scan_check_single` jobs (one for each layer in the image). Once all of the `scan_check_single` jobs are complete, this job will terminate.                |
+| scan_check_single | A security scanning job for a particular layer given by the `parameter: SHA256SUM`. This job breaks up the layer into components and checks each component for vulnerabilities.                                                                            |
+| scan_check_all    | A security scanning job that updates all of the currently scanned images to display the latest vulnerabilities.                                                                                                                                            |
+| update_vuln_db    | A job that is created to update DTR's vulnerability database. It uses an Internet connection to check for database updates through `https://dss-cve-updates.docker.com/` and updates the `dtr-scanningstore` container if there is a new update available. |
+| scannedlayermigration  | A DTR 2.4 to 2.5 upgrade process that restructures scanned image data. |
+| push_mirror_tag  | A job that pushes a tag to another registry after a push mirror policy has been evaluated. |
+| poll_mirror  | A global cron that evaluates poll mirroring policies. |
+| webhook           | A job that is used to dispatch a webhook payload to a single endpoint.                                                                                                                                                                                     |
+| nautilus_update_db           | The old name for the `update_vuln_db` job. This may be visible on old log files.                                                                                                                                                                                   |
+| ro_registry           | A user-initiated job for manually switching DTR into read-only mode.     |
+| tag_pruning           | A job for cleaning up unnecessary or unwanted repository tags which can be configured by repository admins. For configuration options, see [Tag Pruning](../../user/tag-pruning).                                                                                                                                                                      |
+
+## Job Status
+
+Jobs can have one of the following status values:
+
+| Status          | Description                                                                                                                               |
+|:----------------|:------------------------------------------------------------------------------------------------------------------------------------------|
+| waiting         | Unclaimed job waiting to be picked up by a worker.                                                                              |
+| running         | The job is currently being run by the specified `workerID`.                                                                             |
+| done            | The job has successfully completed.                                                                                                        |
+| error           | The job has completed with errors.                                                                                                         |
+| cancel_request  | The status of a job is monitored by the worker in the database. If the job status changes to `cancel_request`, the job is canceled by the worker. |
+| cancel          | The job has been canceled and was not fully executed.                                                                                          |
+| deleted         | The job and its logs have been removed.                                                                                                        |
+| worker_dead     | The worker for this job has been declared `dead` and the job will not continue.                                                            |
+| worker_shutdown | The worker that was running this job has been gracefully stopped.                                                                          |
+| worker_resurrection | The worker for this job has reconnected to the database and will cancel this job.                                          |
+
+## Where to go next
+
+- [Audit Jobs via Web Interface](audit-jobs-via-ui)
+- [Audit Jobs via API](audit-jobs-via-api)
--- a/ee/dtr/images/auto-delete-job-logs-0.png
+++ b/ee/dtr/images/auto-delete-job-logs-0.png
--- a/ee/dtr/images/auto-delete-job-logs-1.png
+++ b/ee/dtr/images/auto-delete-job-logs-1.png
--- a/ee/dtr/images/auto-delete-job-logs-2.png
+++ b/ee/dtr/images/auto-delete-job-logs-2.png
--- a/ee/dtr/images/auto-delete-job-logs-3.png
+++ b/ee/dtr/images/auto-delete-job-logs-3.png
--- a/ee/dtr/images/view-job-logs-1.png
+++ b/ee/dtr/images/view-job-logs-1.png
--- a/ee/dtr/images/view-job-logs-2.png
+++ b/ee/dtr/images/view-job-logs-2.png
--- a/ee/dtr/images/view-job-logs-3.png
+++ b/ee/dtr/images/view-job-logs-3.png
--- a/ee/dtr/images/view-job-logs-4.png
+++ b/ee/dtr/images/view-job-logs-4.png