Add information about new Domain data point (#15798)

Co-authored-by: Diogo Ferreira <diogoff94@gmail.com>
Co-authored-by: jerae-duffin <83294991+jerae-duffin@users.noreply.github.com>
This commit is contained in:
David Karlsson 2022-10-17 17:08:13 +02:00 committed by GitHub
parent 208641e7ea
commit 4b5671431f
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
3 changed files with 102 additions and 76 deletions

View File

@ -6,4 +6,4 @@ Vocab = Docker, Industry, Technology
[*.md] [*.md]
BasedOnStyles = Vale, Docker BasedOnStyles = Vale, Docker
TokenIgnores = ({%.*%}), \ TokenIgnores = ({%.*%}), \
({:.*?}) ({:(?:.|\n)*?})

View File

@ -1628,7 +1628,7 @@ manuals:
title: Convert an account into an organization title: Convert an account into an organization
- path: /docker-hub/deactivate-account/ - path: /docker-hub/deactivate-account/
title: Deactivate an account or an organization title: Deactivate an account or an organization
- sectiontitle: Docker Verified Publisher Program - sectiontitle: Docker Verified Publisher
section: section:
- path: /docker-hub/publish/ - path: /docker-hub/publish/
title: Overview title: Overview

View File

@ -4,11 +4,11 @@ description: Provides usage statistics of your images on Docker Hub.
keywords: docker hub, hub, insights, analytics, api, verified publisher keywords: docker hub, hub, insights, analytics, api, verified publisher
--- ---
Insights and analytics provides usage analytics for your organization's images Insights and analytics provides usage analytics for your Docker Verified
on Docker Hub. With this tool, you have self-serve access to metrics as both raw Publisher (DVP) images on Docker Hub. With this tool, you have self-serve access
data and summary data for a desired time span. You can view how many times your to metrics as both raw data and summary data for a desired time span. You can
images have been pulled by tag or by digest, and get breakdowns by geolocation, view number of image pulls by tag or by digest, and get breakdowns by
cloud provider, and client (user agent). geolocation, cloud provider, client, and more.
## Exporting analytics data ## Exporting analytics data
@ -27,18 +27,19 @@ manually as a spreadsheet.
Here's how to export usage data for your organization's images using the Docker Here's how to export usage data for your organization's images using the Docker
Hub website. Hub website.
1. Log in to [Docker Hub](https://hub.docker.com/){: target="_blank" 1. Sign in to [Docker Hub](https://hub.docker.com/){: target="_blank"
rel="noopener" class="_"} and select **Organizations**. rel="noopener" class="_"} and select **Organizations**.
2. Choose your organization and click **Insights and analytics**. 2. Choose your organization and select **Insights and analytics**.
![Organization overview page, with the Insights and Analytics tab](./images/organization-tabs.png) ![Organization overview page, with the Insights and Analytics tab](./images/organization-tabs.png)
3. Set the time span for which you want to export analytics data. The 3. Set the time span for which you want to export analytics data.
downloadable CSV files for summary and raw data appear on the right-hand
side.
![Filtering options and download links for analytics data](./images/download-analytics-data.png) The downloadable CSV files for summary and raw data appear on the right-hand
side.
![Filtering options and download links for analytics data](./images/download-analytics-data.png)
### Export data using the API ### Export data using the API
@ -47,84 +48,73 @@ The HTTP API endpoints are available at:
using the API in the [DVP Data API documentation](/docker-hub/api/dvp/){: using the API in the [DVP Data API documentation](/docker-hub/api/dvp/){:
target="_blank" rel="noopener" class="_"}. target="_blank" rel="noopener" class="_"}.
## Data formats ## Data points
The data can be exported in either raw or summary format. Each format contains Export data in either raw or summary format. Each format contains different data
different data points and are formatted differently. points and with different structure.
Review the [Data definitions](#data-definitions) section for more information The following sections describe the available data points for each format. The
about the data points and how to read them. **Available from** column shows when the field was first added.
### Raw data ### Raw data
The raw data format contains the following data points for the selected time The raw data format contains the following data points. Each row in the CSV file
span. Each action is represented as a single row in the CSV file. represents an image pull.
- Timestamp | Data point | Description | Available from |
- Namespace | ----------------------------- | ------------------------------------------------------------------------------------------------------------ | ---------------- |
- Repository | Action | Request type, see [Action classification rules][1]. One of `pull_by_tag`, `pull_by_digest`, `version_check`. | January 1, 2022 |
- Reference | Action day | The date part of the timestamp: `YYYY-MM-DD` | January 1, 2022 |
- Digest | Country | Request origin country. | January 1, 2022 |
- Tag (included when available) | Digest | Image digest. | January 1, 2022 |
- Action day | HTTP method | HTTP method used in the request, see [registry API documentation][2] for details. | January 1, 2022 |
- HTTP method | Host | The cloud service provider used in an event. | January 1, 2022 |
- Action, one of the following: | Namespace | Docker [organization][3] (image namespace). | January 1, 2022 |
- Pull by tag | Reference | Image digest or tag used in the request. | January 1, 2022 |
- Pull by digest | Repository | Docker [repository][4] (image name). | January 1, 2022 |
- Version check | Tag (included when available) | Tag name that's only available if the request referred to a tag. | January 1, 2022 |
- Type | Timestamp | Date and time of the request: `YYYY-MM-DD 00:00:00` | January 1, 2022 |
- Host | Type | The industry from which the event originates. One of `business`, `isp`, `hosting`, `education`, `null` | January 1, 2022 |
- Country | User agent tool | The application a user used to pull an image (for example, `docker` or `containerd`). | January 1, 2022 |
- User agent tool | User agent version | The version of the application used to pull an image. | January 1, 2022 |
- User agent version | Domain | Request origin domain, see [Privacy][5]. | October 11, 2022 |
[1]: #action-classification-rules
[2]: /registry/spec/api/
[3]: /docker-hub/orgs/
[4]: /docker-hub/repos/
[5]: #privacy
### Summary data ### Summary data
The summary data format contains the following data points for each namespace, The summary data format contains the following data points for each namespace,
repository, and reference (tag or digest), for the selected time span. repository, and reference (tag or digest), for the selected time span.
- Unique IP addresses | Data point | Value | Description | Available from |
- Pulls by tag | ----------------- | ------- | ------------------------------------------------- | --------------- |
- Pulls by digest | Unique IP address | String | Number of unique IP addresses, see [Privacy][3]. | January 1, 2022 |
- Version checks | Pull by tag | Integer | GET request, by digest or by tag. | January 1, 2022 |
| Pull by digest | Integer | GET or HEAD request by digest, or HEAD by digest. | January 1, 2022 |
| Version check | Integer | HEAD by tag, not followed by a GET | January 1, 2022 |
### Data definitions [3]: #privacy
| Data point | Definition | ### Action classification rules
| :----------------- | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Action | An action represents the multiple request events associated with a `docker pull`. We have applied rules to these events so that the data is more meaningful in analyzing user behavior and intent. An action can be filtered into three distinct categories: version check, pull by tag, and pull by digest. Each action is represented as a single row in the raw data. For more information, see [Action classification rules](#action-classification-rules). |
| Version check | This is a filter on the action data point. It is a speculation of user intent. Includes: HEAD by tag not followed by a GET (from the same IP address within a 5-second window). Excludes: HEAD by digest |
| Pull by tag | This is a filter on the action data point. It is a speculation of user intent. Includes: GET (by digest or by tag). If the GET is immediately preceded by a HEAD by tag (from the same IP address within a 5-second window), then the GET and HEAD together are counted as a single Pull by Tag. If the GET by tag is immediately followed by another GET (from the same IP address within a 5-second window, but a different digest), then the two GETs are counted as a single Pull by Tag. |
| Pull by digest | This is a filter on the action data point. It is a speculation of user intent. Includes: GET by digest. If the GET is immediately preceded by a HEAD by digest (from the same IP address within a 5-second window), then the GET and HEAD together are counted as a single pull by digest. If the GET is immediately followed by another GET (from the same IP address within a 5-second window, but a different digest), then the two GETs together are counted as a single pull by digest. Includes: HEAD by digest, not followed by a GET. |
| Type | The industry from which the event originates. Industry types include `business`, `isp` (internet service provider), `hosting`, `education`, and `null` in cases where the industry could not be identified. |
| Host | The cloud service provider used in an event. |
| Reference | The digest or tag that was referenced in the action. |
| Digest | The image version digest. |
| Tag | The tag name. Only available if the pull referred to a tag, not available if the pull referred to a digest. |
| Country | The country from which the request originated. |
| Timestamp | Date and time of an event in the following schema: YYYY-MM-DD 00:00:00 |
| Action day | The date portion of the timestamp: YYYY-MM-DD |
| Namespace | The Docker organization that a repository is a part of. |
| Repository | The repository that an image belongs to. |
| Reference | The tag or digest of any given image. |
| HTTP method | The HTTP method used in a request by the client. More information on Docker Registry HTTP API protocols can be found [here](/registry/spec/api/){: target="_blank" rel="noopener" class="_"}. |
| User agent tool | The application a user used to pull an image (for example, `docker` or `containerd`). Extracted from the UA string. |
| User agent version | The version of the application used to pull an image. |
| Unique IP address | As part of our privacy-preserving policy, Docker only shares the count of distinct unique IP addresses that request an image. |
## Action classification rules An action represents the multiple request events associated with a
`docker pull`. Pulls are grouped by category to make the data more meaningful
for understanding user behavior and intent. The categories are:
Automated systems frequently check for new versions of your images. The insights - Version check
and analytics metrics show the number of pulls that were triggered by users, and - Pull by tag
pulls by automated systems such as CI/CD tools, respectively. Automated "version - Pull by digest
checks" and real image downloads are differentiated by inspecting the order and
timing of image pulls coming from the same IP address. Being able to distinguish
between different types of image pulls grants you more insight into your users'
behavior. You can inspect the rules for determining intent behind pulls in the
[Action classification rules](#action-classification-rules) section on this
page.
To provide feedback or ask questions about these rules, Automated systems frequently check for new versions of your images. Being able
to distinguish between "version checks" in CI versus actual image pulls by a
user grants you more insight into your users' behavior.
The following table describes the rules applied for determining intent behind
pulls. To provide feedback or ask questions about these rules,
[fill out the Google Form](https://forms.gle/nb7beTUQz9wzXy1b6){: [fill out the Google Form](https://forms.gle/nb7beTUQz9wzXy1b6){:
target="_blank" rel="noopener" class="_"}. target="_blank" rel="noopener" class="_"}.
@ -141,3 +131,39 @@ target="_blank" rel="noopener" class="_"}.
| GET | digest | GET by different digest | Pull by digest | Image is multi-arch | The second GET by digest must be different from the first | | GET | digest | GET by different digest | Pull by digest | Image is multi-arch | The second GET by digest must be different from the first |
| HEAD | digest | GET by same digest | Pull by digest | Image is single arch and/or image is multi-arch but some part of the image already exists on the local machine | | HEAD | digest | GET by same digest | Pull by digest | Image is single arch and/or image is multi-arch but some part of the image already exists on the local machine |
| HEAD | digest | GET by same digest, then a second GET by different digest | Pull by Digest | Image is multi-arch | | HEAD | digest | GET by same digest, then a second GET by different digest | Pull by Digest | Image is multi-arch |
## Changes in data over time
The insights and analytics service is continuously improved to increase the
value it brings to publishers. Some changes might include adding new data
points, or improving existing data to make it more useful.
When there is a change in the dataset provided by the service, such a change
doesn't get retroactively applied. As new data points get added, they're
available from the point of introduction and going forward.
Refer to the tables in the [Data points](#data-points) section to see from which
date a given data point is available.
## Privacy
This section contains information about privacy-protecting measures that ensures
consumers of content on Docker Hub remain completely anonymous.
> **Important**
>
> Docker never shares any Personally Identifiable Information (PII) as part of
> analytics data.
{: .important }
The summary dataset includes Unique IP address count. This data point only
includes the number of distinct unique IP addresses that request an image.
Individual IP addresses are never shared.
The raw dataset includes user IP domains as a data point. That's the domain name
associated with the IP address used to pull an image. If the IP type is
`business`, the domain represents the company or organization associated with
that IP address (for example, `docker.com`). For any other IP type that's not
`business`, the domain represents the internet service provider or hosting
provider used to make the request. On average, only about 30% of all pulls
classify as the `business` IP type (this varies between publishers and images).