Move current schemas from opentelemetry-java. (#7)

This commit is contained in:
Bogdan Drutu 2019-05-13 13:11:42 -07:00 committed by GitHub
parent e6452f2df0
commit aa109fb659
8 changed files with 1076 additions and 0 deletions

View File

@ -0,0 +1,36 @@
# Databases client spans
Calls to databases should be tracked as client spans.
Span `name` should be set to low cardinality value representing the statement
executed on database. It may be stored procedure name (without argument), sql
statement without variable arguments, etc. When it's impossible to get any
meaningful representation of the span `name`, it can be populated using the same
value as `db.instance`.
**TODO: we might need have separate specs for other database types like Redis
and such.**
Note, Redis, Cassandra, HBase and other storage systems may reuse the same
attribute names.
**TODO: Agree to use `type` instead of `component`?**
| Attribute name | Notes and examples |
|:---------------|:-------------------|
| `type` | Database driver name or database name (when known) `JDBI`, `jdbc`, `odbc`, `postgreSQL`. |
| `db.type` | Database type. For any SQL database, `"sql"`. For others, the lower-case database category, e.g. `"cassandra"`, `"hbase"`, or `"redis"`. |
| `db.instance` | Database instance name. E.g., In java, if the jdbc.url=`"jdbc:mysql://127.0.0.1:3306/customers"`, the instance name is `"customers"`. |
| `db.statement` | A database statement for the given database type. E.g., for `db.type="sql"`, `"SELECT * FROM wuser_table"`; for `db.type="redis"`, `"SET mykey 'WuValue'"`. |
| `db.user` | Username for accessing database. E.g., `"readonly_user"` or `"reporting_user"` |
For database client calls, peer information SHOULD be collected.
| Attribute name | Notes and examples |
|:----------------|:-------------------|
| `peer.address` | JDBC substring like `"mysql://prod-db:3306"` |
| `peer.hostname` | Remote hostname. `localhost` |
| `peer.ipv4` | Remote IPv4 address as a `.`-separated tuple. E.g., `"127.0.0.1"` |
| `peer.ipv6` | Remote IPv6 address as a string of colon-separated 4-char hex tuples. E.g., `"2001:0db8:85a3:0000:0000:8a2e:0370:7334"` |
| `peer.port` | Remote port. E.g., `80` (integer) |
| `peer.service` | Remote service name. Can be database friendly name or `db.instance` |

View File

@ -0,0 +1,85 @@
# gRPC
This document explains tracing of gRPC requests with OpenConsensus.
## Spans
Implementations MUST create a span, when the gRPC call starts, for the client
and a span for the server.
Span `name` MUST be full gRPC method name formatted as:
```
$package.$service/$method
```
Examples of span names:
- `grpc.test.EchoService/Echo`
Outgoing requests should be a span `kind` of `CLIENT` and incoming requests
should be a span `kind` of `SERVER`.
## Propagation
Propagation is how `SpanContext` is transmitted on the wire in an gRPC request.
**TODO: review and close on binary protocol and metadata name**
The propagation MUST inject a `SpanContext` as a gRPC metadata `grpc-trace-bin`
and MUST extract a `SpanContext` from the gRPC metadata. Serialization format is
configurable. The default serialization format should be used is [W3C binary
trace context](https://w3c.github.io/trace-context-binary/).
## Attributes
**TODO: `type` is not implemented today in existing integrations. Need to track it**
**TODO: should we include `host`, `uri` or those should be reported as `peer`?**
**TODO: agree that `component` from OpenTracing is being replaced with `type` as
a better name.**
| Attribute name | Description | Type |Example value |
|---------------------------|--------------------------------|--------|---------------------------|
| "type" | Type of the client/server span | string | `grpc` |
## Status
Implementations MUST set status which should be the same as the gRPC
client/server status. The mapping between gRPC canonical codes and OpenCensus
status codes can be found
[here](https://github.com/grpc/grpc-go/blob/master/codes/codes.go).
## Events
In the lifetime of a gRPC stream, the following events SHOULD be created:
- An event for each message sent/received on client and server spans.
[Message
event](../../contrib/src/main/java/opentelemetry/contrib/trace/MessageEvent.java)
should be used as a name of event.
```
-> [time],
"name" = "message",
"message.type" = "SENT",
"message.id" = id
"message.compressed_size" = <compressed size in bytes>,
"message.uncompressed_size" = <uncompressed size in bytes>
```
```
-> [time],
"name" = "message",
"message.type" = "RECEIVED",
"message.id" = id
"message.compressed_size" = <compressed size in bytes>,
"message.uncompressed_size" = <uncompressed size in bytes>
```
The `message.id` MUST be calculated as two different counters starting from `1`
one for sent messages and one for received message. This way we guarantee that
the values will be consistent between different implementations. In case of
unary calls only one sent and one received message will be recorded for both
client and server spans.

View File

@ -0,0 +1,118 @@
# HTTP Stats
Any particular library might provide only a subset of these measures/views/tags.
Check the language-specific documentation for the list of supported values.
There is no special support for multi-part HTTP requests and responses. These are just treated as a single request.
## Units
As always, units are encoded according to the case-sensitive abbreviations from the [Unified Code for Units of Measure](http://unitsofmeasure.org/ucum.html):
* Latencies are measures in float64 milliseconds, denoted "ms"
* Sizes are measured in bytes, denoted "By"
* Dimensionless values have unit "1"
Buckets for distributions in default views are as follows:
* Size in bytes: 0, 1024, 2048, 4096, 16384, 65536, 262144, 1048576, 4194304, 16777216, 67108864, 268435456, 1073741824, 4294967296
* Latency in ms: 0, 1, 2, 3, 4, 5, 6, 8, 10, 13, 16, 20, 25, 30, 40, 50, 65, 80, 100, 130, 160, 200, 250, 300, 400, 500, 650, 800, 1000, 2000, 5000, 10000, 20000, 50000, 100000
## Client
### Measures
Client stats are recorded for each individual HTTP request, including for each individual redirect (followed or not). All stats are recorded after request processing (usually after the response body has been fully read).
| Measure name | Unit | Description |
|---------------------------------------------|------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| opencensus.io/http/client/sent_bytes | By | Total bytes sent in request body (not including headers). This is uncompressed bytes. |
| opencensus.io/http/client/received_bytes | By | Total bytes received in response bodies (not including headers but including error responses with bodies). Should be measured from actual bytes received and read, not the value of the Content-Length header. This is uncompressed bytes. Responses with no body should record 0 for this value. |
| opencensus.io/http/client/roundtrip_latency | ms | Time between first byte of request headers sent to last byte of response received, or terminal error |
### Tags
All client measures should be tagged with the following.
| Tag name | Description |
|--------------------|----------------------------------------------------------------------------------------------------------|
| http_client_method | HTTP method, capitalized (i.e. GET, POST, PUT, DELETE, etc.) |
| http_client_path | URL path (not including query string) |
| http_client_status | HTTP status code as an integer (e.g. 200, 404, 500.), or "error" if no response status line was received |
| http_client_host | Value of the request Host header |
`http_client_method`, `http_client_path`, `http_client_host` are set when an outgoing request
starts and are available in the context for the entire outgoing request processing.
`http_client_status` is set when an outgoing request finishes and is only available around the
stats recorded at the end of request processing.
`http_client_path` and `http_client_host` might have high cardinality and you should be careful about using these
in views if your metrics backend cannot tolerate high-cardinality labels.
### Default views
The following set of views are considered minimum required to monitor client side performance:
| View name | Measure | Aggregation | Tags |
|---------------------------------------------|---------------------------------------------|--------------|----------------------------------------|
| opencensus.io/http/client/sent_bytes | opencensus.io/http/client/sent_bytes | distribution | http_client_method, http_client_status |
| opencensus.io/http/client/received_bytes | opencensus.io/http/client/received_bytes | distribution | http_client_method, http_client_status |
| opencensus.io/http/client/roundtrip_latency | opencensus.io/http/client/roundtrip_latency | distribution | http_client_method, http_client_status |
| opencensus.io/http/client/completed_count | opencensus.io/http/client/roundtrip_latency | count | http_client_method, http_client_status |
## Server
Server measures are recorded at the end of request processing.
### Measures
Server stats are recorded for each individual HTTP request handled. They are recorded at the end of request handling.
| Measure name | Unit | Description |
|------------------------------------------|------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| opencensus.io/http/server/received_bytes | By | Total bytes received in request body (not including headers). This is uncompressed bytes. |
| opencensus.io/http/server/sent_bytes | By | Total bytes sent in response bodies (not including headers but including error responses with bodies). Should be measured from actual bytes received and read, not the value of the Content-Length header. This is uncompressed bytes. Responses with no body should record 0 for this value. |
| opencensus.io/http/server/server_latency | ms | Time between first byte of request headers read to last byte of response sent, or terminal error |
### Tags
All server metrics should be tagged with the following.
| Tag name | Description |
|--------------------|---------------------------------------------------------------------|
| http_server_method | HTTP method, capitalized (i.e. GET, POST, PUT, DELETE, etc.) |
| http_server_path | URL path (not including query string) |
| http_server_status | HTTP server status code returned, as an integer e.g. 200, 404, 500. |
| http_server_host | Value of the request Host header |
| http_server_route | Logical route of the handler of this request |
`http_server_method`, `http_server_path`, `http_server_host` are set when an incoming request
starts and are available in the context for the entire incoming request processing.
`http_server_status` is set when an incoming request finishes and is only available around the stats
recorded at the end of request processing.
`http_server_path` and `http_server_host` are set by the client: you should be careful about using these
in views if your metrics backend cannot tolerate high-cardinality labels.
`http_server_route` should always be a low cardinality string representing the logical route or handler of the
request. A reasonable interpretation of this would be the URL path pattern matched to handle the request,
or an explicitly specified function name. Defaults to the empty string if no other suitable value is
available.
### Default views
The following set of views are considered minimum required to monitor server side performance:
| View name | Measure | Aggregation | Tags |
|-------------------------------------------|------------------------------------------|--------------|-----------------------------------------------------------|
| opencensus.io/http/server/received_bytes | opencensus.io/http/server/received_bytes | distribution | http_server_method, http_server_route, http_server_status |
| opencensus.io/http/server/sent_bytes | opencensus.io/http/server/sent_bytes | distribution | http_server_method, http_server_route, http_server_status |
| opencensus.io/http/server/server_latency | opencensus.io/http/server/server_latency | distribution | http_server_method, http_server_route, http_server_status |
| opencensus.io/http/server/completed_count | opencensus.io/http/server/server_latency | count | http_server_method, http_server_route, http_server_status |
## FAQ
### Why was the path removed from the default views?
Path can have unbounded cardinality, which causes problems for time-series databases like Prometheus.
This is especially true of public-facing HTTP servers, where this becomes a DoS vector.

View File

@ -0,0 +1,225 @@
# HTTP Trace
This document explains tracing of HTTP requests with OpenCensus.
## Spans
Implementations MUST create a span for outgoing requests at the client and a span for incoming
requests at the server.
Span name is formatted as:
* /$path for outgoing requests.
* /($path|$route) for incoming requests.
If route cannot be determined, path is used to name the the span for outgoing requests.
Port MUST be omitted if it is 80 or 443.
Examples of span names:
* /users
* /messages/[:id]
* /users/25f4c31d
Outgoing requests should be a span kind of CLIENT and
incoming requests should be a span kind of SERVER.
## Propagation
Propagation is how SpanContext is transmitted on the wire in an HTTP request.
Implementations MUST allow users to set their own propagation format and MUST provide an
implementation for [B3](https://github.com/openzipkin/b3-propagation/blob/master/README.md#http-encodings)
and [TraceContext](https://w3c.github.io/trace-context/) at least.
If user doesn't set any propagation methods explicitly, TraceContext is used.
> In a previous version of this spec, we recommended that B3 be the default. For backwards compatibility,
implementations may provide a way for users to "opt in" to the new default explicitly, to
avoid a silent change to defaults that could break existing deployments.
The propagation method SHOULD modify a request object to insert a SpanContext or SHOULD be able
to extract a SpanContext from a request object.
## Status
Implementations MUST set status if HTTP request or response is not successful (e.g. not 2xx). In
redirection case, if the client doesn't have autoredirection support, request should be
considered successful.
Set status code to UNKNOWN (2) if the reason cannot be inferred at the callsite or from the HTTP
status code.
Don't set the status message if the reason can be inferred at the callsite of from the HTTP
status code.
### Mapping from HTTP status codes to Trace status codes
| HTTP code | Trace status code |
|-----------------------|------------------------|
| 0...199 | 2 (UNKNOWN) |
| 200...399 | 0 (OK) |
| 400 Bad Request | 3 (INVALID_ARGUMENT) |
| 504 Gateway Timeout | 4 (DEADLINE_EXCEEDED) |
| 404 Not Found | 5 (NOT_FOUND) |
| 403 Forbidden | 7 (PERMISSION_DENIED) |
| 401 Unauthorized* | 16 (UNAUTHENTICATED) |
| 429 Too Many Requests | 8 (RESOURCE_EXHAUSTED) |
| 501 Not Implemented | 12 (UNIMPLEMENTED) |
| 503 Unavailable | 14 (UNAVAILABLE) |
Notes: 401 Unauthorized actually means unauthenticated according to RFC 7235, 3.1.
The Status message should be the Reason-Phrase (RFC 2616 6.1.1) from the
response status line (if available).
### Client errors for client HTTP calls
There are a number of client errors when trying to access http endpoint. Here
are examples of mapping those to the OpenCensus status codes.
| Client error | Trace status code |
|------------------------------|-----------------------|
| DNS resolution failed | 2 (UNKNOWN) |
| Request cancelled by caller | 1 (CANCELLED) |
| URL cannot be parsed | 3 (INVALID_ARGUMENT) |
| Request timed out | 1 (DEADLINE_EXCEEDED) |
## Message events
In the lifetime of an incoming and outgoing request, the following message events SHOULD be created:
* A message event for the request body size if/when determined.
* A message event for the response size if/when determined.
Implementations SHOULD create message event when body size is determined.
```
-> [time], MessageEventTypeSent, UncompressedByteSize, CompressedByteSize
```
Implementations SHOULD create message event when response size is determined.
```
-> [time], MessageEventTypeRecv, UncompressedByteSize, CompressedByteSize
```
## Attributes
Implementations SHOULD set the following attributes on the client and server spans. For a server,
request represents the incoming request. For a client, request represents the outgoing request.
All attributes are optional, but collector should make the best effort to
collect those.
| Attribute name | Description | Type |Example value |
|---------------------------|-----------------------------|--------|---------------------------|
| "http.host" | Request URL host | string | `example.com:779` |
| "http.method" | Request URL method | string | `GET` |
| "http.path" | Request URL path. If empty - set to `/` | `/users/25f4c31d` |
| "http.route" | Matched request URL route | string | `/users/:userID` |
| "http.user_agent" | Request user-agent. Do not inject attribute if user-agent is empty. | string | `HTTPClient/1.2` |
| "http.status_code" | Response status code | int64 | `200` |
| "http.url" | Absolute request URL | string | `https://example.com:779/path/12314/?q=ddds#123` |
Exporters should always export the collected attributes. Exporters should map the collected
attributes to backend's known attributes/labels.
The following table summarizes how OpenCensus attributes maps to the
known attributes/labels on supported tracing backends.
| OpenCensus attribute | Zipkin | Jaeger | Stackdriver Trace label |
|---------------------------|--------------------|--------------------|---------------------------|
| "http.host" | "http.host" | "http.host" | "/http/host" |
| "http.method" | "http.method" | "http.method" | "/http/method" |
| "http.path" | "http.path" | "http.path" | "/http/path" |
| "http.route" | "http.route" | "http.route" | "/http/route" |
| "http.user_agent" | "http.user_agent" | "http.user_agent" | "/http/user_agent" |
| "http.status_code" | "http.status_code" | "http.status_code" | "/http/status_code" |
| "http.url" | "http.url" | "http.url" | "/http/url" |
References:
- [Stackdriver Trace
label](https://cloud.google.com/trace/docs/reference/v1/rest/v1/projects.traces)
- [Jaeger/Open Tracing](https://github.com/opentracing/specification/blob/master/semantic_conventions.md)
- [Zipkin](https://github.com/openzipkin/zipkin-api/blob/master/thrift/zipkinCore.thrift)
## Test Cases
Test cases for outgoing http calls are in the file
[http-out-test-cases.json](http-out-test-cases.json).
File consists of a set of test cases. Each test case represents outgoing http
call, response it receives and the resulting span properties. It looks like
this:
``` json
{
"name": "Name is populated as a path",
"method": "GET",
"url": "http://{host}:{port}/path/to/resource/",
"headers": {
"User-Agent": "test-user-agent"
},
"responseCode": 200,
"spanName": "/path/to/resource/",
"spanStatus": "OK",
"spanKind": "Client",
"spanAttributes": {
"http.path": "/path/to/resource/",
"http.method": "GET",
"http.host": "{host}:{port}",
"http.status_code": "200",
"http.user_agent": "test-user-agent"
}
}
```
Where `name` is the name of the test case. Properties `method`, `url` and
`headers` collection represents the outgoing call. The field `responseCode`
describes the response status code.
The rest of the properties describe the span details of the resulting span -
it's name, kind, status and attributes.
## Sampling
There are two ways to control the `Sampler` used:
* Controlling the global default `Sampler` via [TraceConfig](https://github.com/census-instrumentation/opencensus-specs/blob/master/trace/TraceConfig.md).
* Pass a specific `Sampler` as an option to the HTTP plugin. Plugins should support setting
a sampler per HTTP request.
Example cases where per-request sampling is useful:
- Having different sampling policy per route
- Having different sampling policy per method
- Filtering out certain paths (e.g. health endpoints) to disable tracing
- Always sampling critical paths
- Sampling based on the custom request header or query parameter
In the following Go example, incoming and outgoing request objects can
dynamically inspected to set a sampler.
For outgoing requests:
```go
type Transport struct {
// GetStartOptions allows to set start options per request.
GetStartOptions func(*http.Request) trace.StartOptions
// ...
}
```
For incoming requests:
```go
type Handler struct {
// GetStartOptions allows to set start options per request.
GetStartOptions func(*http.Request) trace.StartOptions
// ...
}
```

View File

@ -0,0 +1,86 @@
# Standard Resources
This page lists the standard resource types in OpenCensus. For more details on how resources can
be combined see [this](Resource.md).
OpenCensus defines these fields.
* [Compute Unit](#compute-unit)
* [Container](#container)
* [Deployment Service](#deployment-service)
* [Kubernetes](#kubernetes)
* [Compute Instance](#compute-instance)
* [Host](#host)
* [Environment](#environment)
* [Cloud](#cloud)
* [Cluster](#cluster)
## TODOs
* Add logical compute units: Service, Task - instance running in a service.
* Add more compute units: Process, Lambda Function, AppEngine unit, etc.
* Add Device (mobile) and Web Browser.
* Decide if lower case strings only.
* Consider to add optional/required for each label and combination of labels (e.g when supplying a
k8s resource all k8s may be required).
## Compute Unit
Resources defining a compute unit (e.g. Container, Process, Lambda Function).
### Container
**type:** `container`
**Description:** A container instance. This resource can be [merged](Resource.md#Merging) with a
deployment service resource, a compute instance resource, and an environment resource.
| Label | Description | Example |
|---|---|---|
| container.name | Container name. | `opencenus-autoconf` |
| container.image.name | Name of the image the container was built on. | `gcr.io/opencensus/operator` |
| container.image.tag | Container image tag. | `0.1` |
## Deployment Service
Resources defining a deployment service (e.g. Kubernetes).
### Kubernetes
**type:** `k8s`
**Description:** A Kubernetes resource. This resource can be [merged](Resource.md#Merging) with
a compute instance resource, and/or an environment resource.
| Label | Description | Example |
|---|---|---|
| k8s.cluster.name | The name of the cluster that the pod is running in. | `opencensus-cluster` |
| k8s.namespace.name | The name of the namespace that the pod is running in. | `default` |
| k8s.pod.name | The name of the pod. | `opencensus-pod-autoconf` |
## Compute Instance
Resources defining a computing instance (e.g. host).
### Host
**type:** `host`
**Description:** A host is defined as a general computing instance. This resource should be
[merged](Resource.md#Merging) with an environment resource.
| Label | Description | Example |
|---|---|---|
| host.hostname | Hostname of the host.<br/> It contains what the `hostname` command returns on the host machine. | `opencensus-test` |
| host.id | Unique host id.<br/> For Cloud this must be the instance_id assigned by the cloud provider | `opencensus-test` |
| host.name | Name of the host.<br/> It may contain what `hostname` returns on Unix systems, the fully qualified, or a name specified by the user. | `opencensus-test` |
| host.type | Type of host.<br/> For Cloud this must be the machine type.| `n1-standard-1` |
## Environment
Resources defining a running environment (e.g. Cloud, Data Center).
### Cloud
**type:** `cloud`
**Description:** A cloud infrastructure (e.g. GCP, Azure, AWS).
| Label | Description | Example |
|---|---|---|
| cloud.provider | Name of the cloud provider.<br/> Example values are aws, azure, gcp. | `gcp` |
| cloud.account.id | The cloud account id used to identify different entities. | `opencensus` |
| cloud.region | A specific geographical location where different entities can run | `us-central1` |
| cloud.zone | Zones are a sub set of the region connected through low-latency links.<br/> In aws it is called availability-zone. | `us-central1-a` |

View File

@ -0,0 +1,173 @@
# gRPC Stats
Any particular library might provide only a subset of these measures/views/tags.
Check the language-specific documentation for the list of supported values.
## Units
As always, units are encoded according to the case-sensitive abbreviations from the [Unified Code for Units of Measure](http://unitsofmeasure.org/ucum.html):
* Latencies are measures in float64 milliseconds, denoted "ms"
* Sizes are measured in bytes, denoted "By"
* Counts of messages per RPC have unit "1"
Buckets for distributions in default views are as follows:
* Size in bytes: 0, 1024, 2048, 4096, 16384, 65536, 262144, 1048576, 4194304, 16777216, 67108864, 268435456, 1073741824, 4294967296
* Latency in ms: 0, 0.01, 0.05, 0.1, 0.3, 0.6, 0.8, 1, 2, 3, 4, 5, 6, 8, 10, 13, 16, 20, 25, 30, 40, 50, 65, 80, 100, 130, 160, 200, 250, 300, 400, 500, 650, 800, 1000, 2000, 5000, 10000, 20000, 50000, 100000
* Counts (no unit): 0, 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, 32768, 65536
## Terminology
* **RPC** single call against a gRPC service, either streaming or unary.
* **message** individual message in an RPC. Streaming RPCs can have multiple messages per RPC. Unary RPCs always have only a single message per RPC.
* **status** string (all caps), e.g. CANCELLED, DEADLINE_EXCEEDED. See: https://github.com/grpc/grpc/blob/master/doc/statuscodes.md
## Client
### Measures
Client stats are recorded at the end of each outbound RPC.
| Measure name | Unit | Description |
|-------------------------------------------|------|-----------------------------------------------------------------------------------------------|
| grpc.io/client/sent_messages_per_rpc | 1 | Number of messages sent in the RPC (always 1 for non-streaming RPCs). |
| grpc.io/client/sent_bytes_per_rpc | By | Total bytes sent across all request messages per RPC. |
| grpc.io/client/received_messages_per_rpc | 1 | Number of response messages received per RPC (always 1 for non-streaming RPCs). |
| grpc.io/client/received_bytes_per_rpc | By | Total bytes received across all response messages per RPC. |
| grpc.io/client/roundtrip_latency | ms | Time between first byte of request sent to last byte of response received, or terminal error. |
| grpc.io/client/server_latency | ms | Propagated from the server and should have the same value as "grpc.io/server/latency". |
| grpc.io/client/started_rpcs | 1 | The total number of client RPCs ever opened, including those that have not completed. |
| grpc.io/client/sent_messages_per_method | 1 | Total messages sent per method. |
| grpc.io/client/received_messages_per_method | 1 | Total messages received per method. |
| grpc.io/client/sent_bytes_per_method | By | Total bytes sent per method, recorded real-time as bytes are sent. |
| grpc.io/client/received_bytes_per_method | By | Total bytes received per method, recorded real-time as bytes are received. |
### Tags
All client metrics should be tagged with the following.
| Tag name | Description |
|--------------------|------------------------------------------------------------------------------------------------------------------|
| grpc_client_method | Full gRPC method name, including package, service and method, e.g. google.bigtable.v2.Bigtable/CheckAndMutateRow |
| grpc_client_status | gRPC server status code received, e.g. OK, CANCELLED, DEADLINE_EXCEEDED |
`grpc_client_method` is set when an outgoing request starts and is available in all the recorded
metrics.
`grpc_client_status` is set when an outgoing request finishes and is only available around metrics
recorded at the end of the outgoing request.
Status codes should be stringified according to:
https://github.com/grpc/grpc/blob/master/doc/statuscodes.md
### Default views
The following set of views are considered minimum required to monitor client-side performance:
| View name | Measure | Aggregation | Tags |
|-------------------------------------------|-------------------------------------------|--------------|----------------------------------------|
| grpc.io/client/sent_bytes_per_rpc | grpc.io/client/sent_bytes_per_rpc | distribution | grpc_client_method |
| grpc.io/client/received_bytes_per_rpc | grpc.io/client/received_bytes_per_rpc | distribution | grpc_client_method |
| grpc.io/client/roundtrip_latency | grpc.io/client/roundtrip_latency | distribution | grpc_client_method |
| grpc.io/client/completed_rpcs | grpc.io/client/roundtrip_latency | count | grpc_client_method, grpc_client_status |
| grpc.io/client/started_rpcs | grpc.io/client/started_rpcs | count | grpc_client_method |
### Extra views
The following set of views are considered useful but not mandatory to monitor client side performance:
| View name | Measure | Aggregation | Tags |
|------------------------------------------|------------------------------------------|--------------|--------------------|
| grpc.io/client/sent_messages_per_rpc | grpc.io/client/sent_messages_per_rpc | distribution | grpc_client_method |
| grpc.io/client/received_messages_per_rpc | grpc.io/client/received_messages_per_rpc | distribution | grpc_client_method |
| grpc.io/client/server_latency | grpc.io/client/server_latency | distribution | grpc_client_method |
| grpc.io/client/sent_messages_per_method | grpc.io/client/sent_messages_per_method | count | grpc_client_method |
| grpc.io/client/received_messages_per_method | grpc.io/client/received_messages_per_method | count | grpc_client_method |
| grpc.io/client/sent_bytes_per_method | grpc.io/client/sent_bytes_per_method | sum | grpc_client_method |
| grpc.io/client/received_bytes_per_method | grpc.io/client/received_bytes_per_method | sum | grpc_client_method |
## Server
Server stats are recorded at the end of processing each RPC.
| Measure name | Unit | Description |
|-------------------------------------------|------|-----------------------------------------------------------------------------------------------|
| grpc.io/server/received_messages_per_rpc | 1 | Number of messages received in each RPC. Has value 1 for non-streaming RPCs. |
| grpc.io/server/received_bytes_per_rpc | By | Total bytes received across all messages per RPC. |
| grpc.io/server/sent_messages_per_rpc | 1 | Number of messages sent in each RPC. Has value 1 for non-streaming RPCs. |
| grpc.io/server/sent_bytes_per_rpc | By | Total bytes sent in across all response messages per RPC. |
| grpc.io/server/server_latency | ms | Time between first byte of request received to last byte of response sent, or terminal error. |
| grpc.io/server/started_rpcs | 1 | The total number of server RPCs ever opened, including those that have not completed. |
| grpc.io/server/sent_messages_per_method | 1 | Total messages sent per method. |
| grpc.io/server/received_messages_per_method | 1 | Total messages received per method. |
| grpc.io/server/sent_bytes_per_method | By | Total bytes sent per method, recorded real-time as bytes are sent. |
| grpc.io/server/received_bytes_per_method | By | Total bytes received per method, recorded real-time as bytes are received. |
### Tags
All server metrics should be tagged with the following.
| Tag name | Description |
|--------------------|----------------------------------------------------------------------------------------------------------------|
| grpc_server_method | Full gRPC method name, including package, service and method, e.g. com.exampleapi.v4.BookshelfService/Checkout |
| grpc_server_status | gRPC server status code returned, e.g. OK, CANCELLED, DEADLINE_EXCEEDED |
`grpc_server_method` is set when an incoming request starts and is available in the context for
the entire RPC call handling.
`grpc_server_status` is set when an incoming request finishes and is only available around metrics
recorded at the end of the incoming request.
Status codes should be stringified according to:
https://github.com/grpc/grpc/blob/master/doc/statuscodes.md
### Default views
The following set of views are considered minimum required to monitor server side performance:
| View name | Measure | Aggregation | Tags |
|-------------------------------------------|---------------------------------------|--------------|----------------------------------------|
| grpc.io/server/received_bytes_per_rpc | grpc.io/server/received_bytes_per_rpc | distribution | grpc_server_method |
| grpc.io/server/sent_bytes_per_rpc | grpc.io/server/sent_bytes_per_rpc | distribution | grpc_server_method |
| grpc.io/server/server_latency | grpc.io/server/server_latency | distribution | grpc_server_method |
| grpc.io/server/completed_rpcs | grpc.io/server/server_latency | count | grpc_server_method, grpc_server_status |
| grpc.io/server/started_rpcs | grpc.io/server/started_rpcs | count | grpc_server_method |
### Extra views
The following set of views are considered useful but not mandatory to monitor server side performance:
| View name | Measure | Aggregation | Tag |
|------------------------------------------|------------------------------------------|--------------|--------------------|
| grpc.io/server/received_messages_per_rpc | grpc.io/server/received_messages_per_rpc | distribution | grpc_server_method |
| grpc.io/server/sent_messages_per_rpc | grpc.io/server/sent_messages_per_rpc | distribution | grpc_server_method |
| grpc.io/server/sent_messages_per_method | grpc.io/server/sent_messages_per_method | count | grpc_server_method |
| grpc.io/server/received_messages_per_method | grpc.io/server/received_messages_per_method | count | grpc_server_method |
| grpc.io/server/sent_bytes_per_method | grpc.io/server/sent_bytes_per_method | sum | grpc_server_method |
| grpc.io/server/received_bytes_per_method | grpc.io/server/received_bytes_per_method | sum | grpc_server_method |
## FAQ
### Why different tag name for server/client method?
This way users can configure views to correlate incoming with outgoing requests. A view example:
| View name | Measure | Aggregation | Tag |
|-----------------------------------------|----------------------------------|--------------|----------------------------------------|
| grpc.io/client/latency_by_server_method | grpc.io/client/roundtrip_latency | distribution | grpc_client_method, grpc_server_method |
### How is the server latency on the client recorded (grcp.io/client/server_latency)?
This is TBD, eventually a designated gRPC metadata key will be specified for this purpose.
### Why no error counts?
Error counts can be computed on your metrics backend by totalling the different per-status values.
### Why are ".../completed_rpcs" views defined over latency measures?
They can be defined over any measure recorded once per RPC (since it's just a count aggregation over the measure).
It would be unnecessary to use a separate "count" measure.
### Why are "*_per_method" views not default?
These views are useful for real-time reporting for streaming RPCs. However, for unary calls
they are not particularly useful, and data volume for these views could be huge compared to
default views. Only enable these views if you are using streaming RPCs and want real-time
metrics.

View File

@ -0,0 +1,274 @@
[
{
"name": "Successful GET call to https://example.com",
"method": "GET",
"url": "https://example.com/",
"spanName": "/",
"spanStatus": "OK",
"spanKind": "Client",
"spanAttributes": {
"http.path": "/",
"http.method": "GET",
"http.host": "example.com",
"http.status_code": "200",
"http.url": "https://example.com/"
}
},
{
"name": "Successfully POST call to https://example.com",
"method": "POST",
"url": "https://example.com/",
"spanName": "/",
"spanStatus": "OK",
"spanKind": "Client",
"spanAttributes": {
"http.path": "/",
"http.method": "POST",
"http.host": "example.com",
"http.status_code": "200",
"http.url": "https://example.com/"
}
},
{
"name": "Name is populated as a path",
"method": "GET",
"url": "http://{host}:{port}/path/to/resource/",
"responseCode": 200,
"spanName": "/path/to/resource/",
"spanStatus": "OK",
"spanKind": "Client",
"spanAttributes": {
"http.path": "/path/to/resource/",
"http.method": "GET",
"http.host": "{host}:{port}",
"http.status_code": "200",
"http.url": "http://{host}:{port}/path/to/resource/"
}
},
{
"name": "Call that cannot resolve DNS will be reported as error span",
"method": "GET",
"url": "https://sdlfaldfjalkdfjlkajdflkajlsdjf.sdlkjafsdjfalfadslkf.com/",
"spanName": "/",
"spanStatus": "UNKNOWN",
"spanKind": "Client",
"spanAttributes": {
"http.path": "/",
"http.method": "GET",
"http.host": "sdlfaldfjalkdfjlkajdflkajlsdjf.sdlkjafsdjfalfadslkf.com",
"http.url": "https://sdlfaldfjalkdfjlkajdflkajlsdjf.sdlkjafsdjfalfadslkf.com/"
}
},
{
"name": "Response code: 199. This test case is not possible to implement on some platforms as they don't allow to return this status code. Keeping this test case for visibility, but it actually simply a fallback into 200 test case",
"method": "GET",
"url": "http://{host}:{port}/",
"responseCode": 200,
"spanName": "/",
"spanStatus": "OK",
"spanKind": "Client",
"spanAttributes": {
"http.path": "/",
"http.method": "GET",
"http.host": "{host}:{port}",
"http.status_code": "200",
"http.url": "http://{host}:{port}/"
}
},
{
"name": "Response code: 200",
"method": "GET",
"url": "http://{host}:{port}/",
"responseCode": 200,
"spanName": "/",
"spanStatus": "OK",
"spanKind": "Client",
"spanAttributes": {
"http.path": "/",
"http.method": "GET",
"http.host": "{host}:{port}",
"http.status_code": "200",
"http.url": "http://{host}:{port}/"
}
},
{
"name": "Response code: 399",
"method": "GET",
"url": "http://{host}:{port}/",
"responseCode": 399,
"spanName": "/",
"spanStatus": "OK",
"spanKind": "Client",
"spanAttributes": {
"http.path": "/",
"http.method": "GET",
"http.host": "{host}:{port}",
"http.status_code": "399",
"http.url": "http://{host}:{port}/"
}
},
{
"name": "Response code: 400",
"method": "GET",
"url": "http://{host}:{port}/",
"responseCode": 400,
"spanName": "/",
"spanStatus": "INVALID_ARGUMENT",
"spanKind": "Client",
"spanAttributes": {
"http.path": "/",
"http.method": "GET",
"http.host": "{host}:{port}",
"http.status_code": "400",
"http.url": "http://{host}:{port}/"
}
},
{
"name": "Response code: 401",
"method": "GET",
"url": "http://{host}:{port}/",
"responseCode": 401,
"spanName": "/",
"spanStatus": "UNAUTHENTICATED",
"spanKind": "Client",
"spanAttributes": {
"http.path": "/",
"http.method": "GET",
"http.host": "{host}:{port}",
"http.status_code": "401",
"http.url": "http://{host}:{port}/"
}
},
{
"name": "Response code: 403",
"method": "GET",
"url": "http://{host}:{port}/",
"responseCode": 403,
"spanName": "/",
"spanStatus": "PERMISSION_DENIED",
"spanKind": "Client",
"spanAttributes": {
"http.path": "/",
"http.method": "GET",
"http.host": "{host}:{port}",
"http.status_code": "403",
"http.url": "http://{host}:{port}/"
}
},
{
"name": "Response code: 404",
"method": "GET",
"url": "http://{host}:{port}/",
"responseCode": 404,
"spanName": "/",
"spanStatus": "NOT_FOUND",
"spanKind": "Client",
"spanAttributes": {
"http.path": "/",
"http.method": "GET",
"http.host": "{host}:{port}",
"http.status_code": "404",
"http.url": "http://{host}:{port}/"
}
},
{
"name": "Response code: 429",
"method": "GET",
"url": "http://{host}:{port}/",
"responseCode": 429,
"spanName": "/",
"spanStatus": "RESOURCE_EXHAUSTED",
"spanKind": "Client",
"spanAttributes": {
"http.path": "/",
"http.method": "GET",
"http.host": "{host}:{port}",
"http.status_code": "429",
"http.url": "http://{host}:{port}/"
}
},
{
"name": "Response code: 501",
"method": "GET",
"url": "http://{host}:{port}/",
"responseCode": 501,
"spanName": "/",
"spanStatus": "UNIMPLEMENTED",
"spanKind": "Client",
"spanAttributes": {
"http.path": "/",
"http.method": "GET",
"http.host": "{host}:{port}",
"http.status_code": "501",
"http.url": "http://{host}:{port}/"
}
},
{
"name": "Response code: 503",
"method": "GET",
"url": "http://{host}:{port}/",
"responseCode": 503,
"spanName": "/",
"spanStatus": "UNAVAILABLE",
"spanKind": "Client",
"spanAttributes": {
"http.path": "/",
"http.method": "GET",
"http.host": "{host}:{port}",
"http.status_code": "503",
"http.url": "http://{host}:{port}/"
}
},
{
"name": "Response code: 504",
"method": "GET",
"url": "http://{host}:{port}/",
"responseCode": 504,
"spanName": "/",
"spanStatus": "DEADLINE_EXCEEDED",
"spanKind": "Client",
"spanAttributes": {
"http.path": "/",
"http.method": "GET",
"http.host": "{host}:{port}",
"http.status_code": "504",
"http.url": "http://{host}:{port}/"
}
},
{
"name": "Response code: 600",
"method": "GET",
"url": "http://{host}:{port}/",
"responseCode": 600,
"spanName": "/",
"spanStatus": "UNKNOWN",
"spanKind": "Client",
"spanAttributes": {
"http.path": "/",
"http.method": "GET",
"http.host": "{host}:{port}",
"http.status_code": "600",
"http.url": "http://{host}:{port}/"
}
},
{
"name": "User agent attribute populated",
"method": "GET",
"url": "http://{host}:{port}/",
"headers": {
"User-Agent": "test-user-agent"
},
"responseCode": 200,
"spanName": "/",
"spanStatus": "OK",
"spanKind": "Client",
"spanAttributes": {
"http.path": "/",
"http.method": "GET",
"http.host": "{host}:{port}",
"http.status_code": "200",
"http.user_agent": "test-user-agent",
"http.url": "http://{host}:{port}/"
}
}
]

View File

@ -0,0 +1,79 @@
# Semantic Conventions
The [OpenTracing Specification](./specification.md) describes the overarching language-neutral data model and API guidelines for OpenTracing. That data model includes the related concepts of **Span Tags** and **(structured) Log Fields**; though these terms are defined in the specification, there is no guidance there about standard Span tags or logging keys.
Those semantic conventions are described by this document. The document is divided into two sections: first, tables listing all standard Span tags and logging keys; then guidance about how to combine these to model certain important semantic concepts.
### Versioning
Changes to this file affect the OpenTracing specification version. Additions should bump the minor version, and backwards-incompatible changes (or perhaps very large additions) should bump the major version.
## Standard Span tags and log fields
### Span tags table
Span tags apply to **the entire Span**; as such, they apply to the entire timerange of the Span, not a particular moment with a particular timestamp: those sorts of events are best modelled as Span log fields (per the table in the next subsection of this document).
| Span tag name | Type | Notes and examples |
|:--------------|:-----|:-------------------|
| `component` | string | The software package, framework, library, or module that generated the associated Span. E.g., `"grpc"`, `"django"`, `"JDBI"`. |
| `http.method` | string | HTTP method of the request for the associated Span. E.g., `"GET"`, `"POST"` |
| `http.status_code` | integer | HTTP response status code for the associated Span. E.g., 200, 503, 404 |
| `http.url` | string | URL of the request being handled in this segment of the trace, in standard URI format. E.g., `"https://domain.net/path/to?resource=here"` |
| `message_bus.destination` | string | An address at which messages can be exchanged. E.g. A Kafka record has an associated `"topic name"` that can be extracted by the instrumented producer or consumer and stored using this tag. |
| `peer.address` | string | Remote "address", suitable for use in a networking client library. This may be a `"ip:port"`, a bare `"hostname"`, a FQDN or various connection strings |
| `peer.hostname` | string | Remote hostname. E.g., `"opentracing.io"`, `"internal.dns.name"` |
| `peer.ipv4` | string | Remote IPv4 address as a `.`-separated tuple. E.g., `"127.0.0.1"` |
| `peer.ipv6` | string | Remote IPv6 address as a string of colon-separated 4-char hex tuples. E.g., `"2001:0db8:85a3:0000:0000:8a2e:0370:7334"` |
| `peer.port` | integer | Remote port. E.g., `80` |
| `peer.service` | string | Remote service name (for some unspecified definition of `"service"`). E.g., `"elasticsearch"`, `"a_custom_microservice"`, `"memcache"` |
| `sampling.priority` | integer | If greater than 0, a hint to the Tracer to do its best to capture the trace. If 0, a hint to the trace to not-capture the trace. If absent, the Tracer should use its default sampling mechanism. |
### Log fields table
Every Span log has a specific timestamp (which must fall between the start and finish timestamps of the Span, inclusive) and one or more **fields**. What follows are the standard fields.
| Span log field name | Type | Notes and examples |
|:--------------------|:--------|:-------------------|
| `error.kind` | string | The type or "kind" of an error (only for `event="error"` logs). E.g., `"Exception"`, `"OSError"` |
| `error.object` | object | For languages that support such a thing (e.g., Java, Python), the actual Throwable/Exception/Error object instance itself. E.g., A `java.lang.UnsupportedOperationException` instance, a python `exceptions.NameError` instance |
| `event` | string | A stable identifier for some notable moment in the lifetime of a Span. For instance, a mutex lock acquisition or release or the sorts of lifetime events in a browser page load described in the [Performance.timing](https://developer.mozilla.org/en-US/docs/Web/API/PerformanceTiming) specification. E.g., from [Zipkin](https://zipkin.io/pages/instrumenting.html#core-data-structures), `"cs"`, `"sr"`, `"ss"`, or `"cr"`. Or, more generally, `"initialized"` or `"timed out"`. For errors, `"error"` |
| `message` | string | A concise, human-readable, one-line message explaining the event. E.g., `"Could not connect to backend"`, `"Cache invalidation succeeded"` |
| `stack` | string | A stack trace in platform-conventional format; may or may not pertain to an error. E.g., `"File \"example.py\", line 7, in \<module\>\ncaller()\nFile \"example.py\", line 5, in caller\ncallee()\nFile \"example.py\", line 2, in callee\nraise Exception(\"Yikes\")\n"` |
## Modelling special circumstances
### RPCs
The following Span tags combine to model RPCs:
- `span.kind`: either `"client"` or `"server"`. It is important to provide this tag **at Span start time**, as it may affect internal ID generation.
- `peer.address`, `peer.hostname`, `peer.ipv4`, `peer.ipv6`, `peer.port`, `peer.service`: optional tags that describe the RPC peer (often in ways it cannot assess internally)
### Message Bus
A message bus is asynchronous, and therefore the relationship type used to link a Consumer Span and a Producer Span would be **Follows From** (see [References between Spans](./specification.md#references-between-spans) for more information on relationship types).
The following Span tags combine to model message bus based communications:
- `message_bus.destination`: as described in the table above
- `span.kind`: either `"producer"` or `"consumer"`. It is important to provide this tag **at Span start time**, as it may affect internal ID generation.
- `peer.address`, `peer.hostname`, `peer.ipv4`, `peer.ipv6`, `peer.port`, `peer.service`: optional tags that describe the message bus broker (often in ways it cannot assess internally)
### Captured errors
Errors may be described by OpenTracing in different ways, largely depending on the language. Some of these descriptive fields are specific to errors; others are not (e.g., the `event` or `message` fields).
For languages where an error object encapsulates a stack trace and type information, log the following fields:
- event=`"error"`
- error.object=`<error object instance>`
For other languages, or when above is not feasible:
- event=`"error"`
- message=`"..."`
- stack=`"..."` (optional)
- error.kind=`"..."` (optional)
This scheme allows Tracer implementations to extract what information they need from the actual error object when it's available.