Observability concept page update

This commit is contained in:
Ori Zohar 2021-02-15 15:58:03 -08:00
commit e23648842b
8 changed files with 23 additions and 42 deletions

View File

@ -4,60 +4,41 @@ title: "Observability"
linkTitle: "Observability"
weight: 500
description: >
How to monitor applications through tracing, metrics, logs and health
Monitor applications through tracing, metrics, logs and health
---
Observability is a term from control theory. Observability means you can answer questions about what's happening on the inside of a system by observing the outside of the system, without having to ship new code to answer new questions. Observability is critical in production environments and services to debug, operate and monitor Dapr system services, components and user applications.
When building an applications, understanding how the system is behaving is an important part of operating it - this includes having the ability to observe the internal calls of an application, gauging its performance and becoming aware of problems as soon as they occur. This is challenging for any system but even more so for a distributed system comprised of multiple microservices where a flow, made of several calls, may start in one microservices but continue in another. Observability is critical in production environments but also useful during development to understand bottlenecks, improve performance and perform basic debugging across the span of microservices.
The observability capabilities enable users to monitor the Dapr system services, their interaction with user applications and understand how these monitored services behave. The observability capabilities are divided into the following areas;
While some data points about an application can be gathered from the underlying infrastructure (e.g. memory consumption, CPU usage), other meaningful information must be collected from an "application aware" layer - one that can show how an important series of calls is executed across microservices. This usually means a developer must add some code to instrument an application for this purpose. Often, instrumentation code is simply meant to send collected data such as traces and metrics to an external monitoring tool or service that can help store, visualize and analyze all this information.
## Distributed tracing
Having to maintain this code, which is not part of the core logic of the application, is another burden on the developer, sometimes requiring understanding monitoring tools APIs, using additional SDKs etc. This instrumentation may also add to the portability challenges of an application which may require different instrumentation depending on where the application is deployed. For example, different cloud providers offer different monitoring solutions and an on-prem deployment might require an on-prem solution.
[Distributed tracing]({{<ref "tracing.md">}}) is used to profile and monitor Dapr system services and user apps. Distributed tracing helps pinpoint where failures occur and what causes poor performance. Distributed tracing is particularly well-suited to debugging and monitoring distributed software architectures, such as microservices.
## Observability for your application with Dapr
When building an application which is leveraging Dapr building blocks to perform service-to-service calls and pub/sub messaging, Dapr offers an advantage in respect to [distributed tracing]({{<ref tracing>}}) because this inter-service communication flows through the Dapr sidecar, the sidecar is in a unique position to offload the burden of application level instrumentation.
You can use distributed tracing to help debug and optimize application code. Distributed tracing contains trace spans between the Dapr runtime, Dapr system services, and user apps across process, nodes, network, and security boundaries. It provides a detailed understanding of service invocations (call flows) and service dependencies.
### Distributed tracing
Dapr can be [configured to emit tracing data]({{<ref setup-tracing.md>}}), and because Dapr does so using widely adopted protocols such as the [Zipkin](https://zipkin.io) protocol, it can be easily integrated with multiple [monitoring backends]({{<ref supported-tracing-backends>}}).
Dapr uses [W3C tracing context for distributed tracing]({{<ref w3c-tracing>}})
<img src="/images/observability-tracing.png" width=1000 alt="Distributed tracing with Dapr">
It is generally recommended to run Dapr in production with tracing.
### OpenTelemetry collector
Dapr can also be configured to work with the [OpenTelemetry Collector]({{<ref open-telemetry-collector>}}) which offers even more compatibility with external monitoring tools.
### Open Telemetry
<img src="/images/observability-opentelemetry-collector.png" width=1000 alt="Distributed tracing via OpenTelemetry collector">
Dapr integrates with [OpenTelemetry](https://opentelemetry.io/) for tracing, metrics and logs. With OpenTelemetry, you can configure various exporters for tracing and metrics based on your environment, whether it is running in the cloud or on-premises.
### Tracing context
Dapr uses [W3C tracing]({{<ref w3c-tracing>}}) specification for tracing context and can generate and propagate the context header itself or propagate user provided context headers.
#### Next steps
## Observability for the Dapr sidecar and system services
As for other parts of your system, you will want to be able to observe Dapr itself and collect metrics and logs emitted by the Dapr sidecar that runs along each microservice as well as the Dapr related services in your environment such as the control plane services that are deployed for a Dapr enabled Kubernetes cluster.
- [How-To: Set up Zipkin]({{< ref zipkin.md >}})
- [How-To: Set up Application Insights with Open Telemetry Collector]({{< ref open-telemetry-collector.md >}})
<img src="/images/observability-sidecar.png" width=1000 alt="Dapr sidecar metrics, logs and health checks">
## Metrics
### Logging
Dapr generates [logs]({{<ref "logs.md">}}) to provide visibility into sidecar operation and to help users identify issues and perform debugging. Log events contain warning, error, info, and debug messages produced by Dapr system services. Dapr can also be configured to send logs to collectors such as [Fluentd]({{< ref fluentd.md >}}) and [Azure Monitor]({{< ref azure-monitor.md >}}) so they can be easily searched, analyzed and provide insights.
[Metrics]({{<ref "metrics.md">}}) are the series of measured values and counts that are collected and stored over time. Dapr metrics provide monitoring and understanding of the behavior of Dapr system services and user apps.
### Metrics
Metrics are the series of measured values and counts that are collected and stored over time. [Dapr metrics]({{<ref "metrics">}}) provide monitoring capabilities to understand the behavior of the Dapr sidecar and system services. For example, the metrics between a Dapr sidecar and the user application show call latency, traffic failures, error rates of requests etc. Dapr [system services metrics](https://github.com/dapr/dapr/blob/master/docs/development/dapr-metrics.md) show sidecar injection failures, health of the system services including CPU usage, number of actor placements made etc.
For example, the service metrics between Dapr sidecars and user apps show call latency, traffic failures, error rates of requests etc.
Dapr system services metrics show side car injection failures, health of the system services including CPU usage, number of actor placement made etc.
#### Next steps
- [How-To: Set up Prometheus and Grafana]({{< ref prometheus.md >}})
- [How-To: Set up Azure Monitor]({{< ref azure-monitor.md >}})
## Logs
[Logs]({{<ref "logs.md">}}) are records of events that occur and can be used to determine failures or another status.
Logs events contain warning, error, info, and debug messages produced by Dapr system services. Each log event includes metadata such as message type, hostname, component name, App ID, ip address, etc.
#### Next steps
- [How-To: Set up Fluentd, Elastic search and Kibana in Kubernetes]({{< ref fluentd.md >}})
- [How-To: Set up Azure Monitor]({{< ref azure-monitor.md >}})
## Health
Dapr provides a way for a hosting platform to determine its [Health]({{<ref "sidecar-health.md">}}) using an HTTP endpoint. With this endpoint, the Dapr process, or sidecar, can be probed to determine its readiness and liveness and action taken accordingly.
#### Next steps
- [Health API]({{< ref health_api.md >}})
### Health checks
The Dapr sidecar exposes an HTTP endpoint for [health checks]({{<ref sidecar-health.md>}}). With this API, user code or hosting environments can probe the Dapr sidecar to determine its status and identify issues with sidecar readiness.

Binary file not shown.

After

Width:  |  Height:  |  Size: 76 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 50 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 83 KiB