Application log observability docs (#1026)
* finish Signed-off-by: 楚岳 <wangyike.wyk@alibaba-inc.com> * small fix Signed-off-by: 楚岳 <wangyike.wyk@alibaba-inc.com> Signed-off-by: 楚岳 <wangyike.wyk@alibaba-inc.com>
This commit is contained in:
parent
49257154b5
commit
f365f6aac7
|
|
@ -0,0 +1,229 @@
|
|||
---
|
||||
title: Application log collecting
|
||||
---
|
||||
|
||||
The application's log is very important to helps users quickly find and troubleshoot problems of online. This doc will introduce how to use `loki` addon to collecting application's log and viewing them.
|
||||
|
||||
## Quick start
|
||||
|
||||
The loki addon can be enabled by two modes: Collecting all applications' log by default or Collecting by specify.
|
||||
|
||||
### Enable the loki addon
|
||||
|
||||
#### Enable addon with mode of collecting all logs (Full collection mode)
|
||||
|
||||
```shell
|
||||
vela addon enable loki agent=vector
|
||||
```
|
||||
|
||||
After the addon is enabled, a [loki](https://github.com/grafana/loki) service will be deployed in the control plane as a log storage data source, and a log collection agent [vector](http://vector.dev/) will be deployed on the nodes of each cluster to collect the stdout of pods on the nodes.
|
||||
|
||||
:::tip
|
||||
Enabling loki addon by this mode. The stdout of the application can be collected without configuring any traits, and the logs can be transmitted to the loki service of the control plane to storage. The most advantage of this mode is that the configuration is simple.
|
||||
But the disadvantage are:
|
||||
1. Collecting all running pods will cause a lot of pressure on the loki service when there are too many applications. On the one hand, too many logs need to be persisted, wasting a lot of hard disk resources. On the other hand, the vector agents of each cluster need to transmit the collected logs to the loki service, which will consume a lot of bandwidth.
|
||||
2. The full collection mode can only collect logs in a unified way, and cannot perform special processing on the log content of different applications.
|
||||
:::
|
||||
|
||||
#### Collecting by specify
|
||||
|
||||
Enable loki addon by setting parameter `agent=vector-controller`:
|
||||
|
||||
```shell
|
||||
vela addon enable loki agent=vector-controller
|
||||
```
|
||||
|
||||
:::tip
|
||||
After specifying the `agent=vector-controller` parameter to enable the addon, the application's logs will not be collected by default, and the application needs to configure special trait to enable it. And this trait supports configuring the vector VRL configuration to perform specific processing on the log content. The following chapters will introduce this part in detail.
|
||||
:::
|
||||
|
||||
### Enable the grafana addon
|
||||
|
||||
```shell
|
||||
vela addon enable grafana
|
||||
```
|
||||
|
||||
:::tip
|
||||
Even if you have enabled the grafana addon as described in the [Automation Observability Documentation](./observability), you still need to re-enable the addon to register the loki data source to grafana.
|
||||
:::
|
||||
|
||||
## Kubernetes system events logs
|
||||
|
||||
After the loki addon is enabled, a component will be installed in each cluster, which is responsible for collecting Kubernetes events and converting them to logs transmit to loki. You can also view and analyze the events of the system through the Kubernetes events dashboard in the grafana addon.
|
||||
|
||||

|
||||
|
||||
<details>
|
||||
KubeVela Events dashboard
|
||||
|
||||
---
|
||||
|
||||
**Kubernetes Event overview** Displays the number of latest Kubernetes events in each time period of the system.
|
||||
|
||||
---
|
||||
|
||||
**Warning Events** The number of `Warning` type events.
|
||||
|
||||
---
|
||||
|
||||
**Image Pull Failed/Container Crashed .../Pod Evicted** The number of events that indicate application failures, such as image pull failure and pod eviction in the last 12 hours.
|
||||
|
||||
---
|
||||
|
||||
**TOP 10 Kubernetes Events** Top 10 types of events with the highest number of occurrences in the last 12 hours.
|
||||
|
||||
---
|
||||
|
||||
**Kubernetes Events Source** Pie chart of the controllers producing these events.
|
||||
|
||||
---
|
||||
|
||||
**Kubernetes Events Type** Pie chart of involved resource object types of events.
|
||||
|
||||
---
|
||||
|
||||
**Kubernetes Live Events ** The recent event logs.
|
||||
|
||||
</details>
|
||||
|
||||
## Collecting stdout log
|
||||
|
||||
As mentioned above, if the full collection mode is selected when the addon is enabled, the of stdout logs of the container can be collected without any special configuration of the application. This section mainly introduces how to collect stdout logs when the mode is `collecting by specify`.
|
||||
|
||||
Configure the `stdout-logs-collector` trait in the component, as follows:
|
||||
|
||||
```yaml
|
||||
apiVersion: core.oam.dev/v1beta1
|
||||
kind: Application
|
||||
metadata:
|
||||
name: app-collect-stdout
|
||||
namespace: default
|
||||
spec:
|
||||
components:
|
||||
- name: flog
|
||||
type: webservice
|
||||
properties:
|
||||
cmd:
|
||||
- flog
|
||||
image: mingrammer/flog
|
||||
traits:
|
||||
type: stdout-logs-collector
|
||||
```
|
||||
|
||||
After the application is created, you can find the deployment resource created by the application in the application dashboard of grafana, click `Detail` button to jump to the deployment resource dashboard, and find the log data below. as follows:
|
||||
|
||||

|
||||
|
||||
### nginx access log analyze
|
||||
|
||||
If your application is a nginx gateway, the `stdout-logs-collector` trait provide the capability to parse nginx [combined](https://docs.nginx.com/nginx/admin-guide/monitoring/logging/) format log to json format. As follows:
|
||||
|
||||
```yaml
|
||||
apiVersion: core.oam.dev/v1beta1
|
||||
kind: Application
|
||||
metadata:
|
||||
name: nginx-app-2
|
||||
spec:
|
||||
components:
|
||||
- name: nginx-comp
|
||||
type: webservice
|
||||
properties:
|
||||
image: nginx:1.14.2
|
||||
ports:
|
||||
- port: 80
|
||||
expose: true
|
||||
traits:
|
||||
- type: stdout-logs-collector
|
||||
properties:
|
||||
parser: nginx
|
||||
```
|
||||
|
||||
Then a special nginx access log analysis dashboard will be generated as follows:
|
||||
|
||||

|
||||
|
||||
<details>
|
||||
KubeVela nginx application dashboard
|
||||
|
||||
---
|
||||
|
||||
**KPI's** Contains the gateway's core key metrics, such as total request traffic in the last twelve hours, and percentage of 5xx requests.
|
||||
|
||||
---
|
||||
|
||||
**HTTP status statistic** Statistics of the number of requests for each request code.
|
||||
|
||||
---
|
||||
|
||||
**Top Request Pages** Statistics of the most visited pages.
|
||||
|
||||
|
||||
</details>
|
||||
|
||||
### Customize log processing configuration
|
||||
|
||||
You can also set customize parse configuration for your application log in this trait. As follows:
|
||||
|
||||
```yaml
|
||||
apiVersion: core.oam.dev/v1beta1
|
||||
kind: Application
|
||||
metadata:
|
||||
name: nginx-app-2
|
||||
spec:
|
||||
components:
|
||||
- name: nginx-comp
|
||||
type: webservice
|
||||
properties:
|
||||
image: nginx:1.14.2
|
||||
ports:
|
||||
- port: 80
|
||||
expose: true
|
||||
traits:
|
||||
- type: stdout-logs-collector
|
||||
properties:
|
||||
parser: customize
|
||||
VRL: |
|
||||
.message = parse_nginx_log!(.message, "combined")
|
||||
.new_field = "new value"
|
||||
```
|
||||
|
||||
In this example, we transform nginx `combinded` format logs to json format, and adding a `new_field` json key to each log, the json value is `new value`. Please refer to [document](https://vector.dev/docs/reference/vrl/) for how to write vector VRL.
|
||||
|
||||
If you have a special log analysis dashboard for this processing method, you can refer to [document](./observability) to import it into grafana.
|
||||
|
||||
## Collecting file log
|
||||
|
||||
The loki addon also support to collect file logs of containers. As follows:
|
||||
|
||||
```yaml
|
||||
apiVersion: core.oam.dev/v1beta1
|
||||
kind: Application
|
||||
metadata:
|
||||
name: app-vector
|
||||
namespace: default
|
||||
spec:
|
||||
components:
|
||||
- name: my-biz
|
||||
type: webservice
|
||||
properties:
|
||||
cmd:
|
||||
- flog
|
||||
- -t
|
||||
- log
|
||||
- -o
|
||||
- /data/daily.log
|
||||
- -d
|
||||
- 10s
|
||||
- -w
|
||||
image: mingrammer/flog
|
||||
traits:
|
||||
- properties:
|
||||
path: "/data/daily.log"
|
||||
type: file-logs-collector
|
||||
```
|
||||
|
||||
In the example, we let business log of the `my-biz` component write to the `/data/daily.log` path in the container. After the application is created, you can view the corresponding file log results through the `deployment` dashboard.
|
||||
|
||||
:::tip
|
||||
It should be noted that the logs that need to be collected mustn't is in the root directory of the container, otherwise it may cause the container to fail to start.
|
||||
:::
|
||||
Binary file not shown.
|
After Width: | Height: | Size: 632 KiB |
Binary file not shown.
|
After Width: | Height: | Size: 685 KiB |
Binary file not shown.
|
After Width: | Height: | Size: 689 KiB |
|
|
@ -0,0 +1,238 @@
|
|||
---
|
||||
title: 应用日志可观测
|
||||
---
|
||||
|
||||
应用日志对于发现和排查线上问题至关重要,KubeVela 提供了专门的日志收集插件,帮助用户快速地构建应用的日志可观测的能力。本文档将介绍如何对应用日志进行采集,并在 grafana 大盘中对日志进行查看和分析。
|
||||
|
||||
## 快速开始
|
||||
|
||||
日志收集插件可以通过默认全部应用采集和指定应用采集两种方式进行启用。
|
||||
|
||||
### 启用日志收集插件
|
||||
|
||||
#### 默认全部采集模式(全采模式)
|
||||
|
||||
```shell
|
||||
vela addon enable loki agent=vector
|
||||
```
|
||||
|
||||
启用该插件后会在管控集群部署一个 [loki](https://github.com/grafana/loki) 服务作为日志存储数据源,并会在各个集群的节点上部署日志采集 agent [vector](https://vector.dev/) 用以采集宿主机上实例的标准输出日志。
|
||||
|
||||
:::tip
|
||||
这种方式启用的日志收集服务。不需要应用配置任何运维特征,即可对应用的标准输出日志进行采集,并将日志到汇总到控集群的 loki 服务当中。优点是配置简单。
|
||||
而这种方式的缺点是:
|
||||
1. 对所有运行的容器进行采集,当应用很多时会对运行在管控集群的 loki 服务造成很大的压力。一方面太多的日志需要被持久化,占用大量硬盘资源。另一方面各个集群的 vector agent 都需要把采集到的日志传输至 loki 服务,会消耗大量系统带宽。
|
||||
2. 全采模式只能以统一的方式对日志进行收集,无法对不同应用的日志内容做特殊的处理。
|
||||
:::
|
||||
|
||||
#### 定向采集模式
|
||||
|
||||
指定 `agent=vector-controller` 参数启动 loki 插件。
|
||||
|
||||
```shell
|
||||
vela addon enable loki agent=vector-controller
|
||||
```
|
||||
|
||||
:::tip
|
||||
指定 `agent=vector-controller` 参数启动日志收集插件之后,默认不会对应用的日志进行采集,需要应用配置专门的运维特征来开启。并且该运维特征支持配置 vector VRL 处理脚本对日志内容做特定处理,下面的章节将详细介绍这部分内容。
|
||||
:::
|
||||
|
||||
|
||||
### 启用 grafana 插件
|
||||
|
||||
```shell
|
||||
vela addon enable grafana
|
||||
```
|
||||
|
||||
:::tip
|
||||
即使你已经按照 [自动化可观测性文档](./observability) 的介绍启用过了 grafana 插件,仍需要重新启用一次 grafana 插件从而让 loki 插件的数据源注册到 grafana 中。
|
||||
:::
|
||||
|
||||
## Kubernetes 系统事件日志
|
||||
|
||||
loki 插件开启后会在各个集群装中安装一个专门的组件,负责采集各个集群中的 Kubernetes 事件并转换成日志的形式存储在 loki 中。你还可以通过 grafana 插件中专门的 Kubernetes 事件分析大盘对系统的事件进行汇总分析。
|
||||
|
||||

|
||||
|
||||
<details>
|
||||
KubeVela Events dashboard 系统中各个集群的 Kubernetes 事件日志
|
||||
|
||||
---
|
||||
|
||||
**Kubernetes Event overview** 以时间为维度,展示系统中各个时间段内新增的 Kubernetes 事件数目。
|
||||
|
||||
---
|
||||
|
||||
**Warning Events** 统计系统中出现 `Warning` 类型的事件数目。
|
||||
|
||||
---
|
||||
|
||||
**Image Pull Failed/Container Crashed .../Pod Evicted** 统计最近十二小时内,镜像拉取失败,实例被驱逐等各类标志应用失败的事件个数。
|
||||
|
||||
---
|
||||
|
||||
**TOP 10 Kubernetes Events** 统计系统中最近十二小时内出现次数最高的十类事件
|
||||
|
||||
---
|
||||
|
||||
**Kubernetes Events Source** 产生这些事件的控制器分布的饼状图。
|
||||
|
||||
---
|
||||
|
||||
**Kubernetes Events Type** 与事件相关的资源对象类型的分布饼状图。
|
||||
|
||||
---
|
||||
|
||||
**Kubernetes Live Events ** 最近的事件日志。
|
||||
|
||||
</details>
|
||||
|
||||
## 应用标准输出日志
|
||||
|
||||
上面已经提到如果在启用插件时选择的是全采模式,不需要应用做任何特殊配置,即可完成对容器的标准输出日志的采集。本节主要介绍,日志收集插件开启定向采集的模式下如何完成标准输出日志的采集。
|
||||
|
||||
在组件中配置 `stdout-logs-collector` 运维特征完成对组件容器日志的收集,如下所示:
|
||||
|
||||
```yaml
|
||||
apiVersion: core.oam.dev/v1beta1
|
||||
kind: Application
|
||||
metadata:
|
||||
name: app-collect-stdout
|
||||
namespace: default
|
||||
spec:
|
||||
components:
|
||||
- name: flog
|
||||
type: webservice
|
||||
properties:
|
||||
cmd:
|
||||
- flog
|
||||
image: mingrammer/flog
|
||||
traits:
|
||||
type: stdout-logs-collector
|
||||
```
|
||||
|
||||
应用创建之后你可以在对应 grafana 应用大盘中找到该应用创建的 deployment 资源,从而点击跳转到 deployment 资源大盘,并在下面找到采集上来的日志数据。如下:
|
||||
|
||||

|
||||
|
||||
### nginx 网关日志日志分析
|
||||
|
||||
如果你的应用是一个 nginx 网关应用,`stdout-logs-collector` 运维特征所提供的 parser 能力可以将 nginx 日志输出 [combined](https://docs.nginx.com/nginx/admin-guide/monitoring/logging/) 格式的日志文件转换成 json 格式,并提供专门的分析大盘对 nginx 的网关访问请求进行进一步的分析。如下:
|
||||
|
||||
```yaml
|
||||
apiVersion: core.oam.dev/v1beta1
|
||||
kind: Application
|
||||
metadata:
|
||||
name: nginx-app-2
|
||||
spec:
|
||||
components:
|
||||
- name: nginx-comp
|
||||
type: webservice
|
||||
properties:
|
||||
image: nginx:1.14.2
|
||||
ports:
|
||||
- port: 80
|
||||
expose: true
|
||||
traits:
|
||||
- type: stdout-logs-collector
|
||||
properties:
|
||||
parser: nginx
|
||||
```
|
||||
|
||||
我们可以通过 grafana 中的应用大盘跳转到专门的 nginx 日志分析大盘。如下:
|
||||
|
||||

|
||||
|
||||
<details>
|
||||
KubeVela nginx application dashboard nginx 网关应用的访问日志分析大盘
|
||||
|
||||
---
|
||||
|
||||
**KPI's** 包含网关的核心关键指标,例如,最近十二小时的总请求访问量,和 5xx 请求的百分占比。
|
||||
|
||||
---
|
||||
|
||||
**HTTP status statistic** 时间维度上网关的各个请求码的请求数量统计。
|
||||
|
||||
---
|
||||
|
||||
**Top Request Pages** 被访问最多的页面统计。
|
||||
|
||||
|
||||
</details>
|
||||
|
||||
### 自定义日志处理脚本
|
||||
|
||||
除了使用通过在运维特种中设定参数 `parser: nginx` 对日志内容做处理,你还可以通过设置自定义的日志处理脚本对日志做自定义的处理。如下:
|
||||
|
||||
```yaml
|
||||
apiVersion: core.oam.dev/v1beta1
|
||||
kind: Application
|
||||
metadata:
|
||||
name: nginx-app-2
|
||||
spec:
|
||||
components:
|
||||
- name: nginx-comp
|
||||
type: webservice
|
||||
properties:
|
||||
image: nginx:1.14.2
|
||||
ports:
|
||||
- port: 80
|
||||
expose: true
|
||||
traits:
|
||||
- type: stdout-logs-collector
|
||||
properties:
|
||||
parser: customize
|
||||
VRL: |
|
||||
.message = parse_nginx_log!(.message, "combined")
|
||||
.new_field = "new value"
|
||||
```
|
||||
|
||||
该例子中,除了将 nginx 输出的 `combinded` 日志转换成 json 格式,并为每条日志增加一个 `new_field` 的 json key ,json value 的值为 `new value`。具体 vector VRL 如何编写请参考[文档](https://vector.dev/docs/reference/vrl/)。
|
||||
如果你针对这种处理方式,专门了特殊的日志分析大盘,可以参考 [文档](./observability) 将其导入到 grafana 中。
|
||||
|
||||
## 应用文件日志
|
||||
|
||||
日志收集插件除了可以对容器标准输出日志进行收集,也可以对容器写到某个目录下的文件日志进行收集。如下:
|
||||
|
||||
```yaml
|
||||
apiVersion: core.oam.dev/v1beta1
|
||||
kind: Application
|
||||
metadata:
|
||||
name: app-vector
|
||||
namespace: default
|
||||
spec:
|
||||
components:
|
||||
- name: my-biz
|
||||
type: webservice
|
||||
properties:
|
||||
cmd:
|
||||
- flog
|
||||
- -t
|
||||
- log
|
||||
- -o
|
||||
- /data/daily.log
|
||||
- -d
|
||||
- 10s
|
||||
- -w
|
||||
image: mingrammer/flog
|
||||
traits:
|
||||
- properties:
|
||||
path: "/data/daily.log"
|
||||
type: file-logs-collector
|
||||
```
|
||||
|
||||
在上面的例子中,我们把 `my-biz` 组件的业务日志输出到了容器内的 `/data/daily.log` 路径下。应用创建之后,你就可以通过应用下的 `deployment` 大盘查看到对应的文件日志结果。
|
||||
|
||||
:::tip
|
||||
需要注意的是需要被收集的日志最好专门放在存储日志的路径下面,避免与镜像自身的功能目录冲突,尤其不能存放在容器的根目录当中,否则可能会导致容器启动失败。
|
||||
:::
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Binary file not shown.
|
After Width: | Height: | Size: 632 KiB |
Binary file not shown.
|
After Width: | Height: | Size: 685 KiB |
Binary file not shown.
|
After Width: | Height: | Size: 689 KiB |
|
|
@ -175,6 +175,7 @@ module.exports = {
|
|||
],
|
||||
},
|
||||
'platform-engineers/operations/observability',
|
||||
'platform-engineers/operations/logging',
|
||||
'end-user/components/more',
|
||||
],
|
||||
},
|
||||
|
|
|
|||
Loading…
Reference in New Issue