zh-translation: /zh/docs/ops/configuration/telemetry/monitoring-multicluster-prometheus (#9339)

* zh-translation: /zh/docs/ops/configuration/telemetry/monitoring-multicluster-prometheus Signed-off-by: nicole-lihui <hui.li-iog@daocloud.io> * fix Promethus -> Prometheus * fix MD036 * fix translation * Update content/zh/docs/ops/configuration/telemetry/monitoring-multicluster-prometheus/index.md Co-authored-by: Kebe <mail@kebe7jun.com> Co-authored-by: Kebe <mail@kebe7jun.com>
2021-04-01 10:32:16 +08:00 · 2021-04-01 10:32:16 +08:00 · f3d83fc4b6
parent 37f5843f40
commit f3d83fc4b6
4 changed files with 152 additions and 61 deletions
--- a/content/zh/docs/ops/configuration/telemetry/in-proxy-service-telemetry/index.md
+++ b/content/zh/docs/ops/configuration/telemetry/in-proxy-service-telemetry/index.md
@ -1,61 +0,0 @@
---
-title: 不使用 Mixer 生成 Istio 指标 [Alpha]
-description: 怎样使用代理生成服务级别的指标。
-weight: 20
-aliases:
-  - /zh/docs/ops/telemetry/in-proxy-service-telemetry
---
-
-Istio 1.4 对直接在 Envoy 代理中生成服务级别的 HTTP 指标添加了 alpha 版支持。
-这个特性让你可以在没有 Mixer 的情况下使用 Istio 提供的工具监控你的服务网格。
-
-在代理中生成的服务级别指标代替了如下所示的当前在 Mixer 中生成的 HTTP 指标：
-
- `istio_requests_total`
- `istio_request_duration_seconds`
- `istio_request_size`
-
-## 在 Envoy 中启用服务级别指标生成功能{#enable-service-level-metrics-generation-in-envoy}
-
-要直接在 Envoy 代理中生成服务级别的指标，请配置以下选项：
-
-    {{< text bash >}}
-    $ istioctl manifest apply --set values.telemetry.enabled=true,values.telemetry.v1.enabled=false,values.telemetry.v2.enabled=true,values.telemetry.v2.prometheus.enabled=true
-    {{< /text >}}
-
-打开 **Istio Mesh** Grafana 面板。可以验证在没有任何请求经过 Istio Mixer 的情况下仍然显示和之前一样的遥测指标。
-
-## 和基于 Mixer 生成遥测指标的区别{#differences-with-mixer-based-generation}
-
-在 Istio 1.3 版本，代理生成和基于 Mixer 生成服务级别的指标存在一些细微的差别。在代理生成和基于 Mixer 生成服务级别的指标有相同完整的特性之前，我们不会考虑功能的稳定性。
-
-在那之前，请注意如下差别：
-
- `istio_request_duration_seconds` 时延指标有一个新的名字：`istio_request_duration_milliseconds`。
-  新的指标度量单位使用毫秒代替秒。我们更新了 Grafana 面板来应对这些变化。
- `istio_request_duration_milliseconds` 指标在代理中使用更多细粒度的 buckets，以提高时延报告的准确性。
-
-## 性能影响{#performance-impact}
-
-{{< warning >}}
-
-因为目前的工作是试验性的，我们主要关注的是建立基础性的功能。基于我们最初的试验，我们已经确定了几个基础的性能优化方向，希望能持续提高性能以及在开发时这个特性的可扩展性。
-
-我们不考虑将这个特性提升到 **Beta** 或者 **Stable** [状态](/zh/about/feature-stages/#feature-phase-definitions)，直到我们完成性能和可扩展性的提升以及评估。
-
-你的网格的性能依赖于你的配置。要了解更多，请看我们的[性能最佳实践帖](/zh/blog/2019/performance-best-practices/)。
-
-{{< /warning >}}
-
-下面是目前为止我们做的测试评估：
-
- 在 `istio-proxy` 容器中所有的过滤器一起使用比运行 Mixer 过滤器减少了 10% 的 CPU 资源。
- 和不配置遥测过滤器的 Envoy 代理相比，新增加的过滤器会导致在 1000 rps 时增加约 5ms P90 的时延。
- 如果你只使用 `istio-telemetry` 服务来生成服务级别的指标，你可以关闭 `istio-telemetry` 服务。
-  这样网格中每 1000 rps 流量可以为你节省约 0.5 vCPU，并且可以在收集[标准指标](/zh/docs/reference/config/policy-and-telemetry/metrics/)时将 Istio 消耗的 CPU 减半。
-
-## 已知的限制{#known-limitations}
-
- 我们只对通过 Prometheus 导出指标提供支持。
- 我们不支持生成 TCP 指标。
- 我们提供不基于代理生成的指标的自定义或配置。
--- a/content/zh/docs/ops/configuration/telemetry/monitoring-multicluster-prometheus/external-production-prometheus.svg
+++ b/content/zh/docs/ops/configuration/telemetry/monitoring-multicluster-prometheus/external-production-prometheus.svg
--- a/content/zh/docs/ops/configuration/telemetry/monitoring-multicluster-prometheus/in-mesh-production-prometheus.svg
+++ b/content/zh/docs/ops/configuration/telemetry/monitoring-multicluster-prometheus/in-mesh-production-prometheus.svg
--- a/content/zh/docs/ops/configuration/telemetry/monitoring-multicluster-prometheus/index.md
+++ b/content/zh/docs/ops/configuration/telemetry/monitoring-multicluster-prometheus/index.md
@ -0,0 +1,150 @@
+---
+title: 使用 Prometheus 监控 Istio 多集群
+description: 配置 Prometheus 监控 Istio 多集群。
+weight: 10
+aliases:
+  - /zh/help/ops/telemetry/monitoring-multicluster-prometheus
+  - /zh/docs/ops/telemetry/monitoring-multicluster-prometheus
+owner: istio/wg-policies-and-telemetry-maintainers
+test: no
+---
+
+## 概述{#overview}
+
+本教程的目的是为如何配置两个或者多个 Kubernetes 集群组成的 Istio 网格提供操作引导。这不是唯一的操作方式，而是演示一个使用 Prometheus 遥测多集群的可行方案。
+
+我们推荐 Istio 多集群监控使用 Prometheus，其主要原因是基于 Prometheus 的[分层联邦](https://prometheus.io/docs/prometheus/latest/federation/#hierarchical-federation)（Hierarchical Federation）。
+
+通过 Istio 部署到每个集群中的 Prometheus 实例作为初始收集器，然后将数据聚合到网格层次的 Prometheus 实例上。网格层次的 Prometheus 既可以部署在网格之外（外部），也可以部署在网格内的集群中。
+
+## 安装 Istio 多集群{#multicluster-Istio-setup}
+
+按照[多集群安装](/zh/docs/setup/install/multicluster/)部分，在[多集群部署模型](/zh/docs/ops/deployment/deployment-models/#multiple-clusters)中选择可行的模型配置 Istio 多集群。为了能够实现本教程的目的，让示例都能够运行，并提出以下警告：
+
+**确保在多集群中安装了一个 Istio Prometheus 集群实例!**
+
+在每个集群中使用 Istio 独立部署的 Prometheus 是跨集群监控的基础，通过联邦（Federation）的方式将 Prometheus 的生产就绪实例运行在网格外部或其中任意一个集群中。
+
+验证在多集群中运行的 Prometheus 实例：
+
+{{< text bash >}}
+$ kubectl -n istio-system get services prometheus
+NAME         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)    AGE
+prometheus   ClusterIP   10.8.4.109   <none>        9090/TCP   20h
+{{< /text >}}
+
+## 配置 Prometheus Federation{#configure-Prometheus-federation}
+
+### 外部 Prometheus{#external-production-Prometheus}
+
+您可能希望在 Istio 部署之外运行 Prometheus 实例有几个原因。
+也许您希望长期监控并且与被监控的集群解耦。
+也许您在想单独的地方去监测多个独立的网格。
+或许你还有其他的动机，不管您的原因是什么，您都需要一些特殊的配置来让它全部工作起来。
+
+{{< image width="80%"
+    link="./external-production-prometheus.svg"
+    alt="监控 Istio 多集群的外部 Prometheus 的架构。"
+    caption="监控 Istio 多集群的外部 Prometheus"
+    >}}
+
+{{< warning >}}
+本教程演示了连接主集群的 Prometheus 实例，但不涉及安全考虑因素。
+对于生产用途，请使用 HTTPS 确保对每个 Prometheus 端点的访问安全。此外，请采取预防措施，例如使用内部负载均衡而不是公共端点，并且配置适当的防火墙规则。
+{{< /warning >}}
+
+Istio 提供了一种通过 [Gateway](/zh/docs/reference/config/networking/gateway/) 向外部暴露集群服务的方式。
+您可以为主集群的 Prometheus 配置 Ingress Gateway，为集群内 Prometheus 端点提供外部连接。
+
+对于每个集群，请按照[远程访问遥测插件](/zh/docs/tasks/observability/gateways/#option-1-secure-access-https)任务中的相应说明进行操作。
+还请注意，您**应该**建立安全（HTTPS）访问。
+
+接下来，配置您的外部 Prometheus 实例，类似以下的配置来访问主集群的 Prometheus 实例（替换 Ingress 域名和集群名称）：
+
+{{< text yaml >}}
+scrape_configs:
+- job_name: 'federate-{{CLUSTER_NAME}}'
+  scrape_interval: 15s
+
+  honor_labels: true
+  metrics_path: '/federate'
+
+  params:
+    'match[]':
+      - '{job="pilot"}'
+      - '{job="envoy-stats"}'
+
+  static_configs:
+    - targets:
+      - 'prometheus.{{INGRESS_DOMAIN}}'
+      labels:
+        cluster: '{{CLUSTER_NAME}}'
+{{< /text >}}
+
+注意：
+
+* `CLUSTER_NAME` 应该与创建集群时的值保持一致（通过 `values.global.multiCluster.clusterName` 设置）。
+
+* 没有开启 Prometheus 端点验证。这意味着任何人都可以查询您的主集群的 Prometheus 实例，这是不可取的。
+
+* 如果 Gateway 没有正确的 HTTPS 配置，所有的通讯都是通过明文传输的，这是不可取的。
+
+### 集群内的 Prometheus{#production-Prometheus-on-an-in-mesh-cluster}
+
+如果您希望在其中一个集群中运行 Prometheus，则需要与网格中的另一个主集群的 Prometheus 实例建立连接。
+
+这实际上只是外部 federation 配置的一种变异。在这种情况下，运行在集群上的 Prometheus 的配置不同于远程集群Prometheus 的配置。
+
+{{< image width="80%"
+    link="./in-mesh-production-prometheus.svg"
+    alt="监控 Istio 多集群的内部 Prometheus 的架构。"
+    caption="监控 Istio 多集群的内部 Prometheus"
+    >}}
+
+配置您的 Prometheus 使得可以同时访问 *主* 和 *从* Prometheus 实例：
+
+首先执行下面的命令：
+
+{{< text bash >}}
+$ kubectl -n istio-system edit cm prometheus -o yaml
+{{< /text >}}
+
+然后给 *从* 集群添加配置（替换每个群集的 Ingress 域名和集群名称），并且给 *主* 集群添加一个配置：
+
+{{< text yaml >}}
+scrape_configs:
+- job_name: 'federate-{{REMOTE_CLUSTER_NAME}}'
+  scrape_interval: 15s
+
+  honor_labels: true
+  metrics_path: '/federate'
+
+  params:
+    'match[]':
+      - '{job="pilot"}'
+      - '{job="envoy-stats"}'
+
+  static_configs:
+    - targets:
+      - 'prometheus.{{REMOTE_INGRESS_DOMAIN}}'
+      labels:
+        cluster: '{{REMOTE_CLUSTER_NAME}}'
+
+- job_name: 'federate-local'
+
+  honor_labels: true
+  metrics_path: '/federate'
+
+  metric_relabel_configs:
+  - replacement: '{{CLUSTER_NAME}}'
+    target_label: cluster
+
+  kubernetes_sd_configs:
+  - role: pod
+    namespaces:
+      names: ['istio-system']
+  params:
+    'match[]':
+    - '{__name__=~"istio_(.*)"}'
+    - '{__name__=~"pilot(.*)"}'
+{{< /text >}}