zh-translation: blog series (#5969)

* network policy #1072

* canary #1092

* mixer spof myth #1093

* adapter model #1110

* appwitch #1018

* egress performance #1064

* multicluster version routing #1069

* custom ingress gateway #1086

* fix external link
This commit is contained in:
2BFL 2019-12-04 20:22:13 +08:00 committed by Istio Automation
parent 5514ed88d9
commit f207adb0b9
8 changed files with 317 additions and 406 deletions

View File

@ -1,6 +1,6 @@
---
title: Canary Deployments using Istio
description: Using Istio to create autoscaled canary deployments.
title: 使用 Istio 进行金丝雀部署
description: 使用 Istio 创建自动缩放的金丝雀部署。
publishdate: 2017-06-14
last_update: 2018-05-16
attribution: Frank Budinsky
@ -10,34 +10,28 @@ aliases:
---
{{< tip >}}
This post was updated on May 16, 2018 to use the latest version of the traffic management model.
本篇博客最后更新时间 2018 年 5 月 16 号,采用了最新版本的流量管理模型。
{{< /tip >}}
One of the benefits of the [Istio](/) project is that it provides the control needed to deploy canary services. The idea behind
canary deployment (or rollout) is to introduce a new version of a service by first testing it using a small percentage of user
traffic, and then if all goes well, increase, possibly gradually in increments, the percentage while simultaneously phasing out
the old version. If anything goes wrong along the way, we abort and rollback to the previous version. In its simplest form,
the traffic sent to the canary version is a randomly selected percentage of requests, but in more sophisticated schemes it
can be based on the region, user, or other properties of the request.
采用 [Istio](/zh/) 项目的一大好处就是为服务金丝雀方式部署提供了控制便利。金丝雀部署(或上线)背后的想法是通过让一小部分用户流量引入的新版本进行测试,如果一切顺利,则可以增加(可能逐渐增加)百分比,逐步替换旧版本。如在过程中出现任何问题,则可以中止并回滚到旧版本。最简单的方式,是随机选择百分比请求到金丝雀版本,但在更复杂的方案下,则可以基于请求的区域,用户或其他属性。
Depending on your level of expertise in this area, you may wonder why Istio's support for canary deployment is even needed, given that platforms like Kubernetes already provide a way to do [version rollout](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#updating-a-deployment) and [canary deployment](https://kubernetes.io/docs/concepts/cluster-administration/manage-deployment/#canary-deployments). Problem solved, right? Well, not exactly. Although doing a rollout this way works in simple cases, its very limited, especially in large scale cloud environments receiving lots of (and especially varying amounts of) traffic, where autoscaling is needed.
基于领域的专业水平,您可能想知道为什么需要 Istio 来支持金丝雀部署,因为像 Kubernetes 这样的平台已经提供了进行 [版本上线](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#updating-a-deployment) 和 [金丝雀部署](https://kubernetes.io/docs/concepts/cluster-administration/manage-deployment/#canary-deployments) 的方法。问题解决了吗 ?不完全是。虽然以这种方式进行部署可以在简单的情况下工作,但功能非常有限,特别是在大规模自动缩放的云环境中大流量的情况下。
## Canary deployment in Kubernetes
## Kubernetes 中的金丝雀部署{#canary-deployment-in-Kubernetes}
As an example, let's say we have a deployed service, **helloworld** version **v1**, for which we would like to test (or simply rollout) a new version, **v2**. Using Kubernetes, you can rollout a new version of the **helloworld** service by simply updating the image in the services corresponding [Deployment](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/) and letting the [rollout](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#updating-a-deployment) happen automatically. If we take particular care to ensure that there are enough **v1** replicas running when we start and [pause](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#pausing-and-resuming-a-deployment) the rollout after only one or two **v2** replicas have been started, we can keep the canarys effect on the system very small. We can then observe the effect before deciding to proceed or, if necessary, [rollback](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#rolling-back-a-deployment). Best of all, we can even attach a [horizontal pod autoscaler](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#scaling-a-deployment) to the Deployment and it will keep the replica ratios consistent if, during the rollout process, it also needs to scale replicas up or down to handle traffic load.
假设我们有一个已部署的 **helloworld** 服务 **v1** 版本,我们想要测试(或简单上线)新版本 **v2**。使用 Kubernetes您可以通过简单地更新服务的 [Deployment](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/) 中的镜像并自动进行部署来 [上线](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#updating-a-deployment) 新版本的 **helloworld** 服务。如果我们特能够小心保证在启动并且在仅启动一个或两个 v2 副本[暂停](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#pausing-and-resuming-a-deployment)上线时有足够的 **v1** 副本运行,则能够保持金丝雀发布对系统的影响非常小。后续我们可以观察效果,或在必要时进行[回滚](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#rolling-back-a-deployment)。最好,我们也能够对 Deployment 设置 [HPA](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#scaling-a-deployment),在上线过程中减少或增加副本以处理流量负载时,也能够保持副本比例一致。
Although fine for what it does, this approach is only useful when we have a properly tested version that we want to deploy, i.e., more of a blue/green, a.k.a. red/black, kind of upgrade than a "dip your feet in the water" kind of canary deployment. In fact, for the latter (for example, testing a canary version that may not even be ready or intended for wider exposure), the canary deployment in Kubernetes would be done using two Deployments with [common pod labels](https://kubernetes.io/docs/concepts/cluster-administration/manage-deployment/#using-labels-effectively). In this case, we cant use autoscaling anymore because its now being done by two independent autoscalers, one for each Deployment, so the replica ratios (percentages) may vary from the desired ratio, depending purely on load.
尽管这种机制能够很好工作,但这种方式只适用于部署的经过适当测试的版本,也就是说,更多的是蓝/绿发布,又称红/黑发布,而不是 “蜻蜓点水“ 式的金丝雀部署。实际上对于后者例如并没有完全准备好或者无意对外暴露的版本Kubernetes 中的金丝雀部署将使用具有 [公共 pod 标签](https://kubernetes.io/docs/concepts/cluster-administration/manage-deployment/#using-labels-effectively) 的两个 Deployment 来完成。在这种情况下,我们不能再使用自动缩放器,因为是有由两个独立的自动缩放器来进行控制,不同负载情况下,副本比例(百分比)可能与所需的比例不同。
Whether we use one deployment or two, canary management using deployment features of container orchestration platforms like Docker, Mesos/Marathon, or Kubernetes has a fundamental problem: the use of instance scaling to manage the traffic; traffic version distribution and replica deployment are not independent in these systems. All replica pods, regardless of version, are treated the same in the `kube-proxy` round-robin pool, so the only way to manage the amount of traffic that a particular version receives is by controlling the replica ratio. Maintaining canary traffic at small percentages requires many replicas (e.g., 1% would require a minimum of 100 replicas). Even if we ignore this problem, the deployment approach is still very limited in that it only supports the simple (random percentage) canary approach. If, instead, we wanted to limit the visibility of the canary to requests based on some specific criteria, we still need another solution.
无论我们使用一个或者两个部署,使用 DockerMesos/Marathon 或 Kubernetes 等容器编排平台进行的金丝雀发布管理都存在一个根本问题:使用实例扩容来管理流量;版本流量分发和副本部署在上述平台中并独立。所有 pod 副本,无论版本如何,在 `kube-proxy` 循环池中都被一视同仁地对待因此管理特定版本接收的流量的唯一方法是控制副本比例。以小百分比维持金丝雀流量需要许多副本例如1 将需要至少 100 个副本)。即使我们可以忽略这个问题,部署方式功能仍然非常有限,因为它只支持简单(随机百分比)金丝雀部署。如果我们想根据某些特定规则将请求路由到金丝雀版本上,我们仍然需要另一种解决方案。
## Enter Istio
## 使用 Istio{#enter-Istio}
With Istio, traffic routing and replica deployment are two completely independent functions. The number of pods implementing services are free to scale up and down based on traffic load, completely orthogonal to the control of version traffic routing. This makes managing a canary version in the presence of autoscaling a much simpler problem. Autoscalers may, in fact, respond to load variations resulting from traffic routing changes, but they are nevertheless functioning independently and no differently than when loads change for other reasons.
使用 Istio流量路由和副本部署是两个完全独立的功能。服务的 pod 数量可以根据流量负载灵活伸缩,与版本流量路由的控制完全正交。这在自动缩放的情况下能够更加简单地管理金丝雀版本。事实上,自动缩放管理器仍然独立运行,其在响应因流量路由导致的负载变化与其他原因导致负载变化的行为上没有区别。
Istios [routing rules](/zh/docs/concepts/traffic-management/#routing-rules) also provide other important advantages; you can easily control
fine-grained traffic percentages (e.g., route 1% of traffic without requiring 100 pods) and you can control traffic using other criteria (e.g., route traffic for specific users to the canary version). To illustrate, lets look at deploying the **helloworld** service and see how simple the problem becomes.
Istio 的 [路由规则](/zh/docs/concepts/traffic-management/#routing-rules) 也带来了其他的便利;你可以轻松实现细粒度控制流量百分比(例如,路由 1 的流量而不需要 100 个 pod当然也可以使用其他规则来控制流量例如将特定用户的流量路由到金丝雀版本。作为展示让我们看一下采用这种方式部署 **helloworld** 服务的简单便捷。
We begin by defining the **helloworld** Service, just like any other Kubernetes service, something like this:
首先我们定义 **helloworld** 服务,和普通 **Kubernetes** 服务一样,如下所示:
{{< text yaml >}}
apiVersion: v1
@ -52,10 +46,9 @@ spec:
...
{{< /text >}}
We then add 2 Deployments, one for each version (**v1** and **v2**), both of which include the service selectors `app: helloworld` label:
然后我们添加 2 个 Deployment分别为版本 **v1****v2**,这两个版本都包含服务选择标签 `apphelloworld`
{{< text yaml >}}
apiVersion: apps/v1
kind: Deployment
metadata:
name: helloworld-v1
@ -71,7 +64,7 @@ spec:
- image: helloworld-v1
...
---
apiVersion: apps/v1
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: helloworld-v2
@ -88,11 +81,9 @@ spec:
...
{{< /text >}}
Note that this is exactly the same way we would do a [canary deployment](https://kubernetes.io/docs/concepts/cluster-administration/manage-deployment/#canary-deployments) using plain Kubernetes, but in that case we would need to adjust the number of replicas of each Deployment to control the distribution of traffic. For example, to send 10% of the traffic to the canary version (**v2**), the replicas for **v1** and **v2** could be set to 9 and 1, respectively.
需要注意的是,这与使用普通 Kubernetes 进行 [金丝雀部署](https://kubernetes.io/docs/concepts/cluster-administration/manage-deployment/#canary-deployments) 的方式完全相同,但是在 Kubernetes 方式下控制流量分配需要调整每个 Deployment 的副本数目。例如,将 10 的流量发送到金丝雀版本v2v1 和 v2 的副本可以分别设置为 9 和 1。
However, since we are going to deploy the service in an [Istio enabled](/zh/docs/setup/) cluster, all we need to do is set a routing
rule to control the traffic distribution. For example if we want to send 10% of the traffic to the canary, we could use `kubectl`
to set a routing rule something like this:
但是在 [启用 Istio](/zh/docs/setup/) 的集群中,我们可以通过设置路由规则来控制流量分配。如将 10 的流量发送到金丝雀版本本,我们可以使用 `kubectl` 来设置以下的路由规则:
{{< text bash >}}
$ kubectl apply -f - <<EOF
@ -108,11 +99,11 @@ spec:
- destination:
host: helloworld
subset: v1
weight: 90
weight: 90
- destination:
host: helloworld
subset: v2
weight: 10
weight: 10
---
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
@ -130,11 +121,11 @@ spec:
EOF
{{< /text >}}
After setting this rule, Istio will ensure that only one tenth of the requests will be sent to the canary version, regardless of how many replicas of each version are running.
当规则设置生效后Istio 将确保只有 10% 的请求发送到金丝雀版本,无论每个版本的运行副本数量是多少。
## Autoscaling the deployments
## 部署中的自动缩放{#autoscaling-the-deployments}
Because we dont need to maintain replica ratios anymore, we can safely add Kubernetes [horizontal pod autoscalers](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/) to manage the replicas for both version Deployments:
由于我们不再需要保持副本比例,所以我们可以安全地设置 Kubernetes [HPA](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/) 来管理两个版本 Deployment 的副本:
{{< text bash >}}
$ kubectl autoscale deployment helloworld-v1 --cpu-percent=50 --min=1 --max=10
@ -153,7 +144,7 @@ Helloworld-v1 Deployment/helloworld-v1 50% 47% 1 10 17s
Helloworld-v2 Deployment/helloworld-v2 50% 40% 1 10 15s
{{< /text >}}
If we now generate some load on the **helloworld** service, we would notice that when scaling begins, the **v1** autoscaler will scale up its replicas significantly higher than the **v2** autoscaler will for its replicas because **v1** pods are handling 90% of the load.
如果现在对 **helloworld** 服务上产生一些负载,我们会注意到,当扩容开始时,**v1** 扩容副本数目远高于 **v2** ,因为 **v1** pod 正在处理 90 的负载。
{{< text bash >}}
$ kubectl get pods | grep helloworld
@ -169,7 +160,7 @@ helloworld-v1-3523621687-xlt26 0/2 Pending 0 19m
helloworld-v2-4095161145-963wt 2/2 Running 0 50m
{{< /text >}}
If we then change the routing rule to send 50% of the traffic to **v2**, we should, after a short delay, notice that the **v1** autoscaler will scale down the replicas of **v1** while the **v2** autoscaler will perform a corresponding scale up.
如果更改路由规则将 50 的流量发送到 **v2**,我们则可以在短暂的延迟后注意到 **v1** 副本数的减少,而 **v2** 副本数相应地增加。
{{< text bash >}}
$ kubectl get pods | grep helloworld
@ -186,8 +177,9 @@ helloworld-v2-4095161145-t2ccm 0/2 Pending 0 17m
helloworld-v2-4095161145-v3v9n 0/2 Pending 0 13m
{{< /text >}}
The end result is very similar to the simple Kubernetes Deployment rollout, only now the whole process is not being orchestrated and managed in one place. Instead, were seeing several components doing their jobs independently, albeit in a cause and effect manner.
What's different, however, is that if we now stop generating load, the replicas of both versions will eventually scale down to their minimum (1), regardless of what routing rule we set.
最终结果与 Kubernetes Deployment 上线非常相似,只是整个流程并不是集中地进行编排和管理。相反,我们看到几个组件独立完成工作,虽然它们有因果关系。
有一点不同的是当我们停止负载时无论设置路由规则如何两个版本的副本数最终都会缩小到最小值1
{{< text bash >}}
$ kubectl get pods | grep helloworld
@ -195,9 +187,11 @@ helloworld-v1-3523621687-dt7n7 2/2 Running 0 1h
helloworld-v2-4095161145-963wt 2/2 Running 0 1h
{{< /text >}}
## Focused canary testing
## 聚焦金丝雀测试{#focused-canary-testing}
As mentioned above, the Istio routing rules can be used to route traffic based on specific criteria, allowing more sophisticated canary deployment scenarios. Say, for example, instead of exposing the canary to an arbitrary percentage of users, we want to try it out on internal users, maybe even just a percentage of them. The following command could be used to send 50% of traffic from users at *some-company-name.com* to the canary version, leaving all other users unaffected:
如上所述Istio 路由规则可用于根据特定规则准进行流量路由,从而能够提供更复杂的金丝雀部署方案。例如,与简单通过将金丝雀版本暴露给任意百分比的用户方式不同,我们希望在内部用户上尝试,甚至可能只是内部用户的一部分。
以下命令可将特定网站上 50 的用户流量路由到金丝雀版本,而其他用户则不受影响:
{{< text bash >}}
$ kubectl apply -f - <<EOF
@ -217,11 +211,11 @@ spec:
- destination:
host: helloworld
subset: v1
weight: 50
weight: 50
- destination:
host: helloworld
subset: v2
weight: 50
weight: 50
- route:
- destination:
host: helloworld
@ -229,11 +223,12 @@ spec:
EOF
{{< /text >}}
As before, the autoscalers bound to the 2 version Deployments will automatically scale the replicas accordingly, but that will have no affect on the traffic distribution.
和以前一样,绑定到 2 个版本 Deployment 的自动缩放器会相应地自动管理副本,但这对流量分配没有影响。
## Summary
## 总结{#summary}
In this article weve seen how Istio supports general scalable canary deployments, and how this differs from the basic deployment support in Kubernetes. Istios service mesh provides the control necessary to manage traffic distribution with complete independence from deployment scaling. This allows for a simpler, yet significantly more functional, way to do canary test and rollout.
本文中,我们看到了 Istio 如何支持通用可扩展的金丝雀部署,以及与 Kubernetes 部署的差异。Istio 服务网格提供了管理流量分配所需的基础控制,并完全独立于部署缩放。这允许简单而强大的方式来进行金丝雀测试和上线。
Intelligent routing in support of canary deployment is just one of the many features of Istio that will make the production deployment of large-scale microservices-based applications much simpler. Check out [istio.io](/) for more information and to try it out.
The sample code used in this article can be found [here]({{< github_tree >}}/samples/helloworld).
支持金丝雀部署的智能路由只是 Istio 的众多功能之一,它将使基于大型微服务的应用程序的生产部署变得更加简单。查看 [istio.io](/zh/) 了解更多信息。
可在 [此处]({{< github_tree >}}/samples/helloworld) 查看示例代码。

View File

@ -1,6 +1,6 @@
---
title: Using Network Policy with Istio
description: How Kubernetes Network Policy relates to Istio policy.
title: Istio 使用网络策略
description: Istio 的策略如何关联 Kubernetes 的网络策略 。
publishdate: 2017-08-10
subtitle:
attribution: Spike Curtis
@ -9,57 +9,54 @@ aliases:
target_release: 0.1
---
The use of Network Policy to secure applications running on Kubernetes is a now a widely accepted industry best practice. Given that Istio also supports policy, we want to spend some time explaining how Istio policy and Kubernetes Network Policy interact and support each other to deliver your application securely.
使用网络策略去保护运行在 Kubernetes 上的应用程序现在是一种广泛接受的行业最佳实践。 鉴于 Istio 也支持策略,我们希望花一些时间来解释 Istio 策略和 Kubernetes 网络策略的相互作用和互相支持提供应用程序的安全。
Lets start with the basics: why might you want to use both Istio and Kubernetes Network Policy? The short answer is that they are good at different things. Consider the main differences between Istio and Network Policy (we will describe "typical” implementations, e.g. Calico, but implementation details can vary with different network providers):
让我们从基础开始:为什么你想要同时使用 Istio 和 Kubernetes 网络策略? 简短的回答是它们处理不同的事情。 表格列出 Istio 和网络策略之间的主要区别我们将描述“典型”实现例如Calico但具体实现细节可能因不同的网络提供商而异
| | Istio Policy | Network Policy |
| --------------------- | ----------------- | ------------------ |
| **Layer** | "Service" --- L7 | "Network" --- L3-4 |
| **Implementation** | User space | Kernel |
| **Enforcement Point** | Pod | Node |
| | Istio 策略 |网络策略 |
| -------------------- | ----------------- | ------------------ |
| **层级** |"服务" --- L7 |"网络" --- L3-4 |
| **实现** |用户空间 |内核 |
| **执行点** |Pod |节点 |
## Layer
## 层级{#layer}
Istio policy operates at the "service” layer of your network application. This is Layer 7 (Application) from the perspective of the OSI model, but the de facto model of cloud native applications is that Layer 7 actually consists of at least two layers: a service layer and a content layer. The service layer is typically HTTP, which encapsulates the actual application data (the content layer). It is at this service layer of HTTP that the Istios Envoy proxy operates. In contrast, Network Policy operates at Layers 3 (Network) and 4 (Transport) in the OSI model.
从 OSI 模型的角度来看7层应用程序Istio 策略运行在网络应用程序的“服务”层。但事实上云原生应用程序模型是7层实际上至少包含两层服务层和内容层。服务层通常是 HTTP 它封装了实际的应用程序数据内容层。Istio 的 Envoy 代理运行的 HTTP 服务层。相比之下,网络策略在 OSI 模型中的第3层网络和第4层传输运行。
Operating at the service layer gives the Envoy proxy a rich set of attributes to base policy decisions on, for protocols it understands, which at present includes HTTP/1.1 & HTTP/2 (gRPC operates over HTTP/2). So, you can apply policy based on virtual host, URL, or other HTTP headers. In the future, Istio will support a wide range of Layer 7 protocols, as well as generic TCP and UDP transport.
运行在服务层为 Envoy 代理提供了一组丰富的属性,以便基础协议进行策略决策,其中包括 HTTP/1.1 和 HTTP/2 gRPC 运行在 HTTP/2 上。因此您可以基于虚拟主机、URL或其他 HTTP 头部应用策略。在未来Istio 将支持广泛的7层协议、以及通用的 TCP 和 UDP 传输。
In contrast, operating at the network layer has the advantage of being universal, since all network applications use IP. At the network layer you can apply policy regardless of the layer 7 protocol: DNS, SQL databases, real-time streaming, and a plethora of other services that do not use HTTP can be secured. Network Policy isnt limited to a classic firewalls tuple of IP addresses, proto, and ports. Both Istio and Network Policy are aware of rich Kubernetes labels to describe pod endpoints.
相比之下Istio 策略运行在网络层具有通用的优势因为所有网络应用程序都使用IP。无论7层协议如何您都可以在网络层应用策略DNS 、SQL 数据库、实时流以及许多不使用 HTTP 的其他服务都可以得到保护。网络策略不仅限于经典防火墙的 IP 地址、 协议和端口三元组, Istio 和网络策略都可以使用丰富的 Kubernetes 标签来描述 pod 端点。
## Implementation
## 实现{#implementation}
The Istios proxy is based on [Envoy](https://envoyproxy.github.io/envoy/), which is implemented as a user space daemon in the data plane that
interacts with the network layer using standard sockets. This gives it a large amount of flexibility in processing, and allows it to be
distributed (and upgraded!) in a container.
Istio 的代理基于 [`Envoy`](https://envoyproxy.github.io/envoy/),它作为数据平面的用户空间守护进程实现的,使用标准套接字与网络层交互。这使它在处理方面具有很大的灵活性,并允许它在容器中分发(和升级!)。
Network Policy data plane is typically implemented in kernel space (e.g. using iptables, eBPF filters, or even custom kernel modules). Being in kernel space
allows them to be extremely fast, but not as flexible as the Envoy proxy.
网络策略数据平面通常在内核空间中实现(例如:使用 iptables 、eBPF 过滤器、或甚至自定义内核模块)。在内核空间使它们性能很好,但不像 Envoy 代理那样灵活。
## Enforcement Point
## 执行点{#enforcement-point}
Policy enforcement using the Envoy proxy is implemented inside the pod, as a sidecar container in the same network namespace. This allows a simple deployment model. Some containers are given permission to reconfigure the networking inside their pod (`CAP_NET_ADMIN`). If such a service instance is compromised, or misbehaves (as in a malicious tenant) the proxy can be bypassed.
Envoy 代理的策略执行是在 pod 中,作为同一网络命名空间中的 sidecar 容器。这使得部署模型简单。某些容器赋予权限可以重新配置其 pod 中的网络(`CAP_NET_ADMIN`)。如果此类服务实例绕过代理受到损害或行为不当(如:在恶意租户中)。
While this wont let an attacker access other Istio-enabled pods, so long as they are correctly configured, it opens several attack vectors:
虽然这不会让攻击者访问其他启用了 Istio 的 pod ,但通过配置,会打开几种攻击:
- Attacking unprotected pods
- Attempting to deny service to protected pods by sending lots of traffic
- Exfiltrating data collected in the pod
- Attacking the cluster infrastructure (servers or Kubernetes services)
- Attacking services outside the mesh, like databases, storage arrays, or legacy systems.
- 攻击未受保护的 pods
- 尝试通过发送大量流量为受保护的 pods 造成访问拒绝
- 在 pod 中收集的漏出数据
- 攻击集群基础设施( 服务器或 Kubernetes 服务)
- 攻击网格外的服务,如数据库,存储阵列或遗留系统。
Network Policy is typically enforced at the host node, outside the network namespace of the guest pods. This means that compromised or misbehaving pods must break into the root namespace to avoid enforcement. With the addition of egress policy due in Kubernetes 1.8, this difference makes Network Policy a key part of protecting your infrastructure from compromised workloads.
网络策略通常在客户机的网络命名空间之外的主机节点处执行。 这意味着必须避免受损或行为不当的 pod 进入根命名空间的执行。 通过在 Kubernetes 1.8 中添加 egress 策略,这是网络策略成为保护基础设施免受工作负载受损的关键部分。
## Examples
## 举例{#examples}
Lets walk through a few examples of what you might want to do with Kubernetes Network Policy for an Istio-enabled application. Consider the Bookinfo sample application. Were going to cover the following use cases for Network Policy:
让我们来看一些Istio应用程序使用 Kubernetes 网络策略的示例。 下面我们以 Bookinfo 应用程序为例,介绍网络策略功能的用例:
- Reduce attack surface of the application ingress
- Enforce fine-grained isolation within the application
- 减少应用程序入口的攻击面
- 在应用程序中实现细粒度隔离
### Reduce attack surface of the application ingress
### 减少应用程序入口的攻击面{#reduce-attack-surface-of-the-application-ingress}
Our application ingress controller is the main entry-point to our application from the outside world. A quick peek at `istio.yaml` (used to install Istio) defines the Istio ingress like this:
应用程序的 ingress 控制器是外部世界进入我们应用程序的主要入口。 快速查看 `istio.yaml` (用于安装 Istio )定义了 Istio-ingress如下所示
{{< text yaml >}}
apiVersion: v1
@ -79,7 +76,7 @@ spec:
istio: ingress
{{< /text >}}
The `istio-ingress` exposes ports 80 and 443. Lets limit incoming traffic to just these two ports. Envoy has a [built-in administrative interface](https://www.envoyproxy.io/docs/envoy/latest/operations/admin.html#operations-admin-interface), and we dont want a misconfigured `istio-ingress` image to accidentally expose our admin interface to the outside world. This is an example of defense in depth: a properly configured image should not expose the interface, and a properly configured Network Policy will prevent anyone from connecting to it. Either can fail or be misconfigured and we are still protected.
`istio-ingress` 暴露端口 80 和 443 . 我们需要将流入流量限制在这两个端口上。 Envoy 有 [`内置管理接口`](https://www.envoyproxy.io/docs/envoy/latest/operations/admin.html#operations-admin-interface),我们不希望错误配置 `istio-ingress` 镜像而导致意外地将我们的管理接口暴露给外界。这里深度防御的示例:正确配置的镜像应该暴露接口,正确配置的网络策略将阻止任何人连接到它,要么失败,要么配置错误,受到保护。
{{< text yaml >}}
apiVersion: networking.k8s.io/v1
@ -99,16 +96,16 @@ spec:
port: 443
{{< /text >}}
### Enforce fine-grained isolation within the application
### 在应用程序中实现细粒度隔离{#enforce-fine-grained-isolation-within-the-application}
Here is the service graph for the Bookinfo application.
如下是 Bookinfo 应用程序的服务示意图:
{{< image width="80%"
link="/zh/docs/examples/bookinfo/withistio.svg"
caption="Bookinfo Service Graph"
>}}
This graph shows every connection that a correctly functioning application should be allowed to make. All other connections, say from the Istio Ingress directly to the Rating service, are not part of the application. Lets lock out those extraneous connections so they cannot be used by an attacker. Imagine, for example, that the Ingress pod is compromised by an exploit that allows an attacker to run arbitrary code. If we only allow connections to the Product Page pods using Network Policy, the attacker has gained no more access to my application backends _even though they have compromised a member of the service mesh_.
此图显示了一个正确功能的应用程序应该允许的每个连接。 所有其他连接,例如从 Istio Ingress 直接到 Rating 服务,不是应用程序的一部分。 让我们排除那些无关的连接,它们不能被攻击者所用。 例如想象一下Ingress pod 受到攻击者的攻击,允许攻击者运行任意代码。 如果我们使用网络策略只允许连接到 `productpage``http://$GATEWAY_URL/productpage`)的 Pod ,则攻击者不再获得对我的应用程序后端的访问权限,尽管它们已经破坏了服务网格的成员。
{{< text yaml >}}
apiVersion: networking.k8s.io/v1
@ -130,10 +127,10 @@ spec:
istio: ingress
{{< /text >}}
You can and should write a similar policy for each service to enforce which other pods are allowed to access each.
推荐你可以而且应该为每个服务编写类似的策略,允许其他 pod 访问执行。
## Summary
## 总结{#summary}
Our take is that Istio and Network Policy have different strengths in applying policy. Istio is application-protocol aware and highly flexible, making it ideal for applying policy in support of operational goals, like service routing, retries, circuit-breaking, etc, and for security that operates at the application layer, such as token validation. Network Policy is universal, highly efficient, and isolated from the pods, making it ideal for applying policy in support of network security goals. Furthermore, having policy that operates at different layers of the network stack is a really good thing as it gives each layer specific context without commingling of state and allows separation of responsibility.
我们认为 Istio 和网络策略在应用策略方面有不同的优势。 Istio 具有应用协议感知和高度灵活性,非常适合应用策略来支持运营目标,如:服务路由、重试、熔断等,以及在应用层开启的安全性,例如:令牌验证。 网络策略是通用的、高效的、与 pod 隔离,使其成为应用策略以支持网络安全目标的理想选择。 此外,拥有在网络堆栈的不同层运行的策略是一件非常好的事情,因为它为每个层提供特定的上下文而不会混合状态并允许责任分离。
This post is based on the three part blog series by Spike Curtis, one of the Istio team members at Tigera. The full series can be found here: <https://www.projectcalico.org/using-network-policy-in-concert-with-istio/>
这篇文章是基于 Spike Curtis 的三部分博客系列,他是 Tigera 的 Istio 团队成员之一。 完整系列可以在这里找到:<https://www.projectcalico.org/using-network-policy-in-concert-with-istio/>

View File

@ -1,8 +1,8 @@
---
title: Mixer Adapter Model
description: Provides an overview of Mixer's plug-in architecture.
title: Mixer 适配器模型
description: 概要说明 Mixer 的插件架构。
publishdate: 2017-11-03
subtitle: Extending Istio to integrate with a world of infrastructure backends
subtitle: 将 Istio 与后端基础设施整合
attribution: Martin Taillefer
keywords: [adapters,mixer,policies,telemetry]
aliases:
@ -10,82 +10,69 @@ aliases:
target_release: 0.2
---
Istio 0.2 introduced a new Mixer adapter model which is intended to increase Mixers flexibility to address a varied set of infrastructure backends. This post intends to put the adapter model in context and explain how it works.
Istio 0.2 引入了一种新的 Mixer 适配器模型,这种模型使接入后端基础设施具有更多的灵活性 。本文将解释这种模型是如何工作的。
## Why adapters?
## 为什么是适配器模型?{#why-adapters}
Infrastructure backends provide support functionality used to build services. They include such things as access control systems, telemetry capturing systems, quota enforcement systems, billing systems, and so forth. Services traditionally directly integrate with these backend systems, creating a hard coupling and baking-in specific semantics and usage options.
后端基础设施提供了支持服务构建的功能。他们包括访问控制、遥测、配额控制、计费系统等等。传统服务会直接与这些后端系统集成,并与后端紧密耦合,并集成到其中的个性化语义和操作。
Mixer serves as an abstraction layer between Istio and an open-ended set of infrastructure backends. The Istio components and services that run within the mesh can interact with these backends, while not being coupled to the backends specific interfaces.
Mixer 服务作为 Istio 和一套开放式基础设施之间的抽象层。Istio 组件和运行在 Service Mesh 中的服务,通过 Mixer 就可以在不直接访问后端接口的情况下和这些后端进行交互。
In addition to insulating application-level code from the details of infrastructure backends, Mixer provides an intermediation model that allows operators to inject and control policies between application code and backends. Operators can control which data is reported to which backend, which backend to consult for authorization, and much more.
除了作为应用层与基础设施隔离外Mixer 提供了一种中介模型,这种模型允许注入和控制应用和后端的策略。操作人员可以控制哪些数据汇报给哪个后端,哪个后端提供授权等等。
Given that individual infrastructure backends each have different interfaces and operational models, Mixer needs custom
code to deal with each and we call these custom bundles of code [*adapters*](https://github.com/istio/istio/wiki/Mixer-Compiled-In-Adapter-Dev-Guide).
考虑到每个基础服务都有不同的接口和操作模型Mixer 需要用户通过代码来解决这些差异,我们可以称这些用户自己封装的代码为 [*适配器*](https://github.com/istio/istio/wiki/Mixer-Compiled-In-Adapter-Dev-Guide)。
Adapters are Go packages that are directly linked into the Mixer binary. Its fairly simple to create custom Mixer binaries linked with specialized sets of adapters, in case the default set of adapters is not sufficient for specific use cases.
适配器以 Go 包的形式直接链接到 Mixer 二进制中。如果默认的适配器不能满足特定的使用需求,自定义适配器也是很简单的。
## Philosophy
## 设计哲学{#philosophy}
Mixer is essentially an attribute processing and routing machine. The proxy sends it [attributes](/zh/docs/reference/config/policy-and-telemetry/mixer-overview/#attributes) as part of doing precondition checks and telemetry reports, which it turns into a series of calls into adapters. The operator supplies configuration which describes how to map incoming attributes to inputs for the adapters.
Mixer 本质上就是一个处理属性和路由的机器。代理将 [属性](/zh/docs/reference/config/policy-and-telemetry/mixer-overview/#attributes) 作为预检和遥测报告的一部分发送出来,并且转换为一系列对适配器的调用。运维人员提供了用于描述如何将传入的属性映射为适配器的配置。
{{< image width="60%"
link="/zh/docs/reference/config/policy-and-telemetry/mixer-overview/machine.svg"
caption="Attribute Machine"
>}}
Configuration is a complex task. In fact, evidence shows that the overwhelming majority of service outages are caused by configuration errors. To help combat this, Mixers configuration model enforces a number of constraints designed to avoid errors. For example, the configuration model uses strong typing to ensure that only meaningful attributes or attribute expressions are used in any given context.
配置是一个复杂的任务。有证据表明绝大多数服务中断是由配置错误造成的。为了帮助解决这一问题Mixer 的配置模型通过做限制来避免错误。例如,在配置中使用强类型,以此来确保在上下文环境中使用了有意义的属性或者属性表达式。
## Handlers: configuring adapters
## Handlers: 适配器的配置{#handlers-configuring-adapters}
Each adapter that Mixer uses requires some configuration to operate. Typically, adapters need things like the URL to their backend, credentials, caching options, and so forth. Each adapter defines the exact configuration data it needs via a [protobuf](https://developers.google.com/protocol-buffers/) message.
Mixer 使用的每个适配器都需要一些配置才能运行。一般来说,适配器需要一些信息。例如,到后端的 URL 、证书、缓存选项等等。每个适配器使用一个 [protobuf](https://developers.google.com/protocol-buffers/) 消息来定义所需要的配置数据。
You configure each adapter by creating [*handlers*](/zh/docs/reference/config/policy-and-telemetry/mixer-overview/#handlers) for them. A handler is a
configuration resource which represents a fully configured adapter ready for use. There can be any number of handlers for a single adapter, making it possible to reuse an adapter in different scenarios.
你可以通过创建 [*handler*](/zh/docs/reference/config/policy-and-telemetry/mixer-overview/#handlers) 为适配器提供配置。Handler 就是一套能让一个适配器就绪的完整配置。对同一个适配器可以有任意数量的 Handler这样就可以在不同的场景下复用了。
## Templates: adapter input schema
## Templates: 适配输入结构{#templates-adapter-input-schema}
Mixer is typically invoked twice for every incoming request to a mesh service, once for precondition checks and once for telemetry reporting. For every such call, Mixer invokes one or more adapters. Different adapters need different pieces of data as input in order to do their work. A logging adapter needs a log entry, a metric adapter needs a metric, an authorization adapter needs credentials, etc.
Mixer [*templates*](/zh/docs/reference/config/policy-and-telemetry/templates/) are used to describe the exact data that an adapter consumes at request time.
通常对于进入到 Mesh 服务中的请求Mixer 会发生两次调用一次是预检一次是遥测报告。每一次调用Mixer 都会调用一个或更多的适配器。不同的适配器需要不同的数据作为输入来处理。例如日志适配器需要日志输入metric 适配器需要 metric 数据作为输入认证的适配器需要证书等等。Mixer [*templates*](/zh/docs/reference/config/policy-and-telemetry/templates/) 用来描述每次请求适配器消费的数据。
Each template is specified as a [protobuf](https://developers.google.com/protocol-buffers/) message. A single template describes a bundle of data that is delivered to one or more adapters at runtime. Any given adapter can be designed to support any number of templates, the specific templates the adapter supports is determined by the adapter developer.
每个 Template 被指定为 [protobuf](https://developers.google.com/protocol-buffers/) 消息。一个模板描述了一组数据,这些数据在运行时被传递给一个或多个适配器。一个适配器可以支持任意数量的模板,开发者还可以设计支持特定模板的是适配器。
[`metric`](/zh/docs/reference/config/policy-and-telemetry/templates/metric/) and [`logentry`](/zh/docs/reference/config/policy-and-telemetry/templates/logentry/) are two of the most essential templates used within Istio. They represent respectively the payload to report a single metric and a single log entry to appropriate backends.
[`metric`](/zh/docs/reference/config/policy-and-telemetry/templates/metric/) 和 [`logentry`](/zh/docs/reference/config/policy-and-telemetry/templates/logentry/) 是两个最重要的模板,分别表示负载的单一指标,和到适当后端的单一日志条目。
## Instances: attribute mapping
## Instances: 属性映射{#instances-attribute-mapping}
You control which data is delivered to individual adapters by creating
[*instances*](/zh/docs/reference/config/policy-and-telemetry/mixer-overview/#instances).
Instances control how Mixer uses the [attributes](/zh/docs/reference/config/policy-and-telemetry/mixer-overview/#attributes) delivered
by the proxy into individual bundles of data that can be routed to different adapters.
你可以通过创建 [*instances*](/zh/docs/reference/config/policy-and-telemetry/mixer-overview/#instances) 来决定哪些数据被传递给特定的适配器。Instances 决定了 Mixer 如何通过 [attributes](/zh/docs/reference/config/policy-and-telemetry/mixer-overview/#attributes) 把来自代理的属性拆分为各种数据然后分发给不同的适配器。
Creating instances generally requires using [attribute expressions](/zh/docs/reference/config/policy-and-telemetry/expression-language/). The point of these expressions is to use any attribute or literal value in order to produce a result that can be assigned to an instances field.
创建实例通常需要使用 [attribute expressions](/zh/docs/reference/config/policy-and-telemetry/expression-language/) 。这些表达式的功能是使用属性和常量来生成结果数据用于给instance字段进行赋值。
Every instance field has a type, as defined in the template, every attribute has a
[type](https://github.com/istio/api/blob/{{< source_branch_name >}}/policy/v1beta1/value_type.proto), and every attribute expression has a type.
You can only assign type-compatible expressions to any given instance fields. For example, you cant assign an integer expression
to a string field. This kind of strong typing is designed to minimize the risk of creating bogus configurations.
在模板中定义的每个 instance 字段、每个属性、每个表达式都有一个 [type](https://github.com/istio/api/blob/{{< source_branch_name >}}/policy/v1beta1/value_type.proto),只有兼容的数据类型才能进行赋值。例如不能把整型的表达式赋值给字符串类型。强类型设计的目的就是为了降低配置出错引发的风险。
## Rules: delivering data to adapters
## Rules: 将数据交付给适配器{#rules-delivering-data-to-adapters}
The last piece to the puzzle is telling Mixer which instances to send to which handler and when. This is done by
creating [*rules*](/zh/docs/reference/config/policy-and-telemetry/mixer-overview/#rules). Each rule identifies a specific handler and the set of
instances to send to that handler. Whenever Mixer processes an incoming call, it invokes the indicated handler and gives it the specific set of instances for processing.
最后一个问题就是告诉 Mixer 哪个 instance 在什么时候发送给哪个 handler。这个通过创建 [*rules*](/zh/docs/reference/config/policy-and-telemetry/mixer-overview/#rules) 实现。每个规则都会指定一个特定的处理程序和要发送给该处理程序的示例。当 Mixer 处理一个调用时,它会调用指定的处理程序,并给他一组特定的处理实例。
Rules contain matching predicates. A predicate is an attribute expression which returns a true/false value. A rule only takes effect if its predicate expression returns true. Otherwise, its like the rule didnt exist and the indicated handler isnt invoked.
Rule 中包含有匹配断言,这个断言是一个返回布尔值的属性表达式。只有属性表达式断言成功的 Rule 才会生效,否则这条规则就形同虚设,当然其中的 Handler 也不会被调用。
## Future
## 未来的工作{#future}
We are working to improve the end to end experience of using and developing adapters. For example, several new features are planned to make templates more expressive. Additionally, the expression language is being substantially enhanced to be more powerful and well-rounded.
我们正在努力改进和提升适配器的使用及开发。例如,计划中包含很多新特性使用户更加方便地使用 Templates。另外表达式语言也正在不断的发展和成熟。
Longer term, we are evaluating ways to support adapters which arent directly linked into the main Mixer binary. This would simplify deployment and composition.
长远来看,我们正在寻找不直接将适配器直接连接到 Mixer 二进制的方法。这将简化部署和开发使用。
## Conclusion
## 结论{#conclusion}
The refreshed Mixer adapter model is designed to provide a flexible framework to support an open-ended set of infrastructure backends.
新的 Mixer 适配器模型的设计是为了提供一个灵活的框架用来支持一个开放基础设施。
Handlers provide configuration data for individual adapters, templates determine exactly what kind of data different adapters want to consume at runtime, instances let operators prepare this data, rules direct the data to one or more handlers.
Handler 为各个适配器提供了配置数据Template 用于在运行时确定不同的适配器所需的数据类型Instance 让运维人员准备这些数据Rule 将这些数据提交给一个或多个 Handler 进行处理。
You can learn more about Mixer's overall architecture [here](/zh/docs/reference/config/policy-and-telemetry/mixer-overview/), and learn the specifics of templates, handlers,
and rules [here](/zh/docs/reference/config/policy-and-telemetry). You can find many examples of Mixer configuration resources in the Bookinfo sample
[here]({{< github_tree >}}/samples/bookinfo).
更多信息可以关注 [这里](/zh/docs/reference/config/policy-and-telemetry/mixer-overview/)。更多关于 templates, handlers,和 rules 的内容可以关注 [这里](/zh/docs/reference/config/policy-and-telemetry/)。你也可以在[这里]({{< github_tree >}}/samples/bookinfo) 找到对应的示例。

View File

@ -1,8 +1,8 @@
---
title: Mixer and the SPOF Myth
description: Improving availability and reducing latency.
title: Mixer 和 SPOF 神话
description: 提高可用,降低延迟。
publishdate: 2017-12-07
subtitle: Improving availability and reducing latency
subtitle: 提高可用,降低延迟
attribution: Martin Taillefer
keywords: [adapters,mixer,policies,telemetry,availability,latency]
aliases:
@ -11,116 +11,98 @@ aliases:
target_release: 0.3
---
As [Mixer](/zh/docs/reference/config/policy-and-telemetry/) is in the request path, it is natural to question how it impacts
overall system availability and latency. A common refrain we hear when people first glance at Istio architecture diagrams is
"Isn't this just introducing a single point of failure?"
[Mixer](/zh/docs/reference/config/policy-and-telemetry/) 出现在请求路径上,很自然的会引发一个疑问:他对系统可用性和延迟会产生什么样的影响?第一次看到 Istio 架构图时,人们最常见的问题就是:"这不就是一个单点失败的典型案例么?”
In this post, well dig deeper and cover the design principles that underpin Mixer and the surprising fact Mixer actually
increases overall mesh availability and reduces average request latency.
本文中我们会深入挖掘和阐述 Mixer 的设计原则,在这些设计原则的支持下 Mixer 能够令人惊奇的提高网格内的系统可用性,降低平均请求延时。
Istio's use of Mixer has two main benefits in terms of overall system availability and latency:
Istio 的 Mixer 对系统总体可用性和延迟有两个主要的好处:
* **Increased SLO**. Mixer insulates proxies and services from infrastructure backend failures, enabling higher effective mesh availability. The mesh as a whole tends to experience a lower rate of failure when interacting with the infrastructure backends than if Mixer were not present.
* **提高 SLO**Mixer 把 Proxy 和服务从基础设施后端的故障中隔离出来,提供了高级、高效的网格可用性保障。作为一个整体来说,在同基础设施后端的交互中,有了 Mixer 的帮助,会有更低的故障率。
* **Reduced Latency**. Through aggressive use of shared multi-level caches and sharding, Mixer reduces average observed latencies across the mesh.
* **降低延迟**通过对各个层次的分片缓存的积极使用和共享Mixer 能够降低平均延迟。
We'll explain this in more detail below.
接下来会对上面的内容进行一下解释。
## How we got here
## Istio 是怎么来的{#how-we-got-here}
For many years at Google, weve been using an internal API & service management system to handle the many APIs exposed by Google. This system has been fronting the worlds biggest services (Google Maps, YouTube, Gmail, etc) and sustains a peak rate of hundreds of millions of QPS. Although this system has served us well, it had problems keeping up with Googles rapid growth, and it became clear that a new architecture was needed in order to tamp down ballooning operational costs.
Google 在多年中都在使用一个内部的 API 和服务管理系统,用于处理 Google 提供的众多 API。这一系统支持了最大的服务群Google Maps、YouTube 以及 Gmail 等),承受上百万 QPS 峰值的冲击。这套系统运行的虽然很好,但是仍然无法跟上 Google 快速增长的脚步,很显然,要有新的架构来降低飞涨的运维成本。
In 2014, we started an initiative to create a replacement architecture that would scale better. The result has proven extremely successful and has been gradually deployed throughout Google, saving in the process millions of dollars a month in ops costs.
2014 年,我们开始了一个草案,准备替换这一系统,进行更好的伸缩。这一决定最后证明是非常正确的,在 Google 进行整体部署之后,每月降低了上百万美元的运维成本。
The older system was built around a centralized fleet of fairly heavy proxies into which all incoming traffic would flow, before being forwarded to the services where the real work was done. The newer architecture jettisons the shared proxy design and instead consists of a very lean and efficient distributed sidecar proxy sitting next to service instances, along with a shared fleet of sharded control plane intermediaries:
过去,流量在进入具体的服务之前,首先会进入一个较重的代理,旧系统就是以这个代理为中心构建的。新的架构摒弃了共享代理的设计,用轻量高效的 Sidecar 代理取而代之,这一代理和服务实例并行,共享一个控制平面。
{{< image width="75%"
link="./mixer-spof-myth-1.svg"
title="Google System Topology"
caption="Google's API & Service Management System"
title="Google 系统拓扑"
caption="Google 的 API 和 服务管理系统"
>}}
Look familiar? Of course: its just like Istio! Istio was conceived as a second generation of this distributed proxy architecture. We took the core lessons from this internal system, generalized many of the concepts by working with our partners, and created Istio.
看起来很面熟吧?是的,跟 Istio 很像。Istio 就是作为这一分布式代理架构的继任者进行构思的。我们从内部系统中获取了核心的灵感,在同合作伙伴的协同工作中产生了很多概念,这些导致了 Istio 的诞生。
## Architecture recap
## 架构总结{#architecture-recap}
As shown in the diagram below, Mixer sits between the mesh and the infrastructure backends that support it:
下图中Mixer 在 Mesh 和基础设施之间:
{{< image width="75%" link="./mixer-spof-myth-2.svg" caption="Istio Topology" >}}
{{< image width="75%" link="./mixer-spof-myth-2.svg" caption="Istio 拓扑" >}}
The Envoy sidecar logically calls Mixer before each request to perform precondition checks, and after each request to report telemetry.
The sidecar has local caching such that a relatively large percentage of precondition checks can be performed from cache. Additionally, the
sidecar buffers outgoing telemetry such that it only actually needs to call Mixer once for every several thousands requests. Whereas precondition
checks are synchronous to request processing, telemetry reports are done asynchronously with a fire-and-forget pattern.
逻辑上Envoy Sidecar 会在每次请求之前调用 Mixer进行前置检查每次请求之后又要进行指标报告。Sidecar 中包含本地缓存一大部分的前置检查可以通过缓存来进行。另外Sidecar 会把待发送的指标数据进行缓冲,这样可能在几千次请求之后才调用一次 Mixer。前置检查和请求处理是同步的指标数据上送是使用 fire-and-forget 模式异步完成的。
At a high level, Mixer provides:
抽象一点说Mixer 提供:
* **Backend Abstraction**. Mixer insulates the Istio components and services within the mesh from the implementation details of individual infrastructure backends.
* **后端抽象**Mixer 把 Istio 组件和网格中的服务从基础设施细节中隔离开来。
* **Intermediation**. Mixer allows operators to have fine-grained control over all interactions between the mesh and the infrastructure backends.
* **中间人**Mixer 让运维人员能够对所有网格和基础设施后端之间的交互进行控制。
However, even beyond these purely functional aspects, Mixer has other characteristics that provide the system with additional benefits.
除了这些纯功能方面Mixer 还有一些其他特点,为系统提供更多益处。
## Mixer: SLO booster
### MixerSLO 助推器{#mixer-booster}
Contrary to the claim that Mixer is a SPOF and can therefore lead to mesh outages, we believe it in fact improves the effective availability of a mesh. How can that be? There are three basic characteristics at play:
有人说 Mixer 是一个 SPOF会导致 Mesh 的崩溃,而我们认为 Mixer 增加了 Mesh 的可用性。这是如何做到的?下面是三个理由:
* **Statelessness**. Mixer is stateless in that it doesnt manage any persistent storage of its own.
* **无状态**Mixer 没有状态,他不管理任何自己的持久存储。
* **Hardening**. Mixer proper is designed to be a highly reliable component. The design intent is to achieve > 99.999% uptime for any individual Mixer instance.
* **稳固**Mixer 是一个高可靠性的组件,设计要求所有 Mixer 实例都要有超过 99.999% 的可靠性。
* **Caching and Buffering**. Mixer is designed to accumulate a large amount of transient ephemeral state.
* **缓存和缓冲**Mixer 能够积累大量的短期状态数据。
The sidecar proxies that sit next to each service instance in the mesh must necessarily be frugal in terms of memory consumption, which constrains the possible amount of local caching and buffering. Mixer, however, lives independently and can use considerably larger caches and output buffers. Mixer thus acts as a highly-scaled and highly-available second-level cache for the sidecars.
Sidecar 代理伴随每个服务实例而运行,必须节约使用内存,这样就限制了本地缓存和缓冲的数量。但是 Mixer 是独立运行的,能使用更大的缓存和缓冲。因此 Mixer 为 Sidecar 提供了高伸缩性高可用的二级缓存服务。
Mixers expected availability is considerably higher than most infrastructure backends (those often have availability of perhaps 99.9%). Its local caches and buffers help mask infrastructure backend failures by being able to continue operating even when a backend has become unresponsive.
Mixer 的预期可用性明显高于多数后端(多数是 99.9%)。他的本地缓存和缓冲区能够在后端无法响应的时候继续运行,因此有助于对基础设施故障的屏蔽,降低影响。
## Mixer: Latency slasher
### Mixer延迟削减器{#mixer-latency-slasher}
As we explained above, the Istio sidecars generally have fairly effective first-level caching. They can serve the majority of their traffic from cache. Mixer provides a much greater shared pool of second-level cache, which helps Mixer contribute to a lower average per-request latency.
上面我们解释过Istio Sidecar 具备有效的一级缓存在为流量服务的时候多数时间都可以使用缓存来完成。Mixer 提供了更大的共享池作为二级缓存,这也帮助了 Mixer 降低平均请求的延迟。
While its busy cutting down latency, Mixer is also inherently cutting down the number of calls your mesh makes to infrastructure backends. Depending on how youre paying for these backends, this might end up saving you some cash by cutting down the effective QPS to the backends.
不只是降低延迟Mixer 还降低了 Mesh 到底层的请求数量,这样就能显著降低到基础设施后端的 QPS如果你要付款给这些后端那么这一优点就会节省更多成本。
## Work ahead
## 下一步{#work-ahead}
We have opportunities ahead to continue improving the system in many ways.
我们还有机会对系统做出更多改进。
### Configuration canaries
### 以金丝雀部署的方式进行配置发布{#configuration-canaries}
Mixer is highly scaled so it is generally resistant to individual instance failures. However, Mixer is still susceptible to cascading
failures in the case when a poison configuration is deployed which causes all Mixer instances to crash basically at the same time
(yeah, that would be a bad day). To prevent this from happening, configuration changes can be canaried to a small set of Mixer instances,
and then more broadly rolled out.
Mixer 具备高度的伸缩性,所以他通常不会故障。然而如果部署了错误的配置,还是会引发 Mixer 进程的崩溃。为了防止这种情况的出现,可以用金丝雀部署的方式来发布配置,首先为一小部分 Mixer 进行部署,然后扩大部署范围。
Mixer doesnt yet do canarying of configuration changes, but we expect this to come online as part of Istios ongoing work on reliable
configuration distribution.
目前的 Mixer 并未具备这样的能力,我们期待这一功能成为 Istio 可靠性配置工作的一部分最终得以发布。
### Cache tuning
### 缓存调优{#cache-tuning}
We have yet to fine-tune the sizes of the sidecar and Mixer caches. This work will focus on achieving the highest performance possible using the least amount of resources.
我们的 Sidecar 和 Mixer 缓存还需要更好的调整,这部分的工作会着眼于资源消耗的降低和性能的提高。
### Cache sharing
### 缓存共享{#cache-sharing}
At the moment, each Mixer instance operates independently of all other instances. A request handled by one Mixer instance will not leverage data cached in a different instance. We will eventually experiment with a distributed cache such as memcached or Redis in order to provide a much larger mesh-wide shared cache, and further reduce the number of calls to infrastructure backends.
现在 Mixer 的实例之间是各自独立的。一个请求在被某个 Mixer 实例处理之后,并不会把过程中产生的缓存传递给其他 Mixer 实例。我们最终会试验使用 Memcached 或者 Redis 这样的分布式缓存,以期提供一个网格范围内的共享缓存,更好的降低对后端基础设施的调用频率。
### Sharding
### 分片{#Sharding}
In very large meshes, the load on Mixer can be great. There can be a large number of Mixer instances, each straining to keep caches primed to
satisfy incoming traffic. We expect to eventually introduce intelligent sharding such that Mixer instances become slightly specialized in
handling particular data streams in order to increase the likelihood of cache hits. In other words, sharding helps improve cache
efficiency by routing related traffic to the same Mixer instance over time, rather than randomly dispatching to
any available Mixer instance.
在大规模的网格中Mixer 的负载可能很重。我们可以使用大量的 Mixer 实例,每个实例都为各自承担的流量维护各自的缓存。我们希望引入智能分片能力,这样 Mixer 实例就能针对特定的数据流提供特定的服务,从而提高缓存命中率;换句话说,分片可以利用把相似的流量分配给同一个 Mixer 实例的方式来提高缓存效率,而不是把请求交给随机选择出来的 Mixer 实例进行处理。
## Conclusion
## 结语{#conclusion}
Practical experience at Google showed that the model of a slim sidecar proxy and a large shared caching control plane intermediary hits a sweet
spot, delivering excellent perceived availability and latency. Weve taken the lessons learned there and applied them to create more sophisticated and
effective caching, prefetching, and buffering strategies in Istio. Weve also optimized the communication protocols to reduce overhead when a cache miss does occur.
Google 的实际经验展示了轻代理、大缓存控制平面结合的好处:提供更好的可用性和延迟。过去的经验帮助 Istio 构建了更精确更有效的缓存、预抓取以及缓冲策略等功能。我们还优化了通讯协议,用于降低缓存无法命中的时候,对性能产生的影响。
Mixer is still young. As of Istio 0.3, we havent really done significant performance work within Mixer itself. This means when a request misses the sidecar
cache, we spend more time in Mixer to respond to requests than we should. Were doing a lot of work to improve this in coming months to reduce the overhead
that Mixer imparts in the synchronous precondition check case.
Mixer 还很年轻。在 Istio 0.3 中Mixer 并没有性能方面的重要改进。这意味着如果一个请求没有被 Sidecar 缓存命中Mixer 就会花费更多时间。未来的几个月中我们会做很多工作来优化同步的前置检查过程中的这种情况。
We hope this post makes you appreciate the inherent benefits that Mixer brings to Istio.
Dont hesitate to post comments or questions to [istio-policies-and-telemetry@](https://groups.google.com/forum/#!forum/istio-policies-and-telemetry).
我们希望本文能够让读者能够意识到 Mixer 对 Istio 的益处。
如果有意见或者问题,无需犹豫,请前往 [istio-policies-and-telemetry@](https://groups.google.com/forum/#!forum/istio-policies-and-telemetry)。

View File

@ -1,6 +1,6 @@
---
title: Sidestepping Dependency Ordering with AppSwitch
description: Addressing application startup ordering and startup latency using AppSwitch.
title: 使用 AppSwitch 进行 Sidestepping 依赖性排序
description: 使用 AppSwitch 解决应用程序启动顺序和启动延迟。
publishdate: 2019-01-14
subtitle:
attribution: Dinesh Subhraveti (AppOrbit and Columbia University)
@ -8,74 +8,78 @@ keywords: [appswitch,performance]
target_release: 1.0
---
We are going through an interesting cycle of application decomposition and recomposition. While the microservice paradigm is driving monolithic applications to be broken into separate individual services, the service mesh approach is helping them to be connected back together into well-structured applications. As such, microservices are logically separate but not independent. They are usually closely interdependent and taking them apart introduces many new concerns such as need for mutual authentication between services. Istio directly addresses most of those issues.
我们正在经历一个有趣事情,对应用程序进行拆分和重组。虽然微服务需要把单体应用分解为多个微型服务,但服务网格会把这些服务连结为一个应用程序。因此,微服务是逻辑上分离而又不是相互独立的。它们通常是紧密相互依赖的,而拆分单体应用的同时会引入了许多新的问题,例如服务之间需要双向认证等。而 Istio 恰巧能解决大多数问题。
## Dependency ordering problem
## 依赖性排序问题{#dependency-ordering-problem}
An issue that arises due to application decomposition and one that Istio doesnt address is dependency ordering -- bringing up individual services of an application in an order that guarantees that the application as a whole comes up quickly and correctly. In a monolithic application, with all its components built-in, dependency ordering between the components is enforced by internal locking mechanisms. But with individual services potentially scattered across the cluster in a service mesh, starting a service first requires checking that the services it depends on are up and available.
依赖性排序的问题是由于应用程序拆分而导致的问题且 Istio 也尚未解决 - 确保应用程序整体快速正确地的顺序启动应用程序的各个服务。在单体应用程序中,内置所有组件,组件之间的依赖顺序由内部锁机制强制执行。但是,如果单个服务分散在服务网格的集群中,则启动服务需要首先检查它所依赖的服务是否已启动且可用。
Dependency ordering is deceptively nuanced with a host of interrelated problems. Ordering individual services requires having the dependency graph of the services so that they can be brought up starting from leaf nodes back to the root nodes. It is not easy to construct such a graph and keep it updated over time as interdependencies evolve with the behavior of the application. Even if the dependency graph is somehow provided, enforcing the ordering itself is not easy. Simply starting the services in the specified order obviously wont do. A service may have started but not be ready to accept connections yet. This is the problem with docker-compose's `depends-on` tag, for example.
依赖性排序由于存在许多相互关联的问题而具有欺骗性。对单个服务进行排序需要具有服务的依赖关系图,以便它们可以从叶节点开始返回到根节点。由于相互依赖性随着应用程序的行为而发展,因此构建这样的图并随时保持更新并不容易。即使以某种方式提供依赖图,强制执行排序本身并不容易。简单地按指定的顺序启动服务显然是行不通的。服务可能已启动但尚未准备好提供服务。例如 docker-compose 中的 depends-on 标签就存在这样的问题。
Apart from introducing sufficiently long sleeps between service startups, a common pattern that is often used is to check for readiness of dependencies before starting a service. In Kubernetes, this could be done with a wait script as part of the init container of the pod. However that means that the entire application would be held up until all its dependencies come alive. Sometimes applications spend several minutes initializing themselves on startup before making their first outbound connection. Not allowing a service to start at all adds substantial overhead to overall startup time of the application. Also, the strategy of waiting on the init container won't work for the case of multiple interdependent services within the same pod.
除了在服务启动之间引入足够长的睡眠之外,还有一个常见的模式是,在启动服务之前检查被依赖的服务是否已经准备就绪。在 Kubernetes 中,可以用在 Pod 的 Init 容器中加入等待脚本的方式来完成。但是,这意味着整个应用程序将被暂停,直到所有的依赖服务都准备就绪。有时,应用程序会在启动第一次出站连接之前花几分钟时间初始化自己。不允许服务启动会增加应用程序整体启动时间的大量开销。此外,等待 init 容器的策略不适用于同一 pod 中的多个服务相互依赖的情况。
### Example scenario: IBM WebSphere ND
### 示例场景IBM WebSphere ND{#example-scenario-IBM-WebSphere}
Let us consider IBM WebSphere ND -- a widely deployed application middleware -- to grok these problems more closely. It is a fairly complex framework in itself and consists of a central component called deployment manager (`dmgr`) that manages a set of node instances. It uses UDP to negotiate cluster membership among the nodes and requires that deployment manager is up and operational before any of the node instances can come up and join the cluster.
IBM WebSphere ND 是一个常见的应用程序中间件,通过对它的观察,能够更好地理解这种问题。它本身就是一个相当复杂的框架,由一个名为 Deployment manager`dmgr`)的中央组件组成,它管理一组节点实例。它使用 UDP 协商节点之间的集群成员资格,并要求部署管理器在任何节点实例出现并加入集群之前已启动并可运行。
Why are we talking about a traditional application in the modern cloud-native context? It turns out that there are significant gains to be had by enabling them to run on the Kubernetes and Istio platforms. Essentially it's a part of the modernization journey that allows running traditional apps alongside green-field apps on the same modern platform to facilitate interoperation between the two. In fact, WebSphere ND is a demanding application. It expects a consistent network environment with specific network interface attributes etc. AppSwitch is equipped to take care of those requirements. For the purpose of this blog however, I'll focus on the dependency ordering requirement and how AppSwitch addresses it.
为什么我们在现代云原生环境中讨论传统应用程序?事实证明,通过使它们能够在 Kubernetes 和 Istio 平台上运行可以获得显著的收益。从本质上讲它是现代化之旅的一部分它允许在同一现代平台上运行传统应用程序和全新的现代应用程序以促进两者之间的互操作。实际上WebSphere ND 是一个要求很高的应用程序。它期望具有特定网络接口属性的一致网络环境等。AppSwitch 可以满足这些要求。本博客的将重点关注依赖顺序需求以及 AppSwitch 在这方面的解决方法。
Simply deploying `dmgr` and node instances as pods on a Kubernetes cluster does not work. `dmgr` and the node instances happen to have a lengthy initialization process that can take several minutes. If they are all co-scheduled, the application typically ends up in a funny state. When a node instance comes up and finds that `dmgr` is missing, it would take an alternate startup path. Instead, if it had exited immediately, Kubernetes crash-loop would have taken over and perhaps the application would have come up. But even in that case, it turns out that a timely startup is not guaranteed.
在 Kubernetes 集群上简单地部署 `dmgr` 和节点实例作为 pod 是行不通的。`dmgr` 和节点实例碰巧有一个很长的初始化过程,可能需要几分钟。如果他们被同时部署,那么应用程序通常会处于一个有趣的状态。当一个节点实例出现并发现缺少 `dmgr`它将需要一个备用启动路径。相反如果它立即退出Kubernetes 崩溃循环将接管,也许应用程序会出现。但即使在这种情况下,事实证明及时启动并不能得到保证。
One `dmgr` along with its node instances is a basic deployment configuration for WebSphere ND. Applications like IBM Business Process Manager that are built on top of WebSphere ND running in production environments include several other services. In those configurations, there could be a chain of interdependencies. Depending on the applications hosted by the node instances, there may be an ordering requirement among them as well. With long service initialization times and crash-loop restarts, there is little chance for the application to start in any reasonable length of time.
一个 `dmgr` 及其节点实例是 WebSphere ND 的基本部署配置。构建在生产环境中运行的 WebSphere ND 之上的 IBM Business Process Manager 等应用程序包括其他一些服务。在这些配置中,可能存在一系列相互依赖关系。根据节点实例托管的应用程序,它们之间也可能存在排序要求。使用较长的服务初始化时间和崩溃循环重启,应用程序几乎没有机会在任何合理的时间内启动。
### Sidecar dependency in Istio
### Istio 中的 Sidecar 依赖{#sidecar-dependency-in-Istio}
Istio itself is affected by a version of the dependency ordering problem. Since connections into and out of a service running under Istio are redirected through its sidecar proxy, an implicit dependency is created between the application service and its sidecar. Unless the sidecar is fully operational, all requests from and to the service get dropped.
Istio 本身受依赖性排序问题版本的影响。由于在 Istio 下运行的服务的连接通过其 sidecar 代理重定向,因此在应用程序服务及其 sidecar 之间创建了隐式依赖关系。除非 sidecar 完全正常运行,否则所有来自服务的请求都将被丢弃。
## Dependency ordering with AppSwitch
## 使用 AppSwitch 进行依赖性排序{#dependency-ordering-with-AppSwitch}
So how do we go about addressing these issues? One way is to defer it to the applications and say that they are supposed to be "well behaved" and implement appropriate logic to make themselves immune to startup order issues. However, many applications (especially traditional ones) either timeout or deadlock if misordered. Even for new applications, implementing one off logic for each service is substantial additional burden that is best avoided. Service mesh needs to provide adequate support around these problems. After all, factoring out common patterns into an underlying framework is really the point of service mesh.
那么我们如何解决这些问题呢?一种方法是将其推迟到应用程序并说它们应该“表现良好”并实施适当的逻辑以使自己免受启动顺序问题的影响。但是,许多应用程序(尤其是传统应用程序)如果错误则会超时或死锁。即使对于新的应用程序,为每个服务实现一个关闭逻辑也是最大的额外负担,最好避免。服务网格需要围绕这些问题提供足够的支持。毕竟,将常见模式分解为底层框架实际上是服务网格的重点。
[AppSwitch](http://appswitch.io) explicitly addresses dependency ordering. It sits on the control path of the applications network interactions between clients and services in a cluster and knows precisely when a service becomes a client by making the `connect` call and when a particular service becomes ready to accept connections by making the `listen` call. It's _service router_ component disseminates information about these events across the cluster and arbitrates interactions among clients and servers. That is how AppSwitch implements functionality such as load balancing and isolation in a simple and efficient manner. Leveraging the same strategic location of the application's network control path, it is conceivable that the `connect` and `listen` calls made by those services can be lined up at a finer granularity rather than coarsely sequencing entire services as per a dependency graph. That would effectively solve the multilevel dependency problem and speedup application startup.
[AppSwitch](http://appswitch.io) 明确地解决了依赖性排序。它位于应用程序在集群中的客户端和服务之间的网络交互的控制路径上,并且通过进行 `connect` 调用以及当特定服务通过使 `listen` 准备好接受连接时,准确地知道服务何时成为客户端。调用它的 _service router_ 组件在集群中传播有关这些事件的信息并仲裁客户端和服务器之间的交互。AppSwitch 就是以这种简单有效的方式实现负载均衡和隔离等功能的。利用应用程序的网络控制路径的相同战略位置,可以想象这些服务所做的“连接”和“监听”调用可以以更精细的粒度排列,而不是按照依赖关系图对整个服务进行粗略排序。这将有效地解决多级依赖问题和加速应用程序启动。
But that still requires a dependency graph. A number of products and tools exist to help with discovering service dependencies. But they are typically based on passive monitoring of network traffic and cannot provide the information beforehand for any arbitrary application. Network level obfuscation due to encryption and tunneling also makes them unreliable. The burden of discovering and specifying the dependencies ultimately falls to the developer or the operator of the application. As it is, even consistency checking a dependency specification is itself quite complex and any way to avoid requiring a dependency graph would be most desirable.
但这仍然需要一个依赖图。存在许多产品和工具来帮助发现服务依赖性。但它们通常基于对网络流量的被动监控,并且无法预先为任意应用程序提供信息。由于加密和隧道导致的网络级混淆也使它们不可靠。发现和指定依赖项的负担最终落在应用程序的开发人员或操作员身上。实际上,甚至一致性检查依赖性规范本身也非常复杂,相对来说,能够避免使用依赖图的任何方法都会更加理想。
The point of a dependency graph is to know which clients depend on a particular service so that those clients can then be made to wait for the respective service to become live. But does it really matter which specific clients? Ultimately one tautology that always holds is that all clients of a service have an implicit dependency on the service. Thats what AppSwitch leverages to get around the requirement. In fact, that sidesteps dependency ordering altogether. All services of the application can be co-scheduled without regard to any startup order. Interdependencies among them automatically work themselves out at the granularity of individual requests and responses, resulting in quick and correct application startups.
依赖图的要点是知道哪些客户端依赖于特定服务以便让客户端能够等待被依赖服务准备就绪。但具体客户真的重要吗归根结底一个服务的所有客户端都是依赖这个服务的。AppSwitch 正是利用这一点来解决依赖问题。事实上,这完全避免了依赖性排序。可以同时调度应用程序中的所有服务,而无需考虑启动顺序。它们之间的相互依赖性会根据各个请求和响应的粒度自动完成,从而实现快速,正确的应用程序启动。
### AppSwitch model and constructs
### AppSwitch 模型和构造{#AppSwitch-model-and-constructs}
Now that we have a conceptual understanding of AppSwitchs high-level approach, lets look at the constructs involved. But first a quick summary of the usage model is in order. Even though it is written for a different context, reviewing my earlier [blog](/zh/blog/2018/delayering-istio/) on this topic would be useful as well. For completeness, let me also note AppSwitch doesnt bother with non-network dependencies. For example it may be possible for two services to interact using IPC mechanisms or through the shared file system. Processes with deep ties like that are typically part of the same service anyway and dont require frameworks intervention for ordering.
既然我们对 AppSwitch 的高级方法有了概念性的理解,那么让我们来看看所涉及的结构。但首先要对使用模型进行快速总结。尽管它是针对不同的上下文编写的,但在此主题上查看我之前的 [blog](/zh/blog/2018/delayering-istio/) 也很有用。为了完整起见,我还要注意 AppSwitch 不会打扰非网络依赖。例如,两个服务可能使用 IPC 机制或通过共享文件系统进行交互。像这样的深层联系的流程通常是同一服务的一部分,并且不需要主动地对应用程序的正常执行进行干预。
At its core, AppSwitch is built on a mechanism that allows instrumenting the BSD socket API and other related calls like `fcntl` and `ioctl` that deal with sockets. As interesting as the details of its implementation are, its going to distract us from the main topic, so Id just summarize the key properties that distinguish it from other implementations. (1) Its fast. It uses a combination of `seccomp` filtering and binary instrumentation to aggressively limit intervening with applications normal execution. AppSwitch is particularly suited for service mesh and application networking use cases given that it implements those features without ever having to actually touch the data. In contrast, network level approaches incur per-packet cost. Take a look at this [blog](/zh/blog/2018/delayering-istio/) for some of the performance measurements. (2) It doesnt require any kernel support, kernel module or a patch and works on standard distro kernels (3) It can run as regular user (no root). In fact, the mechanism can even make it possible to run [Docker daemon without root](https://linuxpiter.com/en/materials/2478) by removing root requirement to network containers (4) It doesnt require any changes to the applications whatsoever and works for any type of application -- from WebSphere ND and SAP to custom C apps to statically linked Go apps. Only requirement at this point is Linux/x86.
AppSwitch 的核心能够使用 BSD Socket API 及其相关的其它调用(例如 `fcntl` 和 ioctl来完成对 Socket 的处理。它的实现细节很有意思,但是为了防止偏离本文的主题,这里仅对其独特的关键属性进行一个总结。
1速度很快。它使用 `seccomp` 过滤和二进制检测的组合来积极地限制应用程序正常执行的干预。AppSwitch 特别适用于服务网格和应用程序网络用例,因为它实现了这些功能,而无需实际触摸数据。相反,网络级方法会导致每个数据包的成本。看看这个[博客](/zh/blog/2018/delayering-istio/)进行一些性能测量。
2它不需要任何内核支持内核模块或补丁可以在标准的发行版内核上运行
3它可以作为普通用户运行非 root。事实上该机制甚至可以通过删除对网络容器的根要求来运行 [非 root 的 Docker 守护进程](https://linuxpiter.com/en/materials/2478)
4它可以不加更改的用于任何类型的应用程序上适用于任何类型的应用程序 - 从 WebSphere ND 和 SAP 到自定义 C 应用程序,再到静态链接的 `Golang` 应用程序。Linux/x86 是仅有的运行需求。
### Decoupling services from their references
### 将服务与其引用分离{#decoupling-services-from-their-references}
AppSwitch is built on the fundamental premise that applications should be decoupled from their references. The identity of applications is traditionally derived from the identity of the host on which they run. However, applications and hosts are very different objects that need to be referenced independently. Detailed discussion around this topic along with a conceptual foundation of AppSwitch is presented in this [research paper](https://arxiv.org/abs/1711.02294).
AppSwitch 建立在应用程序应与其引用分离的基本前提之上。传统上,应用程序的标识源自它们运行的​​主机的标识。但是,应用程序和主机是需要独立引用的非常不同的对象。本 [主题](https://arxiv.org/abs/1711.02294)介绍了围绕此主题的详细讨论以及 AppSwitch 的概念基础。
The central AppSwitch construct that achieves the decoupling between services objects and their identities is _service reference_ (_reference_, for short). AppSwitch implements service references based on the API instrumentation mechanism outlined above. A service reference consists of an IP:port pair (and optionally a DNS name) and a label-selector that selects the service represented by the reference and the clients to which this reference applies. A reference supports a few key properties. (1) It can be named independently of the name of the object it refers to. That is, a service may be listening on an IP and port but a reference allows that service to be reached on any other IP and port chosen by the user. This is what allows AppSwitch to run traditional applications captured from their source environments with static IP configurations to run on Kubernetes by providing them with necessary IP addresses and ports regardless of the target network environment. (2) It remains unchanged even if the location of the target service changes. A reference automatically redirects itself as its label-selector now resolves to the new instance of the service (3) Most important for this discussion, a reference remains valid even as the target service is coming up.
实现服务对象及其身份之间解耦的中央 AppSwitch 构造是 _service reference_(简称 _reference_ 。AppSwitch 基于上面概述的 API 检测机制实现服务引用。服务引用由 IP端口对以及可选的 DNS 名称和标签选择器组成标签选择器选择引用所代表的服务以及此引用所适用的客户端。引用支持一些关键属性。1它的名称可以独立于它所引用的对象的名称。也就是说服务可能正在侦听 IP 和端口,但是引用允许在用户选择的任何其他 IP 和端口上达到该服务。这使 AppSwitch 能够运行从源环境中捕获的传统应用程序,通过静态 IP 配置在 Kubernetes 上运行,为其提供必要的 IP 地址和端口而不管目标网络环境如何。2即使目标服务的位置发生变化它也保持不变。引用自动重定向自身因为其标签选择器现在解析为新的服务实例3对于此讨论最重要的是在目标服务的启动过程中引用就已经生效了。
To facilitate discovering services that can be accessed through service references, AppSwitch provides an _auto-curated service registry_. The registry is automatically kept up to date as services come and go across the cluster based on the network API that AppSwitch tracks. Each entry in the registry consists of the IP and port where the respective service is bound. Along with that, it includes a set of labels indicating the application to which this service belongs, the IP and port that the application passed through the socket API when creating the service, the IP and port where AppSwitch actually bound the service on the underlying host on behalf of the application etc. In addition, applications created under AppSwitch carry a set of labels passed by the user that describe the application together with a few default system labels indicating the user that created the application and the host where the application is running etc. These labels are all available to be expressed in the label-selector carried by a service reference. A service in the registry can be made accessible to clients by creating a service reference. A client would then be able to reach the service at the references name (IP:port). Now lets look at how AppSwitch guarantees that the reference remains valid even when the target service has not yet come up.
为了便于发现可通过服务引用访问的服务AppSwitch 提供了一个 _auto-curated 服务注册表_。根据 AppSwitch 跟踪的网络 API当服务进出群集时注册表会自动保持最新。注册表中的每个条目都包含相应服务绑定的 IP 和端口。除此之外,它还包括一组标签,指示此服务所属的应用程序,应用程序在创建服务时通过 Socket API 传递的 IP 和端口AppSwitch 实际绑定基础主机上的服务的 IP 和端口此外,在 AppSwitch 下创建的应用程序带有一组用户传递的标签用于描述应用程序以及一些默认系统标签指示创建应用程序的用户和运行应用程序的主机等。这些标签都可以在服务引用所携带的标签选择器中表示。通过创建服务引用可以使客户端访问注册表中的服务。然后客户端将能够以引用的名称IP端口访问服务。现在让我们来看看 AppSwitch 如何在目标服务尚在启动的过程中就让服务引用开始生效的。
### Non-blocking requests
### 非阻塞请求{#non-blocking-requests}
AppSwitch leverages the semantics of the BSD socket API to ensure that service references appear valid from the perspective of clients as corresponding services come up. When a client makes a blocking connect call to another service that has not yet come up, AppSwitch blocks the call for a certain time waiting for the target service to become live. Since it is known that the target service is a part of the application and is expected to come up shortly, making the client block rather than returning an error such as `ECONNREFUSED` prevents the application from failing to start. If the service doesnt come up within time, an error is returned to the application so that framework-level mechanisms like Kubernetes crash-loop can kick in.
AppSwitch 利用 BSD Socket API 的语义确保服务引用从客户的角度看起来是有效的因为相应的服务出现了。当客户端对一个尚未启动的服务发起阻塞式连接调用时AppSwitch 会阻止该调用一段时间等待目标服务变为活动状态。由于已知目标服务是应用程序的一部分并且预计很快就会出现,因此客户端会被阻塞,而不是收到 ECONNREFUSED 之类的返回信息导致启动失败。如果服务没有及时出现,则会向应用程序返回一个错误,以便像 Kubernetes 崩溃循环这样的框架级机制可以启动。
If the client request is marked as non-blocking, AppSwitch handles that by returning `EAGAIN` to inform the application to retry rather than give up. Once again, that is in-line with the semantics of socket API and prevents failures due to startup races. AppSwitch essentially enables the retry logic already built into applications in support of the BSD socket API to be transparently repurposed for dependency ordering.
如果客户端请求被标记为非阻塞,则 AppSwitch 通过返回 `EAGAIN` 来处理该请求以通知应用程序重试而不是放弃。再次,这与 Socket API 的语义一致并防止由于启动竞争而导致的失败。AppSwitch 通过对 BSD Socket API 的支持,将重试逻辑内置到应用程序之中,从而透明的为应用提供了依赖排序支持。
### Application timeouts
### 应用程序超时{#application-timeouts}
What if the application times out based on its own internal timer? Truth be told, AppSwitch can also fake applications perception of time if wanted but that would be overstepping and actually unnecessary. Application decides and knows best how long it should wait and its not appropriate for AppSwitch to mess with that. Application timeouts are conservatively long and if the target service still hasnt come up in time, it is unlikely to be a dependency ordering issue. There must be something else going on that should not be masked.
如果应用程序基于其自己的内部计时器超时怎么办说实话如果需要AppSwitch 还可以伪造应用程序对时间的感知,但这种做法不仅越界,而且并无必要。应用程序决定并知道它应该等待多长时间,这对 AppSwitch 来说是不合适的。应用程序超过保守时长如果目标服务仍未及时出现则不太可能是依赖性排序问题。一定是出现了其它问题AppSwitch 不应掩盖这些问题。
### Wildcard service references for sidecar dependency
### 为 Sidecar 提供服务引用的通配符支持{#wildcard-service-references-for-sidecar-dependency}
Service references can be used to address the Istio sidecar dependency issue mentioned earlier. AppSwitch allows the IP:port specified as part of a service reference to be a wildcard. That is, the service reference IP address can be a netmask indicating the IP address range to be captured. If the label selector of the service reference points to the sidecar service, then all outgoing connections of any application for which this service reference is applied, will be transparently redirected to the sidecar. And of course, the service reference remains valid while sidecar is still coming up and the race is removed.
服务引用可用于解决前面提到的 Istio sidecar 依赖性问题。AppSwitch 用 IP端口的方式来描述对服务的引用这种描述中是可以使用通配符的。也就是说服务引用描述中可以用IP 掩码的形式来表达要捕捉的 IP 地址的范围。如果服务引用的标签选择器指向 sidecar 服务,则应用此服务引用的任何应用程序的所有传出连接将被透明地重定向到 sidecar。当然在 Sidecar 启动过程中,服务引用仍然是有效的。
Using service references for sidecar dependency ordering also implicitly redirects applications connections to the sidecar without requiring iptables and attendant privilege issues. Essentially it works as if the application is directly making connections to the sidecar rather than the target destination, leaving the sidecar in charge of what to do. AppSwitch would interject metadata about the original destination etc. into the data stream of the connection using the proxy protocol that the sidecar could decode before passing the connection through to the application. Some of these details were discussed [here](/zh/blog/2018/delayering-istio/). That takes care of outbound connections but what about incoming connections? With all services and their sidecars running under AppSwitch, any incoming connections that would have come from remote nodes would be redirected to their respective remote sidecars. So nothing special to do about incoming connections.
使用 sidecar 依赖性排序的服务引用也隐式地将应用程序的连接重定向到 sidecar ,而不需要 iptables 和随之而来的权限问题。基本上它就像应用程序直接连接到 sidecar 而不是目标目的地一样工作,让 sidecar 负责做什么。AppSwitch 将使用 sidecar 可以在将连接传递到应用程序之前解码的代理协议将关于原始目的地等的元数据插入到连接的数据流中。其中一些细节已在 [此处](/zh/blog/2018/delayering-istio/) 进行了讨论。出站连接是这样处理的,那么入站连接呢?由于所有服务及其 sidecar 都在 AppSwitch 下运行,因此来自远程节点的任何传入连接都将被重定向到各自的远程 sidecar 。所以传入连接没有什么特别处理。
## Summary
## 总结{#summary}
Dependency ordering is a pesky problem. This is mostly due to lack of access to fine-grain application-level events around inter-service interactions. Addressing this problem would have normally required applications to implement their own internal logic. But AppSwitch makes those internal application events to be instrumented without requiring application changes. AppSwitch then leverages the ubiquitous support for the BSD socket API to sidestep the requirement of ordering dependencies.
依赖顺序是一个讨厌的问题。这主要是由于无法访问有关服务间交互的细粒度应用程序级事件。解决这个问题通常需要应用程序来实现自己的内部逻辑。但 AppSwitch 使这些内部应用程序事件无需更改应用程序即可进行检测。然后AppSwitch 利用对 BSD Socket API 的普遍支持来回避排序依赖关系的要求。
## Acknowledgements
## 致谢{#acknowledgements}
Thanks to Eric Herness and team for their insights and support with IBM WebSphere and BPM products as we modernized them onto the Kubernetes platform and to Mandar Jog, Martin Taillefer and Shriram Rajagopalan for reviewing early drafts of this blog.
感谢 Eric Herness 和团队对 IBM WebSphere 和 BPM 产品的见解和支持,我们将它们应用到现代化 Kubernetes 平台,还要感谢 Mandar JogMartin Taillefer 和 Shriram Rajagopalan 对于此博客早期草稿的评审。

View File

@ -1,34 +1,32 @@
---
title: Deploy a Custom Ingress Gateway Using Cert-Manager
description: Describes how to deploy a custom ingress gateway using cert-manager manually.
subtitle: Custom ingress gateway
title: 使用 Cert-Manager 部署一个自定义 Ingress 网关
description: 如何使用 cert-manager 手工部署一个自定义 Ingress 网关。
subtitle: 自定义 Ingress 网关
publishdate: 2019-01-10
keywords: [ingress,traffic-management]
attribution: Julien Senon
target_release: 1.0
---
This post provides instructions to manually create a custom ingress [gateway](/zh/docs/reference/config/networking/gateway/) with automatic provisioning of certificates based on cert-manager.
本文介绍了手工创建自定义 Ingress [Gateway](/zh/docs/reference/config/networking/gateway/) 的过程,其中使用 cert-manager 完成了证书的自动管理。
The creation of custom ingress gateway could be used in order to have different `loadbalancer` in order to isolate traffic.
自定义 Ingress 网关在使用不同负载均衡器来隔离通信的情况下很有帮助。
## Before you begin
## 开始之前{#before-you-begin}
* Setup Istio by following the instructions in the
[Installation guide](/zh/docs/setup/).
* Setup `cert-manager` with helm [chart](https://github.com/helm/charts/tree/master/stable/cert-manager#installing-the-chart)
* We will use `demo.mydemo.com` for our example,
it must be resolved with your DNS
* 根据 [安装指南](/zh/docs/setup/) 完成 Istio 的部署。
* 用 Helm [Chart](https://github.com/helm/charts/tree/master/stable/cert-manager#installing-the-chart) 部署 `cert-manager`
* 我们会使用 `demo.mydemo.com` 进行演示,因此你的 DNS 解析要能够解析这个域名。
## Configuring the custom ingress gateway
## 配置自定义 Ingress 网关{#configuring-the-custom-ingress-gateway}
1. Check if [cert-manager](https://github.com/helm/charts/tree/master/stable/cert-manager) was installed using Helm with the following command:
1. 用下面的 `helm` 命令检查 [cert-manager](https://github.com/helm/charts/tree/master/stable/cert-manager) 是否已经完成部署:
{{< text bash >}}
$ helm ls
{{< /text >}}
The output should be similar to the example below and show cert-manager with a `STATUS` of `DEPLOYED`:
该命令的输出大概如下所示,其中的 `cert-manager``STATUS` 字段应该是 `DEPLOYED`
{{< text plain >}}
NAME REVISION UPDATED STATUS CHART APP VERSION NAMESPACE
@ -36,10 +34,10 @@ The creation of custom ingress gateway could be used in order to have different
cert 1 Wed Oct 24 14:08:36 2018 DEPLOYED cert-manager-v0.6.0-dev.2 v0.6.0-dev.2 istio-system
{{< /text >}}
1. To create the cluster's issuer, apply the following configuration:
1. 要创建集群的证书签发者,可以使用如下的配置:
{{< tip >}}
Change the cluster's [issuer](https://cert-manager.readthedocs.io/en/latest/reference/issuers.html) provider with your own configuration values. The example uses the values under `route53`.
用自己的配置修改集群的 [证书签发者](https://cert-manager.readthedocs.io/en/latest/reference/issuers.html)。例子中使用的是 `route53`
{{< /tip >}}
{{< text yaml >}}
@ -50,15 +48,15 @@ The creation of custom ingress gateway could be used in order to have different
namespace: kube-system
spec:
acme:
# The ACME server URL
# ACME 服务器地址
server: https://acme-v02.api.letsencrypt.org/directory
# Email address used for ACME registration
# ACME 注册的 Email 地址
email: <REDACTED>
# Name of a secret used to store the ACME account private key
# Secret 的名字,用于保存 ACME 账号的私钥
privateKeySecretRef:
name: letsencrypt-demo
dns01:
# Here we define a list of DNS-01 providers that can solve DNS challenges
# 这里定义了一个列表,包含了 DNS-01 的相关内容,用于应对 DNS Challenge。
providers:
- name: your-dns
route53:
@ -69,7 +67,7 @@ The creation of custom ingress gateway could be used in order to have different
key: secret-access-key
{{< /text >}}
1. If you use the `route53` [provider](https://cert-manager.readthedocs.io/en/latest/tasks/acme/configuring-dns01/route53.html), you must provide a secret to perform DNS ACME Validation. To create the secret, apply the following configuration file:
1. 如果使用的是 `route53` [provider](https://cert-manager.readthedocs.io/en/latest/tasks/acme/configuring-dns01/route53.html),必须提供一个 Secret 来进行 DNS 的 ACME 验证。可以使用下面的配置来创建需要的 Secret
{{< text yaml >}}
apiVersion: v1
@ -81,7 +79,7 @@ The creation of custom ingress gateway could be used in order to have different
secret-access-key: <REDACTED BASE64>
{{< /text >}}
1. Create your own certificate:
1. 创建自己的证书:
{{< text yaml >}}
apiVersion: certmanager.k8s.io/v1alpha1
@ -105,9 +103,9 @@ The creation of custom ingress gateway could be used in order to have different
secretName: istio-customingressgateway-certs
{{< /text >}}
Make a note of the value of `secretName` since a future step requires it.
记录一下 `secretName` 的值,后面会使用它。
1. To scale automatically, declare a new horizontal pod autoscaler with the following configuration:
1. 要进行自动扩容,可以新建一个 HPA 对象:
{{< text yaml >}}
apiVersion: autoscaling/v1
@ -129,16 +127,16 @@ The creation of custom ingress gateway could be used in order to have different
desiredReplicas: 1
{{< /text >}}
1. Apply your deployment with declaration provided in the [yaml definition](/zh/blog/2019/custom-ingress-gateway/deployment-custom-ingress.yaml)
1. 使用 [附件 YAML 中的定义](/zh/blog/2019/custom-ingress-gateway/deployment-custom-ingress.yaml)进行部署。
{{< tip >}}
The annotations used, for example `aws-load-balancer-type`, only apply for AWS.
其中类似 `aws-load-balancer-type` 这样的注解,只对 AWS 生效。
{{< /tip >}}
1. Create your service:
1. 创建你的服务:
{{< warning >}}
The `NodePort` used needs to be an available port.
`NodePort` 需要是一个可用端口。
{{< /warning >}}
{{< text yaml >}}
@ -172,7 +170,7 @@ The creation of custom ingress gateway could be used in order to have different
port: 31400
{{< /text >}}
1. Create your Istio custom gateway configuration object:
1. 创建你的自定义 Ingress 网关配置对象:
{{< text yaml >}}
apiVersion: networking.istio.io/v1alpha3
@ -205,7 +203,7 @@ The creation of custom ingress gateway could be used in order to have different
serverCertificate: /etc/istio/ingressgateway-certs/tls.crt
{{< /text >}}
1. Link your `istio-custom-gateway` with your `VirtualService`:
1. 使用 `VirtualService` 连接 `istio-custom-gateway`
{{< text yaml >}}
apiVersion: networking.istio.io/v1alpha3
@ -223,7 +221,7 @@ The creation of custom ingress gateway could be used in order to have different
host: my-demoapp
{{< /text >}}
1. Correct certificate is returned by the server and it is successfully verified (_SSL certificate verify ok_ is printed):
1. 服务器返回了正确的证书,并成功完成验证(`SSL certificate verify ok`
{{< text bash >}}
$ curl -v `https://demo.mydemo.com`
@ -231,4 +229,4 @@ The creation of custom ingress gateway could be used in order to have different
SSL certificate verify ok.
{{< /text >}}
**Congratulations!** You can now use your custom `istio-custom-gateway` [gateway](/zh/docs/reference/config/networking/gateway/) configuration object.
**恭喜你!** 现在你可以使用自定义的 `istio-custom-gateway` [网关](/zh/docs/reference/config/networking/gateway/) 对象了。

View File

@ -1,122 +1,121 @@
---
title: Egress Gateway Performance Investigation
description: Verifies the performance impact of adding an egress gateway.
title: Egress gateway 性能测试
description: 评估加入 Egress gateway 对性能造成的影响。
publishdate: 2019-01-31
subtitle: An Istio Egress Gateway performance assessment
subtitle: Istio Egress gateway 性能评估
attribution: Jose Nativio, IBM
keywords: [performance,traffic-management,egress,mongo]
target_release: 1.0
---
The main objective of this investigation was to determine the impact on performance and resource utilization when an egress gateway is added in the service mesh to access an external service (MongoDB, in this case). The steps to configure an egress gateway for an external MongoDB are described in the blog [Consuming External MongoDB Services](/zh/blog/2018/egress-mongo/).
为了从网格中访问外部服务(本例中使用的是 MongoDB需要加入 Egress gateway本次测试的主要目的就是调查这一行为对性能和资源使用造成的影响。在博客 [使用外部 MongoDB 服务](/zh/blog/2018/egress-mongo/) 中介绍了为外部 MongoDB 配置 Egress gateway 的具体步骤。
The application used for this investigation was the Java version of Acmeair, which simulates an airline reservation system. This application is used in the Performance Regression Patrol of Istio daily builds, but on that setup the microservices have been accessing the external MongoDB directly via their sidecars, without an egress gateway.
本次测试中使用的应用是 Acmeair 的 Java 版,这个应用会模拟一个航空订票系统。在 Istio 的每日构建中会使用该应用来进行性能的回归测试,但是在回归测试过程中,这些应用会使用自己的 Sidecar 来访问外部的 MongoDB而不是 Egress gateway。
The diagram below illustrates how regression patrol currently runs with Acmeair and Istio:
下图描述了目前的 Istio 回归测试过程中Acmeair 应用的运行方式:
{{< image width="70%"
link="./acmeair_regpatrol3.png"
caption="Acmeair benchmark in the Istio performance regression patrol environment"
caption="在 Istio 性能回归测试环境中的 Acmeair 基准测试"
>}}
Another difference is that the application communicates with the external DB with plain MongoDB protocol. The first change made for this study was to establish a TLS communication between the MongoDB and its clients running within the application, as this is a more realistic scenario.
还有一个差别就是,这一应用和外部数据库使用的是明文的 MongoDB 协议。本文中的第一个变化就是将应用到外部 MongoDB 之间的连接升级为 TLS 模式,以体现更贴近实际情况的场景。
Several cases for accessing the external database from the mesh were tested and described next.
下面会讲到一些从网格中访问外部数据库的具体案例。
## Egress traffic cases
## Egress 流量案例{#egress-traffic-cases}
### Case 1: Bypassing the sidecar
### 案例 1绕过 Sidecar{#case-1-bypassing-the-sidecar}
In this case, the sidecar does not intercept the communication between the application and the external DB. This is accomplished by setting the init container argument -x with the CIDR of the MongoDB, which makes the sidecar ignore messages to/from this IP address. For example:
在这个案例中Sidecar 对应用和外部数据库之间的通信不做拦截。这一配置是通过初始化容器中的 `-x` 参数来完成的,将其内容设置为 MongoDB 的 CIDR 即可。这种做法导致 Sidecar 忽略流入/流出指定 IP 地址的流量。举例来说:
- -x
- "169.47.232.211/32"
{{< image width="70%"
link="./case1_sidecar_bypass3.png"
caption="Traffic to external MongoDB by-passing the sidecar"
caption="绕过 Sidecar 和外部 MongoDB 进行通信"
>}}
### Case 2: Through the sidecar, with service entry
### 案例 2使用 Service Entry通过 Sidecar 完成访问{#case-2-through-the-sidecar-with-service-entry}
This is the default configuration when the sidecar is injected into the application pod. All messages are intercepted by the sidecar and routed to the destination according to the configured rules, including the communication with external services. The MongoDB was defined as a `ServiceEntry`.
在 Sidecar 已经注入到应用 Pod 之后,这种方式是缺省(访问外部服务)的方式。所有的流量都被 Sidecar 拦截,然后根据配置好的规则路由到目的地,这里所说的目的地也包含了外部服务。下面为 MongoDB 配置一个 `ServiceEntry`
{{< image width="70%"
link="./case2_sidecar_passthru3.png"
caption="Sidecar intercepting traffic to external MongoDB"
caption="Sidecar 拦截对外部 MongoDB 的流量"
>}}
### Case 3: Egress gateway
### 案例 3: Egress gateway{#case-3-egress-gateway}
The egress gateway and corresponding destination rule and virtual service resources are defined for accessing MongoDB. All traffic to and from the external DB goes through the egress gateway (envoy).
配置 Egress gateway 以及配套的 Destination rule 和 Virtual service用于访问 MongoDB。所有进出外部数据库的流量都从 Egress gatewayEnvoy通过。
{{< image width="70%"
link="./case3_egressgw3.png"
caption="Introduction of the egress gateway to access MongoDB"
caption="使用 Egress gateway 访问 MongoDB"
>}}
### Case 4: Mutual TLS between sidecars and the egress gateway
### 案例 4在 Sidecar 和 Egress gateway 之间的双向 TLS{#case-4-mutual-TLS-between-sidecars-and-the-egress-gateway}
In this case, there is an extra layer of security between the sidecars and the gateway, so some impact in performance is expected.
这种方式中,在 Sidecar 和 Gateway 之中多出了一个安全层,所以会影响性能。
{{< image width="70%"
link="./case4_egressgw_mtls3.png"
caption="Enabling mutual TLS between sidecars and the egress gateway"
caption="在 Sidecar 和 Egress gateway 之间启用双向 TLS"
>}}
### Case 5: Egress gateway with SNI proxy
### 案例 5带有 SNI proxy 的 Egress gateway{#case-5-egress-gateway-with-SNI-proxy}
This scenario is used to evaluate the case where another proxy is required to access wildcarded domains. This may be required due current limitations of envoy. An nginx proxy was created as sidecar in the egress gateway pod.
这个场景中,因为 Envoy 目前存在的一些限制,需要另一个代理来访问通配符域名。这里创建了一个 Nginx 代理,在 Egress gateway Pod 中作为 Sidecar 来使用。
{{< image width="70%"
link="./case5_egressgw_sni_proxy3.png"
caption="Egress gateway with additional SNI Proxy"
caption="带有 SNI proxy 的 Egress gateway"
>}}
## Environment
## 环境{#environment}
* Istio version: 1.0.2
* `K8s` version: `1.10.5_1517`
* Acmeair App: 4 services (1 replica of each), inter-services transactions, external Mongo DB, avg payload: 620 bytes.
* Istio 版本: 1.0.2
* `K8s` 版本:`1.10.5_1517`
* Acmeair 应用4 个服务(每个服务一个实例),跨服务事务,外部 MongoDB平均载荷620 字节。
## Results
## 结果{#results}
`Jmeter` was used to generate the workload which consisted in a sequence of 5-minute runs, each one using a growing number of clients making http requests. The number of clients used were 1, 5, 10, 20, 30, 40, 50 and 60.
使用 `Jmeter` 来生成负载,负载包含了一组持续五分钟的访问,每个阶段都会逐步提高客户端数量来发出 http 请求。客户端数量为1、5、10、20、30、40、50 和 60。
### Throughput
### 吞吐量{#throughput}
The chart below shows the throughput obtained for the different cases:
下图展示了不同案例中的吞吐量:
{{< image width="75%"
link="./throughput3.png"
caption="Throughput obtained for the different cases"
caption="不同案例中的吞吐量"
>}}
As you can see, there is no major impact in having sidecars and the egress gateway between the application and the external MongoDB, but enabling mutual TLS and then adding the SNI proxy caused a degradation in the throughput of about 10% and 24%, respectively.
如图可见,在应用和外部数据库中加入 Sidecar 和 Egress gateway 并没有对性能产生太大影响;但是启用双向 TLS、又加入 SNI 代理之后,吞吐量分别下降了 10% 和 24%。
### Response time
### 响应时间{#response-time}
The average response times for the different requests were collected when traffic was being driven with 20 clients. The chart below shows the average, median, 90%, 95% and 99% average values for each case:
在 20 客户端的情况下我们对不同请求的平均响应时间也进行了记录。下图展示了各个案例中平均、中位数、90%、95% 以及 99% 百分位的响应时间。
{{< image width="75%"
link="./response_times3.png"
caption="Response times obtained for the different configurations"
caption="不同配置中的响应时间"
>}}
Likewise, not much difference in the response times for the 3 first cases, but mutual TLS and the extra proxy adds noticeable latency.
跟吞吐量类似,前面三个案例的响应时间没有很大区别,但是双向 TLS 和 额外的代理造成了明显的延迟。
### CPU utilization
### CPU 用量{#CPU-utilization}
The CPU usage was collected for all Istio components as well as for the sidecars during the runs. For a fair comparison, CPU used by Istio was normalized by the throughput obtained for a given run. The results are shown in the following graph:
运行过程中还搜集了所有 Istio 组件以及 Sidecar 的 CPU 使用情况。为了公平起见,用吞吐量对 Istio 的 CPU 用量进行了归一化。下图中展示了这一结果:
{{< image width="75%"
link="./cpu_usage3.png"
caption="CPU usage normalized by TPS"
caption="使用 TPS 进行归一化的 CPU 用量"
>}}
In terms of CPU consumption per transaction, Istio has used significantly more CPU only in the egress gateway + SNI proxy case.
经过归一化处理之后的 CPU 用量数据表明Istio 在使用 Egress gateway + SNI 代理的情况下,消耗了更多的 CPU。
## Conclusion
In this investigation, we tried different options to access an external TLS-enabled MongoDB to compare their performance. The introduction of the Egress Gateway did not have a significant impact on the performance nor meaningful additional CPU consumption. Only when enabling mutual TLS between sidecars and egress gateway or using an additional SNI proxy for wildcarded domains we could observe some degradation.
## 结论{#conclusion}
在这一系列的测试之中,我们用不同的方式来访问一个启用了 TLS 的 MongoDB 来进行性能对比。Egress gateway 的引用没有对性能和 CPU 消耗的显著影响。但是启用了 Sidecar 和 Egress gateway 之间的双向 TLS 或者为通配符域名使用了额外的 SNI 代理之后,会看到性能降级的现象。

View File

@ -1,6 +1,6 @@
---
title: Version Routing in a Multicluster Service Mesh
description: Configuring Istio route rules in a multicluster service mesh.
title: 多集群服务网格中的分版本路由
description: 在多集群服务网格环境中配置 Istio 的路由规则。
publishdate: 2019-02-07
subtitle:
attribution: Frank Budinsky (IBM)
@ -8,39 +8,20 @@ keywords: [traffic-management,multicluster]
target_release: 1.0
---
If you've spent any time looking at Istio, you've probably noticed that it includes a lot of features that
can be demonstrated with simple [tasks](/zh/docs/tasks/) and [examples](/zh/docs/examples/)
running on a single Kubernetes cluster.
Because most, if not all, real-world cloud and microservices-based applications are not that simple
and will need to have the services distributed and running in more than one location, you may be
wondering if all these things will be just as simple in your real production environment.
如果花一点时间对 Istio 进行了解,你可能会注意到,大量的功能都可以在单一的 Kubernetes 集群中,用简单的 [任务](/zh/docs/tasks) 和 [示例](/zh/docs/examples/) 所表达的方式来运行。但是真实世界中的云计算和基于微服务的应用往往不是这么简单的,会需要在不止一个地点分布运行,用户难免会产生怀疑,生产环境中是否还能这样运行?
Fortunately, Istio provides several ways to configure a service mesh so that applications
can, more-or-less transparently, be part of a mesh where the services are running
in more than one cluster, i.e., in a
[multicluster deployment](/zh/docs/ops/prep/deployment-models/#multiple-clusters).
The simplest way to set up a multicluster mesh, because it has no special networking requirements,
is using a replicated
[control plane model](/zh/docs/ops/prep/deployment-models/#control-plane-models).
In this configuration, each Kubernetes cluster contributing to the mesh has its own control plane,
but each control plane is synchronized and running under a single administrative control.
幸运的是Istio 提供了多种服务网格的配置方式,应用能够用近乎透明的方式加入一个跨越多个集群运行的服务网格之中,也就是 [多集群服务网格](/zh/docs/ops/prep/deployment-models/#multiple-clusters) 。最简单的设置多集群网格的方式,就是使用 [多控制平面拓扑](/zh/docs/ops/prep/deployment-models/#control-plane-models) ,这种方式不需要特别的网络依赖。在这种条件下,每个 Kubernetes 集群都有自己的控制平面,但是每个控制平面都是同步的,并接受统一的管理。
In this article we'll look at how one of the features of Istio,
[traffic management](/zh/docs/concepts/traffic-management/), works in a multicluster mesh with
a dedicated control plane topology.
We'll show how to configure Istio route rules to call remote services in a multicluster service mesh
by deploying the [Bookinfo sample]({{< github_tree >}}/samples/bookinfo) with version `v1` of the `reviews` service
running in one cluster, versions `v2` and `v3` running in a second cluster.
本文中,我们会在多控制平面拓扑形式的多集群网格中尝试一下 Istio 的 [流量管理](/zh/docs/concepts/traffic-management/) 功能。我们会展示如何配置 Istio 路由规则,在多集群服务网格中部署 [Bookinfo 示例]({{<github_tree>}}/samples/bookinfo)`reviews` 服务的 `v1` 版本运行在一个集群上,而 `v2``v3` 运行在另一个集群上,并完成远程服务调用。
## Set up clusters
## 集群部署 {#setup-clusters}
To start, you'll need two Kubernetes clusters, both running a slightly customized configuration of Istio.
首先需要部署两个 Kubernetes 集群,并各自运行一个做了轻度定制的 Istio。
* Set up a multicluster environment with two Istio clusters by following the
[replicated control planes](/zh/docs/setup/install/multicluster/gateways/) instructions.
* 依照[使用 Gateway 连接多个集群](/zh/docs/setup/install/multicluster/gateways/)中提到的步骤设置一个多集群环境。
* The `kubectl` command is used to access both clusters with the `--context` flag.
Use the following command to list your contexts:
* `kubectl` 命令可以使用 `--context` 参数访问两个集群。
使用下面的命令列出所有 `context`
{{< text bash >}}
$ kubectl config get-contexts
@ -49,16 +30,16 @@ To start, you'll need two Kubernetes clusters, both running a slightly customize
cluster2 cluster2 user@foo.com default
{{< /text >}}
* Export the following environment variables with the context names of your configuration:
* 将配置文件中的 `context` 名称赋值给两个环境变量:
{{< text bash >}}
$ export CTX_CLUSTER1=<cluster1 context name>
$ export CTX_CLUSTER2=<cluster2 context name>
{{< /text >}}
## Deploy version v1 of the `bookinfo` application in `cluster1`
## `cluster1` 中部署 `bookinfo``v1` 版本{#deploy-in-cluster-1}
Run the `productpage` and `details` services and version `v1` of the `reviews` service in `cluster1`:
`cluster1` 中运行 `productpage``details` 服务,以及 `reviews` 服务的 `v1` 版本。
{{< text bash >}}
$ kubectl label --context=$CTX_CLUSTER1 namespace default istio-injection=enabled
@ -161,9 +142,9 @@ spec:
EOF
{{< /text >}}
## Deploy `bookinfo` v2 and v3 services in `cluster2`
## `cluster2` 中部署 `bookinfo``v2``v3`{#deploy-in-cluster-2}
Run the `ratings` service and version `v2` and `v3` of the `reviews` service in `cluster2`:
`cluster2` 中运行 `ratings` 服务以及 `reviews` 服务的 `v2``v3` 版本:
{{< text bash >}}
$ kubectl label --context=$CTX_CLUSTER2 namespace default istio-injection=enabled
@ -253,36 +234,30 @@ spec:
EOF
{{< /text >}}
## Access the `bookinfo` application
## 访问 `bookinfo` 应用{#access-the-application}
Just like any application, we'll use an Istio gateway to access the `bookinfo` application.
和平常一样,我们需要使用一个 Istio gateway 来访问 `bookinfo` 应用。
* Create the `bookinfo` gateway in `cluster1`:
* `cluster1` 中创建 `bookinfo` 的网关:
{{< text bash >}}
$ kubectl apply --context=$CTX_CLUSTER1 -f @samples/bookinfo/networking/bookinfo-gateway.yaml@
{{< /text >}}
* Follow the [Bookinfo sample instructions](/zh/docs/examples/bookinfo/#determine-the-ingress-IP-and-port)
to determine the ingress IP and port and then point your browser to `http://$GATEWAY_URL/productpage`.
* 遵循 [Bookinfo 示例应用](/zh/docs/examples/bookinfo/#determine-the-ingress-IP-and-port)中的步骤,确定 Ingress 的 IP 和端口,用浏览器打开 `http://$GATEWAY_URL/productpage`
You should see the `productpage` with reviews, but without ratings, because only `v1` of the `reviews` service
is running on `cluster1` and we have not yet configured access to `cluster2`.
这里会看到 `productpage`,其中包含了 `reviews` 的内容,但是没有出现 `ratings`,这是因为只有 `reviews` 服务的 `v1` 版本运行在 `cluster1` 上,我们还没有配置到 `cluster2` 的访问。
## Create a service entry and destination rule on `cluster1` for the remote reviews service
## `cluster1` 上为远端的 `reviews` 服务创建 `ServiceEntry` 以及 `DestinationRule`
As described in the [setup instructions](/zh/docs/setup/install/multicluster/gateways/#setup-DNS),
remote services are accessed with a `.global` DNS name. In our case, it's `reviews.default.global`,
so we need to create a service entry and destination rule for that host.
The service entry will use the `cluster2` gateway as the endpoint address to access the service.
You can use the gateway's DNS name, if it has one, or its public IP, like this:
根据 [配置指南](/zh/docs/setup/install/multicluster/gateways/#setup-DNS) 中的介绍,远程服务可以用一个 `.global` 的 DNS 名称进行访问。在我们的案例中,就是 `reviews.default.global`,所以我们需要为这个主机创建 `ServiceEntry``DestinationRule`。`ServiceEntry` 会使用 `cluster2` 网关作为端点地址来访问服务。可以使用网关的 DNS 名称或者公共 IP
{{< text bash >}}
$ export CLUSTER2_GW_ADDR=$(kubectl get --context=$CTX_CLUSTER2 svc --selector=app=istio-ingressgateway \
-n istio-system -o jsonpath="{.items[0].status.loadBalancer.ingress[0].ip}")
{{< /text >}}
Now create the service entry and destination rule using the following command:
用下面的命令来创建 `ServiceEntry``DestinationRule`
{{< text bash >}}
$ kubectl apply --context=$CTX_CLUSTER1 -f - <<EOF
@ -300,13 +275,13 @@ spec:
protocol: http
resolution: DNS
addresses:
- 240.0.0.3
- 127.255.0.3
endpoints:
- address: ${CLUSTER2_GW_ADDR}
labels:
cluster: cluster2
ports:
http1: 15443 # Do not change this port value
http1: 15443 # 不要修改端口值
---
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
@ -327,24 +302,13 @@ spec:
EOF
{{< /text >}}
The address `240.0.0.3` of the service entry can be any arbitrary unallocated IP.
Using an IP from the class E addresses range 240.0.0.0/4 is a good choice.
Check out the
[gateway-connected multicluster example](/zh/docs/setup/install/multicluster/gateways/#configure-the-example-services)
for more details.
`ServiceEntry` 的地址 `127.255.0.3` 可以是任意的未分配 IP。在 `127.0.0.0/8` 的范围里面进行选择是个不错的主意。阅读 [通过网关进行连接的多集群](/zh/docs/setup/install/multicluster/gateways/#configure-the-example-services) 一文,能够获得更多相关信息。
Note that the labels of the subsets in the destination rule map to the service entry
endpoint label (`cluster: cluster2`) corresponding to the `cluster2` gateway.
Once the request reaches the destination cluster, a local destination rule will be used
to identify the actual pod labels (`version: v1` or `version: v2`) corresponding to the
requested subset.
注意 `DestinationRule` 中的 `subset` 的标签,`cluster: cluster2` 对应的是 `cluster2` 网关。一旦流量到达目标集群,就会由本地目的 `DestinationRule` 来鉴别实际的 Pod 标签(`version: v1` 或者 `version: v2`
## Create a destination rule on both clusters for the local reviews service
## 在所有集群上为本地 `reviews` 服务创建 `DestinationRule`
Technically, we only need to define the subsets of the local service that are being used
in each cluster (i.e., `v1` in `cluster1`, `v2` and `v3` in `cluster2`), but for simplicity we'll
just define all three subsets in both clusters, since there's nothing wrong with defining subsets
for versions that are not actually deployed.
技术上来说,我们只需要为每个集群定义本地的 `subset` 即可(`cluster1` 中的 `v1``cluster2` 中的 `v2``v3`),但是定义一个用不到的并未部署的版本也没什么大碍,为了清晰一点,我们会在两个集群上都创建全部三个 `subset`
{{< text bash >}}
$ kubectl apply --context=$CTX_CLUSTER1 -f - <<EOF
@ -394,23 +358,15 @@ spec:
EOF
{{< /text >}}
## Create a virtual service to route reviews service traffic
## 创建 `VirtualService` 来路由 `reviews` 服务的流量{#create-a-destination-rule-on-both-clusters-for-the-local-reviews-service}
At this point, all calls to the `reviews` service will go to the local `reviews` pods (`v1`) because
if you look at the source code you will see that the `productpage` implementation is simply making
requests to `http://reviews:9080` (which expands to host `reviews.default.svc.cluster.local`), the
local version of the service.
The corresponding remote service is named `reviews.default.global`, so route rules are needed to
redirect requests to the global host.
目前所有调用 `reviews` 服务的流量都会进入本地的 `reviews` Pod也就是 `v1`,如果查看一下远吗,会发现 `productpage` 的实现只是简单的对 `http://reviews:9080` (也就是 `reviews.default.svc.cluster.local`)发起了请求,也就是本地版本。对应的远程服务名称为 `reviews.default.global`,所以需要用路由规则来把请求转发到远端集群。
{{< tip >}}
Note that if all of the versions of the `reviews` service were remote, so there is no local `reviews`
service defined, the DNS would resolve `reviews` directly to `reviews.default.global`. In that case
we could call the remote `reviews` service without any route rules.
注意如果所有版本的 `reviews` 服务都在远端,也就是说本地没有 `reviews` 服务,那么 DNS 就会把 `reviews` 直接解析到 `reviews.default.global`,在本文的环境里,无需定义任何路由规则就可以发起对远端集群的请求。
{{< /tip >}}
Apply the following virtual service to direct traffic for user `jason` to `reviews` versions `v2` and `v3` (50/50)
which are running on `cluster2`. Traffic for any other user will go to `reviews` version `v1`.
创建下列的 `VirtualService`,把 `jason` 的流量转发给运行在 `cluster2` 上的 `v2``v3` 版本的 `reviews`,两个版本各负责一半流量。其他用户的流量还是会发给 `v1` 版本的 `reviews`
{{< text bash >}}
$ kubectl apply --context=$CTX_CLUSTER1 -f - <<EOF
@ -443,19 +399,12 @@ EOF
{{< /text >}}
{{< tip >}}
This 50/50 rule isn't a particularly realistic example. It's just a convenient way to demonstrate
accessing multiple subsets of a remote service.
这种平均分配的规则并不实际,只是一种用于演示远端服务多版本之间流量分配的方便手段。
{{< /tip >}}
Return to your browser and login as user `jason`. If you refresh the page several times, you should see
the display alternating between black and red ratings stars (`v2` and `v3`). If you logout, you will
only see reviews without ratings (`v1`).
回到浏览器,用 `jason` 的身份登录。刷新页面几次,会看到星形图标在红黑两色之间切换(`v2` 和 `v3`)。如果登出,就只会看到没有 `ratings``reviews` 服务了。
## Summary
## 总结{#summary}
In this article, we've seen how to use Istio route rules to distribute the versions of a service
across clusters in a multicluster service mesh with a replicated control plane model.
In this example, we manually configured the `.global` service entry and destination rules needed to provide
connectivity to one remote service, `reviews`. In general, however, if we wanted to enable any service
to run either locally or remotely, we would need to create `.global` resources for every service.
Fortunately, this process could be automated and likely will be in a future Istio release.
本文中,我们看到在多控制平面拓扑的多集群网格中,如何使用 Istio 路由规则进行跨集群的流量分配。
这里我们手工配置了 `.global``ServiceEntry` 以及 `DestinationRule`,用于进行对远端集群中 `reviews` 服务的访问。实际上如果我们想要的话,可以让任何服务都在远端或本地运行,当然需要为远端服务配置 `.global` 的相关资源。幸运的是,这个过程可以自动化,并且可能在 Istio 的未来版本中实现。