Progressive delivery Kubernetes operator (Canary, A/B Testing and Blue/Green deployments)

ab-testing aws-appmesh canary contour gitops gloo istio kubernetes linkerd nginx progressive-delivery

Go to file

Stefan Prodan 3fa333fa49 Publish branch-commit images to Docker Hub		2018-09-24 20:36:52 +03:00
artifacts	Add deployment and rbac manifests	2018-09-24 18:28:17 +03:00
cmd	Run go fmt	2018-09-24 18:27:18 +03:00
docs/screens	Add CLI screen	2018-09-23 11:29:24 +03:00
hack	Add CRD code gen scripts	2018-09-24 13:30:22 +03:00
pkg	Add Travis CI build	2018-09-24 20:28:45 +03:00
vendor	Add console color pkg	2018-09-23 11:28:08 +03:00
.gitignore	Initial commit	2018-09-20 00:43:47 +03:00
.travis.yml	Publish branch-commit images to Docker Hub	2018-09-24 20:36:52 +03:00
Dockerfile	Add build files	2018-09-24 18:28:52 +03:00
Gopkg.lock	Add console color pkg	2018-09-23 11:28:08 +03:00
Gopkg.toml	vendor k8s and istio (knative/pkg)	2018-09-21 19:21:37 +03:00
LICENSE	Initial commit	2018-09-20 00:43:47 +03:00
Makefile	Add build files	2018-09-24 18:28:52 +03:00
README.md	Add Travis CI build	2018-09-24 20:28:45 +03:00

README.md

steerer

Steerer is a Kubernetes operator that automates the promotion of canary deployments using Istio routing for traffic shifting and Prometheus metrics for canary analysis.

Gated rollout stages:

scan for deployments marked for rollout
check Istio virtual service routes are mapped to primary and canary ClusterIP services
check primary and canary deployments status
- halt rollout if a rolling update is underway
- halt rollout if pods are unhealthy
increase canary traffic weight percentage from 0% to 10%
check canary HTTP success rate
- halt rollout if percentage is under the specified threshold
increase canary traffic wight by 10% till it reaches 100%
- halt rollout while canary success rate is under the threshold
- halt rollout if the primary or canary deployment becomes unhealthy
- halt rollout while canary deployment is being scaled up/down by HPA
promote canary to primary
- copy canary deployment spec template over primary
wait for primary rolling update to finish
- halt rollout if pods are unhealthy
route all traffic to primary
scale to zero the canary deployment
mark rollout as finished
wait for the canary deployment to be updated (revision bump) and start over

A rollout can be defined using steerer's custom resource:

apiVersion: apps.weave.works/v1beta1
kind: Rollout
metadata:
  name: podinfo
  namespace: test
spec:
  targetKind: Deployment
  primary:
    name: podinfo
    host: podinfo
  canary:
    name: podinfo-canary
    host: podinfo-canary
  virtualService:
    name: podinfo
    weight: 10
  metric:
    type: counter
    name: istio_requests_total
    interval: 1m
    threshold: 99

Usage

Deploy steerer in istio-system namespace:

kubectl apply -f ./artifacts/steerer

Create a test namespace:

kubectl apply -f ./artifacts/namespaces/

Create primary and canary deployments, services, hpa and Istio virtual service:

kubectl apply -f ./artifacts/workloads/

Create rollout custom resources:

kubectl apply -f ./artifacts/rollouts/

Rollout output:

kubectl -n test describe rollout/podinfo

Events:
  Type     Reason  Age   From     Message
  ----     ------  ----  ----     -------
  Normal   Synced  3m    steerer  Starting rollout for podinfo.test
  Normal   Synced  3m    steerer  Advance rollout podinfo.test weight 10
  Normal   Synced  3m    steerer  Advance rollout podinfo.test weight 20
  Normal   Synced  2m    steerer  Advance rollout podinfo.test weight 30
  Normal   Synced  2m    steerer  Advance rollout podinfo.test weight 40
  Normal   Synced  2m    steerer  Advance rollout podinfo.test weight 50
  Normal   Synced  2m    steerer  Advance rollout podinfo.test weight 60
  Normal   Synced  2m    steerer  Advance rollout podinfo.test weight 60
  Warning  Synced  2m    steerer  Halt rollout podinfo.test success rate 88.89% < 99%
  Warning  Synced  2m    steerer  Halt rollout podinfo.test success rate 82.86% < 99%
  Warning  Synced  1m    steerer  Halt rollout podinfo.test success rate 80.49% < 99%
  Warning  Synced  1m    steerer  Halt rollout podinfo.test success rate 82.98% < 99%
  Warning  Synced  1m    steerer  Halt rollout podinfo.test success rate 83.33% < 99%
  Warning  Synced  1m    steerer  Halt rollout podinfo.test success rate 82.22% < 99%
  Warning  Synced  1m    steerer  Halt rollout podinfo.test success rate 94.74% < 99%
  Normal   Synced  1m    steerer  Advance rollout podinfo.test weight 70
  Normal   Synced  55s   steerer  Advance rollout podinfo.test weight 80
  Normal   Synced  45s   steerer  Advance rollout podinfo.test weight 90
  Normal   Synced  35s   steerer  Advance rollout podinfo.test weight 100
  Normal   Synced  25s   steerer  Copying podinfo-canary.test template spec to podinfo.test
  Warning  Synced  15s   steerer  Waiting for podinfo.test rollout to finish: 1 of 2 updated replicas are available
  Normal   Synced  5s    steerer  Promotion complete! Scaling down podinfo-canary.test

HTTP success rate query:

sum(
    rate(
        istio_requests_total{
          reporter="destination",
          destination_workload_namespace=~"$namespace",
          destination_workload=~"$workload",
          response_code!~"5.*"
        }[$interval]
    )
) 
/ 
sum(
    rate(
        istio_requests_total{
          reporter="destination",
          destination_workload_namespace=~"$namespace",
          destination_workload=~"$workload"
        }[$interval]
    )
)