Progressive delivery Kubernetes operator (Canary, A/B Testing and Blue/Green deployments)

ab-testing aws-appmesh canary contour gitops gloo istio kubernetes linkerd nginx progressive-delivery

Go to file

Stefan Prodan 29548dded3 Document rollout GA switch		2018-09-23 12:47:19 +03:00
artifacts	Add canary HPA	2018-09-23 11:27:35 +03:00
cmd	Format rollout CLI output	2018-09-23 11:29:04 +03:00
docs/screens	Add CLI screen	2018-09-23 11:29:24 +03:00
pkg	Format rollout CLI output	2018-09-23 11:29:04 +03:00
vendor	Add console color pkg	2018-09-23 11:28:08 +03:00
.gitignore	Initial commit	2018-09-20 00:43:47 +03:00
Gopkg.lock	Add console color pkg	2018-09-23 11:28:08 +03:00
Gopkg.toml	vendor k8s and istio (knative/pkg)	2018-09-21 19:21:37 +03:00
LICENSE	Initial commit	2018-09-20 00:43:47 +03:00
README.md	Document rollout GA switch	2018-09-23 12:47:19 +03:00

README.md

steerer

Istio progressive rollout gated by Prometheus HTTP success rate metric

Usage

Create a test namespace:

kubectl apply -f ./artifacts/namespace/

Create GA and canary deployments, services, hpa and Istio virtual service:

kubectl apply -f ./artifacts/workloads/

Rollout:

Rollout flow:

scan namespace for deployments marked for rollout
scan namespace for a corresponding canary deployment (-canary prefix)
check Istio virtual service routes are mapped to GA and canary ClusterIP services
check GA and canary deployments status (halt rollout if a rolling update is underway or if pods are unhealthy)
increase canary traffic weight percentage by 10%
check canary HTTP success rate (halt rollout if percentage is under the specified threshold)
advance canary traffic wight by 10% till it reaches 100%
- halt rollout while canary success rate is under the threshold
- halt rollout if the GA or canary deployment becomes unhealthy
- halt rollout while canary deployment is being scaled up/down by HPA
promote canary to GA (copy canary deployment spec template over GA)
wait for GA rolling update to finish
route all traffic to GA
scale to zero the canary deployment
mark rollout deployment as finished
wait for the canary deployment to be updated (revision bump) and start over

HTTP success rate query:

sum(
    rate(
        istio_requests_total{
          reporter="destination",
          destination_workload_namespace=~"$namespace",
          destination_workload=~"$workload",
          response_code!~"5.*"
        }[$interval]
    )
) 
/ 
sum(
    rate(
        istio_requests_total{
          reporter="destination",
          destination_workload_namespace=~"$namespace",
          destination_workload=~"$workload"
        }[$interval]
    )
)