Progressive delivery Kubernetes operator (Canary, A/B Testing and Blue/Green deployments)
Go to file
Stefan Prodan 29548dded3 Document rollout GA switch 2018-09-23 12:47:19 +03:00
artifacts Add canary HPA 2018-09-23 11:27:35 +03:00
cmd Format rollout CLI output 2018-09-23 11:29:04 +03:00
docs/screens Add CLI screen 2018-09-23 11:29:24 +03:00
pkg Format rollout CLI output 2018-09-23 11:29:04 +03:00
vendor Add console color pkg 2018-09-23 11:28:08 +03:00
.gitignore Initial commit 2018-09-20 00:43:47 +03:00
Gopkg.lock Add console color pkg 2018-09-23 11:28:08 +03:00
Gopkg.toml vendor k8s and istio (knative/pkg) 2018-09-21 19:21:37 +03:00
LICENSE Initial commit 2018-09-20 00:43:47 +03:00
README.md Document rollout GA switch 2018-09-23 12:47:19 +03:00

README.md

steerer

Istio progressive rollout gated by Prometheus HTTP success rate metric

Usage

Create a test namespace:

kubectl apply -f ./artifacts/namespace/

Create GA and canary deployments, services, hpa and Istio virtual service:

kubectl apply -f ./artifacts/workloads/

Rollout:

rollout-cli

Rollout flow:

  • scan namespace for deployments marked for rollout
  • scan namespace for a corresponding canary deployment (-canary prefix)
  • check Istio virtual service routes are mapped to GA and canary ClusterIP services
  • check GA and canary deployments status (halt rollout if a rolling update is underway or if pods are unhealthy)
  • increase canary traffic weight percentage by 10%
  • check canary HTTP success rate (halt rollout if percentage is under the specified threshold)
  • advance canary traffic wight by 10% till it reaches 100%
    • halt rollout while canary success rate is under the threshold
    • halt rollout if the GA or canary deployment becomes unhealthy
    • halt rollout while canary deployment is being scaled up/down by HPA
  • promote canary to GA (copy canary deployment spec template over GA)
  • wait for GA rolling update to finish
  • route all traffic to GA
  • scale to zero the canary deployment
  • mark rollout deployment as finished
  • wait for the canary deployment to be updated (revision bump) and start over

HTTP success rate query:

sum(
    rate(
        istio_requests_total{
          reporter="destination",
          destination_workload_namespace=~"$namespace",
          destination_workload=~"$workload",
          response_code!~"5.*"
        }[$interval]
    )
) 
/ 
sum(
    rate(
        istio_requests_total{
          reporter="destination",
          destination_workload_namespace=~"$namespace",
          destination_workload=~"$workload"
        }[$interval]
    )
)