mirror of https://github.com/fluxcd/flagger.git
				
				
				
			
		
			
				
	
	
		
			453 lines
		
	
	
		
			16 KiB
		
	
	
	
		
			Markdown
		
	
	
	
			
		
		
	
	
			453 lines
		
	
	
		
			16 KiB
		
	
	
	
		
			Markdown
		
	
	
	
# Deployment Strategies
 | 
						|
 | 
						|
Flagger can run automated application analysis, promotion and rollback for the following deployment strategies:
 | 
						|
 | 
						|
* **Canary Release** \(progressive traffic shifting\)
 | 
						|
  * Istio, Linkerd, App Mesh, NGINX, Skipper, Contour, Gloo Edge, Traefik, Open Service Mesh, Kuma, Gateway API, Apache APISIX, Knative
 | 
						|
* **A/B Testing** \(HTTP headers and cookies traffic routing\)
 | 
						|
  * Istio, App Mesh, NGINX, Contour, Gloo Edge, Gateway API
 | 
						|
* **Blue/Green** \(traffic switching\)
 | 
						|
  * Kubernetes CNI, Istio, Linkerd, App Mesh, NGINX, Contour, Gloo Edge, Open Service Mesh, Gateway API
 | 
						|
* **Blue/Green Mirroring** \(traffic shadowing\)
 | 
						|
  * Istio, Gateway API
 | 
						|
* **Canary Release with Session Affinity** \(progressive traffic shifting combined with cookie based routing\)
 | 
						|
  * Istio, Gateway API
 | 
						|
 | 
						|
For Canary releases and A/B testing you'll need a Layer 7 traffic management solution like
 | 
						|
a service mesh or an ingress controller. For Blue/Green deployments no service mesh or ingress controller is required.
 | 
						|
 | 
						|
A canary analysis is triggered by changes in any of the following objects:
 | 
						|
 | 
						|
* Deployment PodSpec \(container image, command, ports, env, resources, etc\)
 | 
						|
* ConfigMaps mounted as volumes or mapped to environment variables
 | 
						|
* Secrets mounted as volumes or mapped to environment variables
 | 
						|
 | 
						|
## Canary Release
 | 
						|
 | 
						|
Flagger implements a control loop that gradually shifts traffic to the canary while measuring
 | 
						|
key performance indicators like HTTP requests success rate, requests average duration and pod health.
 | 
						|
Based on analysis of the KPIs a canary is promoted or aborted.
 | 
						|
 | 
						|

 | 
						|
 | 
						|
The canary analysis runs periodically until it reaches the maximum traffic weight or the failed checks threshold.
 | 
						|
 | 
						|
Spec:
 | 
						|
 | 
						|
```yaml
 | 
						|
  analysis:
 | 
						|
    # schedule interval (default 60s)
 | 
						|
    interval: 1m
 | 
						|
    # max number of failed metric checks before rollback
 | 
						|
    threshold: 10
 | 
						|
    # max traffic percentage routed to canary
 | 
						|
    # percentage (0-100)
 | 
						|
    maxWeight: 50
 | 
						|
    # canary increment step
 | 
						|
    # percentage (0-100)
 | 
						|
    stepWeight: 2
 | 
						|
    # promotion increment step (default 100)
 | 
						|
    # percentage (0-100)
 | 
						|
    stepWeightPromotion: 100
 | 
						|
  # deploy straight to production without
 | 
						|
  # the metrics and webhook checks
 | 
						|
  skipAnalysis: false
 | 
						|
```
 | 
						|
 | 
						|
The above analysis, if it succeeds, will run for 25 minutes while validating the HTTP metrics and webhooks every minute.
 | 
						|
You can determine the minimum time it takes to validate and promote a canary deployment using this formula:
 | 
						|
 | 
						|
```text
 | 
						|
interval * (maxWeight / stepWeight)
 | 
						|
```
 | 
						|
 | 
						|
And the time it takes for a canary to be rollback when the metrics or webhook checks are failing:
 | 
						|
 | 
						|
```text
 | 
						|
interval * threshold
 | 
						|
```
 | 
						|
 | 
						|
When `stepWeightPromotion` is specified, the promotion phase happens in stages, the traffic is routed back
 | 
						|
to the primary pods in a progressive manner, the primary weight is increased until it reaches 100%.
 | 
						|
 | 
						|
In emergency cases, you may want to skip the analysis phase and ship changes directly to production.
 | 
						|
At any time you can set the `spec.skipAnalysis: true`. When skip analysis is enabled,
 | 
						|
Flagger checks if the canary deployment is healthy and promotes it without analysing it.
 | 
						|
If an analysis is underway, Flagger cancels it and runs the promotion.
 | 
						|
 | 
						|
Gated canary promotion stages:
 | 
						|
 | 
						|
* scan for canary deployments
 | 
						|
* check primary and canary deployment status
 | 
						|
  * halt advancement if a rolling update is underway
 | 
						|
  * halt advancement if pods are unhealthy
 | 
						|
* call confirm-rollout webhooks and check results
 | 
						|
  * halt advancement if any hook returns a non HTTP 2xx result
 | 
						|
* call pre-rollout webhooks and check results
 | 
						|
  * halt advancement if any hook returns a non HTTP 2xx result
 | 
						|
  * increment the failed checks counter
 | 
						|
* increase canary traffic weight percentage from 0% to 2% \(step weight\)
 | 
						|
* call rollout webhooks and check results
 | 
						|
* check canary HTTP request success rate and latency
 | 
						|
  * halt advancement if any metric is under the specified threshold
 | 
						|
  * increment the failed checks counter
 | 
						|
* check if the number of failed checks reached the threshold
 | 
						|
  * route all traffic to primary
 | 
						|
  * scale to zero the canary deployment and mark it as failed
 | 
						|
  * call post-rollout webhooks
 | 
						|
  * post the analysis result to Slack
 | 
						|
  * wait for the canary deployment to be updated and start over
 | 
						|
* increase canary traffic weight by 2% \(step weight\) till it reaches 50% \(max weight\) 
 | 
						|
  * halt advancement if any webhook call fails
 | 
						|
  * halt advancement while canary request success rate is under the threshold
 | 
						|
  * halt advancement while canary request duration P99 is over the threshold
 | 
						|
  * halt advancement while any custom metric check fails
 | 
						|
  * halt advancement if the primary or canary deployment becomes unhealthy 
 | 
						|
  * halt advancement while canary deployment is being scaled up/down by HPA
 | 
						|
* call confirm-promotion webhooks and check results
 | 
						|
  * halt advancement if any hook returns a non HTTP 2xx result
 | 
						|
* promote canary to primary
 | 
						|
  * copy ConfigMaps and Secrets from canary to primary
 | 
						|
  * copy canary deployment spec template over primary
 | 
						|
* wait for primary rolling update to finish
 | 
						|
  * halt advancement if pods are unhealthy
 | 
						|
* route all traffic to primary
 | 
						|
* scale to zero the canary deployment
 | 
						|
* mark rollout as finished
 | 
						|
* call post-rollout webhooks
 | 
						|
* send notification with the canary analysis result
 | 
						|
* wait for the canary deployment to be updated and start over
 | 
						|
 | 
						|
### Rollout Weights
 | 
						|
 | 
						|
By default Flagger uses linear weight values for the promotion, with the start value,
 | 
						|
the step and the maximum weight value in 0 to 100 range.
 | 
						|
 | 
						|
Example:
 | 
						|
 | 
						|
```yaml
 | 
						|
# canary.yaml
 | 
						|
spec:
 | 
						|
  analysis:
 | 
						|
    maxWeight: 50
 | 
						|
    stepWeight: 20
 | 
						|
```
 | 
						|
 | 
						|
This configuration performs analysis starting from 20, increasing by 20 until weight goes above 50.  
 | 
						|
We would have steps (canary weight : primary weight):
 | 
						|
 | 
						|
* 20 (20 : 80)
 | 
						|
* 40 (40 : 60)
 | 
						|
* 60 (60 : 40)
 | 
						|
* promotion
 | 
						|
 | 
						|
In order to enable non-linear promotion a new parameter was introduced:
 | 
						|
 | 
						|
* `stepWeights` - determines the ordered array of weights, which shall be used during canary promotion.
 | 
						|
 | 
						|
Example:
 | 
						|
 | 
						|
```yaml
 | 
						|
# canary.yaml
 | 
						|
spec:
 | 
						|
  analysis:
 | 
						|
    stepWeights: [1, 2, 10, 80]
 | 
						|
```
 | 
						|
 | 
						|
This configuration performs analysis starting from 1, going through `stepWeights` values till 80.  
 | 
						|
We would have steps (canary weight : primary weight):
 | 
						|
 | 
						|
* 1 (1 : 99)
 | 
						|
* 2 (2 : 98)
 | 
						|
* 10 (10 : 90)
 | 
						|
* 80 (20 : 60)
 | 
						|
* promotion
 | 
						|
 | 
						|
## A/B Testing
 | 
						|
 | 
						|
For frontend applications that require session affinity you should use
 | 
						|
HTTP headers or cookies match conditions to ensure a set of users
 | 
						|
will stay on the same version for the whole duration of the canary analysis.
 | 
						|
 | 
						|

 | 
						|
 | 
						|
You can enable A/B testing by specifying the HTTP match conditions and the number of iterations.
 | 
						|
If Flagger finds a HTTP match condition, it will ignore the `maxWeight` and `stepWeight` settings.
 | 
						|
 | 
						|
Istio example:
 | 
						|
 | 
						|
```yaml
 | 
						|
  analysis:
 | 
						|
    # schedule interval (default 60s)
 | 
						|
    interval: 1m
 | 
						|
    # total number of iterations
 | 
						|
    iterations: 10
 | 
						|
    # max number of failed iterations before rollback
 | 
						|
    threshold: 2
 | 
						|
    # canary match condition
 | 
						|
    match:
 | 
						|
      - headers:
 | 
						|
          x-canary:
 | 
						|
            regex: ".*insider.*"
 | 
						|
      - headers:
 | 
						|
          cookie:
 | 
						|
            regex: "^(.*?;)?(canary=always)(;.*)?$"
 | 
						|
```
 | 
						|
 | 
						|
The above configuration will run an analysis for ten minutes targeting the Safari users and those that have a test cookie.
 | 
						|
You can determine the minimum time that it takes to validate and promote a canary deployment using this formula:
 | 
						|
 | 
						|
```text
 | 
						|
interval * iterations
 | 
						|
```
 | 
						|
 | 
						|
And the time it takes for a canary to be rollback when the metrics or webhook checks are failing:
 | 
						|
 | 
						|
```text
 | 
						|
interval * threshold
 | 
						|
```
 | 
						|
 | 
						|
Istio example:
 | 
						|
 | 
						|
```yaml
 | 
						|
  analysis:
 | 
						|
    interval: 1m
 | 
						|
    threshold: 10
 | 
						|
    iterations: 2
 | 
						|
    match:
 | 
						|
      - headers:
 | 
						|
          x-canary:
 | 
						|
            exact: "insider"
 | 
						|
      - headers:
 | 
						|
          cookie:
 | 
						|
            regex: "^(.*?;)?(canary=always)(;.*)?$"
 | 
						|
      - sourceLabels:
 | 
						|
          app.kubernetes.io/name: "scheduler"
 | 
						|
```
 | 
						|
 | 
						|
The header keys must be lowercase and use hyphen as the separator.
 | 
						|
Header values are case-sensitive and formatted as follows:
 | 
						|
 | 
						|
* `exact: "value"` for exact string match
 | 
						|
* `prefix: "value"` for prefix-based match
 | 
						|
* `suffix: "value"` for suffix-based match
 | 
						|
* `regex: "value"` for [RE2](https://github.com/google/re2/wiki/Syntax) style regex-based match
 | 
						|
 | 
						|
Note that the `sourceLabels` match conditions are applicable only when
 | 
						|
the `mesh` gateway is included in the `canary.service.gateways` list.
 | 
						|
 | 
						|
App Mesh example:
 | 
						|
 | 
						|
```yaml
 | 
						|
  analysis:
 | 
						|
    interval: 1m
 | 
						|
    threshold: 10
 | 
						|
    iterations: 2
 | 
						|
    match:
 | 
						|
      - headers:
 | 
						|
          user-agent:
 | 
						|
            regex: ".*Chrome.*"
 | 
						|
```
 | 
						|
 | 
						|
Note that App Mesh supports a single condition.
 | 
						|
 | 
						|
Contour example:
 | 
						|
 | 
						|
```yaml
 | 
						|
  analysis:
 | 
						|
    interval: 1m
 | 
						|
    threshold: 10
 | 
						|
    iterations: 2
 | 
						|
    match:
 | 
						|
      - headers:
 | 
						|
          user-agent:
 | 
						|
            prefix: "Chrome"
 | 
						|
```
 | 
						|
 | 
						|
Note that Contour does not support regex, you can use prefix, suffix or exact.
 | 
						|
 | 
						|
NGINX example:
 | 
						|
 | 
						|
```yaml
 | 
						|
  analysis:
 | 
						|
    interval: 1m
 | 
						|
    threshold: 10
 | 
						|
    iterations: 2
 | 
						|
    match:
 | 
						|
      - headers:
 | 
						|
          x-canary:
 | 
						|
            exact: "insider"
 | 
						|
      - headers:
 | 
						|
          cookie:
 | 
						|
            exact: "canary"
 | 
						|
```
 | 
						|
 | 
						|
Note that the NGINX ingress controller supports only exact matching for
 | 
						|
cookies names where the value must be set to `always`.
 | 
						|
Starting with NGINX ingress v0.31, regex matching is supported for header values.
 | 
						|
 | 
						|
The above configurations will route users with the x-canary header
 | 
						|
or canary cookie to the canary instance during analysis:
 | 
						|
 | 
						|
```bash
 | 
						|
curl -H 'X-Canary: insider' http://app.example.com
 | 
						|
curl -b 'canary=always' http://app.example.com
 | 
						|
```
 | 
						|
 | 
						|
## Blue/Green Deployments
 | 
						|
 | 
						|
For applications that are not deployed on a service mesh,
 | 
						|
Flagger can orchestrate blue/green style deployments with Kubernetes L4 networking.
 | 
						|
When using Istio you have the option to mirror traffic between blue and green.
 | 
						|
 | 
						|

 | 
						|
 | 
						|
You can use the blue/green deployment strategy by replacing
 | 
						|
`stepWeight/maxWeight` with `iterations` in the `analysis` spec:
 | 
						|
 | 
						|
```yaml
 | 
						|
  analysis:
 | 
						|
    # schedule interval (default 60s)
 | 
						|
    interval: 1m
 | 
						|
    # total number of iterations
 | 
						|
    iterations: 10
 | 
						|
    # max number of failed iterations before rollback
 | 
						|
    threshold: 2
 | 
						|
```
 | 
						|
 | 
						|
With the above configuration Flagger will run conformance and load tests on the canary pods for ten minutes.
 | 
						|
If the metrics analysis succeeds, live traffic will be switched from
 | 
						|
the old version to the new one when the canary is promoted.
 | 
						|
 | 
						|
The blue/green deployment strategy is supported for all service mesh providers.
 | 
						|
 | 
						|
Blue/Green rollout steps for service mesh:
 | 
						|
 | 
						|
* detect new revision (deployment spec, secrets or configmaps changes)
 | 
						|
* scale up the canary (green)
 | 
						|
* run conformance tests for the canary pods
 | 
						|
* run load tests and metric checks for the canary pods every minute
 | 
						|
* abort the canary release if the failure threshold is reached
 | 
						|
* route traffic to canary (This doesn't happen when using the kubernetes provider)
 | 
						|
* promote canary spec over primary (blue)
 | 
						|
* wait for primary rollout
 | 
						|
* route traffic to primary
 | 
						|
* scale down canary
 | 
						|
 | 
						|
After the analysis finishes, the traffic is routed to the canary (green) before
 | 
						|
triggering the primary (blue) rolling update,
 | 
						|
this ensures a smooth transition to the new version avoiding dropping
 | 
						|
in-flight requests during the Kubernetes deployment rollout.
 | 
						|
 | 
						|
## Blue/Green with Traffic Mirroring
 | 
						|
 | 
						|
Traffic Mirroring is a pre-stage in a Canary (progressive traffic shifting) or Blue/Green deployment strategy.
 | 
						|
Traffic mirroring will copy each incoming request, sending one request to the primary and one to the canary service.
 | 
						|
The response from the primary is sent back to the user. The response from the canary is discarded.
 | 
						|
Metrics are collected on both requests so that the deployment will only proceed if the canary metrics are healthy.
 | 
						|
 | 
						|
Mirroring should be used for requests that are **idempotent** or capable of being processed
 | 
						|
twice (once by the primary and once by the canary).
 | 
						|
Reads are idempotent. Before using mirroring on requests that may be writes,
 | 
						|
you should consider what will happen if a write is duplicated and handled by the primary and canary.
 | 
						|
 | 
						|
To use mirroring, set `spec.analysis.mirror` to `true`.
 | 
						|
 | 
						|
```yaml
 | 
						|
  analysis:
 | 
						|
    # schedule interval (default 60s)
 | 
						|
    interval: 1m
 | 
						|
    # total number of iterations
 | 
						|
    iterations: 10
 | 
						|
    # max number of failed iterations before rollback
 | 
						|
    threshold: 2
 | 
						|
    # Traffic shadowing
 | 
						|
    mirror: true
 | 
						|
    # Weight of the traffic mirrored to your canary (defaults to 100%)
 | 
						|
    # Only applicable for Istio.
 | 
						|
    mirrorWeight: 100
 | 
						|
```
 | 
						|
 | 
						|
Mirroring rollout steps for service mesh:
 | 
						|
 | 
						|
* detect new revision (deployment spec, secrets or configmaps changes)
 | 
						|
* scale from zero the canary deployment
 | 
						|
* wait for the HPA to set the canary minimum replicas
 | 
						|
* check canary pods health
 | 
						|
* run the acceptance tests
 | 
						|
* abort the canary release if tests fail
 | 
						|
* start the load tests
 | 
						|
* mirror 100% of the traffic from primary to canary
 | 
						|
* check request success rate and request duration every minute
 | 
						|
* abort the canary release if the failure threshold is reached
 | 
						|
* stop traffic mirroring after the number of iterations is reached
 | 
						|
* route live traffic to the canary pods
 | 
						|
* promote the canary \(update the primary secrets, configmaps and deployment spec\)
 | 
						|
* wait for the primary deployment rollout to finish
 | 
						|
* wait for the HPA to set the primary minimum replicas
 | 
						|
* check primary pods health
 | 
						|
* switch live traffic back to primary
 | 
						|
* scale to zero the canary
 | 
						|
* send notification with the canary analysis result
 | 
						|
 | 
						|
After the analysis finishes, the traffic is routed to the canary (green) before
 | 
						|
triggering the primary (blue) rolling update, this ensures a smooth transition
 | 
						|
to the new version avoiding dropping in-flight requests during the Kubernetes deployment rollout.
 | 
						|
 | 
						|
## Canary Release with Session Affinity
 | 
						|
 | 
						|
This deployment strategy mixes a Canary Release with A/B testing. A Canary Release is helpful when
 | 
						|
we're trying to expose new features to users progressively, but because of the very nature of its
 | 
						|
routing (weight based), users can land on the application's old version even after they have been
 | 
						|
routed to the new version previously. This can be annoying, or worse break how other services interact
 | 
						|
with our application. To address this issue, we borrow some things from A/B testing.
 | 
						|
 | 
						|
Since A/B testing is particularly helpful for applications that require session affinity, we integrate
 | 
						|
cookie based routing with regular weight based routing. This means once a user is exposed to the new
 | 
						|
version of our application (based on the traffic weights), they're always routed to that version, i.e.
 | 
						|
they're never routed back to the old version of our application.
 | 
						|
 | 
						|
You can enable this, by specifying `.spec.analsyis.sessionAffinity` in the Canary:
 | 
						|
 | 
						|
```yaml
 | 
						|
  analysis:
 | 
						|
    # schedule interval (default 60s)
 | 
						|
    interval: 1m
 | 
						|
    # max number of failed metric checks before rollback
 | 
						|
    threshold: 10
 | 
						|
    # max traffic percentage routed to canary
 | 
						|
    # percentage (0-100)
 | 
						|
    maxWeight: 50
 | 
						|
    # canary increment step
 | 
						|
    # percentage (0-100)
 | 
						|
    stepWeight: 2
 | 
						|
    # session affinity config
 | 
						|
    sessionAffinity:
 | 
						|
      # name of the cookie used
 | 
						|
      cookieName: flagger-cookie
 | 
						|
      # max age of the cookie (in seconds)
 | 
						|
      # optional; defaults to 86400
 | 
						|
      maxAge: 21600
 | 
						|
```
 | 
						|
 | 
						|
`.spec.analysis.sessionAffinity.cookieName` is the name of the Cookie that is stored. The value of the
 | 
						|
cookie is a randomly generated string of characters that act as a unique identifier. For the above
 | 
						|
config, the response header of a request routed to the canary deployment during a Canary run will look like:
 | 
						|
```
 | 
						|
Set-Cookie: flagger-cookie=LpsIaLdoNZ; Max-Age=21600
 | 
						|
```
 | 
						|
 | 
						|
After a Canary run is over and all traffic is shifted back to the primary deployment, all responses will
 | 
						|
have the following header:
 | 
						|
```
 | 
						|
Set-Cookie: flagger-cookie=LpsIaLdoNZ; Max-Age=-1
 | 
						|
```
 | 
						|
This tells the client to delete the cookie, making sure there are no junk cookies lying around in the user's
 | 
						|
system.
 | 
						|
 | 
						|
If a new Canary run is triggered, the response header will set a new cookie for all requests routed to
 | 
						|
the Canary deployment:
 | 
						|
```
 | 
						|
Set-Cookie: flagger-cookie=McxKdLQoIN; Max-Age=21600
 | 
						|
```
 |