diff --git a/README.md b/README.md index 393d3516..1838328d 100644 --- a/README.md +++ b/README.md @@ -22,7 +22,7 @@ and is a core component of the [GitOps toolkit](https://fluxcd.io/flux/component | [OCIRepository](docs/spec/v1beta2/ocirepositories.md) | `source.toolkit.fluxcd.io/v1beta2` | | [HelmRepository](docs/spec/v1/helmrepositories.md) | `source.toolkit.fluxcd.io/v1` | | [HelmChart](docs/spec/v1/helmcharts.md) | `source.toolkit.fluxcd.io/v1` | -| [Bucket](docs/spec/v1beta2/buckets.md) | `source.toolkit.fluxcd.io/v1beta2` | +| [Bucket](docs/spec/v1/buckets.md) | `source.toolkit.fluxcd.io/v1` | ## Features diff --git a/docs/spec/v1/README.md b/docs/spec/v1/README.md index a87051a5..3a382959 100644 --- a/docs/spec/v1/README.md +++ b/docs/spec/v1/README.md @@ -8,6 +8,7 @@ This is the v1 API specification for defining the desired state sources of Kuber + [GitRepository](gitrepositories.md) + [HelmRepository](helmrepositories.md) + [HelmChart](helmcharts.md) + + [Bucket](buckets.md) ## Implementation diff --git a/docs/spec/v1/buckets.md b/docs/spec/v1/buckets.md new file mode 100644 index 00000000..980a4b99 --- /dev/null +++ b/docs/spec/v1/buckets.md @@ -0,0 +1,1382 @@ +# Buckets + + + +The `Bucket` API defines a Source to produce an Artifact for objects from storage +solutions like Amazon S3, Google Cloud Storage buckets, or any other solution +with a S3 compatible API such as Minio, Alibaba Cloud OSS and others. + +## Example + +The following is an example of a Bucket. It creates a tarball (`.tar.gz`) +Artifact with the fetched objects from an object storage with an S3 +compatible API (e.g. [Minio](https://min.io)): + +```yaml +--- +apiVersion: source.toolkit.fluxcd.io/v1 +kind: Bucket +metadata: + name: minio-bucket + namespace: default +spec: + interval: 5m0s + endpoint: minio.example.com + insecure: true + secretRef: + name: minio-bucket-secret + bucketName: example +--- +apiVersion: v1 +kind: Secret +metadata: + name: minio-bucket-secret + namespace: default +type: Opaque +stringData: + accesskey: + secretkey: +``` + +In the above example: + +- A Bucket named `minio-bucket` is created, indicated by the + `.metadata.name` field. +- The source-controller checks the object storage bucket every five minutes, + indicated by the `.spec.interval` field. +- It authenticates to the `minio.example.com` endpoint with + the static credentials from the `minio-secret` Secret data, indicated by + the `.spec.endpoint` and `.spec.secretRef.name` fields. +- A list of object keys and their [etags](https://en.wikipedia.org/wiki/HTTP_ETag) + in the `.spec.bucketName` bucket is compiled, while filtering the keys using + [default ignore rules](#default-exclusions). +- The digest (algorithm defaults to SHA256) of the list is used as Artifact + revision, reported in-cluster in the `.status.artifact.revision` field. +- When the current Bucket revision differs from the latest calculated revision, + all objects are fetched and archived. +- The new Artifact is reported in the `.status.artifact` field. + +You can run this example by saving the manifest into `bucket.yaml`, and +changing the Bucket and Secret values to target a Minio instance you have +control over. + +**Note:** For more advanced examples targeting e.g. Amazon S3 or GCP, see +[Provider](#provider). + +1. Apply the resource on the cluster: + + ```sh + kubectl apply -f bucket.yaml + ``` + +2. Run `kubectl get buckets` to see the Bucket: + + ```console + NAME ENDPOINT AGE READY STATUS + minio-bucket minio.example.com 34s True stored artifact for revision 'sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855' + ``` + +3. Run `kubectl describe bucket minio-bucket` to see the [Artifact](#artifact) + and [Conditions](#conditions) in the Bucket's Status: + + ```console + ... + Status: + Artifact: + Digest: sha256:72aa638abb455ca5f9ef4825b949fd2de4d4be0a74895bf7ed2338622cd12686 + Last Update Time: 2024-02-01T23:43:38Z + Path: bucket/default/minio-bucket/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855.tar.gz + Revision: sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 + Size: 38099 + URL: http://source-controller.source-system.svc.cluster.local./bucket/default/minio-bucket/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855.tar.gz + Conditions: + Last Transition Time: 2024-02-01T23:43:38Z + Message: stored artifact for revision 'sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855' + Observed Generation: 1 + Reason: Succeeded + Status: True + Type: Ready + Last Transition Time: 2024-02-01T23:43:38Z + Message: stored artifact for revision 'sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855' + Observed Generation: 1 + Reason: Succeeded + Status: True + Type: ArtifactInStorage + Observed Generation: 1 + URL: http://source-controller.source-system.svc.cluster.local./bucket/default/minio-bucket/latest.tar.gz + Events: + Type Reason Age From Message + ---- ------ ---- ---- ------- + Normal NewArtifact 82s source-controller stored artifact with 16 fetched files from 'example' bucket + ``` + +## Writing a Bucket spec + +As with all other Kubernetes config, a Bucket needs `apiVersion`, `kind`, and +`metadata` fields. The name of a Bucket object must be a valid +[DNS subdomain name](https://kubernetes.io/docs/concepts/overview/working-with-objects/names#dns-subdomain-names). + +A Bucket also needs a +[`.spec` section](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#spec-and-status). + +### Provider + +The `.spec.provider` field allows for specifying a Provider to enable provider +specific configurations, for example to communicate with a non-S3 compatible +API endpoint, or to change the authentication method. + +Supported options are: + +- [Generic](#generic) +- [AWS](#aws) +- [Azure](#azure) +- [GCP](#gcp) + +If you do not specify `.spec.provider`, it defaults to `generic`. + +#### Generic + +When a Bucket's `spec.provider` is set to `generic`, the controller will +attempt to communicate with the specified [Endpoint](#endpoint) using the +[Minio Client SDK](https://github.com/minio/minio-go), which can communicate +with any Amazon S3 compatible object storage (including +[GCS](https://cloud.google.com/storage/docs/interoperability), +[Wasabi](https://wasabi-support.zendesk.com/hc/en-us/articles/360002079671-How-do-I-use-Minio-Client-with-Wasabi-), +and many others). + +The `generic` Provider _requires_ a [Secret reference](#secret-reference) to a +Secret with `.data.accesskey` and `.data.secretkey` values, used to +authenticate with static credentials. + +The Provider allows for specifying a region the bucket is in using the +[`.spec.region` field](#region), if required by the [Endpoint](#endpoint). + +##### Generic example + +```yaml +--- +apiVersion: source.toolkit.fluxcd.io/v1 +kind: Bucket +metadata: + name: generic-insecure + namespace: default +spec: + provider: generic + interval: 5m0s + bucketName: podinfo + endpoint: minio.minio.svc.cluster.local:9000 + timeout: 60s + insecure: true + secretRef: + name: minio-credentials +--- +apiVersion: v1 +kind: Secret +metadata: + name: minio-credentials + namespace: default +type: Opaque +data: + accesskey: + secretkey: +``` + +#### AWS + +When a Bucket's `.spec.provider` field is set to `aws`, the source-controller +will attempt to communicate with the specified [Endpoint](#endpoint) using the +[Minio Client SDK](https://github.com/minio/minio-go). + +Without a [Secret reference](#secret-reference), authorization using +credentials retrieved from the AWS EC2 service is attempted by default. When +a reference is specified, it expects a Secret with `.data.accesskey` and +`.data.secretkey` values, used to authenticate with static credentials. + +The Provider allows for specifying the +[Amazon AWS Region](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html#concepts-available-regions) +using the [`.spec.region` field](#region). + +##### AWS EC2 example + +**Note:** On EKS you have to create an [IAM role](#aws-iam-role-example) for +the source-controller service account that grants access to the bucket. + +```yaml +--- +apiVersion: source.toolkit.fluxcd.io/v1 +kind: Bucket +metadata: + name: aws + namespace: default +spec: + interval: 5m0s + provider: aws + bucketName: podinfo + endpoint: s3.amazonaws.com + region: us-east-1 + timeout: 30s +``` + +##### AWS IAM role example + +Replace `` with the specified `.spec.bucketName`. + +```json +{ + "Version": "2012-10-17", + "Statement": [ + { + "Sid": "", + "Effect": "Allow", + "Action": "s3:GetObject", + "Resource": "arn:aws:s3:::/*" + }, + { + "Sid": "", + "Effect": "Allow", + "Action": "s3:ListBucket", + "Resource": "arn:aws:s3:::" + } + ] +} +``` + +##### AWS static auth example + +```yaml +--- +apiVersion: source.toolkit.fluxcd.io/v1 +kind: Bucket +metadata: + name: aws + namespace: default +spec: + interval: 5m0s + provider: aws + bucketName: podinfo + endpoint: s3.amazonaws.com + region: us-east-1 + secretRef: + name: aws-credentials +--- +apiVersion: v1 +kind: Secret +metadata: + name: aws-credentials + namespace: default +type: Opaque +data: + accesskey: + secretkey: +``` + +#### Azure + +When a Bucket's `.spec.provider` is set to `azure`, the source-controller will +attempt to communicate with the specified [Endpoint](#endpoint) using the +[Azure Blob Storage SDK for Go](https://github.com/Azure/azure-sdk-for-go/tree/main/sdk/storage/azblob). + +Without a [Secret reference](#secret-reference), authentication using a chain +with: + +- [Environment credentials](https://pkg.go.dev/github.com/Azure/azure-sdk-for-go/sdk/azidentity#EnvironmentCredential) +- [Workload Identity](https://pkg.go.dev/github.com/Azure/azure-sdk-for-go/sdk/azidentity@v1.3.0-beta.4#WorkloadIdentityCredential) +- [Managed Identity](https://pkg.go.dev/github.com/Azure/azure-sdk-for-go/sdk/azidentity#ManagedIdentityCredential) + with the `AZURE_CLIENT_ID` +- Managed Identity with a system-assigned identity + +is attempted by default. If no chain can be established, the bucket +is assumed to be publicly reachable. + +When a reference is specified, it expects a Secret with one of the following +sets of `.data` fields: + +- `tenantId`, `clientId` and `clientSecret` for authenticating a Service + Principal with a secret. +- `tenantId`, `clientId` and `clientCertificate` (plus optionally + `clientCertificatePassword` and/or `clientCertificateSendChain`) for + authenticating a Service Principal with a certificate. +- `clientId` for authenticating using a Managed Identity. +- `accountKey` for authenticating using a + [Shared Key](https://pkg.go.dev/github.com/Azure/azure-sdk-for-go/sdk/storage/azblob#SharedKeyCredential). +- `sasKey` for authenticating using a [SAS Token](https://docs.microsoft.com/en-us/azure/storage/common/storage-sas-overview) + +For any Managed Identity and/or Azure Active Directory authentication method, +the base URL can be configured using `.data.authorityHost`. If not supplied, +[`AzurePublicCloud` is assumed](https://pkg.go.dev/github.com/Azure/azure-sdk-for-go/sdk/azidentity#AuthorityHost). + +##### Azure example + +```yaml +--- +apiVersion: source.toolkit.fluxcd.io/v1 +kind: Bucket +metadata: + name: azure-public + namespace: default +spec: + interval: 5m0s + provider: azure + bucketName: podinfo + endpoint: https://podinfoaccount.blob.core.windows.net + timeout: 30s +``` + +##### Azure Service Principal Secret example + +```yaml +--- +apiVersion: source.toolkit.fluxcd.io/v1 +kind: Bucket +metadata: + name: azure-service-principal-secret + namespace: default +spec: + interval: 5m0s + provider: azure + bucketName: + endpoint: https://.blob.core.windows.net + secretRef: + name: azure-sp-auth +--- +apiVersion: v1 +kind: Secret +metadata: + name: azure-sp-auth + namespace: default +type: Opaque +data: + tenantId: + clientId: + clientSecret: +``` + +##### Azure Service Principal Certificate example + +```yaml +--- +apiVersion: source.toolkit.fluxcd.io/v1 +kind: Bucket +metadata: + name: azure-service-principal-cert + namespace: default +spec: + interval: 5m0s + provider: azure + bucketName: + endpoint: https://.blob.core.windows.net + secretRef: + name: azure-sp-auth +--- +apiVersion: v1 +kind: Secret +metadata: + name: azure-sp-auth + namespace: default +type: Opaque +data: + tenantId: + clientId: + clientCertificate: + # Plus optionally + clientCertificatePassword: + clientCertificateSendChain: # either "1" or "true" +``` + +##### Azure Managed Identity with Client ID example + +```yaml +--- +apiVersion: source.toolkit.fluxcd.io/v1 +kind: Bucket +metadata: + name: azure-managed-identity + namespace: default +spec: + interval: 5m0s + provider: azure + bucketName: + endpoint: https://.blob.core.windows.net + secretRef: + name: azure-smi-auth +--- +apiVersion: v1 +kind: Secret +metadata: + name: azure-smi-auth + namespace: default +type: Opaque +data: + clientId: +``` + +##### Azure Blob Shared Key example + +```yaml +--- +apiVersion: source.toolkit.fluxcd.io/v1 +kind: Bucket +metadata: + name: azure-shared-key + namespace: default +spec: + interval: 5m0s + provider: azure + bucketName: + endpoint: https://.blob.core.windows.net + secretRef: + name: azure-key +--- +apiVersion: v1 +kind: Secret +metadata: + name: azure-key + namespace: default +type: Opaque +data: + accountKey: +``` + +##### Workload Identity + +If you have [Workload Identity](https://azure.github.io/azure-workload-identity/docs/installation/managed-clusters.html) +set up on your cluster, you need to create an Azure Identity and give it +access to Azure Blob Storage. + +```shell +export IDENTITY_NAME="blob-access" + +az role assignment create --role "Storage Blob Data Reader" \ +--assignee-object-id "$(az identity show -n $IDENTITY_NAME -o tsv --query principalId -g $RESOURCE_GROUP)" \ +--scope "/subscriptions//resourceGroups//providers/Microsoft.Storage/storageAccounts//blobServices/default/containers/" +``` + +Establish a federated identity between the Identity and the source-controller +ServiceAccount. + +```shell +export SERVICE_ACCOUNT_ISSUER="$(az aks show --resource-group --name --query "oidcIssuerProfile.issuerUrl" -otsv)" + +az identity federated-credential create \ + --name "kubernetes-federated-credential" \ + --identity-name "${IDENTITY_NAME}" \ + --resource-group "${RESOURCE_GROUP}" \ + --issuer "${SERVICE_ACCOUNT_ISSUER}" \ + --subject "system:serviceaccount:flux-system:source-controller" +``` + +Add a patch to label and annotate the source-controller Deployment and ServiceAccount +correctly so that it can match an identity binding: + +```yaml +apiVersion: kustomize.config.k8s.io/v1beta1 +kind: Kustomization +resources: + - gotk-components.yaml + - gotk-sync.yaml +patches: + - patch: |- + apiVersion: v1 + kind: ServiceAccount + metadata: + name: source-controller + namespace: flux-system + annotations: + azure.workload.identity/client-id: + labels: + azure.workload.identity/use: "true" + - patch: |- + apiVersion: apps/v1 + kind: Deployment + metadata: + name: source-controller + namespace: flux-system + labels: + azure.workload.identity/use: "true" + spec: + template: + metadata: + labels: + azure.workload.identity/use: "true" +``` + +If you have set up Workload Identity correctly and labeled the source-controller +Deployment and ServiceAccount, then you don't need to reference a Secret. For more information, +please see [documentation](https://azure.github.io/azure-workload-identity/docs/quick-start.html). + +```yaml +apiVersion: source.toolkit.fluxcd.io/v1 +kind: Bucket +metadata: + name: azure-bucket + namespace: flux-system +spec: + interval: 5m0s + provider: azure + bucketName: testsas + endpoint: https://testfluxsas.blob.core.windows.net +``` + +##### Deprecated: Managed Identity with AAD Pod Identity + +If you are using [aad pod identity](https://azure.github.io/aad-pod-identity/docs), +You need to create an Azure Identity and give it access to Azure Blob Storage. + +```sh +export IDENTITY_NAME="blob-access" + +az role assignment create --role "Storage Blob Data Reader" \ +--assignee-object-id "$(az identity show -n $IDENTITY_NAME -o tsv --query principalId -g $RESOURCE_GROUP)" \ +--scope "/subscriptions//resourceGroups/$RESOURCE_GROUP/providers/Microsoft.Storage/storageAccounts//blobServices/default/containers/" + +export IDENTITY_CLIENT_ID="$(az identity show -n ${IDENTITY_NAME} -g ${RESOURCE_GROUP} -otsv --query clientId)" +export IDENTITY_RESOURCE_ID="$(az identity show -n ${IDENTITY_NAME} -otsv --query id)" +``` + +Create an AzureIdentity object that references the identity created above: + +```yaml +--- +apiVersion: aadpodidentity.k8s.io/v1 +kind: AzureIdentity +metadata: + name: # source-controller label will match this name + namespace: flux-system +spec: + clientID: + resourceID: + type: 0 # user-managed identity +``` + +Create an AzureIdentityBinding object that binds Pods with a specific selector +with the AzureIdentity created: + +```yaml +apiVersion: "aadpodidentity.k8s.io/v1" +kind: AzureIdentityBinding +metadata: + name: ${IDENTITY_NAME}-binding +spec: + azureIdentity: ${IDENTITY_NAME} + selector: ${IDENTITY_NAME} +``` + +Label the source-controller Deployment correctly so that it can match an identity binding: + +```yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: kustomize-controller + namespace: flux-system +spec: + template: + metadata: + labels: + aadpodidbinding: ${IDENTITY_NAME} # match the AzureIdentity name +``` + +If you have set up aad-pod-identity correctly and labeled the source-controller +Deployment, then you don't need to reference a Secret. + +```yaml +apiVersion: source.toolkit.fluxcd.io/v1 +kind: Bucket +metadata: + name: azure-bucket + namespace: flux-system +spec: + interval: 5m0s + provider: azure + bucketName: testsas + endpoint: https://testfluxsas.blob.core.windows.net +``` + +##### Azure Blob SAS Token example + +```yaml +--- +apiVersion: source.toolkit.fluxcd.io/v1 +kind: Bucket +metadata: + name: azure-sas-token + namespace: default +spec: + interval: 5m0s + provider: azure + bucketName: + endpoint: https://.blob.core.windows.net + secretRef: + name: azure-key +--- +apiVersion: v1 +kind: Secret +metadata: + name: azure-key + namespace: default +type: Opaque +data: + sasKey: +``` + +The `sasKey` only contains the SAS token e.g +`?sv=2020-08-0&ss=bfqt&srt=co&sp=rwdlacupitfx&se=2022-05-26T21:55:35Z&st=2022-05...`. +The leading question mark (`?`) is optional. The query values from the `sasKey` +data field in the Secrets gets merged with the ones in the `.spec.endpoint` of +the Bucket. If the same key is present in the both of them, the value in the +`sasKey` takes precedence. + +**Note:** The SAS token has an expiry date, and it must be updated before it +expires to allow Flux to continue to access Azure Storage. It is allowed to use +an account-level or container-level SAS token. + +The minimum permissions for an account-level SAS token are: + +- Allowed services: `Blob` +- Allowed resource types: `Container`, `Object` +- Allowed permissions: `Read`, `List` + +The minimum permissions for a container-level SAS token are: + +- Allowed permissions: `Read`, `List` + +Refer to the [Azure documentation](https://learn.microsoft.com/en-us/rest/api/storageservices/create-account-sas#blob-service) for a full overview on permissions. + +#### GCP + +When a Bucket's `.spec.provider` is set to `gcp`, the source-controller will +attempt to communicate with the specified [Endpoint](#endpoint) using the +[Google Client SDK](https://github.com/googleapis/google-api-go-client). + +Without a [Secret reference](#secret-reference), authorization using a +workload identity is attempted by default. The workload identity is obtained +using the `GOOGLE_APPLICATION_CREDENTIALS` environment variable, falling back +to the Google Application Credential file in the config directory. +When a reference is specified, it expects a Secret with a `.data.serviceaccount` +value with a GCP service account JSON file. + +The Provider allows for specifying the +[Bucket location](https://cloud.google.com/storage/docs/locations) using the +[`.spec.region` field](#region). + +##### GCP example + +```yaml +--- +apiVersion: source.toolkit.fluxcd.io/v1 +kind: Bucket +metadata: + name: gcp-workload-identity + namespace: default +spec: + interval: 5m0s + provider: gcp + bucketName: podinfo + endpoint: storage.googleapis.com + region: us-east-1 + timeout: 30s +``` + +##### GCP static auth example + +```yaml +--- +apiVersion: source.toolkit.fluxcd.io/v1 +kind: Bucket +metadata: + name: gcp-secret + namespace: default +spec: + interval: 5m0s + provider: gcp + bucketName: + endpoint: storage.googleapis.com + region: + secretRef: + name: gcp-service-account +--- +apiVersion: v1 +kind: Secret +metadata: + name: gcp-service-account + namespace: default +type: Opaque +data: + serviceaccount: +``` + +Where the (base64 decoded) value of `.data.serviceaccount` looks like this: + +```json +{ + "type": "service_account", + "project_id": "example", + "private_key_id": "28qwgh3gdf5hj3gb5fj3gsu5yfgh34f45324568hy2", + "private_key": "-----BEGIN PRIVATE KEY-----\nHwethgy123hugghhhbdcu6356dgyjhsvgvGFDHYgcdjbvcdhbsx63c\n76tgycfehuhVGTFYfw6t7ydgyVgydheyhuggycuhejwy6t35fthyuhegvcetf\nTFUHGTygghubhxe65ygt6tgyedgy326hucyvsuhbhcvcsjhcsjhcsvgdtHFCGi\nHcye6tyyg3gfyuhchcsbhygcijdbhyyTF66tuhcevuhdcbhuhhvftcuhbh3uh7t6y\nggvftUHbh6t5rfthhuGVRtfjhbfcrd5r67yuhuvgFTYjgvtfyghbfcdrhyjhbfctfdfyhvfg\ntgvggtfyghvft6tugvTF5r66tujhgvfrtyhhgfct6y7ytfr5ctvghbhhvtghhjvcttfycf\nffxfghjbvgcgyt67ujbgvctfyhVC7uhvgcyjvhhjvyujc\ncgghgvgcfhgg765454tcfthhgftyhhvvyvvffgfryyu77reredswfthhgfcftycfdrttfhf/\n-----END PRIVATE KEY-----\n", + "client_email": "test@example.iam.gserviceaccount.com", + "client_id": "32657634678762536746", + "auth_uri": "https://accounts.google.com/o/oauth2/auth", + "token_uri": "https://oauth2.googleapis.com/token", + "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs", + "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/test%40podinfo.iam.gserviceaccount.com" +} +``` + +### Interval + +`.spec.interval` is a required field that specifies the interval which the +object storage bucket must be consulted at. + +After successfully reconciling a Bucket object, the source-controller requeues +the object for inspection after the specified interval. The value must be in a +[Go recognized duration string format](https://pkg.go.dev/time#ParseDuration), +e.g. `10m0s` to look at the object storage bucket every 10 minutes. + +If the `.metadata.generation` of a resource changes (due to e.g. the apply of a +change to the spec), this is handled instantly outside the interval window. + +**Note:** The controller can be configured to apply a jitter to the interval in +order to distribute the load more evenly when multiple Bucket objects are set up +with the same interval. For more information, please refer to the +[source-controller configuration options](https://fluxcd.io/flux/components/source/options/). + +### Endpoint + +`.spec.endpoint` is a required field that specifies the HTTP/S object storage +endpoint to connect to and fetch objects from. Connecting to an (insecure) +HTTP endpoint requires enabling [`.spec.insecure`](#insecure). + +Some endpoints require the specification of a [`.spec.region`](#region), +see [Provider](#provider) for more (provider specific) examples. + +### STS + +`.spec.sts` is an optional field for specifying the Security Token Service +configuration. A Security Token Service (STS) is a web service that issues +temporary security credentials. By adding this field, one may specify the +STS endpoint from where temporary credentials will be fetched. + +This field is only supported for the `aws` and `generic` bucket [providers](#provider). + +If using `.spec.sts`, the following fields are required: + +- `.spec.sts.provider`, the Security Token Service provider. The only supported + option for the `generic` bucket provider is `ldap`. The only supported option + for the `aws` bucket provider is `aws`. +- `.spec.sts.endpoint`, the HTTP/S endpoint of the Security Token Service. In + the case of `aws` this can be `https://sts.amazonaws.com`, or a Regional STS + Endpoint, or an Interface Endpoint created inside a VPC. In the case of + `ldap` this must be the LDAP server endpoint. + +When using the `ldap` provider, the following fields may also be specified: + +- `.spec.sts.secretRef.name`, the name of the Secret containing the LDAP + credentials. The Secret must contain the following keys: + - `username`, the username to authenticate with. + - `password`, the password to authenticate with. +- `.spec.sts.certSecretRef.name`, the name of the Secret containing the + TLS configuration for communicating with the STS endpoint. The contents + of this Secret must follow the same structure of + [`.spec.certSecretRef.name`](#cert-secret-reference). + +If [`.spec.proxySecretRef.name`](#proxy-secret-reference) is specified, +the proxy configuration will be used for commucating with the STS endpoint. + +Example for the `ldap` provider: + +```yaml +--- +apiVersion: source.toolkit.fluxcd.io/v1 +kind: Bucket +metadata: + name: example + namespace: example +spec: + interval: 5m + bucketName: example + provider: generic + endpoint: minio.example.com + sts: + provider: ldap + endpoint: https://ldap.example.com + secretRef: + name: ldap-credentials + certSecretRef: + name: ldap-tls +--- +apiVersion: v1 +kind: Secret +metadata: + name: ldap-credentials + namespace: example +type: Opaque +stringData: + username: + password: +--- +apiVersion: v1 +kind: Secret +metadata: + name: ldap-tls + namespace: example +type: kubernetes.io/tls # or Opaque +stringData: + tls.crt: + tls.key: + ca.crt: +``` + +### Bucket name + +`.spec.bucketName` is a required field that specifies which object storage +bucket on the [Endpoint](#endpoint) objects should be fetched from. + +See [Provider](#provider) for more (provider specific) examples. + +### Region + +`.spec.region` is an optional field to specify the region a +[`.spec.bucketName`](#bucket-name) is located in. + +See [Provider](#provider) for more (provider specific) examples. + +### Cert secret reference + +`.spec.certSecretRef.name` is an optional field to specify a secret containing +TLS certificate data. The secret can contain the following keys: + +* `tls.crt` and `tls.key`, to specify the client certificate and private key used +for TLS client authentication. These must be used in conjunction, i.e. +specifying one without the other will lead to an error. +* `ca.crt`, to specify the CA certificate used to verify the server, which is +required if the server is using a self-signed certificate. + +If the server is using a self-signed certificate and has TLS client +authentication enabled, all three values are required. + +The Secret should be of type `Opaque` or `kubernetes.io/tls`. All the files in +the Secret are expected to be [PEM-encoded][pem-encoding]. Assuming you have +three files; `client.key`, `client.crt` and `ca.crt` for the client private key, +client certificate and the CA certificate respectively, you can generate the +required Secret using the `flux create secret tls` command: + +```sh +flux create secret tls minio-tls --tls-key-file=client.key --tls-crt-file=client.crt --ca-crt-file=ca.crt +``` + +If TLS client authentication is not required, you can generate the secret with: + +```sh +flux create secret tls minio-tls --ca-crt-file=ca.crt +``` + +This API is only supported for the `generic` [provider](#provider). + +Example usage: + +```yaml +--- +apiVersion: source.toolkit.fluxcd.io/v1 +kind: Bucket +metadata: + name: example + namespace: example +spec: + interval: 5m + bucketName: example + provider: generic + endpoint: minio.example.com + certSecretRef: + name: minio-tls +--- +apiVersion: v1 +kind: Secret +metadata: + name: minio-tls + namespace: example +type: kubernetes.io/tls # or Opaque +stringData: + tls.crt: + tls.key: + ca.crt: +``` + +### Proxy secret reference + +`.spec.proxySecretRef.name` is an optional field used to specify the name of a +Secret that contains the proxy settings for the object. These settings are used +for all the remote operations related to the Bucket. +The Secret can contain three keys: + +- `address`, to specify the address of the proxy server. This is a required key. +- `username`, to specify the username to use if the proxy server is protected by + basic authentication. This is an optional key. +- `password`, to specify the password to use if the proxy server is protected by + basic authentication. This is an optional key. + +Example: + +```yaml +--- +apiVersion: v1 +kind: Secret +metadata: + name: http-proxy +type: Opaque +stringData: + address: http://proxy.com + username: mandalorian + password: grogu +``` + +Proxying can also be configured in the source-controller Deployment directly by +using the standard environment variables such as `HTTPS_PROXY`, `ALL_PROXY`, etc. + +`.spec.proxySecretRef.name` takes precedence over all environment variables. + +### Insecure + +`.spec.insecure` is an optional field to allow connecting to an insecure (HTTP) +[endpoint](#endpoint), if set to `true`. The default value is `false`, +denying insecure (HTTP) connections. + +### Timeout + +`.spec.timeout` is an optional field to specify a timeout for object storage +fetch operations. The value must be in a +[Go recognized duration string format](https://pkg.go.dev/time#ParseDuration), +e.g. `1m30s` for a timeout of one minute and thirty seconds. +The default value is `60s`. + +### Secret reference + +`.spec.secretRef.name` is an optional field to specify a name reference to a +Secret in the same namespace as the Bucket, containing authentication +credentials for the object storage. For some `.spec.provider` implementations +the presence of the field is required, see [Provider](#provider) for more +details and examples. + +### Prefix + +`.spec.prefix` is an optional field to enable server-side filtering +of files in the Bucket. + +**Note:** The server-side filtering works only with the `generic`, `aws` +and `gcp` [provider](#provider) and is preferred over [`.spec.ignore`](#ignore) +as a more efficient way of excluding files. + +### Ignore + +`.spec.ignore` is an optional field to specify rules in [the `.gitignore` +pattern format](https://git-scm.com/docs/gitignore#_pattern_format). Storage +objects which keys match the defined rules are excluded while fetching. + +When specified, `.spec.ignore` overrides the [default exclusion +list](#default-exclusions), and may overrule the [`.sourceignore` file +exclusions](#sourceignore-file). See [excluding files](#excluding-files) +for more information. + +### Suspend + +`.spec.suspend` is an optional field to suspend the reconciliation of a Bucket. +When set to `true`, the controller will stop reconciling the Bucket, and changes +to the resource or in the object storage bucket will not result in a new +Artifact. When the field is set to `false` or removed, it will resume. + +For practical information, see +[suspending and resuming](#suspending-and-resuming). + +## Working with Buckets + +### Excluding files + +By default, storage bucket objects which match the [default exclusion +rules](#default-exclusions) are excluded while fetching. It is possible to +overwrite and/or overrule the default exclusions using a file in the bucket +and/or an in-spec set of rules. + +#### `.sourceignore` file + +Excluding files is possible by adding a `.sourceignore` file in the root of the +object storage bucket. The `.sourceignore` file follows [the `.gitignore` +pattern format](https://git-scm.com/docs/gitignore#_pattern_format), and +pattern entries may overrule [default exclusions](#default-exclusions). + +#### Ignore spec + +Another option is to define the exclusions within the Bucket spec, using the +[`.spec.ignore` field](#ignore). Specified rules override the +[default exclusion list](#default-exclusions), and may overrule `.sourceignore` +file exclusions. + +```yaml +--- +apiVersion: source.toolkit.fluxcd.io/v1 +kind: Bucket +metadata: + name: +spec: + ignore: | + # exclude all + /* + # include deploy dir + !/deploy + # exclude file extensions from deploy dir + /deploy/**/*.md + /deploy/**/*.txt +``` + +### Triggering a reconcile + +To manually tell the source-controller to reconcile a Bucket outside the +[specified interval window](#interval), a Bucket can be annotated with +`reconcile.fluxcd.io/requestedAt: `. Annotating the resource +queues the Bucket for reconciliation if the `` differs from +the last value the controller acted on, as reported in +[`.status.lastHandledReconcileAt`](#last-handled-reconcile-at). + +Using `kubectl`: + +```sh +kubectl annotate --field-manager=flux-client-side-apply --overwrite bucket/ reconcile.fluxcd.io/requestedAt="$(date +%s)" +``` + +Using `flux`: + +```sh +flux reconcile source bucket +``` + +### Waiting for `Ready` + +When a change is applied, it is possible to wait for the Bucket to reach a +[ready state](#ready-bucket) using `kubectl`: + +```sh +kubectl wait bucket/ --for=condition=ready --timeout=1m +``` + +### Suspending and resuming + +When you find yourself in a situation where you temporarily want to pause the +reconciliation of a Bucket, you can suspend it using the [`.spec.suspend` +field](#suspend). + +#### Suspend a Bucket + +In your YAML declaration: + +```yaml +--- +apiVersion: source.toolkit.fluxcd.io/v1 +kind: Bucket +metadata: + name: +spec: + suspend: true +``` + +Using `kubectl`: + +```sh +kubectl patch bucket --field-manager=flux-client-side-apply -p '{\"spec\": {\"suspend\" : true }}' +``` + +Using `flux`: + +```sh +flux suspend source bucket +``` + +**Note:** When a Bucket has an Artifact and is suspended, and this Artifact +later disappears from the storage due to e.g. the source-controller Pod being +evicted from a Node, this will not be reflected in the Bucket's Status until it +is resumed. + +#### Resume a Bucket + +In your YAML declaration, comment out (or remove) the field: + +```yaml +--- +apiVersion: source.toolkit.fluxcd.io/v1 +kind: Bucket +metadata: + name: +spec: + # suspend: true +``` + +**Note:** Setting the field value to `false` has the same effect as removing +it, but does not allow for "hot patching" using e.g. `kubectl` while practicing +GitOps; as the manually applied patch would be overwritten by the declared +state in Git. + +Using `kubectl`: + +```sh +kubectl patch bucket --field-manager=flux-client-side-apply -p '{\"spec\" : {\"suspend\" : false }}' +``` + +Using `flux`: + +```sh +flux resume source bucket +``` + +### Debugging a Bucket + +There are several ways to gather information about a Bucket for debugging +purposes. + +#### Describe the Bucket + +Describing a Bucket using `kubectl describe bucket ` displays the +latest recorded information for the resource in the `Status` and `Events` +sections: + +```console +... +Status: +... + Conditions: + Last Transition Time: 2024-02-02T13:26:55Z + Message: processing object: new generation 1 -> 2 + Observed Generation: 2 + Reason: ProgressingWithRetry + Status: True + Type: Reconciling + Last Transition Time: 2024-02-02T13:26:55Z + Message: bucket 'my-new-bucket' does not exist + Observed Generation: 2 + Reason: BucketOperationFailed + Status: False + Type: Ready + Last Transition Time: 2024-02-02T13:26:55Z + Message: bucket 'my-new-bucket' does not exist + Observed Generation: 2 + Reason: BucketOperationFailed + Status: True + Type: FetchFailed + Observed Generation: 1 + URL: http://source-controller.source-system.svc.cluster.local./bucket/default/minio-bucket/latest.tar.gz +Events: + Type Reason Age From Message + ---- ------ ---- ---- ------- + Warning BucketOperationFailed 37s (x11 over 42s) source-controller bucket 'my-new-bucket' does not exist +``` + +#### Trace emitted Events + +To view events for specific Bucket(s), `kubectl events` can be used in +combination with `--for` to list the Events for specific objects. For example, +running + +```sh +kubectl events --for Bucket/ +``` + +lists + +```console +LAST SEEN TYPE REASON OBJECT MESSAGE +2m30s Normal NewArtifact bucket/ fetched 16 files with revision from 'my-new-bucket' +36s Normal ArtifactUpToDate bucket/ artifact up-to-date with remote revision: 'sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855' +18s Warning BucketOperationFailed bucket/ bucket 'my-new-bucket' does not exist +``` + +Besides being reported in Events, the reconciliation errors are also logged by +the controller. The Flux CLI offer commands for filtering the logs for a +specific Bucket, e.g. `flux logs --level=error --kind=Bucket --name=`. + +## Bucket Status + +### Artifact + +The Bucket reports the latest synchronized state from the object storage +bucket as an Artifact object in the `.status.artifact` of the resource. + +The Artifact file is a gzip compressed TAR archive +(`.tar.gz`), and can be retrieved in-cluster from the +`.status.artifact.url` HTTP address. + +#### Artifact example + +```yaml +--- +apiVersion: source.toolkit.fluxcd.io/v1 +kind: Bucket +metadata: + name: +status: + artifact: + digest: sha256:cbec34947cc2f36dee8adcdd12ee62ca6a8a36699fc6e56f6220385ad5bd421a + lastUpdateTime: "2024-01-28T10:30:30Z" + path: bucket///c3ab8ff13720e8ad9047dd39466b3c8974e592c2fa383d4a3960714caef0c4f2.tar.gz + revision: sha256:c3ab8ff13720e8ad9047dd39466b3c8974e592c2fa383d4a3960714caef0c4f2 + size: 38099 + url: http://source-controller..svc.cluster.local./bucket///c3ab8ff13720e8ad9047dd39466b3c8974e592c2fa383d4a3960714caef0c4f2.tar.gz +``` + +#### Default exclusions + +The following files and extensions are excluded from the Artifact by +default: + +- Git files (`.git/, .gitignore, .gitmodules, .gitattributes`) +- File extensions (`.jpg, .jpeg, .gif, .png, .wmv, .flv, .tar.gz, .zip`) +- CI configs (`.github/, .circleci/, .travis.yml, .gitlab-ci.yml, appveyor.yml, .drone.yml, cloudbuild.yaml, codeship-services.yml, codeship-steps.yml`) +- CLI configs (`.goreleaser.yml, .sops.yaml`) +- Flux v1 config (`.flux.yaml`) + +To define your own exclusion rules, see [excluding files](#excluding-files). + +### Conditions + +A Bucket enters various states during its lifecycle, reflected as +[Kubernetes Conditions][typical-status-properties]. +It can be [reconciling](#reconciling-bucket) while fetching storage objects, +it can be [ready](#ready-bucket), or it can [fail during +reconciliation](#failed-bucket). + +The Bucket API is compatible with the [kstatus specification][kstatus-spec], +and reports `Reconciling` and `Stalled` conditions where applicable to +provide better (timeout) support to solutions polling the Bucket to become +`Ready`. + +#### Reconciling Bucket + +The source-controller marks a Bucket as _reconciling_ when one of the following +is true: + +- There is no current Artifact for the Bucket, or the reported Artifact is + determined to have disappeared from the storage. +- The generation of the Bucket is newer than the [Observed Generation](#observed-generation). +- The newly calculated Artifact revision differs from the current Artifact. + +When the Bucket is "reconciling", the `Ready` Condition status becomes +`Unknown` when the controller detects drift, and the controller adds a Condition +with the following attributes to the Bucket's `.status.conditions`: + +- `type: Reconciling` +- `status: "True"` +- `reason: Progressing` | `reason: ProgressingWithRetry` + +If the reconciling state is due to a new revision, an additional Condition is +added with the following attributes: + +- `type: ArtifactOutdated` +- `status: "True"` +- `reason: NewRevision` + +Both Conditions have a ["negative polarity"][typical-status-properties], +and are only present on the Bucket while their status value is `"True"`. + +#### Ready Bucket + +The source-controller marks a Bucket as _ready_ when it has the following +characteristics: + +- The Bucket reports an [Artifact](#artifact). +- The reported Artifact exists in the controller's Artifact storage. +- The Bucket was able to communicate with the Bucket's object storage endpoint + using the current spec. +- The revision of the reported Artifact is up-to-date with the latest + calculated revision of the object storage bucket. + +When the Bucket is "ready", the controller sets a Condition with the following +attributes in the Bucket's `.status.conditions`: + +- `type: Ready` +- `status: "True"` +- `reason: Succeeded` + +This `Ready` Condition will retain a status value of `"True"` until the Bucket +is marked as [reconciling](#reconciling-bucket), or e.g. a +[transient error](#failed-bucket) occurs due to a temporary network issue. + +When the Bucket Artifact is archived in the controller's Artifact +storage, the controller sets a Condition with the following attributes in the +Bucket's `.status.conditions`: + +- `type: ArtifactInStorage` +- `status: "True"` +- `reason: Succeeded` + +This `ArtifactInStorage` Condition will retain a status value of `"True"` until +the Artifact in the storage no longer exists. + +#### Failed Bucket + +The source-controller may get stuck trying to produce an Artifact for a Bucket +without completing. This can occur due to some of the following factors: + +- The object storage [Endpoint](#endpoint) is temporarily unavailable. +- The specified object storage bucket does not exist. +- The [Secret reference](#secret-reference) contains a reference to a + non-existing Secret. +- The credentials in the referenced Secret are invalid. +- The Bucket spec contains a generic misconfiguration. +- A storage related failure when storing the artifact. + +When this happens, the controller sets the `Ready` Condition status to `False`, +and adds a Condition with the following attributes to the Bucket's +`.status.conditions`: + +- `type: FetchFailed` | `type: StorageOperationFailed` +- `status: "True"` +- `reason: AuthenticationFailed` | `reason: BucketOperationFailed` + +This condition has a ["negative polarity"][typical-status-properties], +and is only present on the Bucket while the status value is `"True"`. +There may be more arbitrary values for the `reason` field to provide accurate +reason for a condition. + +While the Bucket has this Condition, the controller will continue to attempt +to produce an Artifact for the resource with an exponential backoff, until +it succeeds and the Bucket is marked as [ready](#ready-bucket). + +Note that a Bucket can be [reconciling](#reconciling-bucket) while failing at +the same time, for example due to a newly introduced configuration issue in the +Bucket spec. When a reconciliation fails, the `Reconciling` Condition reason +would be `ProgressingWithRetry`. When the reconciliation is performed again +after the failure, the reason is updated to `Progressing`. + +### Observed Ignore + +The source-controller reports an observed ignore in the Bucket's +`.status.observedIgnore`. The observed ignore is the latest `.spec.ignore` value +which resulted in a [ready state](#ready-bucket), or stalled due to error +it can not recover from without human intervention. The value is the same as the +[ignore in spec](#ignore). It indicates the ignore rules used in building the +current artifact in storage. + +Example: +```yaml +status: + ... + observedIgnore: | + hpa.yaml + build + ... +``` + +### Observed Generation + +The source-controller reports an +[observed generation][typical-status-properties] +in the Bucket's `.status.observedGeneration`. The observed generation is the +latest `.metadata.generation` which resulted in either a [ready state](#ready-bucket), +or stalled due to error it can not recover from without human +intervention. + +### Last Handled Reconcile At + +The source-controller reports the last `reconcile.fluxcd.io/requestedAt` +annotation value it acted on in the `.status.lastHandledReconcileAt` field. + +For practical information about this field, see [triggering a +reconcile](#triggering-a-reconcile). + +[typical-status-properties]: https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#typical-status-properties +[kstatus-spec]: https://github.com/kubernetes-sigs/cli-utils/tree/master/pkg/kstatus