source-controller/docs/spec/v1beta2/buckets.md

1273 lines
40 KiB
Markdown

# Buckets
<!-- menuweight:30 -->
The `Bucket` API defines a Source to produce an Artifact for objects from storage
solutions like Amazon S3, Google Cloud Storage buckets, or any other solution
with a S3 compatible API such as Minio, Alibaba Cloud OSS and others.
## Example
The following is an example of a Bucket. It creates a tarball (`.tar.gz`)
Artifact with the fetched objects from an object storage with an S3
compatible API (e.g. [Minio](https://min.io)):
```yaml
---
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: Bucket
metadata:
name: minio-bucket
namespace: default
spec:
interval: 5m0s
endpoint: minio.example.com
insecure: true
secretRef:
name: minio-bucket-secret
bucketName: example
---
apiVersion: v1
kind: Secret
metadata:
name: minio-bucket-secret
namespace: default
type: Opaque
stringData:
accesskey: <access key>
secretkey: <secret key>
```
In the above example:
- A Bucket named `minio-bucket` is created, indicated by the
`.metadata.name` field.
- The source-controller checks the object storage bucket every five minutes,
indicated by the `.spec.interval` field.
- It authenticates to the `minio.example.com` endpoint with
the static credentials from the `minio-secret` Secret data, indicated by
the `.spec.endpoint` and `.spec.secretRef.name` fields.
- A list of object keys and their [etags](https://en.wikipedia.org/wiki/HTTP_ETag)
in the `.spec.bucketName` bucket is compiled, while filtering the keys using
[default ignore rules](#default-exclusions).
- The digest (algorithm defaults to SHA256) of the list is used as Artifact
revision, reported in-cluster in the `.status.artifact.revision` field.
- When the current Bucket revision differs from the latest calculated revision,
all objects are fetched and archived.
- The new Artifact is reported in the `.status.artifact` field.
You can run this example by saving the manifest into `bucket.yaml`, and
changing the Bucket and Secret values to target a Minio instance you have
control over.
**Note:** For more advanced examples targeting e.g. Amazon S3 or GCP, see
[Provider](#provider).
1. Apply the resource on the cluster:
```sh
kubectl apply -f bucket.yaml
```
2. Run `kubectl get buckets` to see the Bucket:
```console
NAME ENDPOINT AGE READY STATUS
minio-bucket minio.example.com 34s True stored artifact for revision 'sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855'
```
3. Run `kubectl describe bucket minio-bucket` to see the [Artifact](#artifact)
and [Conditions](#conditions) in the Bucket's Status:
```console
...
Status:
Artifact:
Digest: sha256:72aa638abb455ca5f9ef4825b949fd2de4d4be0a74895bf7ed2338622cd12686
Last Update Time: 2022-02-01T23:43:38Z
Path: bucket/default/minio-bucket/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855.tar.gz
Revision: sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
Size: 38099
URL: http://source-controller.source-system.svc.cluster.local./bucket/default/minio-bucket/e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855.tar.gz
Conditions:
Last Transition Time: 2022-02-01T23:43:38Z
Message: stored artifact for revision 'sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855'
Observed Generation: 1
Reason: Succeeded
Status: True
Type: Ready
Last Transition Time: 2022-02-01T23:43:38Z
Message: stored artifact for revision 'sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855'
Observed Generation: 1
Reason: Succeeded
Status: True
Type: ArtifactInStorage
Observed Generation: 1
URL: http://source-controller.source-system.svc.cluster.local./bucket/default/minio-bucket/latest.tar.gz
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal NewArtifact 82s source-controller stored artifact with 16 fetched files from 'example' bucket
```
## Writing a Bucket spec
As with all other Kubernetes config, a Bucket needs `apiVersion`, `kind`, and
`metadata` fields. The name of a Bucket object must be a valid
[DNS subdomain name](https://kubernetes.io/docs/concepts/overview/working-with-objects/names#dns-subdomain-names).
A Bucket also needs a
[`.spec` section](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#spec-and-status).
### Provider
The `.spec.provider` field allows for specifying a Provider to enable provider
specific configurations, for example to communicate with a non-S3 compatible
API endpoint, or to change the authentication method.
Supported options are:
- [Generic](#generic)
- [AWS](#aws)
- [Azure](#azure)
- [GCP](#gcp)
If you do not specify `.spec.provider`, it defaults to `generic`.
#### Generic
When a Bucket's `spec.provider` is set to `generic`, the controller will
attempt to communicate with the specified [Endpoint](#endpoint) using the
[Minio Client SDK](https://github.com/minio/minio-go), which can communicate
with any Amazon S3 compatible object storage (including
[GCS](https://cloud.google.com/storage/docs/interoperability),
[Wasabi](https://wasabi-support.zendesk.com/hc/en-us/articles/360002079671-How-do-I-use-Minio-Client-with-Wasabi-),
and many others).
The `generic` Provider _requires_ a [Secret reference](#secret-reference) to a
Secret with `.data.accesskey` and `.data.secretkey` values, used to
authenticate with static credentials.
The Provider allows for specifying a region the bucket is in using the
[`.spec.region` field](#region), if required by the [Endpoint](#endpoint).
##### Generic example
```yaml
---
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: Bucket
metadata:
name: generic-insecure
namespace: default
spec:
provider: generic
interval: 5m0s
bucketName: podinfo
endpoint: minio.minio.svc.cluster.local:9000
timeout: 60s
insecure: true
secretRef:
name: minio-credentials
---
apiVersion: v1
kind: Secret
metadata:
name: minio-credentials
namespace: default
type: Opaque
data:
accesskey: <BASE64>
secretkey: <BASE64>
```
#### AWS
When a Bucket's `.spec.provider` field is set to `aws`, the source-controller
will attempt to communicate with the specified [Endpoint](#endpoint) using the
[Minio Client SDK](https://github.com/minio/minio-go).
Without a [Secret reference](#secret-reference), authorization using
credentials retrieved from the AWS EC2 service is attempted by default. When
a reference is specified, it expects a Secret with `.data.accesskey` and
`.data.secretkey` values, used to authenticate with static credentials.
The Provider allows for specifying the
[Amazon AWS Region](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html#concepts-available-regions)
using the [`.spec.region` field](#region).
##### AWS EC2 example
**Note:** On EKS you have to create an [IAM role](#aws-iam-role-example) for
the source-controller service account that grants access to the bucket.
```yaml
---
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: Bucket
metadata:
name: aws
namespace: default
spec:
interval: 5m0s
provider: aws
bucketName: podinfo
endpoint: s3.amazonaws.com
region: us-east-1
timeout: 30s
```
##### AWS IAM role example
Replace `<bucket-name>` with the specified `.spec.bucketName`.
```json
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::<bucket-name>/*"
},
{
"Sid": "",
"Effect": "Allow",
"Action": "s3:ListBucket",
"Resource": "arn:aws:s3:::<bucket-name>"
}
]
}
```
##### AWS static auth example
```yaml
---
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: Bucket
metadata:
name: aws
namespace: default
spec:
interval: 5m0s
provider: aws
bucketName: podinfo
endpoint: s3.amazonaws.com
region: us-east-1
secretRef:
name: aws-credentials
---
apiVersion: v1
kind: Secret
metadata:
name: aws-credentials
namespace: default
type: Opaque
data:
accesskey: <BASE64>
secretkey: <BASE64>
```
#### Azure
When a Bucket's `.spec.provider` is set to `azure`, the source-controller will
attempt to communicate with the specified [Endpoint](#endpoint) using the
[Azure Blob Storage SDK for Go](https://github.com/Azure/azure-sdk-for-go/tree/main/sdk/storage/azblob).
Without a [Secret reference](#secret-reference), authentication using a chain
with:
- [Environment credentials](https://pkg.go.dev/github.com/Azure/azure-sdk-for-go/sdk/azidentity#EnvironmentCredential)
- [Workload Identity](https://pkg.go.dev/github.com/Azure/azure-sdk-for-go/sdk/azidentity@v1.3.0-beta.4#WorkloadIdentityCredential)
- [Managed Identity](https://pkg.go.dev/github.com/Azure/azure-sdk-for-go/sdk/azidentity#ManagedIdentityCredential)
with the `AZURE_CLIENT_ID`
- Managed Identity with a system-assigned identity
is attempted by default. If no chain can be established, the bucket
is assumed to be publicly reachable.
When a reference is specified, it expects a Secret with one of the following
sets of `.data` fields:
- `tenantId`, `clientId` and `clientSecret` for authenticating a Service
Principal with a secret.
- `tenantId`, `clientId` and `clientCertificate` (plus optionally
`clientCertificatePassword` and/or `clientCertificateSendChain`) for
authenticating a Service Principal with a certificate.
- `clientId` for authenticating using a Managed Identity.
- `accountKey` for authenticating using a
[Shared Key](https://pkg.go.dev/github.com/Azure/azure-sdk-for-go/sdk/storage/azblob#SharedKeyCredential).
- `sasKey` for authenticating using a [SAS Token](https://docs.microsoft.com/en-us/azure/storage/common/storage-sas-overview)
For any Managed Identity and/or Azure Active Directory authentication method,
the base URL can be configured using `.data.authorityHost`. If not supplied,
[`AzurePublicCloud` is assumed](https://pkg.go.dev/github.com/Azure/azure-sdk-for-go/sdk/azidentity#AuthorityHost).
##### Azure example
```yaml
---
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: Bucket
metadata:
name: azure-public
namespace: default
spec:
interval: 5m0s
provider: azure
bucketName: podinfo
endpoint: https://podinfoaccount.blob.core.windows.net
timeout: 30s
```
##### Azure Service Principal Secret example
```yaml
---
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: Bucket
metadata:
name: azure-service-principal-secret
namespace: default
spec:
interval: 5m0s
provider: azure
bucketName: <bucket-name>
endpoint: https://<account-name>.blob.core.windows.net
secretRef:
name: azure-sp-auth
---
apiVersion: v1
kind: Secret
metadata:
name: azure-sp-auth
namespace: default
type: Opaque
data:
tenantId: <BASE64>
clientId: <BASE64>
clientSecret: <BASE64>
```
##### Azure Service Principal Certificate example
```yaml
---
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: Bucket
metadata:
name: azure-service-principal-cert
namespace: default
spec:
interval: 5m0s
provider: azure
bucketName: <bucket-name>
endpoint: https://<account-name>.blob.core.windows.net
secretRef:
name: azure-sp-auth
---
apiVersion: v1
kind: Secret
metadata:
name: azure-sp-auth
namespace: default
type: Opaque
data:
tenantId: <BASE64>
clientId: <BASE64>
clientCertificate: <BASE64>
# Plus optionally
clientCertificatePassword: <BASE64>
clientCertificateSendChain: <BASE64> # either "1" or "true"
```
##### Azure Managed Identity with Client ID example
```yaml
---
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: Bucket
metadata:
name: azure-managed-identity
namespace: default
spec:
interval: 5m0s
provider: azure
bucketName: <bucket-name>
endpoint: https://<account-name>.blob.core.windows.net
secretRef:
name: azure-smi-auth
---
apiVersion: v1
kind: Secret
metadata:
name: azure-smi-auth
namespace: default
type: Opaque
data:
clientId: <BASE64>
```
##### Azure Blob Shared Key example
```yaml
---
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: Bucket
metadata:
name: azure-shared-key
namespace: default
spec:
interval: 5m0s
provider: azure
bucketName: <bucket-name>
endpoint: https://<account-name>.blob.core.windows.net
secretRef:
name: azure-key
---
apiVersion: v1
kind: Secret
metadata:
name: azure-key
namespace: default
type: Opaque
data:
accountKey: <BASE64>
```
##### Workload Identity
If you have [Workload Identity](https://azure.github.io/azure-workload-identity/docs/installation/managed-clusters.html)
set up on your cluster, you need to create an Azure Identity and give it
access to Azure Blob Storage.
```shell
export IDENTITY_NAME="blob-access"
az role assignment create --role "Storage Blob Data Reader" \
--assignee-object-id "$(az identity show -n $IDENTITY_NAME -o tsv --query principalId -g $RESOURCE_GROUP)" \
--scope "/subscriptions/<SUBSCRIPTION-ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.Storage/storageAccounts/<account-name>/blobServices/default/containers/<container-name>"
```
Establish a federated identity between the Identity and the source-controller
ServiceAccount.
```shell
export SERVICE_ACCOUNT_ISSUER="$(az aks show --resource-group <RESOURCE_GROUP> --name <CLUSTER-NAME> --query "oidcIssuerProfile.issuerUrl" -otsv)"
az identity federated-credential create \
--name "kubernetes-federated-credential" \
--identity-name "${IDENTITY_NAME}" \
--resource-group "${RESOURCE_GROUP}" \
--issuer "${SERVICE_ACCOUNT_ISSUER}" \
--subject "system:serviceaccount:flux-system:source-controller"
```
Add a patch to label and annotate the source-controller Deployment and ServiceAccount
correctly so that it can match an identity binding:
```yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- gotk-components.yaml
- gotk-sync.yaml
patches:
- patch: |-
apiVersion: v1
kind: ServiceAccount
metadata:
name: source-controller
namespace: flux-system
annotations:
azure.workload.identity/client-id: <AZURE_CLIENT_ID>
labels:
azure.workload.identity/use: "true"
- patch: |-
apiVersion: apps/v1
kind: Deployment
metadata:
name: source-controller
namespace: flux-system
labels:
azure.workload.identity/use: "true"
spec:
template:
metadata:
labels:
azure.workload.identity/use: "true"
```
If you have set up Workload Identity correctly and labeled the source-controller
Deployment and ServiceAccount, then you don't need to reference a Secret. For more information,
please see [documentation](https://azure.github.io/azure-workload-identity/docs/quick-start.html).
```yaml
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: Bucket
metadata:
name: azure-bucket
namespace: flux-system
spec:
interval: 5m0s
provider: azure
bucketName: testsas
endpoint: https://testfluxsas.blob.core.windows.net
```
##### Deprecated: Managed Identity with AAD Pod Identity
If you are using [aad pod identity](https://azure.github.io/aad-pod-identity/docs),
You need to create an Azure Identity and give it access to Azure Blob Storage.
```sh
export IDENTITY_NAME="blob-access"
az role assignment create --role "Storage Blob Data Reader" \
--assignee-object-id "$(az identity show -n $IDENTITY_NAME -o tsv --query principalId -g $RESOURCE_GROUP)" \
--scope "/subscriptions/<SUBSCRIPTION-ID>/resourceGroups/$RESOURCE_GROUP/providers/Microsoft.Storage/storageAccounts/<account-name>/blobServices/default/containers/<container-name>"
export IDENTITY_CLIENT_ID="$(az identity show -n ${IDENTITY_NAME} -g ${RESOURCE_GROUP} -otsv --query clientId)"
export IDENTITY_RESOURCE_ID="$(az identity show -n ${IDENTITY_NAME} -otsv --query id)"
```
Create an AzureIdentity object that references the identity created above:
```yaml
---
apiVersion: aadpodidentity.k8s.io/v1
kind: AzureIdentity
metadata:
name: # source-controller label will match this name
namespace: flux-system
spec:
clientID: <IDENTITY_CLIENT_ID>
resourceID: <IDENTITY_RESOURCE_ID>
type: 0 # user-managed identity
```
Create an AzureIdentityBinding object that binds Pods with a specific selector
with the AzureIdentity created:
```yaml
apiVersion: "aadpodidentity.k8s.io/v1"
kind: AzureIdentityBinding
metadata:
name: ${IDENTITY_NAME}-binding
spec:
azureIdentity: ${IDENTITY_NAME}
selector: ${IDENTITY_NAME}
```
Label the source-controller Deployment correctly so that it can match an identity binding:
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: kustomize-controller
namespace: flux-system
spec:
template:
metadata:
labels:
aadpodidbinding: ${IDENTITY_NAME} # match the AzureIdentity name
```
If you have set up aad-pod-identity correctly and labeled the source-controller
Deployment, then you don't need to reference a Secret.
```yaml
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: Bucket
metadata:
name: azure-bucket
namespace: flux-system
spec:
interval: 5m0s
provider: azure
bucketName: testsas
endpoint: https://testfluxsas.blob.core.windows.net
```
##### Azure Blob SAS Token example
```yaml
---
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: Bucket
metadata:
name: azure-sas-token
namespace: default
spec:
interval: 5m0s
provider: azure
bucketName: <bucket-name>
endpoint: https://<account-name>.blob.core.windows.net
secretRef:
name: azure-key
---
apiVersion: v1
kind: Secret
metadata:
name: azure-key
namespace: default
type: Opaque
data:
sasKey: <base64>
```
The `sasKey` only contains the SAS token e.g
`?sv=2020-08-0&ss=bfqt&srt=co&sp=rwdlacupitfx&se=2022-05-26T21:55:35Z&st=2022-05...`.
The leading question mark (`?`) is optional. The query values from the `sasKey`
data field in the Secrets gets merged with the ones in the `.spec.endpoint` of
the Bucket. If the same key is present in the both of them, the value in the
`sasKey` takes precedence.
**Note:** The SAS token has an expiry date, and it must be updated before it
expires to allow Flux to continue to access Azure Storage. It is allowed to use
an account-level or container-level SAS token.
The minimum permissions for an account-level SAS token are:
- Allowed services: `Blob`
- Allowed resource types: `Container`, `Object`
- Allowed permissions: `Read`, `List`
The minimum permissions for a container-level SAS token are:
- Allowed permissions: `Read`, `List`
Refer to the [Azure documentation](https://learn.microsoft.com/en-us/rest/api/storageservices/create-account-sas#blob-service) for a full overview on permissions.
#### GCP
When a Bucket's `.spec.provider` is set to `gcp`, the source-controller will
attempt to communicate with the specified [Endpoint](#endpoint) using the
[Google Client SDK](https://github.com/googleapis/google-api-go-client).
Without a [Secret reference](#secret-reference), authorization using a
workload identity is attempted by default. The workload identity is obtained
using the `GOOGLE_APPLICATION_CREDENTIALS` environment variable, falling back
to the Google Application Credential file in the config directory.
When a reference is specified, it expects a Secret with a `.data.serviceaccount`
value with a GCP service account JSON file.
The Provider allows for specifying the
[Bucket location](https://cloud.google.com/storage/docs/locations) using the
[`.spec.region` field](#region).
##### GCP example
```yaml
---
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: Bucket
metadata:
name: gcp-workload-identity
namespace: default
spec:
interval: 5m0s
provider: gcp
bucketName: podinfo
endpoint: storage.googleapis.com
region: us-east-1
timeout: 30s
```
##### GCP static auth example
```yaml
---
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: Bucket
metadata:
name: gcp-secret
namespace: default
spec:
interval: 5m0s
provider: gcp
bucketName: <bucket-name>
endpoint: storage.googleapis.com
region: <bucket-region>
secretRef:
name: gcp-service-account
---
apiVersion: v1
kind: Secret
metadata:
name: gcp-service-account
namespace: default
type: Opaque
data:
serviceaccount: <BASE64>
```
Where the (base64 decoded) value of `.data.serviceaccount` looks like this:
```json
{
"type": "service_account",
"project_id": "example",
"private_key_id": "28qwgh3gdf5hj3gb5fj3gsu5yfgh34f45324568hy2",
"private_key": "-----BEGIN PRIVATE KEY-----\nHwethgy123hugghhhbdcu6356dgyjhsvgvGFDHYgcdjbvcdhbsx63c\n76tgycfehuhVGTFYfw6t7ydgyVgydheyhuggycuhejwy6t35fthyuhegvcetf\nTFUHGTygghubhxe65ygt6tgyedgy326hucyvsuhbhcvcsjhcsjhcsvgdtHFCGi\nHcye6tyyg3gfyuhchcsbhygcijdbhyyTF66tuhcevuhdcbhuhhvftcuhbh3uh7t6y\nggvftUHbh6t5rfthhuGVRtfjhbfcrd5r67yuhuvgFTYjgvtfyghbfcdrhyjhbfctfdfyhvfg\ntgvggtfyghvft6tugvTF5r66tujhgvfrtyhhgfct6y7ytfr5ctvghbhhvtghhjvcttfycf\nffxfghjbvgcgyt67ujbgvctfyhVC7uhvgcyjvhhjvyujc\ncgghgvgcfhgg765454tcfthhgftyhhvvyvvffgfryyu77reredswfthhgfcftycfdrttfhf/\n-----END PRIVATE KEY-----\n",
"client_email": "test@example.iam.gserviceaccount.com",
"client_id": "32657634678762536746",
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
"token_uri": "https://oauth2.googleapis.com/token",
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
"client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/test%40podinfo.iam.gserviceaccount.com"
}
```
### Interval
`.spec.interval` is a required field that specifies the interval which the
object storage bucket must be consulted at.
After successfully reconciling a Bucket object, the source-controller requeues
the object for inspection after the specified interval. The value must be in a
[Go recognized duration string format](https://pkg.go.dev/time#ParseDuration),
e.g. `10m0s` to look at the object storage bucket every 10 minutes.
If the `.metadata.generation` of a resource changes (due to e.g. the apply of a
change to the spec), this is handled instantly outside the interval window.
**Note:** The controller can be configured to apply a jitter to the interval in
order to distribute the load more evenly when multiple Bucket objects are set up
with the same interval. For more information, please refer to the
[source-controller configuration options](https://fluxcd.io/flux/components/source/options/).
### Endpoint
`.spec.endpoint` is a required field that specifies the HTTP/S object storage
endpoint to connect to and fetch objects from. Connecting to an (insecure)
HTTP endpoint requires enabling [`.spec.insecure`](#insecure).
Some endpoints require the specification of a [`.spec.region`](#region),
see [Provider](#provider) for more (provider specific) examples.
### Bucket name
`.spec.bucketName` is a required field that specifies which object storage
bucket on the [Endpoint](#endpoint) objects should be fetched from.
See [Provider](#provider) for more (provider specific) examples.
### Region
`.spec.region` is an optional field to specify the region a
[`.spec.bucketName`](#bucket-name) is located in.
See [Provider](#provider) for more (provider specific) examples.
### Cert secret reference
`.spec.certSecretRef.name` is an optional field to specify a secret containing
TLS certificate data. The secret can contain the following keys:
* `tls.crt` and `tls.key`, to specify the client certificate and private key used
for TLS client authentication. These must be used in conjunction, i.e.
specifying one without the other will lead to an error.
* `ca.crt`, to specify the CA certificate used to verify the server, which is
required if the server is using a self-signed certificate.
If the server is using a self-signed certificate and has TLS client
authentication enabled, all three values are required.
The Secret should be of type `Opaque` or `kubernetes.io/tls`. All the files in
the Secret are expected to be [PEM-encoded][pem-encoding]. Assuming you have
three files; `client.key`, `client.crt` and `ca.crt` for the client private key,
client certificate and the CA certificate respectively, you can generate the
required Secret using the `flux create secret tls` command:
```sh
flux create secret tls minio-tls --tls-key-file=client.key --tls-crt-file=client.crt --ca-crt-file=ca.crt
```
If TLS client authentication is not required, you can generate the secret with:
```sh
flux create secret tls minio-tls --ca-crt-file=ca.crt
```
This API is only supported for the `generic` [provider](#provider).
Example usage:
```yaml
---
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: Bucket
metadata:
name: example
namespace: example
spec:
interval: 5m
bucketName: example
provider: generic
endpoint: minio.example.com
certSecretRef:
name: minio-tls
---
apiVersion: v1
kind: Secret
metadata:
name: minio-tls
namespace: example
type: kubernetes.io/tls # or Opaque
stringData:
tls.crt: <PEM-encoded cert>
tls.key: <PEM-encoded key>
ca.crt: <PEM-encoded cert>
```
### Insecure
`.spec.insecure` is an optional field to allow connecting to an insecure (HTTP)
[endpoint](#endpoint), if set to `true`. The default value is `false`,
denying insecure (HTTP) connections.
### Timeout
`.spec.timeout` is an optional field to specify a timeout for object storage
fetch operations. The value must be in a
[Go recognized duration string format](https://pkg.go.dev/time#ParseDuration),
e.g. `1m30s` for a timeout of one minute and thirty seconds.
The default value is `60s`.
### Secret reference
`.spec.secretRef.name` is an optional field to specify a name reference to a
Secret in the same namespace as the Bucket, containing authentication
credentials for the object storage. For some `.spec.provider` implementations
the presence of the field is required, see [Provider](#provider) for more
details and examples.
### Prefix
`.spec.prefix` is an optional field to enable server-side filtering
of files in the Bucket.
**Note:** The server-side filtering works only with the `generic`, `aws`
and `gcp` [provider](#provider) and is preferred over [`.spec.ignore`](#ignore)
as a more efficient way of excluding files.
### Ignore
`.spec.ignore` is an optional field to specify rules in [the `.gitignore`
pattern format](https://git-scm.com/docs/gitignore#_pattern_format). Storage
objects which keys match the defined rules are excluded while fetching.
When specified, `.spec.ignore` overrides the [default exclusion
list](#default-exclusions), and may overrule the [`.sourceignore` file
exclusions](#sourceignore-file). See [excluding files](#excluding-files)
for more information.
### Suspend
`.spec.suspend` is an optional field to suspend the reconciliation of a Bucket.
When set to `true`, the controller will stop reconciling the Bucket, and changes
to the resource or in the object storage bucket will not result in a new
Artifact. When the field is set to `false` or removed, it will resume.
For practical information, see
[suspending and resuming](#suspending-and-resuming).
## Working with Buckets
### Excluding files
By default, storage bucket objects which match the [default exclusion
rules](#default-exclusions) are excluded while fetching. It is possible to
overwrite and/or overrule the default exclusions using a file in the bucket
and/or an in-spec set of rules.
#### `.sourceignore` file
Excluding files is possible by adding a `.sourceignore` file in the root of the
object storage bucket. The `.sourceignore` file follows [the `.gitignore`
pattern format](https://git-scm.com/docs/gitignore#_pattern_format), and
pattern entries may overrule [default exclusions](#default-exclusions).
#### Ignore spec
Another option is to define the exclusions within the Bucket spec, using the
[`.spec.ignore` field](#ignore). Specified rules override the
[default exclusion list](#default-exclusions), and may overrule `.sourceignore`
file exclusions.
```yaml
---
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: Bucket
metadata:
name: <bucket-name>
spec:
ignore: |
# exclude all
/*
# include deploy dir
!/deploy
# exclude file extensions from deploy dir
/deploy/**/*.md
/deploy/**/*.txt
```
### Triggering a reconcile
To manually tell the source-controller to reconcile a Bucket outside the
[specified interval window](#interval), a Bucket can be annotated with
`reconcile.fluxcd.io/requestedAt: <arbitrary value>`. Annotating the resource
queues the Bucket for reconciliation if the `<arbitrary-value>` differs from
the last value the controller acted on, as reported in
[`.status.lastHandledReconcileAt`](#last-handled-reconcile-at).
Using `kubectl`:
```sh
kubectl annotate --field-manager=flux-client-side-apply --overwrite bucket/<bucket-name> reconcile.fluxcd.io/requestedAt="$(date +%s)"
```
Using `flux`:
```sh
flux reconcile source bucket <bucket-name>
```
### Waiting for `Ready`
When a change is applied, it is possible to wait for the Bucket to reach a
[ready state](#ready-bucket) using `kubectl`:
```sh
kubectl wait bucket/<bucket-name> --for=condition=ready --timeout=1m
```
### Suspending and resuming
When you find yourself in a situation where you temporarily want to pause the
reconciliation of a Bucket, you can suspend it using the [`.spec.suspend`
field](#suspend).
#### Suspend a Bucket
In your YAML declaration:
```yaml
---
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: Bucket
metadata:
name: <bucket-name>
spec:
suspend: true
```
Using `kubectl`:
```sh
kubectl patch bucket <bucket-name> --field-manager=flux-client-side-apply -p '{\"spec\": {\"suspend\" : true }}'
```
Using `flux`:
```sh
flux suspend source bucket <bucket-name>
```
**Note:** When a Bucket has an Artifact and is suspended, and this Artifact
later disappears from the storage due to e.g. the source-controller Pod being
evicted from a Node, this will not be reflected in the Bucket's Status until it
is resumed.
#### Resume a Bucket
In your YAML declaration, comment out (or remove) the field:
```yaml
---
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: Bucket
metadata:
name: <bucket-name>
spec:
# suspend: true
```
**Note:** Setting the field value to `false` has the same effect as removing
it, but does not allow for "hot patching" using e.g. `kubectl` while practicing
GitOps; as the manually applied patch would be overwritten by the declared
state in Git.
Using `kubectl`:
```sh
kubectl patch bucket <bucket-name> --field-manager=flux-client-side-apply -p '{\"spec\" : {\"suspend\" : false }}'
```
Using `flux`:
```sh
flux resume source bucket <bucket-name>
```
### Debugging a Bucket
There are several ways to gather information about a Bucket for debugging
purposes.
#### Describe the Bucket
Describing a Bucket using `kubectl describe bucket <bucket-name>` displays the
latest recorded information for the resource in the `Status` and `Events`
sections:
```console
...
Status:
...
Conditions:
Last Transition Time: 2022-02-02T13:26:55Z
Message: processing object: new generation 1 -> 2
Observed Generation: 2
Reason: ProgressingWithRetry
Status: True
Type: Reconciling
Last Transition Time: 2022-02-02T13:26:55Z
Message: bucket 'my-new-bucket' does not exist
Observed Generation: 2
Reason: BucketOperationFailed
Status: False
Type: Ready
Last Transition Time: 2022-02-02T13:26:55Z
Message: bucket 'my-new-bucket' does not exist
Observed Generation: 2
Reason: BucketOperationFailed
Status: True
Type: FetchFailed
Observed Generation: 1
URL: http://source-controller.source-system.svc.cluster.local./bucket/default/minio-bucket/latest.tar.gz
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning BucketOperationFailed 37s (x11 over 42s) source-controller bucket 'my-new-bucket' does not exist
```
#### Trace emitted Events
To view events for specific Bucket(s), `kubectl events` can be used in
combination with `--for` to list the Events for specific objects. For example,
running
```sh
kubectl events --for Bucket/<bucket-name>
```
lists
```console
LAST SEEN TYPE REASON OBJECT MESSAGE
2m30s Normal NewArtifact bucket/<bucket-name> fetched 16 files with revision from 'my-new-bucket'
36s Normal ArtifactUpToDate bucket/<bucket-name> artifact up-to-date with remote revision: 'sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855'
18s Warning BucketOperationFailed bucket/<bucket-name> bucket 'my-new-bucket' does not exist
```
Besides being reported in Events, the reconciliation errors are also logged by
the controller. The Flux CLI offer commands for filtering the logs for a
specific Bucket, e.g. `flux logs --level=error --kind=Bucket --name=<bucket-name>`.
## Bucket Status
### Artifact
The Bucket reports the latest synchronized state from the object storage
bucket as an Artifact object in the `.status.artifact` of the resource.
The Artifact file is a gzip compressed TAR archive
(`<calculated revision>.tar.gz`), and can be retrieved in-cluster from the
`.status.artifact.url` HTTP address.
#### Artifact example
```yaml
---
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: Bucket
metadata:
name: <bucket-name>
status:
artifact:
digest: sha256:cbec34947cc2f36dee8adcdd12ee62ca6a8a36699fc6e56f6220385ad5bd421a
lastUpdateTime: "2022-01-28T10:30:30Z"
path: bucket/<namespace>/<bucket-name>/c3ab8ff13720e8ad9047dd39466b3c8974e592c2fa383d4a3960714caef0c4f2.tar.gz
revision: sha256:c3ab8ff13720e8ad9047dd39466b3c8974e592c2fa383d4a3960714caef0c4f2
size: 38099
url: http://source-controller.<namespace>.svc.cluster.local./bucket/<namespace>/<bucket-name>/c3ab8ff13720e8ad9047dd39466b3c8974e592c2fa383d4a3960714caef0c4f2.tar.gz
```
#### Default exclusions
The following files and extensions are excluded from the Artifact by
default:
- Git files (`.git/, .gitignore, .gitmodules, .gitattributes`)
- File extensions (`.jpg, .jpeg, .gif, .png, .wmv, .flv, .tar.gz, .zip`)
- CI configs (`.github/, .circleci/, .travis.yml, .gitlab-ci.yml, appveyor.yml, .drone.yml, cloudbuild.yaml, codeship-services.yml, codeship-steps.yml`)
- CLI configs (`.goreleaser.yml, .sops.yaml`)
- Flux v1 config (`.flux.yaml`)
To define your own exclusion rules, see [excluding files](#excluding-files).
### Conditions
A Bucket enters various states during its lifecycle, reflected as
[Kubernetes Conditions][typical-status-properties].
It can be [reconciling](#reconciling-bucket) while fetching storage objects,
it can be [ready](#ready-bucket), or it can [fail during
reconciliation](#failed-bucket).
The Bucket API is compatible with the [kstatus specification][kstatus-spec],
and reports `Reconciling` and `Stalled` conditions where applicable to
provide better (timeout) support to solutions polling the Bucket to become
`Ready`.
#### Reconciling Bucket
The source-controller marks a Bucket as _reconciling_ when one of the following
is true:
- There is no current Artifact for the Bucket, or the reported Artifact is
determined to have disappeared from the storage.
- The generation of the Bucket is newer than the [Observed Generation](#observed-generation).
- The newly calculated Artifact revision differs from the current Artifact.
When the Bucket is "reconciling", the `Ready` Condition status becomes
`Unknown` when the controller detects drift, and the controller adds a Condition
with the following attributes to the Bucket's `.status.conditions`:
- `type: Reconciling`
- `status: "True"`
- `reason: Progressing` | `reason: ProgressingWithRetry`
If the reconciling state is due to a new revision, an additional Condition is
added with the following attributes:
- `type: ArtifactOutdated`
- `status: "True"`
- `reason: NewRevision`
Both Conditions have a ["negative polarity"][typical-status-properties],
and are only present on the Bucket while their status value is `"True"`.
#### Ready Bucket
The source-controller marks a Bucket as _ready_ when it has the following
characteristics:
- The Bucket reports an [Artifact](#artifact).
- The reported Artifact exists in the controller's Artifact storage.
- The Bucket was able to communicate with the Bucket's object storage endpoint
using the current spec.
- The revision of the reported Artifact is up-to-date with the latest
calculated revision of the object storage bucket.
When the Bucket is "ready", the controller sets a Condition with the following
attributes in the Bucket's `.status.conditions`:
- `type: Ready`
- `status: "True"`
- `reason: Succeeded`
This `Ready` Condition will retain a status value of `"True"` until the Bucket
is marked as [reconciling](#reconciling-bucket), or e.g. a
[transient error](#failed-bucket) occurs due to a temporary network issue.
When the Bucket Artifact is archived in the controller's Artifact
storage, the controller sets a Condition with the following attributes in the
Bucket's `.status.conditions`:
- `type: ArtifactInStorage`
- `status: "True"`
- `reason: Succeeded`
This `ArtifactInStorage` Condition will retain a status value of `"True"` until
the Artifact in the storage no longer exists.
#### Failed Bucket
The source-controller may get stuck trying to produce an Artifact for a Bucket
without completing. This can occur due to some of the following factors:
- The object storage [Endpoint](#endpoint) is temporarily unavailable.
- The specified object storage bucket does not exist.
- The [Secret reference](#secret-reference) contains a reference to a
non-existing Secret.
- The credentials in the referenced Secret are invalid.
- The Bucket spec contains a generic misconfiguration.
- A storage related failure when storing the artifact.
When this happens, the controller sets the `Ready` Condition status to `False`,
and adds a Condition with the following attributes to the Bucket's
`.status.conditions`:
- `type: FetchFailed` | `type: StorageOperationFailed`
- `status: "True"`
- `reason: AuthenticationFailed` | `reason: BucketOperationFailed`
This condition has a ["negative polarity"][typical-status-properties],
and is only present on the Bucket while the status value is `"True"`.
There may be more arbitrary values for the `reason` field to provide accurate
reason for a condition.
While the Bucket has this Condition, the controller will continue to attempt
to produce an Artifact for the resource with an exponential backoff, until
it succeeds and the Bucket is marked as [ready](#ready-bucket).
Note that a Bucket can be [reconciling](#reconciling-bucket) while failing at
the same time, for example due to a newly introduced configuration issue in the
Bucket spec. When a reconciliation fails, the `Reconciling` Condition reason
would be `ProgressingWithRetry`. When the reconciliation is performed again
after the failure, the reason is updated to `Progressing`.
### Observed Ignore
The source-controller reports an observed ignore in the Bucket's
`.status.observedIgnore`. The observed ignore is the latest `.spec.ignore` value
which resulted in a [ready state](#ready-bucket), or stalled due to error
it can not recover from without human intervention. The value is the same as the
[ignore in spec](#ignore). It indicates the ignore rules used in building the
current artifact in storage.
Example:
```yaml
status:
...
observedIgnore: |
hpa.yaml
build
...
```
### Observed Generation
The source-controller reports an
[observed generation][typical-status-properties]
in the Bucket's `.status.observedGeneration`. The observed generation is the
latest `.metadata.generation` which resulted in either a [ready state](#ready-bucket),
or stalled due to error it can not recover from without human
intervention.
### Last Handled Reconcile At
The source-controller reports the last `reconcile.fluxcd.io/requestedAt`
annotation value it acted on in the `.status.lastHandledReconcileAt` field.
For practical information about this field, see [triggering a
reconcile](#triggering-a-reconcile).
[typical-status-properties]: https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#typical-status-properties
[kstatus-spec]: https://github.com/kubernetes-sigs/cli-utils/tree/master/pkg/kstatus