diff --git a/product/personas.md b/product/personas.md new file mode 100644 index 000000000..910c31812 --- /dev/null +++ b/product/personas.md @@ -0,0 +1,96 @@ +# Elafros Personas + +When discussing user actions, it is often helpful to [define specific +user roles](https://en.wikipedia.org/wiki/Persona_(user_experience)) who +might want to do the action. + + +## Elafros Compute + +### Developer Personas + +The developer personas are software engineers looking to build and run +a stateless application without concern about the underlying +infrastructure. + +* Hobbyist +* Backend SWE +* Full stack SWE +* SRE + +User stories: +* Deploy some code +* Update environment +* Roll back the last change +* Debug an error in code +* Monitor my application + +### Operator Personas + +* Hobbyist / Contributor +* Cluster administrator +* Security Engineer / Auditor +* Capacity Planner + +User stories: +* Create an Elafros cluster +* Apply policy / RBAC +* Control or charge back for resource usage +* Choose logging or monitoring plugins + + +## Elafros Build + +We expect the build components of Elafros to be useful on their own, +as well as in conjunction with the compute components. + +### Developer + +User stories: +* Start a build +* Read build logs + +### Language operator / contributor + +User stories: +* Create a build image / build pack + + +## Elafros Events + +Event generation and consumption is a core part of the serverless +(particularly function as a service) computing model. Event generation +and dispatch enables decoupling of event producers from consumers. + +## Event consumer (developer) + +User stories: +* Determine what event sources are available +* Trigger my service when certain events happen (event binding) +* Filter events from a provider + +## Event producer + +User stories: +* Publish events +* Control who can bind events + + +## Contributors + +Contributors are an important part of the Elafros project. As such, we +will also consider how various infrastructure encourages and enables +contributors to the project, as well as the impact on end-users. + +* Hobbyist or newcomer +* Motivated user +* Corporate (employed) maintainer +* Consultant + +User stories: +* Check out the code +* Build and run the code +* Run tests +* View test status +* Run performance tests + diff --git a/spec/errors.md b/spec/errors.md new file mode 100644 index 000000000..5f0f3badf --- /dev/null +++ b/spec/errors.md @@ -0,0 +1,322 @@ +# Error Conditions and Reporting + +Elafros uses the standard Kubernetes API pattern for reporting +configuration errors and current state of the system by writing the +report in the `status` section. There are two mechanisms commonly used +in status: + +* conditions represent true/false statements about the current state + of the resource. + +* other fields may provide status on the most recently retrieved state + of the system as it relates to the resource (example: number of + replicas or traffic assignments). + +Both of these mechanisms often include additional data from the +controller such as `observedGeneration` (to determine whether the +controller has seen the latest updates to the spec). Example user and +system error scenarios are included below along with how the status is +presented to CLI and UI tools via the API. + +* [Revision failed to become Ready](#revision-failed-to-become-ready) +* [Build failed](#build-failed) +* [Revision not found by Route](#revision-not-found-by-route) +* [Configuration not found by Route](#configuration-not-found-by-route) +* [Latest Revision of a Configuration deleted](#latest-revision-of-a-configuration-deleted) +* [Resource exhausted while creating a revision](#resource-exhausted-while-creating-a-revision) +* [Deployment progressing slowly/stuck](#deployment-progressing-slowly-stuck) +* [Traffic shift progressing slowly/stuck](#traffic-shift-progressing-slowly-stuck) +* [Container image not present in repository](#container-image-not-present-in-repository) +* [Container image fails at startup on Revision](#container-image-fails-at-startup-on-revision) + + +## Revision failed to become Ready + +If the latest Revision fails to become `Ready` for any reason within some reasonable +timeframe, the Configuration should signal this +with the `LatestRevisionReady` status, copying the reason and the message +from the `Ready` condition on the Revision. + +```yaml +... +status: + latestReadyRevisionName: abc + latestCreatedRevisionName: bcd # Hasn't become "Ready" + conditions: + - type: LatestRevisionReady + status: False + reason: ContainerMissing + message: "Unable to start because container is missing and build failed." +``` + + +## Build failed + +If the Build steps failed while creating a Revision, you can examine +the `Failed` condition on the Build or the `BuildFailed` condition on +the Revision (which copies the value from the build referenced by +`spec.buildName`). In addition, the Build resource (but not the +Revision) should have a status field to link to the log output of the +build. + +```http +GET /apis/build.dev/v1alpha1/namespaces/default/builds/build-1acub3 +``` +```yaml +... +status: + # Link to log stream; could be ELK or Stackdriver, for example + buildLogsLink: "http://logging.infra.mycompany.com/...?filter=..." + conditions: + - type: Failed + status: True + reason: BuildStepFailed # could also be SourceMissing, etc + # reason is a short status, message provides error details + message: "Step XYZ failed with error message: $LASTLOGLINE" +``` + + +```http +GET /apis/elafros.dev/v1alpha1/namespaces/default/revisions/abc +``` +```yaml +... +status: + conditions: + - type: Ready + status: False + reason: ContainerMissing + message: "Unable to start because container is missing and build failed." + - type: BuildFailed + status: True + reason: BuildStepFailed + # reason is a short status, message provides error details + message: "Step XYZ failed with error message: $LASTLOGLINE" +``` + + +## Revision not found by Route + +If a Revision is referenced in the Route's `spec.rollout.traffic`, the +corresponding entry in the `status.traffic` list will be set to "Not +found", and the `TrafficDropped` condition will be marked as True, +with a reason of `RevisionMissing`. + +```http +GET /apis/elafros.dev/v1alpha1/namespaces/default/routes/abc +``` +```yaml +... +status: + traffic: + - revisionName: abc + name: current + percent: 100 + - revisionName: "Not found" + name: next + percent: 0 + conditions: + - type: RolloutInProgress + status: False + - type: TrafficDropped + status: True + reason: RevisionMissing + # reason is a short status, message provides error details + message: "Revision 'qyzz' referenced in rollout.traffic not found" +``` + + +## Configuration not found by Route + +If a Route references the `latestReadyRevisionName` of a Configuration +and the Configuration cannot be found, the corresponding entry in +`status.traffic` list will be set to "Not found", and the +`TrafficDropped` condition will be marked as True with a reason of +`ConfigurationMissing`. + +```http +GET /apis/elafros.dev/v1alpha1/namespaces/default/routes/abc +``` +```yaml +... +status: + traffic: + - revisionName: "Not found" + percent: 100 + conditions: + - type: RolloutInProgress + status: False + - type: TrafficDropped + status: True + reason: ConfigurationMissing + # reason is a short status, message provides error details + message: "Revision 'my-service' referenced in rollout.traffic not found" +``` + + +## Latest Revision of a Configuration deleted + +If the most recent (or most recently ready) Revision is deleted, the +Configuration will clear the `latestReadyRevisionName`. If the +Configuration is referenced by a Route, the Route will set the +`TrafficDropped` condition with reason `RevisionMissing`, as above. + +```http +GET /apis/elafros.dev/v1alpha1/namespaces/default/configurations/my-service +``` +```yaml +... +metadata: + generation: 1234 # only updated when spec changes + ... +spec: + ... +status: + latestCreatedRevision: abc + conditions: + - type: LatestRevisionReady + status: False + reason: RevisionMissing + message: "The latest Revision appears to have been deleted." + observedGeneration: 1234 +``` + + +## Resource exhausted while creating a revision + +Since a Revision is only metadata, the Revision will be created, but +will have a condition indicating the underlying failure, possibly +indicating the failed underlying resource. In a multitenant +environment, the customer might not have have access or visibility +into the underlying resources in the hosting environment. + +```http +GET /apis/elafros.dev/v1alpha1/namespaces/default/revisions/abc +``` +```yaml +... +status: + conditions: + - type: Ready + status: False + reason: NoDeployment + message: "The controller could not create a deployment named ela-abc-e13ac." +``` + + +## Deployment progressing slowly/stuck + +See +[the kubernetes documentation for how this is handled for Deployments](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#failed-deployment). For +Revisions, we will start by assuming a single timeout for deployment +(rather than configurable), and report that the Revision was not +Ready, with a reason `ProgressDeadlineExceeded`. Note that we will +only report `ProgressDeadlineExceeded` if we could not determine +another reason (such as quota failures, missing build, or container +execution failures). + +Kubernetes controllers will continue attempting to make progress +(possibly at a less-aggressive rate) when they encounter a case where +the desired status cannot match the actual status, so if the +underlying deployment is slow, it might eventually finish after +reporting `ProgressDeadlineExceeded`. + +```http +GET /apis/elafros.dev/v1alpha1/namespaces/default/revisions/abc +``` +```yaml +... +status: + conditions: + - type: Ready + status: False + reason: ProgressDeadlineExceeded + message: "Unable to create pods for more than 120 seconds." +``` + + +## Traffic shift progressing slowly/stuck + +Similar to deployment slowness, if the transfer of traffic (either via +gradual or abrupt rollout) takes longer than a certain timeout to +complete/update, the `RolloutInProgress` condition will remain at +True, but the reason will be set to `ProgressDeadlineExceeded`. + +```http +GET /apis/elafros.dev/v1alpha1/namespaces/default/routes/abc +``` +```yaml +... +status: + traffic: + - revisionName: abc + percent: 75 + - revisionName: def + percent: 25 + conditions: + - type: RolloutInProgress + status: True + reason: ProgressDeadlineExceeded + # reason is a short status, message provides error details + message: "Unable to update traffic split for more than 120 seconds." +``` + + +## Container image not present in repository + +Revisions might be created while a Build is still creating the +container image or uploading it to the repository. If the build is +being performed by a CRD in the cluster, the spec.buildName attribute +will be set (and see the [Build failed](#build-failed) example). In +other cases when the build is not supplied, the container image +referenced might not be present in the registry (either because of a +typo or because it was deleted). In this case, the Ready condition +will be set to False with a reason of ContainerMissing. This condition +could be corrected if the image becomes available at a later time. We +can also make a defensive copy of the container image to avoid this +error due to deleted source container. + +```http +GET /apis/elafros.dev/v1alpha1/namespaces/default/revisions/abc +``` +```yaml +... +status: + conditions: + - type: Ready + status: False + reason: ContainerMissing + message: "Unable to fetch image 'gcr.io/...': " + - type: Failed + status: True + reason: ContainerMissing + message: "Unable to fetch image 'gcr.io/...': " +``` + + +## Container image fails at startup on Revision + +Particularly for development cases with interpreted languages like +Node or Python, syntax errors or the like might only be caught at +container startup time. For this reason, implementations may choose to +start a single copy of the container on deployment, before making the +container Ready. If the initial container fails to start, the `Ready` +condition will be set to False and the reason will be set to +`ExitCode:%d` with the exit code of the application, and the last line +of output in the message. Additionally, the Revision will include a +`logsUrl` which provides the address of an endpoint which can be used to +fetch the logs for the failed process. + +```http +GET /apis/elafros.dev/v1alpha1/namespaces/default/revisions/abc +``` +```yaml +... +status: + logUrl: "http://logging.infra.mycompany.com/...?filter=revision=abc&..." + conditions: + - type: Ready + status: False + reason: ExitCode:127 + message: "Container failed with: SyntaxError: Unexpected identifier" +``` diff --git a/spec/images/auto_rollout.png b/spec/images/auto_rollout.png new file mode 100644 index 000000000..04dfb62c4 Binary files /dev/null and b/spec/images/auto_rollout.png differ diff --git a/spec/images/build_example.png b/spec/images/build_example.png new file mode 100644 index 000000000..28fb83e2e Binary files /dev/null and b/spec/images/build_example.png differ diff --git a/spec/images/build_function.png b/spec/images/build_function.png new file mode 100644 index 000000000..d37bd8023 Binary files /dev/null and b/spec/images/build_function.png differ diff --git a/spec/images/initial_creation.png b/spec/images/initial_creation.png new file mode 100644 index 000000000..61e160c3d Binary files /dev/null and b/spec/images/initial_creation.png differ diff --git a/spec/images/manual_rollout.png b/spec/images/manual_rollout.png new file mode 100644 index 000000000..1e54b0a27 Binary files /dev/null and b/spec/images/manual_rollout.png differ diff --git a/spec/images/object_model.png b/spec/images/object_model.png new file mode 100644 index 000000000..c48a4df53 Binary files /dev/null and b/spec/images/object_model.png differ diff --git a/spec/motivation.md b/spec/motivation.md new file mode 100644 index 000000000..573e6da9b --- /dev/null +++ b/spec/motivation.md @@ -0,0 +1,22 @@ +The goal of the Elafros project is to provide a common toolkit and API +framework for serverless workloads. + +We define serverless workloads as computing workloads that are: + +* Stateless +* Amenable to the process scale-out model +* Primarily driven by application level (L7 -- HTTP, for example) + request traffic + +While Kubernetes provides basic primitives like Deployment, Service, +and Ingress in support of this model, our experience suggests that a +more compact and richer opinionated model has substantial benefit for +developers. In particular, by standardizing on higher-level primitives +which perform substantial amounts of automation of common +infrastructure, it should be possible to build consistent toolkits +that provide a richer experience than updating yaml files with +`kubectl`. + +The Elafros APIs consist of Compute API (these documents), +[Build API](https://github.com/elafros/build) and +[Eventing API](https://github.com/elafros/eventing). diff --git a/spec/normative_examples.md b/spec/normative_examples.md new file mode 100644 index 000000000..8b2aac4bf --- /dev/null +++ b/spec/normative_examples.md @@ -0,0 +1,929 @@ +# Sample API Usage + +Following are several normative sample scenarios utilizing the Elafros +API. These scenarios are arranged to provide a flavor of the API and +building from the smallest, most frequent operations. + +Examples in this section illustrate: + +* [Automatic rollout of a new Revision to an existing Service with a + pre-built container](#1--automatic-rollout-of-a-new-revision-to-existing-service---pre-built-container) +* [Creating a first route to deploy a first revision from a pre-built + container](#2--creating-route-and-deploying-first-revision---pre-built-container) +* [Configuration changes and manual rollout + options](#3--manual-rollout-of-a-new-revision---config-change-only) +* [Creating a revision from source](#4--deploy-a-revision-from-source) +* [Creating a function from source](#5--deploy-a-function) + +Note that these API operations are identical for both app and function +based services. (to see the full resource definitions, see the +[Resource YAML Definitions](spec.md)). + +CLI samples are for illustrative purposes, and not intended to +represent final CLI design. + +## 1) Automatic rollout of a new Revision to existing Service - pre-built container + +**_Scenario_**: User deploys a new revision to an existing service +with a new container image, rolling out automatically to 100% + +``` +$ elafros deploy --service my-service + Deploying app to service [my-service]: +✓ Starting +✓ Promoting + Done. + Deployed to https://my-service.default.mydomain.com +``` + +**Steps**: + +* Update the Configuration with the config change + +**Results:** + +* A new Revision is created, and automatically rolled out to 100% once + ready + +![Automatic Rollout](images/auto_rollout.png) + + +After the initial Route and Configuration have been created (which is +shown in the [second example](TODO)), the typical +interaction is to update the revision configuration, resulting in the +creation of a new revision, which will be automatically rolled out by +the route. Revision configuration updates can be handled as either a +PUT or PATCH operation: + +* Optimistic concurrency controls for PUT operations in a + read/modify/write routine work as expected in kubernetes. + +* PATCH semantics should work as expected in kubernetes, but may have + some limitations imposed by CRDs at the moment. + +In this and following examples PATCH is used. Revisions can be built +from source, which results in a container image, or by directly +supplying a pre-built container, which this first scenario +illustrates. The example demonstrates the PATCH issued by the client, +followed by several GET calls to illustrate each step in the +reconciliation process as the system materializes the new revision, +and begins shifting traffic from the old revision to the new revision. + +The client PATCHes the configuration's template revision with just the +new container image, inheriting previous configuration from the +configuration: + +```http +PATCH /apis/elafros.dev/v1alpha1/namespaces/default/configurations/my-service +``` +```yaml +apiVersion: elafros.dev/v1alpha1 +kind: Configuration +metadata: + name: my-service # by convention, same name as the service +spec: + revisionTemplate: # template for building Revision + spec: + container: + image: gcr.io/... # new image +``` + +The update to the Configuration triggers a new revision being created, +and the Configuration is updated to reflect the new Revision: + +```http +GET /apis/elafros.dev/v1alpha1/namespaces/default/configurations/my-service +``` +```yaml +apiVersion: elafros.dev/v1alpha1 +kind: Configuration +metadata: + name: my-service + generation: 1235 + ... + +spec: + ... # same as before, except new container.image +status: + latestReadyRevisionName: abc + latestCreatedRevisionName: def # new revision created, but not ready yet + observedGeneration: 1235 +``` + +The newly created revision has the same config as the previous +revision, but different code. Note the generation label reflects the +new generation of the configuration (1235), indicating the provenance +of the revision: + +```http +GET /apis/elafros.dev/v1alpha1/namespaces/default/revisions/def +``` +```yaml +apiVersion: elafros.dev/v1alpha1 +kind: Revision +metadata: + name: def + labels: + elafros.dev/configuration: my-service + elafros.dev/configurationGeneration: 1235 + ... +spec: + container: # k8s core.v1.Container + image: gcr.io/... # new container + # same config as previous revision + env: + - name: FOO + value: bar + - name: HELLO + value: blurg + ... +status: + conditions: + - type: Ready + status: True +``` + +When the new revision is Ready, i.e. underlying resources are +materialized and ready to serve, the configuration updates its +`status.latestReadyRevisionName` status to reflect the new +revision. The route, which is configured to automatically rollout new +revisions from the configuration, watches the configuration and is +notified of the `latestReadyRevisionName`, and begins migrating traffic +to it. During reconciliation, traffic may be routed to both existing +revision `abc` and new revision `def`: + +```http +GET /apis/elafros.dev/v1alpha1/namespaces/default/routes/my-service +``` +```yaml +apiVersion: elafros.dev/v1alpha1 +kind: Route +metadata: + name: my-service + ... +spec: + rollout: + traffic: + - configurationName: my-service + percent: 100 + +status: + # domain: + # oss: my-service.namespace.mydomain.com + domain: my-service.namespace.mydomain.com + # percentages add to 100 + traffic: # in status, all configurationName refs are dereferenced + - revisionName: abc + percent: 75 + - revisionName: def + percent: 25 + conditions: + - type: RolloutComplete + status: False +``` + +And once reconciled, revision def serves 100% of the traffic : + +```http +GET /apis/elafros.dev/v1alpha1/namespaces/default/routes/my-service +``` +```yaml +apiVersion: elafros.dev/v1alpha1 +kind: Route +metadata: + name: my-service + ... +spec: + rollout: + traffic: + - configurationName: my-service + percent: 100 +status: + domain: my-service.default.mydomain.com + traffic: + - revisionName: def + percent: 100 + conditions: + - type: RolloutComplete + status: True + ... +``` + + +## 2) Creating Route and deploying first Revision - pre-built container + +**Scenario**: User creates a new Route and deploys their first + Revision based on a pre-built container + +``` +$ elafros deploy --service my-service --region us-central1 +✓ Creating service [my-service] in region [us-central1] + Deploying app to service [my-service]: +✓ Uploading [=================] +✓ Starting +✓ Promoting + Done. + Deployed to https://my-service.default.mydomain.com +``` + +**Steps**: + +* Create a new Configuration and a Route that references a that + configuration. + +**Results**: + +* A new Configuration is created, and generates a new Revision based + on the configuration + +* A new Route is created, referencing the configuration + +* The route begins serving traffic to the revision that was created by + the configuration + +![Initial Creation](images/initial_creation.png) + + +The previous example assumed an existing Route and Configuration to +illustrate the common scenario of updating the configuration to deploy +a new revision to the service. + +In this getting started example, deploying a first Revision is +accomplished by creating a new Configuration (which will generate a +new Revision) and creating a new Route referring to that +configuration. Note that these two steps can occur in either order, or +in parallel. + +A Route can either refer directly to a Revision, or to the latest +ready revision of a Configuration, as this example illustrates. This +is the most straightforward scenario that many Elafros customers are +expected to use, and is consistent with the experience of deploying +code that is rolled out immediately. + +The example shows the POST calls issued by the client, followed by +several GET calls to illustrate each step in the reconciliation +process as the system materializes and begins routing traffic to the +revision. + +The client creates the route and configuration, which by convention +share the same name: + +```http +POST /apis/elafros.dev/v1alpha1/namespaces/default/routes +``` +```yaml +apiVersion: elafros.dev/v1alpha1 +kind: Route +metadata: + name: my-service +spec: + rollout: + traffic: + - configurationName: my-service # named reference to Configuration + percent: 100 # automatically activate new Revisions from the configuration +``` + +```http +POST /apis/elafros.dev/v1alpha1/namespaces/default/configurations +``` +```yaml +apiVersion: elafros.dev/v1alpha1 +kind: Configuration +metadata: + name: my-service # By convention (not req'd), same name as the service. + # This will also be set as the "elafros.dev/configuration" + # label on the created Revision. +spec: + revisionTemplate: # template for building Revision + metadata: ... + spec: + container: # k8s core.v1.Container + image: gcr.io/... + env: + - name: FOO + value: bar + - name: HELLO + value: world + ... +``` + +Upon the creation of the configuration, the system will create a new +Revision, generating its name, and applying the spec and metadata from +the configuration, as well as new metadata labels: + +```http +GET /apis/elafros.dev/v1alpha1/namespaces/default/revisions/abc +``` +```yaml +apiVersion: elafros.dev/v1alpha1 +kind: Revision +metadata: + name: abc # generated name + labels: + # name and generation of the configuration that created the revision + elafros.dev/configuration: my-service + elafros.dev/configurationGeneration: 1234 + ... # uid, resourceVersion, creationTimestamp, generation, selfLink, etc +spec: + ... # spec from the configuration +status: + conditions: + - type: Ready + status: False + message: "Starting Instances" +``` + +Immediately after the revision is created, i.e. before underlying +resources have been fully materialized, the configuration is updated +with latestCreatedRevisionName: + +```http +GET /apis/elafros.dev/v1alpha1/namespaces/default/configurations/my-service +``` +```yaml +apiVersion: elafros.dev/v1alpha1 +kind: Configuration +metadata: + name: my-service + generation: 1234 + ... # uid, resourceVersion, creationTimestamp, selfLink, etc +spec: + ... # same as before +status: + # latest created revision, may not have materialized yet + latestCreatedRevisionName: abc + observedGeneration: 1234 +``` + +The configuration watches the revision, and when the revision is +updated as Ready (to serve), the latestReadyRevisionName is updated: + +```http +GET /apis/elafros.dev/v1alpha1/namespaces/default/configurations/my-service +``` +```yaml +apiVersion: elafros.dev/v1alpha1 +kind: Configuration +metadata: + name: my-service + generation: 1234 + ... +spec: + ... # same as before +status: + # the latest created and ready to serve. Watched by service + latestReadyRevisionName: abc + # latest created revision + latestCreatedRevisionName: abc + observedGeneration: 1234 +``` + +The route, which watches the configuration `my-service`, observes the +change to `latestReadyRevisionName` and begins routing traffic to the +new revision `abc`, addressable as +`my-service.default.mydomain.com`. Once reconciled: + +```http +GET /apis/elafros.dev/v1alpha1/namespaces/default/routes/my-service +``` +```yaml +apiVersion: elafros.dev/v1alpha1 +kind: Route +metadata: + name: my-service + generation: 2145 + ... +spec: + rollout: + traffic: + - configurationName: my-service + percent: 100 + +status: + domain: my-service.default.mydomain.com + + traffic: # in status, all configurationName refs are dereferenced to latest revision + - revisionName: abc # latestReadyRevisionName from configurationName in spec + percent: 100 + + conditions: + - type: RolloutComplete + status: True + + observedGeneration: 2145 +``` + + +## 3) Manual rollout of a new Revision - config change only + +**_Scenario_**: User updates configuration with new configuration (env + var change) to an existing service, tests the revision, then + proceeds with a manually controlled rollout to 100% + +``` +$ elafros rollout strategy manual + +$ elafros deploy --service my-service --env HELLO="blurg" +[...] + +$ elafros revisions list --service my-service +Name Traffic Id Date Deployer Git SHA +next 0% v3 2018-01-19 12:16 user1 a6f92d1 +current 100% v2 2018-01-18 20:34 user1 a6f92d1 + v1 2018-01-17 10:32 user1 33643fc + +$ elafros rollout next percent 5 +[...] +$ elafros rollout next percent 50 +[...] +$ elafros rollout finish +[...] + +$ elafros revisions list --service my-service +Name Traffic Id Date Deployer Git SHA +current,next 100% v3 2018-01-19 12:16 user1 a6f92d1 + v2 2018-01-18 20:34 user1 a6f92d1 + v1 2018-01-17 10:32 user1 33643fc +``` + +**Steps**: + +* Update the Route to pin the current revision + +* Update the Configuration with the new configuration (env var) + +* Update the Route to address the new Revision + +* After testing the new revision through the named subdomain, proceed + with the rollout, incrementally increasing traffic to 100% + +**Results:** + +* The system creates the new revision from the configuration, + addressable at next.my-service... (by convention), but traffic is + not routed to it until the percentage is manually ramped up. Upon + completing the rollout, the next revision is now the current + revision + +![Manual rollout](images/manual_rollout.png) + + +In the previous examples, the route referenced a Configuration for +automatic rollouts of new Revisions. While this pattern is useful for +many scenarios such as functions-as-a-service and simple development +flows, the Route can also reference Revisions directly to "pin" +traffic to specific revisions, which is suitable for manually +controlling rollouts, i.e. testing a new revision prior to serving +traffic. (Note: see [Appendix B](complex_examples.md) for a +semi-automatic variation of manual rollouts). + +The client updates the route to pin the current revision: + +```http +PATCH /apis/elafros.dev/v1alpha1/namespaces/default/routes/my-service +``` +```yaml +apiVersion: elafros.dev/v1alpha1 +kind: Route +metadata: + name: my-service +spec: + rollout: + traffic: + - revisionName: def # pin a specific revision, i.e. the current one + percent: 100 +``` + +As in the previous example, the configuration is updated to trigger +the creation of a new revision, in this case updating the container +image but keeping the same config: + +```http +PATCH /apis/elafros.dev/v1alpha1/namespaces/default/configurations/my-service +``` +```yaml +apiVersion: elafros.dev/v1alpha1 +kind: Configuration +metadata: + name: my-service +spec: + revisionTemplate: + spec: + container: + env: # k8s-style strategic merge patch, updating a single list value + - name: HELLO + value: blurg # changed value +``` + +A new revision `ghi` is created that has the same code as the previous +revision `def`, but different config: + +```http +GET /apis/elafros.dev/v1alpha1/namespaces/default/revisions/ghi +``` +```yaml +apiVersion: elafros.dev/v1alpha1 +kind: Revision +metadata: + name: ghi + ... +spec: + container: + image: gcr.io/... # same container as previous revision abc + env: + - name: FOO + value: bar + - name: HELLO + value: blurg # changed value + ... +status: + conditions: + - type: Ready + status: True +``` + +Even when ready, the new revision does not automatically start serving +traffic, as the route was pinned to revision `def`. + +Update the route to make the existing revision serving traffic +addressable through subdomain `current`, and referencing the new +revision at 0% traffic but making it addressable through subdomain +`next`: + +```http +PATCH /apis/elafros.dev/v1alpha1/namespaces/default/routes/my-service +``` +```yaml +apiVersion: elafros.dev/v1alpha1 +kind: Route +metadata: + name: my-service +spec: + rollout: + traffic: + - revisionName: def + name: current # addressable as current.my-service.default.mydomain.com + percent: 100 + - revisionName: ghi + name: next # addressable as next.my-service.default.mydomain.com + percent: 0 # no traffic yet +``` + +In this state, the route makes both revisions addressable with +subdomains `current` and `next` (once the revision `ghi` has a status of +Ready), but traffic has not shifted to next yet. Also note that while +the names current/next have semantic meaning, they are convention +only; blue/green, or any other subdomain names could be configured. + +```http +GET /apis/elafros.dev/v1alpha1/namespaces/default/routes/my-service +``` +```yaml +apiVersion: elafros.dev/v1alpha1 +kind: Route +metadata: + name: my-service + ... +spec: + ... # unchanged +status: + domain: my-service.default.mydomain.com + traffic: + - revisionName: def + name: current # addressable as current.my-service.default.mydomain.com + percent: 100 + - revisionName: ghi + name: next # addressable as next.my-service.default.mydomain.com + percent: 0 + conditions: + - type: RolloutComplete + status: True + ... +``` + +After testing the new revision at +`next.my-service.default.mydomain.com`, it can be rolled out to 100% +(either directly, or through several increments, with the split +totaling 100%): + +```http +PATCH /apis/elafros.dev/v1alpha1/namespaces/default/routes/my-service +``` +```yaml +apiVersion: elafros.dev/v1alpha1 +kind: Route +metadata: + name: my-service +spec: + rollout: + Traffic: # percentages must total 100% + - revisionName: def + name: current + percent: 0 + - revisionName: ghi + name: next + percent: 100 # migrate traffic fully to the next revision +``` + +After reconciliation, all traffic has been shifted to the new version: + +```http +GET /apis/elafros.dev/v1alpha1/namespaces/default/routes/my-service +``` +```yaml +apiVersion: elafros.dev/v1alpha1 +kind: Route +metadata: + name: my-service + ... +spec: + ... # unchanged +status: + domain: my-service.default.mydomain.com + traffic: + - revisionName: def + name: current + percent: 0 + - revisionName: ghi + name: next + percent: 100 + conditions: + - type: RolloutComplete + status: True + ... +``` + +By convention, the final step when completing the rollout is to update +`current` to reflect the new revision. `next` can either be removed, or +left addressing the same revision as current so that +`next.my-service.default.mydomain.com` is always addressable. + +```http +PATCH /apis/elafros.dev/v1alpha1/namespaces/default/routes/my-service +``` +```yaml +apiVersion: elafros.dev/v1alpha1 +kind: Route +metadata: + name: my-service +spec: + rollout: + traffic: + - revisionName: ghi # update for the next rollout, current = next + name: current + percent: 100 + - revisionName: ghi # optional: leave next as also referring to ghi + name: next + percent: 0 +``` + + +## 4) Deploy a Revision from source + +**Scenario**: User deploys a revision to an existing service from + source rather than a pre-built container + +``` +$ elafros deploy --service my-service + Deploying app to service [my-service]: +✓ Uploading [=================] +✓ Detected [node-8-9-4] runtime +✓ Building +✓ Starting +✓ Promoting + Done. + Deployed to https://my-service.default.mydomain.com +``` + +**Steps**: + +* Create/Update a Configuration, inlining build details. + +**Results**: + +* The Configuration is created/updated, which generates a container + build and a new revision based on the template, and can be rolled + out per earlier examples + +![Build Example](images/build_example.png) + + +Previous examples demonstrated configurations created with pre-built +containers. Revisions can also be created by providing build +information to the configuration, which results in a container image +built by the system. The build information is supplied by inlining the +BuildSpec of a Build resource in the Configuration. This describes: + +* **What** to build (`build.source`): Source can be provided as an + archive, manifest file, or repository. + +* **How** to build (`build.template`): a + [BuildTemplate](https://github.com/elafros/build) is referenced, + which describes how to build the container via a builder with + arguments to the build process. + +* **Where** to publish (`build.template.arguments`): Image registry + url and other information specific to this build invocation. + +The client creates the configuration inlining a build spec for an +archive based source build, and referencing a nodejs build template: + +```http +POST /apis/elafros.dev/v1alpha1/namespaces/default/configurations +``` +```yaml +apiVersion: elafros.dev/v1alpha1 +kind: Configuration +metadata: + name: my-service +spec: + build: # build.dev/v1alpha1.BuildTemplateSpec + source: + # oneof git|gcs|custom: + git: + url: https://... + commit: ... + template: # defines build template + name: nodejs_8_9_4 # builder name + namespace: build-templates + arguments: + - name: _IMAGE + value: gcr.io/... # destination for image + + revisionTemplate: # template for building Revision + metadata: ... + spec: + container: # k8s core.v1.Container + image: gcr.io/... # Promise of a future build. Same as supplied in + # build.template.arguments[_IMAGE] + env: # Updated environment variables to go live with new source. + - name: FOO + value: bar + - name: HELLO + value: world +``` + +Note the `revisionTemplate.spec.container.image` above is supplied +with the destination of the build. This enables one-step changes to +both config and source code. If the build step were responsible for +updating the `revisionTemplate.spec.container.image` at the completion +of the build, an update to both source and config could result in the +creation of two Revisions, one with the config change, and the other +with the new code deployment. It is expected that Revision will wait +for the `buildName` to be complete and the +`revisionTemplate.spec.container.image` to be live before marking the +Revision as "ready". + +Upon creating/updating the configuration's build field, the system +creates a new revision. The configuration controller will initiate a +build, populating the revision’s buildName with a reference to the +underlying Build resource. Via status updates which the revision +controller observes through the build reference, the high-level state +of the build is mirrored into conditions in the Revision’s status: + +```http +GET /apis/elafros.dev/v1alpha1/namespaces/default/revisions/abc +``` +```yaml +apiVersion: elafros.dev/v1alpha1 +kind: Revision +metadata: + name: abc + labels: + elafros.dev/configuration: my-service + elafros.dev/configurationGeneration: 1234 + ... +spec: + # name of the build.dev/v1alpha1.Build, if built from source. + # Set by Configuration. + buildName: ... + + # spec from the configuration, with container.image containing the + # newly built container + container: # k8s core.v1.Container + image: gcr.io/... + env: + - name: FOO + value: bar + - name: HELLO + value: world +status: + # This is a copy of metadata from the container image or grafeas, indicating + # the provenance of the revision, annotated on the container + imageSource: + archive|manifest|repository: ... + context: ... + conditions: + - type: Ready + status: True + - type: BuildComplete + status: True + # other conditions indicating build failure details, if applicable +``` + +Rollout operations in the route are identical to the pre-built +container examples. + +Also analogous is updating the configuration to create a new +revision - in this case, updated source would be provided to the +configuration's inlined build spec, which would initiate a new +container build, and the creation of a new revision. + + +## 5) Deploy a Function + +**Scenario**: User deploys a new function revision to an existing service + +``` +$ elafros deploy --function index --service my-function + Deploying function to service [my-function]: +✓ Uploading [=================] +✓ Detected [node-8-9-4] runtime +✓ Building +✓ Starting +✓ Promoting + Done. + Deployed to https://my-function.default.mydomain.com +``` + +**Steps**: + +* Create/Update a Configuration, additionally specifying function details. + +**Results**: + +* The Configuration is created/updated, which generates a new revision + based on the template build and spec which can be rolled out per + previous examples + +![Build Function](images/build_function.png) + + +Previous examples illustrated creating and deploying revisions in the +context of apps. Functions are created and deployed in the same +manner (in particular, as containers which respond to HTTP). In the +build phase of the deployment, additional function metadata may be +taken into account in order to wrap the supplied code in a functions +framework. + +Functions are configured with a language-specific entryPoint. The +entryPoint may be provided as an argument to the build template, if +language-native autodetection is insufficient. By convention, a type +metadata label may also be added that designates revisions as a +function, supporting listing revisions by type; there is no change to +the system behavior based on type. + +Note that a function may be connected to one or more event sources via +Bindings in the Eventing API; the binding of events to functions is +not a core function of the compute API. + +Creating the configuration with build and function metadata: + +```http +POST /apis/elafros.dev/v1alpha1/namespaces/default/configurations +``` +```yaml +apiVersion: elafros.dev/v1alpha1 +kind: Configuration +metadata: + name: my-function +spec: + build: # build.dev/v1alpha1.BuildTemplateSpec + source: + # oneof git|gcs|custom + git: + url: https://... + commit: ... + template: # defines build template + name: go_1_9_fn # function builder + namespace: build-templates + arguments: + - name: _IMAGE + value: gcr.io/... # destination for image + - name: _ENTRY_POINT + value: index # language dependent, function-only entrypoint + + revisionTemplate: # template for building Revision + metadata: + labels: + # One-of "function" or "app", convention for CLI/UI clients to list/select + elafros.dev/type: "function" + spec: + container: # k8s core.v1.Container + image: gcr.io/... # Promise of a future build. Same as supplied in + # build.template.arguments[_IMAGE] + env: + - name: FOO + value: bar + - name: HELLO + value: world + + # serializes requests for function. Default value for functions + concurrencyModel: SingleThreaded + # max time allowed to respond to request + timeoutSeconds: 20 +``` + +Upon creating or updating the configuration, a new Revision is created +per the previous examples. Rollout operations are also identical to +the previous examples. diff --git a/spec/overview.md b/spec/overview.md new file mode 100644 index 000000000..f8bd77de8 --- /dev/null +++ b/spec/overview.md @@ -0,0 +1,86 @@ +# Resource Types + +The primary resources in the Elafros API are Routes, Revisions, and Configurations: + +* A **Route** provides a named endpoint and a mechanism for routing traffic to + +* **Revisions**, which are immutable snapshots of code + config, created by a + +* **Configuration**, which acts as a stream of environments for Revisions. + +![Object model](images/object_model.png) + +## Route + +**Route** provides a network endpoint for a user's service (which +consists of a series of software and configuration Revisions over +time). A kubernetes namespace can have multiple routes. The route +provides a long-lived, stable, named, HTTP-addressable endpoint that +is backed by one or more **Revisions**. The default configuration is +for the route to automatically route traffic to the latest revision +created by a **Configuration**. For more complex scenarios, the API +supports splitting traffic on a percentage basis, and CI tools could +maintain multiple configurations for a single route (e.g. "golden +path" and “experiments”) or reference multiple revisions directly to +pin revisions during an incremental rollout and n-way traffic +split. The route can optionally assign addressable subdomains to any +or all backing revisions. + +## Revision + +**Revision** is an immutable snapshot of code and configuration. A +revision can be created from a pre-built container image or built from +source. While there is a history of previous revisions, only those +currently referenced by a Route are addressable or routable. Older +inactive revisions need not be backed by underlying resources, they +exist only as the revision metadata in storage. Revisions are created +by updates to a **Configuration**. + +## Configuration + +A **Configuration** describes the desired latest Revision state, and +creates and tracks the status of Revisions as the desired state is +updated. A configuration might include instructions on how to transform +a source package (either git repo or archive) into a container by +referencing a [Build](https://github.com/elafros/build), or might +simply reference a container image and associated execution metadata +needed by the Revision. On updates to a Configuration, a new build +and/or deployment (creating a Revision) may be performed; the +Configuration's controller will track the status of created Revisions +and makes both the most recently created and most recently *ready* +(i.e. healthy) Revision available in the status section. + + +# Orchestration + +The system will be configured to not allow customer mutations to +Revisions. Instead, the creation of immutable Revisions through a +Configuration provides: + +* a single referenceable resource for the route to perform automated + rollouts +* a single resource that can be watched to see a history of all the + revisions created +* (but doesn’t mandate) PATCH semantics for new revisions to be done + on the server, minimizing read-modify-write implemented across + multiple clients, which could result in optimistic concurrency + errors +* the ability to rollback to a known good configuration + +In the conventional single live revision scenario, a route has a +single configuration with the same name as the route. Update +operations on the configuration enable scenarios such as: + +* *"Push code, keep config":* Specifying a new revision with updated + source, inheriting configuration such as env vars from the + configuration. +* *"Update config, keep code"*: Specifying a new revision as just a + change to configuration, such as updating an env variable, + inheriting all other configuration and source/image. + +When creating an initial route and performing the first deployment, +the two operations of creating a Route and an associated Configuration +can be done in parallel, which streamlines the use case of deploying +code initially from a button. The +[sample API usage](normative_examples.md) section illustrates +conventional usage of the API. diff --git a/spec/spec.md b/spec/spec.md new file mode 100644 index 000000000..f6d6ab825 --- /dev/null +++ b/spec/spec.md @@ -0,0 +1,263 @@ +## Resource Paths + +Resource paths in the Elafros API have the following standard k8s form: + +``` +/apis/{apiGroup}/{apiVersion}/namespaces/{metadata.namespace}/{kind}/{metadata.name} +``` + +For example: + +``` +/apis/elafros.dev/v1alpha1/namespaces/default/routes/my-service +``` + +It is expected that each Route will provide a name within a +cluster-wide DNS name. While no particular URL scheme is mandated +(consult the `domain` property of the Route for the authoritative +mapping), a common implementation would be to use the kubernetes +namespace mechanism to produce a URL like the following: + +``` +[$revisionname].$route.$namespace. +``` + +For example: + +``` +prod.my-service.default.mydomain.com +``` + + +# Resource YAML Definitions + +YAMLs for the Elafros API resources are described below, describing the +basic k8s structure: metadata, spec and status, along with comments on +specific fields. + +## Route + +For a high-level description of Routes, +[see the overview](overview.md#route). + +```yaml +apiVersion: elafros.dev/v1alpha1 +kind: Route +metadata: + name: my-service + namespace: default + labels: + elafros.dev/type: ... # +optional convention: function|app + + # system generated meta + uid: ... +  resourceVersion: ... # used for optimistic concurrency control +  creationTimestamp: ... +  generation: ... # updated only when spec changes; used by observedGeneration +  selfLink: ... + ... +spec: + traffic: + # list of oneof configurationName | revisionName. + # configurationName watches configurations to address latest latestReadyRevisionName + # revisionName pins a specific revision + - configurationName: ... + name: ... # +optional. Access as {name}.${status.domain}, + # e.g. oss: current.my-service.default.mydomain.com + percent: 100 # list percentages must add to 100. 0 is a valid list value + - ... + +status: + # domain: The hostname used to access the default (traffic-split) + # route. Typically, this will be composed of the name and namespace + # along with a cluster-specific prefix (here, mydomain.com). + domain: my-service.default.mydomain.com + + traffic: + # current rollout status list. configurationName references + # are dereferenced to latest revision + - revisionName: ... # latestReadyRevisionName from a configurationName in spec + name: ... + percent: ... # percentages add to 100. 0 is a valid list value + - ... + + conditions: # See also the [error conditions documentation](errors.md) + - type: RolloutComplete + status: True + - type: TrafficDropped + status: False + - ... + + observedGeneration: ... # last generation being reconciled +``` + + +## Configuration + +For a high-level description of Configurations, +[see the overview](overview.md#configuration). + + +```yaml +apiVersion: elafros.dev/v1alpha1 +kind: Configuration +metadata: + name: my-service + namespace: default + + # system generated meta + uid: ... +  resourceVersion: ... # used for optimistic concurrency control +  creationTimestamp: ... +  generation: ... # updated only when spec changes; used by observedGeneration +  selfLink: ... + ... +spec: + # +optional. composable Build spec, if omitted provide image directly + build: # This is a build.dev/v1alpha1.BuildTemplateSpec + source: + # oneof git|gcs|custom: + + # +optional. + git: + url: https://github.com/jrandom/myrepo + commit: deadbeef # Or branch, tag, ref + + # +optional. A zip archive or a manifest file in Google Cloud + # Storage. A manifest file is a file containing a list of file + # paths, backing URLs, and sha checksums. Manifest may be a more + # efficient mechanism for a client to perform partial upload. + gcs: + location: https://... + type: 'archive' # Or 'manifest' + + # +optional. Custom specifies a container which will be run as + # the first build step to fetch the source. + custom: # is a core.v1.Container + image: gcr.io/cloud-builders/git:latest + args: [ "clone", "https://...", "other-place" ] + + template: # build template reference and arguments. + name: go_1_9_fn # builder name. Functions may have custom builders + namespace: build-templates + arguments: + - name: _IMAGE + value: gcr.io/... # destination for image + - name: _ENTRY_POINT + value: index # if function, language dependent entrypoint + + revisionTemplate: # template for building Revision + metadata: ... + labels: + elafros.dev/type: "function" # One of "function" or "app" + spec: # elafros.RevisionTemplateSpec. Copied to a new revision + + # +optional. if rolling back, the client may set this to the + # previous revision's build to avoid triggering a rebuild + buildName: ... + + # is a core.v1.Container; some fields not allowed, such as resources, ports + container: + # image either provided as pre-built container, or built by Elafros from + # source. When built by elafros, set to the same as build template, e.g. + # build.template.arguments[_IMAGE], as the "promise" of a future build. + # If buildName is provided, it is expected that this image will be + # present when the referenced build is complete. + image: gcr.io/... + command: ['run'] + args: [] + env: + # list of environment vars + - name: FOO + value: bar + - name: HELLO + value: world + - ... + livenessProbe: ... # Optional + readinessProbe: ... # Optional + + # +optional concurrency strategy. SingleThreaded default value for functions + concurrencyModel: SingleThreaded + # +optional. max time the instance is allowed for responding to a request + timeoutSeconds: ... + serviceAccountName: ... # Name of the service account the code should run as. + +status: + # the latest created and ready to serve. Watched by route + latestReadyRevisionName: abc + # latest created revision, may still be in the process of being materialized + latestCreatedRevisionName: def + conditions: # See also the [error conditions documentation](errors.md) + - type: LatestRevisionReady + status: False + reason: ContainerMissing + message: "Unable to start because container is missing and build failed." + observedGeneration: ... # last generation being reconciled +``` + + +## Revision + +For a high-level description of Revisions, +[see the overview](overview.md#revision). + +```yaml +apiVersion: elafros.dev/v1alpha1 +kind: Revision +metadata: + name: myservice-a1e34 # system generated + namespace: default + labels: + elafros.dev/configuration: ... # to list configurations/revisions by service + elafros.dev/configurationGeneration: ... # generation of configuration that created this Revision + elafros.dev/type: "function" # convention, one of "function" or "app" + # system generated meta + uid: ... +  resourceVersion: ... # used for optimistic concurrency control +  creationTimestamp: ... +  generation: ... +  selfLink: ... + ... + +# spec populated by Configuration +spec: + # +optional. name of the build.dev/v1alpha1.Build if built from source + buildName: ... + + container: # core.v1.Container + image: gcr.io/... + command: ['run'] + args: [] + env: # list of environment vars + - name: FOO + value: bar + - name: HELLO + value: world + - ... + livenessProbe: ... # Optional + readinessProbe: ... # Optional + concurrencyModel: ... + timeoutSeconds: ... + serviceAccountName: ... # Name of the service account the code should run as. + ... +status: + # This is a copy of metadata from the container image or grafeas, + # indicating the provenance of the revision. This is based on the + # container image, but may need further clarification. + imageSource: + git|gcs: ... + conditions: # See also the documentation in errors.md + - type: Ready + status: False + message: "Starting Instances" + # if built from source: + - type: BuildComplete + status: True + # other conditions indicating build failure, if applicable + - ... + # URL for accessing the logs generated by this revision. Note that logs + # may still be access controlled separately from access to the API object. + logUrl: "logging.infra.mycompany.com/...?filter=revision=myservice-a1e34&..." +``` + +