diff --git a/serving/setting-up-a-docker-registry.md b/serving/setting-up-a-docker-registry.md new file mode 100644 index 000000000..ad4e82973 --- /dev/null +++ b/serving/setting-up-a-docker-registry.md @@ -0,0 +1,83 @@ +# Setting Up A Docker Registry + +This document explains how to use different Docker registries with Knative Serving. It +assumes you have gone through the steps listed in +[DEVELOPMENT.md](/DEVELOPMENT.md) to set up your development environment (or +that you at least have installed `go`, set `GOPATH`, and put `$GOPATH/bin` on +your `PATH`). + +It currently only contains instructions for [Google Container Registry +(GCR)](https://cloud.google.com/container-registry/), but you should be able to +use any Docker registry. + +## Google Container Registry (GCR) + +### Required Tools + +Install the following tools: + +1. [`gcloud`](https://cloud.google.com/sdk/downloads) +1. [`docker-credential-gcr`](https://github.com/GoogleCloudPlatform/docker-credential-gcr) + + If you installed `gcloud` using the archive or installer, you can install + `docker-credential-gcr` like this: + + ```shell + gcloud components install docker-credential-gcr + ``` + + If you installed `gcloud` using a package manager, you may need to install + it with `go get`: + + ```shell + go get github.com/GoogleCloudPlatform/docker-credential-gcr + ``` + + If you used `go get` to install and `$GOPATH/bin` isn't already in `PATH`, + add it: + + ```shell + export PATH=$PATH:$GOPATH/bin + ``` + +### Setup + +1. If you haven't already set up a GCP project, create one and export its name + for use in later commands. + + ```shell + export PROJECT_ID=my-project-name + gcloud projects create "${PROJECT_ID}" + ``` + +1. Enable the GCR API. + + ```shell + gcloud --project="${PROJECT_ID}" services enable \ + containerregistry.googleapis.com + ``` + +1. Hook up your GCR credentials. Note that this may complain if you don't have + the docker CLI installed, but it is not necessary and should still work. + + ```shell + docker-credential-gcr configure-docker + ``` + +1. If you need to, update your `KO_DOCKER_REPO` and/or `DOCKER_REPO_OVERRIDE` + in your `.bashrc`. It should now be + + ```shell + export KO_DOCKER_REPO='us.gcr.io/' + export DOCKER_REPO_OVERRIDE="${KO_DOCKER_REPO}" + ``` + + (You may need to use a different region than `us` if you didn't pick a`us` + Google Cloud region.) + +That's it, you're done! + +## Local registry + +This section has yet to be written. If you'd like to write it, see issue +[#23](https://github.com/knative/serving/issues/23). diff --git a/serving/setting-up-a-logging-plugin.md b/serving/setting-up-a-logging-plugin.md new file mode 100644 index 000000000..81cda8039 --- /dev/null +++ b/serving/setting-up-a-logging-plugin.md @@ -0,0 +1,82 @@ +# Setting Up A Logging Plugin + +Knative allows cluster operators to use different backends for their logging +needs. This document describes how to change these settings. Knative currently +requires changes in Fluentd configuration files, however we plan on abstracting +logging configuration in the future +([#906](https://github.com/knative/serving/issues/906)). Once +[#906](https://github.com/knative/serving/issues/906) is complete, the +methodology described in this document will no longer be valid and migration to +a new process will be required. In order to minimize the effort for a future +migration, we recommend only changing the output configuration of Fluentd and +leaving the rest intact. + +## Configuring + +### Configure the DaemonSet for stdout/stderr logs + +Operators can do the following steps to configure the Fluentd DaemonSet for +collecting `stdout/stderr` logs from the containers: + +1. Replace `900.output.conf` part in + [fluentd-configmap.yaml](/config/monitoring/fluentd-configmap.yaml) with the + desired output configuration. Knative provides samples for sending logs to + Elasticsearch or Stackdriver. Developers can simply choose one of `150-*` + from [/config/monitoring](/config/monitoring) or override any with other + configuration. +1. Replace the `image` field of `fluentd-ds` container + in [fluentd-ds.yaml](/third_party/config/monitoring/common/fluentd/fluentd-ds.yaml) + with the Fluentd image including the desired Fluentd output plugin. + See [here](/image/fluentd/README.md) for the requirements of Flunetd image + on Knative. + +### Configure the Sidecar for log files under /var/log + +Currently operators have to configure the Fluentd Sidecar separately for +collecting log files under `/var/log`. An +[effort](https://github.com/knative/serving/issues/818) +is in process to get rid of the sidecar. The steps to configure are: + +1. Replace `logging.fluentd-sidecar-output-config` flag in + [config-observability](/config/config-observability.yaml) with the + desired output configuration. **NOTE**: The Fluentd DaemonSet is in + `monitoring` namespace while the Fluentd sidecar is in the namespace same with + the app. There may be small differences between the configuration for DaemonSet + and sidecar even though the desired backends are the same. +1. Replace `logging.fluentd-sidecar-image` flag in + [config-observability](/config/config-observability.yaml) with the Fluentd image including the + desired Fluentd output plugin. In theory, this is the same + with the one for Fluentd DaemonSet. + +## Deploying + +Operators need to deploy Knative components after the configuring: + +```shell +# In case there is no change with the controller code +bazel run config:controller.delete +# Deploy the configuration for sidecar +kubectl apply -f config/config-observability.yaml +# Deploy the controller to make configuration for sidecar take effect +bazel run config:controller.apply + +# Deploy the DaemonSet to make configuration for DaemonSet take effect +kubectl apply -f \ + -f third_party/config/monitoring/common/kubernetes/fluentd/fluentd-ds.yaml \ + -f config/monitoring/200-common/100-fluentd.yaml + -f config/monitoring/200-common/100-istio.yaml +``` + +In the commands above, replace `` with the +Fluentd DaemonSet configuration file, e.g. `config/monitoring/150-stackdriver-prod`. + +**NOTE**: Operators sometimes need to deploy extra services as the logging +backends. For example, if they desire Elasticsearch&Kibana, they have to deploy +the Elasticsearch and Kibana services. Knative provides this sample: + +```shell +kubectl apply -R -f third_party/config/monitoring/elasticsearch +``` + +See [here](/config/monitoring/README.md) for deploying the whole Knative +monitoring components. diff --git a/serving/setting-up-ingress-static-ip.md b/serving/setting-up-ingress-static-ip.md new file mode 100644 index 000000000..c491e47d1 --- /dev/null +++ b/serving/setting-up-ingress-static-ip.md @@ -0,0 +1,53 @@ +# Setting Up Static IP for Knative Gateway + +Knative uses a shared Gateway to serve all incoming traffic within Knative +service mesh, which is the "knative-shared-gateway" Gateway under +"knative-serving" namespace. The IP address to access the gateway is the +external IP address of the "knative-ingressgateway" service under the +"istio-system" namespace. So in order to set static IP for the Knative shared +gateway, you just need to set the external IP address of the +"knative-ingressgateway" service to the static IP you need. + +## Prerequisites + +### Prerequisite 1: Reserve a static IP + +#### Knative on GKE + +If you are running Knative cluster on GKE, you can follow the [instructions](https://cloud.google.com/compute/docs/ip-addresses/reserve-static-external-ip-address#reserve_new_static) to reserve a REGIONAL +IP address. The region of the IP address should be the region your Knative + cluster is running in (e.g. us-east1, us-central1, etc.). + +TODO: add documentation on reserving static IP in other cloud platforms. + +### Prerequisite 2: Deploy Istio And Knative Serving + +Follow the [instructions](https://github.com/knative/serving/blob/master/DEVELOPMENT.md) +to deploy Istio and Knative Serving into your cluster. + +Once you reach this point, you can start to set up static IP for Knative +gateway. + +## Set Up Static IP for Knative Gateway + +### Step 1: Update external IP of "knative-ingressgateway" service + +Run following command to reset the external IP for the +"knative-ingressgateway" service to the static IP you reserved. +```shell +kubectl patch svc knative-ingressgateway -n istio-system --patch '{"spec": { "loadBalancerIP": "" }}' +``` + +### Step 2: Verify static IP address of knative-ingressgateway service + +You can check the external IP of the "knative-ingressgateway" service with: +```shell +kubectl get svc knative-ingressgateway -n istio-system +``` +The result should be something like +``` +NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE +knative-ingressgateway LoadBalancer 10.50.250.120 35.210.48.100 80:32380/TCP,443:32390/TCP,32400:32400/TCP 5h +``` +The external IP will be eventually set to the static IP. This process could +take several minutes. diff --git a/serving/telemetry.md b/serving/telemetry.md new file mode 100644 index 000000000..f17c86833 --- /dev/null +++ b/serving/telemetry.md @@ -0,0 +1,377 @@ +# Logs and metrics + +## Monitoring components setup + +First, deploy monitoring components. + +### Elasticsearch, Kibana, Prometheus, and Grafana Setup + +You can use two different setups: + +1. **150-elasticsearch-prod**: This configuration collects logs & metrics from +user containers, build controller and Istio requests. + + ```shell + kubectl apply -R -f config/monitoring/100-common \ + -f config/monitoring/150-elasticsearch-prod \ + -f third_party/config/monitoring/common \ + -f third_party/config/monitoring/elasticsearch \ + -f config/monitoring/200-common \ + -f config/monitoring/200-common/100-istio.yaml + ``` + +1. **150-elasticsearch-dev**: This configuration collects everything **150 +-elasticsearch-prod** does, plus Knative Serving controller logs. + + ```shell + kubectl apply -R -f config/monitoring/100-common \ + -f config/monitoring/150-elasticsearch-dev \ + -f third_party/config/monitoring/common \ + -f third_party/config/monitoring/elasticsearch \ + -f config/monitoring/200-common \ + -f config/monitoring/200-common/100-istio.yaml + ``` + +### Stackdriver, Prometheus, and Grafana Setup + +If your Knative Serving is not built on a Google Cloud Platform based cluster, +or you want to send logs to another GCP project, you need to build your own +Fluentd image and modify the configuration first. See + +1. [Fluentd image on Knative Serving](/image/fluentd/README.md) +2. [Setting up a logging plugin](setting-up-a-logging-plugin.md) + +Then you can use two different setups: + +1. **150-stackdriver-prod**: This configuration collects logs and metrics from +user containers, build controller, and Istio requests. + +```shell +kubectl apply -R -f config/monitoring/100-common \ + -f config/monitoring/150-stackdriver-prod \ + -f third_party/config/monitoring/common \ + -f config/monitoring/200-common \ + -f config/monitoring/200-common/100-istio.yaml +``` + +2. **150-stackdriver-dev**: This configuration collects everything **150 +-stackdriver-prod** does, plus Knative Serving controller logs. + +```shell +kubectl apply -R -f config/monitoring/100-common \ + -f config/monitoring/150-stackdriver-dev \ + -f third_party/config/monitoring/common \ + -f config/monitoring/200-common \ + -f config/monitoring/200-common/100-istio.yaml +``` + +## Accessing logs + +### Kibana and Elasticsearch + +To open the Kibana UI (the visualization tool for [Elasticsearch](https://info.elastic.co), +enter the following command: + +```shell +kubectl proxy +``` + +This starts a local proxy of Kibana on port 8001. The Kibana UI is only exposed within +the cluster for security reasons. + +Navigate to the [Kibana UI](http://localhost:8001/api/v1/namespaces/monitoring/services/kibana-logging/proxy/app/kibana) +(*It might take a couple of minutes for the proxy to work*). + +When Kibana is opened the first time, it will ask you to create an index. +Accept the default options: + +![Kibana UI Configuring an Index Pattern](images/kibana-landing-page-configure-index.png) + +The Discover tab of the Kibana UI looks like this: + +![Kibana UI Discover tab](images/kibana-discover-tab-annotated.png) + +You can change the time frame of logs Kibana displays in the upper right corner +of the screen. The main search bar is across the top of the Dicover page. + +As more logs are ingested, new fields will be discovered. To have them indexed, +go to Management > Index Patterns > Refresh button (on top right) > Refresh +fields. + + + +#### Accessing configuration and revision logs + +To access the logs for a configuration, enter the following search query in Kibana: + +``` +kubernetes.labels.knative_dev\/configuration: "configuration-example" +``` + +Replace `configuration-example` with your configuration's name. Enter the following +command to get your configuration's name: + +```shell +kubectl get configurations +``` + +To access logs for a revision, enter the following search query in Kibana: + +``` +kubernetes.labels.knative_dev\/revision: "configuration-example-00001" +``` + +Replace `configuration-example-00001` with your revision's name. + +#### Accessing build logs + +To access the logs for a build, enter the following search query in Kibana: + +``` +kubernetes.labels.build\-name: "test-build" +``` + +Replace `test-build` with your build's name. The build name is specified in the `.yaml` file as follows: + +```yaml +apiVersion: build.knative.dev/v1alpha1 +kind: Build +metadata: + name: test-build +``` + +### Stackdriver + +Go to the [Google Cloud Console logging page](https://console.cloud.google.com/logs/viewer) for +your GCP project which stores your logs via Stackdriver. + +## Accessing metrics + +Enter: + +```shell +kubectl port-forward -n monitoring $(kubectl get pods -n monitoring --selector=app=grafana --output=jsonpath="{.items..metadata.name}") 3000 +``` + +Then open the Grafana UI at [http://localhost:3000](http://localhost:3000). The following dashboards are +pre-installed with Knative Serving: + +* **Revision HTTP Requests:** HTTP request count, latency and size metrics per revision and per configuration +* **Nodes:** CPU, memory, network and disk metrics at node level +* **Pods:** CPU, memory and network metrics at pod level +* **Deployment:** CPU, memory and network metrics aggregated at deployment level +* **Istio, Mixer and Pilot:** Detailed Istio mesh, Mixer and Pilot metrics +* **Kubernetes:** Dashboards giving insights into cluster health, deployments and capacity usage + +### Accessing per request traces + +Before you can view per request metrics, you'll need to create a new index pattern that will store +per request traces captured by Zipkin: + +1. Start the Kibana UI serving on local port 8001 by entering the following command: + + ```shell + kubectl proxy + ``` + +1. Open the [Kibana UI](http://localhost:8001/api/v1/namespaces/monitoring/services/kibana-logging/proxy/app/kibana). + +1. Navigate to Management -> Index Patterns -> Create Index Pattern. + +1. Enter `zipkin*` in the "Index pattern" text field. + +1. Click **Create**. + +After you've created the Zipkin index pattern, open the +[Zipkin UI](http://localhost:8001/api/v1/namespaces/istio-system/services/zipkin:9411/proxy/zipkin/). +Click on "Find Traces" to see the latest traces. You can search for a trace ID +or look at traces of a specific application. Click on a trace to see a detailed +view of a specific call. + +To see a demo of distributed tracing, deploy the +[Telemetry sample](../sample/telemetrysample/README.md), send some traffic to it, +then explore the traces it generates from Zipkin UI. + + + +## Default metrics + +The following metrics are collected by default: +* Knative Serving controller metrics +* Istio metrics (mixer, envoy and pilot) +* Node and pod metrics + +There are several other collectors that are pre-configured but not enabled. +To see the full list, browse to config/monitoring/prometheus-exporter +and config/monitoring/prometheus-servicemonitor folders and deploy them +using `kubectl apply -f`. + +## Default logs + +Deployment above enables collection of the following logs: + +* stdout & stderr from all user-container +* stdout & stderr from build-controller + +To enable log collection from other containers and destinations, see +[setting up a logging plugin](setting-up-a-logging-plugin.md). + +## Metrics troubleshooting + +You can use the Prometheus web UI to troubleshoot publishing and service +discovery issues for metrics. To access to the web UI, forward the Prometheus +server to your machine: + +```shell +kubectl port-forward -n monitoring $(kubectl get pods -n monitoring --selector=app=prometheus --output=jsonpath="{.items[0].metadata.name}") 9090 +``` + +Then browse to http://localhost:9090 to access the UI. + +* To see the targets that are being scraped, go to Status -> Targets +* To see what Prometheus service discovery is picking up vs. dropping, go to Status -> Service Discovery + +## Generating metrics + +If you want to send metrics from your controller, follow the steps below. These +steps are already applied to autoscaler and controller. For those controllers, +simply add your new metric definitions to the `view`, create new `tag.Key`s if +necessary and instrument your code as described in step 3. + +In the example below, we will setup the service to host the metrics and +instrument a sample 'Gauge' type metric using the setup. + +1. First, go through [OpenCensus Go Documentation](https://godoc.org/go.opencensus.io). +2. Add the following to your application startup: + +```go +import ( + "net/http" + "time" + + "go.opencensus.io/stats" + "go.opencensus.io/stats/view" + "go.opencensus.io/tag" +) + +var ( + desiredPodCountM *stats.Int64Measure + namespaceTagKey tag.Key + revisionTagKey tag.Key +) + +func main() { + exporter, err := prometheus.NewExporter(prometheus.Options{Namespace: "{your metrics namespace (eg: autoscaler)}"}) + if err != nil { + glog.Fatal(err) + } + view.RegisterExporter(exporter) + view.SetReportingPeriod(10 * time.Second) + + // Create a sample gauge + var r = &Reporter{} + desiredPodCountM = stats.Int64( + "desired_pod_count", + "Number of pods autoscaler wants to allocate", + stats.UnitNone) + + // Tag the statistics with namespace and revision labels + var err error + namespaceTagKey, err = tag.NewKey("namespace") + if err != nil { + // Error handling + } + revisionTagKey, err = tag.NewKey("revision") + if err != nil { + // Error handling + } + + // Create view to see our measurement. + err = view.Register( + &view.View{ + Description: "Number of pods autoscaler wants to allocate", + Measure: r.measurements[DesiredPodCountM], + Aggregation: view.LastValue(), + TagKeys: []tag.Key{namespaceTagKey, configTagKey, revisionTagKey}, + }, + ) + if err != nil { + // Error handling + } + + // Start the endpoint for Prometheus scraping + mux := http.NewServeMux() + mux.Handle("/metrics", exporter) + http.ListenAndServe(":8080", mux) +} +``` + +3. In your code where you want to instrument, set the counter with the +appropriate label values - example: + +```go +ctx := context.TODO() +tag.New( + ctx, + tag.Insert(namespaceTagKey, namespace), + tag.Insert(revisionTagKey, revision)) +stats.Record(ctx, desiredPodCountM.M({Measurement Value})) +``` + +4. Add the following to scape config file located at +config/monitoring/200-common/300-prometheus/100-scrape-config.yaml: + +```yaml +- job_name: + kubernetes_sd_configs: + - role: endpoints + relabel_configs: + # Scrape only the the targets matching the following metadata + - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_label_app, __meta_kubernetes_endpoint_port_name] + action: keep + regex: {SERVICE NAMESPACE};{APP LABEL};{PORT NAME} + # Rename metadata labels to be reader friendly + - source_labels: [__meta_kubernetes_namespace] + action: replace + regex: (.*) + target_label: namespace + replacement: $1 + - source_labels: [__meta_kubernetes_pod_name] + action: replace + regex: (.*) + target_label: pod + replacement: $1 + - source_labels: [__meta_kubernetes_service_name] + action: replace + regex: (.*) + target_label: service + replacement: $1 +``` + +5. Redeploy prometheus and its configuration: +```sh +kubectl delete -f config/monitoring/200-common/300-prometheus +kubectl apply -f config/monitoring/200-common/300-prometheus +``` + +6. Add a dashboard for your metrics - you can see examples of it under +config/grafana/dashboard-definition folder. An easy way to generate JSON +definitions is to use Grafana UI (make sure to login with as admin user) and +[export JSON](http://docs.grafana.org/reference/export_import) from it. + +7. Validate the metrics flow either by Grafana UI or Prometheus UI (see +Troubleshooting section above to enable Prometheus UI) + +## Distributed tracing with Zipkin +Check [Telemetry sample](../sample/telemetrysample/README.md) as an example usage of +[OpenZipkin](https://zipkin.io/pages/existing_instrumentations)'s Go client library. + +## Delete monitoring components +Enter: + +```shell +ko delete --ignore-not-found=true \ + -f config/monitoring/200-common/100-istio.yaml \ + -f config/monitoring/200-common/100-zipkin.yaml \ + -f config/monitoring/100-common +```