diff --git a/blog/articles/distributed-tracing.md b/blog/articles/distributed-tracing.md new file mode 100644 index 000000000..4f4f205c7 --- /dev/null +++ b/blog/articles/distributed-tracing.md @@ -0,0 +1,488 @@ +--- +title: Distributed tracing with Knative, OpenTelemetry and Jaeger +linkTitle: Distributed tracing with Knative, OpenTelemetry and Jaeger +author: "[Ben Moss](https://twitter.com/mossity), Software Engineer @ [VMware](http://vmware.com)." +date: 2021-08-20 +description: Integrating OpenCensus, OpenTelemetry, and Jaeger with Knative. +type: "blog" +--- + +When trying to understand and diagnose our systems, one of the most basic tools +we learn to lean on is the stack trace. Stack traces give us a structured view +of the flow of logic that our program is executing in order to help us wrap our +heads around how we got into a certain state. Distributed tracing is our +industry's attempt to take this idea and apply it at the next higher level of +abstraction and give us a view of the way that messages flow between programs +themselves. + +Knative Eventing is a set of building blocks for wiring up the kind of +distributed architecture that is favored by many these days. It gives us a +language for describing and assembling the connections between programs, through +brokers, triggers, channels and flows, but with this power comes the risk of +creating a pile of spaghetti where determining how events were triggered can +become difficult. In this post we’re going to walk through setting up +distributed tracing with Eventing and see how it can help us better understand +our programs and a bit about how Eventing works under the hood as well. + + +## The lay of the tracing landscape + +One of the first problems that comes with trying to learn about how to do +tracing is just wrapping your head around the ecosystem: Zipkin, Jaeger, +OpenTelemetry, OpenCensus, OpenTracing, and countless more, which one should you +be using? The good news is that these last three “Open” libraries are attempts +to create standards for metrics and tracing so that we don’t need to decide +right away what storage and visualization tools we’ll use, and that switching +between them should be (mostly) painless. OpenCensus and OpenTracing both +started as a way of unifying the fractured landscape around tracing and metrics, +resulting in a tragic/hilarious set of new divergent and competing standards. +OpenTelemetry is the latest effort, itself a unification of OpenCensus and +OpenTracing. + +![xkcd comic "How Standards Proliferate"](https://imgs.xkcd.com/comics/standards.png) + +Knative’s tracing support today [only works with OpenCensus](https://github.com/knative/pkg/blob/bda81c029160eb91786c7e23a35acdd5ee2196b5/tracing/setup.go), but the OpenTelemetry community has given us tools for bridging just this sort of gap in our systems. In this post we’re going to focus on using Jaeger through a mix of OpenCensus and OpenTelemetry, but the broader lessons should apply no matter what tools you’re using. + + +## Getting started + +We’re going to assume that you have a cluster with Knative Serving and Eventing +installed. If you don’t already have a cluster I recommend giving [the Knative +Quickstart](https://knative.dev/docs/getting-started/#install-the-knative-quickstart-environment) +a try, but in theory any setup should work. + +Once we have Knative installed, we’re going to add the [OpenTelemetry +operator](https://github.com/open-telemetry/opentelemetry-operator#getting-started) +to our cluster, which depends on +[cert-manager](https://cert-manager.io/docs/installation/). Something to watch +out for while installing these two is that you’ll need to wait for +cert-manager’s webhook pod to start before you can install the operator, or else +you’ll see a bunch of “connection refused” errors creating certificates. Running +`kubectl -n cert-manager wait --for=condition=Ready pods --all` will block until +cert-manager is ready to roll. `kubectl wait` defaults to a 30 second timeout, +so it may take longer on your cluster depending on image download speeds. + + +``` +kubectl apply -f https://github.com/jetstack/cert-manager/releases/latest/download/cert-manager.yaml && +kubectl -n cert-manager wait --for=condition=Ready pods --all && +kubectl apply -f https://github.com/open-telemetry/opentelemetry-operator/releases/latest/download/opentelemetry-operator.yaml +``` + + +Next we’ll set up the [Jaeger +operator](https://github.com/jaegertracing/jaeger-operator#getting-started) +(yes, another operator, I swear this is the last one). + + +``` +kubectl create namespace observability && +kubectl create -n observability \ + -f https://raw.githubusercontent.com/jaegertracing/jaeger-operator/master/deploy/crds/jaegertracing.io_jaegers_crd.yaml \ + -f https://raw.githubusercontent.com/jaegertracing/jaeger-operator/master/deploy/service_account.yaml \ + -f https://raw.githubusercontent.com/jaegertracing/jaeger-operator/master/deploy/role.yaml \ + -f https://raw.githubusercontent.com/jaegertracing/jaeger-operator/master/deploy/role_binding.yaml \ + -f https://raw.githubusercontent.com/jaegertracing/jaeger-operator/master/deploy/operator.yaml +``` + + +Once it's up we can create a Jaeger instance by running: + + +``` +kubectl apply -n observability -f - <