8.0 KiB
Logs and metrics
First, deploy monitoring components. You can use two different setups:
- everything: This configuration collects logs & metrics from user containers, build controller and istio requests.
bazel run config/monitoring:everything.apply
- everything-dev: This configuration collects everything in (1) plus Elafros controller logs.
bazel run config/monitoring:everything-dev.apply
Accessing logs
Run,
kubectl proxy
Then open Kibana UI at this link (it might take a couple of minutes for the proxy to work). When Kibana is opened the first time, it will ask you to create an index. Accept the default options as is. As logs get ingested, new fields will be discovered and to have them indexed, go to Management -> Index Patterns -> Refresh button (on top right) -> Refresh fields.
Accessing metrics
Run:
kubectl port-forward -n monitoring $(kubectl get pods -n monitoring --selector=app=grafana --output=jsonpath="{.items..metadata.name}") 3000
Then open Grafana UI at http://localhost:3000
Accessing per request traces
First open Kibana UI as shown above. Browse to Management -> Index Patterns -> +Create Index Pattern and type "zipkin*" (without the quotes) to the "Index pattern" text field and hit "Create" button. This will create a new index pattern that will store per request traces captured by Zipkin. This is a one time step and is needed only for fresh installations.
Next, start the proxy if it is not already running:
kubectl proxy
Then open Zipkin UI at this link. Click on "Find Traces" to see the latest traces. You can search for a trace ID or look at traces of a specific application within this UI. Click on a trace to see a detailed view of a specific call.
To see a demo of distributed tracing, deploy the Telemetry sample, send some traffic to it and explore the traces it generates from Zipkin UI.
Default metrics
Following metrics are collected by default:
- Elafros controller metrics
- Istio metrics (mixer, envoy and pilot)
- Node and pod metrics
There are several other collectors that are pre-configured but not enabled. To see the full list, browse to config/monitoring/prometheus-exporter and config/monitoring/prometheus-servicemonitor folders and deploy them using kubectl apply -f.
Default logs
Deployment above enables collection of the following logs:
- stdout & stderr from all ela-container
- stdout & stderr from build-controller
To enable log collection from other containers and destinations, edit fluentd-es-configmap.yaml (search for "fluentd-containers.log" for the starting point). Then run the following:
kubectl replace -f config/monitoring/fluentd/fluentd-es-configmap.yaml
kubectl replace -f config/monitoring/fluentd/fluentd-es-ds.yaml
Note: We will enable a plugin mechanism to define other logs to collect and this step is a workaround until then.
Metrics troubleshooting
You can use Prometheus web UI to troubleshoot publishing and service discovery issues for metrics. To access to the web UI, forward the Prometheus server to your machine:
kubectl port-forward -n monitoring $(kubectl get pods -n monitoring --selector=app=prometheus --output=jsonpath="{.items[0].metadata.name}") 9090
Then browse to http://localhost:9090 to access the UI:
- To see the targets that are being scraped, go to Status -> Targets
- To see what Prometheus service discovery is picking up vs. dropping, go to Status -> Service Discovery
Generating metrics
See Telemetry Sample for deploying a dedicated instance of Prometheus and emitting metrics to it.
If you want to generate metrics within Elafros services and send them to shared instance of Prometheus, follow the steps below. We will create a counter metric below:
- Go through Prometheus Documentation and read Data model and Metric types sections.
- Create a top level variable in your go file and initialize it in init() - example:
import "github.com/prometheus/client_golang/prometheus"
var myCounter = prometheus.NewCounterVec(prometheus.CounterOpts{
Namespace: "elafros",
Name: "mycomponent_something_count",
Help: "Counter to keep track of something in my component",
}, []string{"status"})
func init() {
prometheus.MustRegister(myCounter)
}
- In your code where you want to instrument, increment the counter with the appropriate label values - example:
err := doSomething()
if err == nil {
myCounter.With(prometheus.Labels{"status": "success"}).Inc()
} else {
myCounter.With(prometheus.Labels{"status": "failure"}).Inc()
}
- Start an http listener to serve as the metrics endpoint for Prometheus scraping (this step and onwards are needed only once in a service. ela-controller is already setup for metrics scraping and you can skip rest of these steps if you are targeting ela-controller):
import "github.com/prometheus/client_golang/prometheus/promhttp"
// In your main() func
srv := &http.Server{Addr: ":9090"}
http.Handle("/metrics", promhttp.Handler())
go func() {
if err := srv.ListenAndServe(); err != nil {
glog.Infof("Httpserver: ListenAndServe() finished with error: %s", err)
}
}()
// Wait for the service to shutdown
<-stopCh
// Close the http server gracefully
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
srv.Shutdown(ctx)
- Add a Service for the metrics http endpoint:
apiVersion: v1
kind: Service
metadata:
labels:
app: myappname
prometheus: myappname
name: myappname
namespace: mynamespace
spec:
ports:
- name: metrics
port: 9090
protocol: TCP
targetPort: 9090
selector:
app: myappname # put the appropriate value here to select your application
- Add a ServiceMonitor to tell Prometheus to discover pods and scrape the service defined above:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: myappname
namespace: monitoring
labels:
monitor-category: ela-system # Shared Prometheus instance only targets 'k8s', 'istio', 'node',
# 'prometheus' or 'ela-system' - if you pick something else,
# you need to deploy your own Prometheus instance or edit shared
# instance to target the new category
spec:
selector:
matchLabels:
app: myappname
prometheus: myappname
namespaceSelector:
matchNames:
- mynamespace
endpoints:
- port: metrics
interval: 30s
-
Add a dashboard for your metrics - you can see examples of it under config/grafana/dashboard-definition folder. An easy way to generate JSON definitions is to use Grafana UI (make sure to login with as admin user) and export JSON from it.
-
Add the YAML files to BUILD files.
-
Deploy changes with bazel.
-
Validate the metrics flow either by Grafana UI or Prometheus UI (see Troubleshooting section above to enable Prometheus UI)
Generating logs
Use glog to write logs in your code. In your container spec, add the following args to redirect the logs to stderr:
args:
- "-logtostderr=true"
- "-stderrthreshold=INFO"
See helloworld sample's configuration file as an example.
Distributed tracing with Zipkin
Check Telemetry sample as an example usage of OpenZipkin's Go client library.