# Troubleshooting

## Observability

The Collector offers multiple ways to measure the health of the Collector
as well as investigate issues.

### Logs

Logs can be helpful in identifying issues. Always start by checking the log
output and looking for potential issues.

The verbosity level, which defaults to `INFO` can also be adjusted by passing
the `--log-level` flag to the `otelcol` process. See `--help` for more details.

```bash
$ otelcol --log-level DEBUG
```

### Metrics

Prometheus metrics are exposed locally on port `8888` and path `/metrics`.

For containerized environments it may be desirable to expose this port on a
public interface instead of just locally. The metrics address can be configured
by passing the `--metrics-addr` flag to the `otelcol` process. See `--help` for
more details.

```bash
$ otelcol --metrics-addr 0.0.0.0:8888
```

A grafana dashboard for these metrics can be found
[here](https://grafana.com/grafana/dashboards/11575).

Also note that a Collector can be configured to scrape its own metrics and send
it through configured pipelines. For example:

```yaml
receivers:
  prometheus:
    config:
      scrape_configs:
      - job_name: 'otelcol'
        scrape_interval: 10s
        static_configs:
        - targets: ['0.0.0.0:8888']
        metric_relabel_configs:
          - source_labels: [ __name__ ]
            regex: '.*grpc_io.*'
            action: drop
exporters:
  logging:
service:
  pipelines:
    metrics:
      receivers: [prometheus]
      processors: []
      exporters: [logging]
```

### zPages

The
[zpages](https://github.com/open-telemetry/opentelemetry-collector/tree/master/extension/zpagesextension/README.md)
extension, which if enabled is exposed locally on port `55679`, can be used to
check receivers and exporters trace operations via `/debug/tracez`. `zpages`
may contain error logs that the Collector does not emit.

For containerized environments it may be desirable to expose this port on a
public interface instead of just locally. This can be configured via the
extensions configuration section. For example:

```yaml
extensions:
  zpages:
    endpoint: 0.0.0.0:55679
```

### Local exporters

[Local
exporters](https://github.com/open-telemetry/opentelemetry-collector/tree/master/exporter#general-information)
can be configured to inspect the data being processed by the Collector.

For live troubleshooting purposes consider leveraging the `logging` exporter,
which can be used to confirm that data is being received, processed and
exported by the Collector.

```yaml
receivers:
  zipkin:
exporters:
  logging:
service:
  pipelines:
    traces:
      receivers: [zipkin]
      processors: []
      exporters: [logging]
```

Get a Zipkin payload to test. For example create a file called `trace.json`
that contains:

```json
[
  {
    "traceId": "5982fe77008310cc80f1da5e10147519",
    "parentId": "90394f6bcffb5d13",
    "id": "67fae42571535f60",
    "kind": "SERVER",
    "name": "/m/n/2.6.1",
    "timestamp": 1516781775726000,
    "duration": 26000,
    "localEndpoint": {
      "serviceName": "api"
    },
    "remoteEndpoint": {
      "serviceName": "apip"
    },
    "tags": {
      "data.http_response_code": "201",
    }
  }
]
```

With the Collector running, send this payload to the Collector. For example:

```bash
$ curl -X POST localhost:9411/api/v2/spans -H'Content-Type: application/json' -d @trace.json
```

You should see a log entry like the following from the Collector:

```json
2020-11-11T04:12:33.089Z	INFO	loggingexporter/logging_exporter.go:296	TraceExporter	{"#spans": 1}
```

You can also configure the `logging` exporter so the entire payload is printed:

```yaml
exporters:
  logging:
    loglevel: debug
```

With the modified configuration if you re-run the test above the log output should look like:

```json
2020-11-11T04:08:17.344Z	DEBUG	loggingexporter/logging_exporter.go:353	ResourceSpans #0
Resource labels:
     -> service.name: STRING(api)
InstrumentationLibrarySpans #0
Span #0
    Trace ID       : 5982fe77008310cc80f1da5e10147519
    Parent ID      : 90394f6bcffb5d13
    ID             : 67fae42571535f60
    Name           : /m/n/2.6.1
    Kind           : SPAN_KIND_SERVER
    Start time     : 2018-01-24 08:16:15.726 +0000 UTC
    End time       : 2018-01-24 08:16:15.752 +0000 UTC
Attributes:
     -> data.http_response_code: STRING(201)
```

### Health Check

The
[health_check](https://github.com/open-telemetry/opentelemetry-collector/tree/master/extension/healthcheckextension/README.md)
extension, which by default is available on all interfaces on port `13133`, can
be used to ensure the Collector is functioning properly.

```yaml
extensions:
  health_check:
service:
  extensions: [health_check]
```

It returns a response like the following:

```json
{"status":"Server available","upSince":"2020-11-11T04:12:31.6847174Z","uptime":"49.0132518s"}
```

### pprof

The
[pprof](https://github.com/open-telemetry/opentelemetry-collector/tree/master/extension/pprofextension/README.md)
extension, which by default is available locally on port `1777`, allows you to profile the
Collector as it runs. This is an advanced use-case that should not be needed in most circumstances.

## Common Issues

### Collector exit/restart

The Collector may exit/restart because:

- Memory pressure due to missing or misconfigured
  [memory_limiter](https://github.com/open-telemetry/opentelemetry-collector/blob/master/processor/memorylimiter/README.md)
  processor.
- [Improperly sized](https://github.com/open-telemetry/opentelemetry-collector/blob/master/docs/performance.md)
  for load.
- Improperly configured (for example, a queue size configured higher
  than available memory).
- Infrastructure resource limits (for example Kubernetes).

### Data being dropped

Data may be dropped for a variety of reasons, but most commonly because of an:

- [Improperly sized Collector](https://github.com/open-telemetry/opentelemetry-collector/blob/master/docs/performance.md) resulting in Collector being unable to process and export the data as fast as it is received.
- Exporter destination unavailable or accepting the data too slowly.

To mitigate drops, it is highly recommended to configure the
[batch](https://github.com/open-telemetry/opentelemetry-collector/blob/master/processor/batchprocessor/README.md)
processor. In addition, it may be necessary to configure the [queued retry
options](https://github.com/open-telemetry/opentelemetry-collector/tree/master/exporter/exporterhelper#configuration)
on enabled exporters.

### Receiving data not working

If you are unable to receive data then this is likely because
either:

- There is a network configuration issue
- The receiver configuration is incorrect
- The receiver is defined in the `receivers` section, but not enabled in any `pipelines`
- The client configuration is incorrect

Check the Collector logs as well as `zpages` for potential issues.

### Processing data not working

Most processing issues are a result of either a misunderstanding of how the
processor works or a misconfiguration of the processor.

Examples of misunderstanding include:

- The attributes processors only work for "tags" on spans. Span name is
  handled by the span processor.
- Processors for trace data (except tail sampling) work on individual spans.

### Exporting data not working

If you are unable to export to a destination then this is likely because
either:

- There is a network configuration issue
- The exporter configuration is incorrect
- The destination is unavailable

Check the collector logs as well as `zpages` for potential issues.

More often than not, exporting data does not work because of a network
configuration issue. This could be due to a firewall, DNS, or proxy
issue. Note that the Collector does have
[proxy support](https://github.com/open-telemetry/opentelemetry-collector/tree/master/exporter#proxy-support).