Commit Graph

161 Commits

Author SHA1 Message Date
Risha Mars 46c99febf2
Don't panic on stats that aren't included in StatAllResourceTypes (#1154)
Problem
`conduit stat` would cause a panic for any resource that wasn't in the list 
of StatAllResourceTypes
This bug was introduced by https://github.com/runconduit/conduit/pull/1088/files

Solution
Fix writeStatsToBuffer to not depend on what resources are in StatAllResourceTypes
Also adds a unit test and integration test for `conduit stat ns`
2018-06-19 17:00:16 -07:00
Andrew Seigner 0b9e7ff7df
Enable get for nodes/proxy for Prometheus RBAC (#1142)
The `kubernetes-nodes-cadvisor` Prometheus queries node-level data via
the Kubernetes API server. In some configurations of Kubernetes, namely
minikube and at least one baremetal kubespray cluster, this API call
requires the `get` verb on the `nodes/proxy` resource.

Enable `get` for `nodes/proxy` for the `conduit-prometheus` service
account.

Fixes #912

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-06-18 17:49:23 +01:00
Risha Mars e2c2f19d2c
Propagate errors in conduit containers to the api (#1117)
- It would be nice to display container errors in the UI. This PR gets the pod's container 
statuses and returns them in the public api

- Also add a terminationMessagePolicy to conduit's inject so that we can capture the 
proxy's error messages if it terminates
2018-06-14 16:22:31 -07:00
Thomas Rampelberg 516807bde6
Add readiness/liveness checks for third party components (#1121)
* Add readiness/liveness checks for third party components

Any possible issues with the third party control plane components can wedge the services.

Take the best practices for prometheus/grafana and add them to our template. See #1116

* Update test fixtures for new output
2018-06-14 13:01:13 -07:00
Kevin Lingerfelt 9f1df963e9
Move controller/util and web/util packages to pkg (#1109)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-06-13 11:25:56 -07:00
Kevin Lingerfelt b6d429e80d
dst svc: use shared informer instead of custom endpoints informer (#1079)
* Update destination service ot use shared informer instead of custom endpoints informer
* Add additional tests for dst svc endpoints watcher
* Remove service ports when all listeners unsubscribed
* Update go deps

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-06-13 11:11:57 -07:00
Kevin Lingerfelt 6e66f6d662
Rename Lister to API and expose informers as well as listers (#1072)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-06-12 10:27:55 -07:00
Risha Mars 7d4c4aa290
CLI: print resources in the same order every time stat all is run (#1088)
Previously, in conduit stat all we would just print the map of stat results, which 
resulted in the order in which stats were displayed varying between prints.

Fix:
Define an array, k8s.StatAllResourceTypes and use the order in this array to print 
the map; ensuring a consistent print order every time the command is run.
2018-06-08 15:02:17 -07:00
Kevin Lingerfelt eebc612d52
Add install flag for sending tls identity info to proxies (#1055)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-06-04 16:55:06 -07:00
Kevin Lingerfelt ec2433e9bd
Update controller to use 'tls' metric label (#1044)
* Update controller to use 'tls' metric label
* Fix meshed column formatter

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-06-01 16:44:33 -07:00
Sacha Froment 84781c9c74 conduit inject: Add flag to set proxy bind timeout (#865)
* conduit inject: Add flag to set proxy bind timeout (#863)
* fix test
* fix flag to get it working with #909
* Add time parsing
* Use the variable to set the default value

Signed-off-by: Sacha Froment <sfroment42@gmail.com>
2018-05-29 11:14:29 -07:00
Risha Mars d333a7d861
Add a secured label to the CLI tap responses (#996)
Adds secured=yes/no to the conduit tap responses. This assumes a `meshed` label is
returned by the proxy.
2018-05-25 11:21:38 -07:00
Risha Mars ffabdefc6c
Add queries to prometheus to determine number of fully meshed requests (#983)
- Update the `response_total` prometheus query of the StatSummary endpoint to also
break queries out by a `meshed` label. 
- Add a 'Secured' column to the web UI/CLI stat displays, which indicate the percentage of traffic
starting and ending in the mesh

This meshed label is used in the CLI/Web UI to display a column of the percentage of traffic that
starts/ends in the mesh. (Which is a proxy indicator for whether that traffic is 'secured' when we
add TLS by default for intra mesh requests).

The `meshed` label is not yet added anywhere, so until it is supplied by the proxy, all traffic will
show up as 0% secured in the web/CLI.
2018-05-24 11:05:09 -07:00
Haiwei Liu 8c98cde82b change init image to root options, for install and inject to use (#1001)
Signed-off-by: Haiwei Liu <carllhw@gmail.com>
2018-05-24 10:21:16 -07:00
Andrew Seigner 8a1a3b31d4
Fix non-default proxy-api port (#979)
Running `conduit install --api-port xxx` where xxx != 8086 would yield a
broken install.

Fix the install command to correctly propagate the `api-port` flag,
setting it as the serve address in the proxy-api container.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-05-22 10:34:25 -07:00
Kevin Lingerfelt 2baeaacbc8
Remove package-scoped vars in cmd package (#975)
* Remove package-scoped vars in cmd package
* Run gofmt on all cmd package files
* Re-add missing Args setting on check command

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-05-21 18:15:39 -07:00
Kevin Lingerfelt 36ec391dbe
Go: update k8s dependencies to 1.10.2 (#962)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-05-17 15:46:58 -07:00
Risha Mars b8dc83f9d2
Modify the Stat API to handle requests for resource type "all" (#928)
Allow the Stat endpoint in the public-api to accept requests for resourceType "all".

Currently, this queries Pods, Deployments, RCs and Services, but can be modified 
to query other resources as well.

Both the CLI and web endpoints now work if you set resourceType to all.

e.g. `conduit stat all`
2018-05-11 14:35:37 -07:00
Kevin Lingerfelt 4e8e1eb84d
CLI: Fix validation for service stats (#935)
* CLI: Fix validation for service stats
* Address review feedback

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-05-11 10:28:49 -07:00
Oliver Gould a786089fd6
docker: Cache versionless builds before building versioned go binaries (#921)
The way that git-related version information is linked into go binaries
busts Docker's cache such that every commit causes all binaries to
rebuilt.

In order to ameliorate this, we can build each binary once without
version information first so that its artifacts are cached. When Go
sources are not changed and only the version information changes, builds
are 4.3x faster than before (from 5+ minutes to <90s).

On `master`

Branch off of master and build (mostly cached):

```
:; time DOCKER_TRACE=1 bin/docker-build
...
DOCKER_TRACE=1 bin/docker-build  9.10s user 6.30s system 5% cpu 4:26.47 total
```

Rebuild without changing anything (highly cached):

```
:; time DOCKER_TRACE=1 bin/docker-build
...
DOCKER_TRACE=1 bin/docker-build  9.23s user 6.04s system 47% cpu 32.017 total
```

Update only the git sha and rebuild:

```
:; git ci -am 'bump it' --allow-empty
[ver/eg 2749eb3] bump it
:; time DOCKER_TRACE=1 bin/docker-build
...
DOCKER_TRACE=1 bin/docker-build  8.55s user 6.08s system 4% cpu 5:22.25 total
```

On this branch:

Rebuild without changing anything (highly cached):

```
:; time DOCKER_TRACE=1 bin/docker-build
...
DOCKER_TRACE=1 bin/docker-build  8.94s user 5.97s system 46% cpu 32.257 total
```

Update only the git sha and rebuild:

```
:; git ci -am 'bump it' --allow-empty
[ver/go-docker-cache-versionless 77a80b5] bump it
:; time DOCKER_TRACE=1 bin/docker-build
...
DOCKER_TRACE=1 bin/docker-build-cli-bin  2.02s user 1.34s system 9% cpu 34.144 total
```
2018-05-10 10:22:09 -07:00
Thomas Rampelberg 25d8b22b5c
Adding statefulsets to inject. Fixes #907 (#910) 2018-05-10 09:00:36 -05:00
Thomas Rampelberg e96c2e1135
Made inject aware of the List type. (#886) 2018-05-10 08:08:55 -05:00
Andrew Seigner 1275b1ae89
Introduce Grafana, K8s, and Prom dashboards (#904)
Grafana provides default dashboards for Prometheus and Grafana health.
The community also provides Kubernetes-specific dashboards. Conduit was
not taking advantage of these.

Introduce new Grafana dashboards focused on Grafana, Kubernetes, and
Prometheus health. Tag all Conduit dashboards for easier UI navigation.
Also fix layout in Conduit Health dashboard.

Part of #420

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-05-08 23:11:43 +02:00
Risha Mars f94856e489
Modify the Stat endpoint to also return the number of failed conduit pods (#895)
* Modify the Stat endpoint to also return the count of failed pods
* Add comments explaining pod count stats
* Rename total pod count to running pod count

This is to support the service mesh overview page, as I'd like to include an indicator of
failed pods there.
2018-05-08 10:35:21 -07:00
Andrew Seigner 97bf4fcdf2
Release Notes for 0.4.1 release. (#839)
Also update Getting Started and Debugging docs to reflect changes in
`Tap` and `Stat`.

Fixes #838

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-26 13:32:41 -07:00
Brian Smith c5d2dab8bd
Remove special support for ExternalName services (#764)
After this was implemented we found that ExternalName services are
represented in DNS as CNAMEs, which means that the proxy's DNS
fallback logic can be used instead of doing DNS in the control
plane. Besides simplifying the controller, this will also increase
fidelity with the proxied pods' DNS configuration (improve
transparency).

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-04-25 11:53:33 -10:00
Andrew Seigner dce31b888f
Deprecate Tap, rename TapByResource to Tap (#844)
The `conduit tap` command is now deprecated.

Replace `conduit tap` with `connduit tapByResource`. Rename tapByResource
to tap. The underlying protobuf for tap remains, the tap gRPC endpoint now
returns Unimplemented.

Fixes #804

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-25 12:24:46 -07:00
Andrew Seigner 640570cd6b
Make TapByResource output destination pod (#837)
The TapByResource command now has access to destination labels from the
proxy, but was not outputting them on the cli.

Modify the TapByResource output to print the destination pod label,
rather than the ip, when available.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-24 18:58:40 -07:00
Andrew Seigner a0a9a42e23
Implement Public API and Tap on top of Lister (#835)
public-api and and tap were both using their own implementations of
the Kubernetes Informer/Lister APIs.

This change factors out all Informer/Lister usage into the Lister
module. This also introduces a new `Lister.GetObjects` method.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-24 18:10:48 -07:00
Andrew Seigner baf4ea1a5a
Implement TapByResource in Tap Service (#827)
The TapByResource endpoint was previously a stub.

Implement end-to-end tapByResource functionality, with support for
specifying any kubernetes resource(s) as target and destination.

Fixes #803, #49

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-23 16:13:26 -07:00
Andrew Seigner 39eccb09e2
cli: standardize kubernetes resource parsing (#830)
The Tap command leveraged new cli parsing code, enabling Kubernetes
resources specified as `(TYPE [NAME] | TYPE/NAME)`. The Stat command
did not use this.

Modify the Stat command to use the same cli flag parsing code as Tap.
Remove the to/from-resource flags from Stat.

Fixes #792

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-23 15:17:42 -07:00
Andrew Seigner 1f32f130de
Upgrade Prometheus from 2.1.0 to 2.2.1 (#816)
There have been a number of performance improvements and bug fixes since
v2.1.0.

Bump our Prometheus container to v2.2.1.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-19 18:00:53 -07:00
Andrew Seigner 79bdc638b3
Service support in stat command (#809)
The `stat` command did not support `service` as a resource type.

This change adds `service` support to the `stat` command. Specifically:
- as a destination resource on `--to` commands
- as a target resource on `--from` commands

Fixes #805

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-19 16:51:20 -07:00
Andrew Seigner 293e00bc3e
Introduce tapByResource cli command (#802)
The existing `tap` command is being deprecated.

Introduce a `tapByResource` cli command. It supports tapping a Kubernetes
resource or collection of resources, optionally filtered by outbound resources.
This command will eventually replace `tap`.

Part of #778

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-19 14:44:23 -07:00
Kevin Lingerfelt 653dc6bfaa
Add replication controller stats in CLI (#794)
* Add replication controller stats in CLI
* Fix pod status in stat summary tests

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-04-18 18:12:14 -07:00
Oliver Gould 06dd8d90ee
Introduce the TapByResource API (#778)
This changes the public api to have a new rpc type, `TapByResource`.
This api supersedes the Tap api. `TapByResource` is richer, more closely 
reflecting the proxy's capabilities.

The proxy's Tap api is extended to select over destination labels,
corresponding with those returned by the Destination api.

Now both `Tap` and `TapByResource`'s responses may include destination
labels.

This change avoids breaking backwards compatibility by:

* introducing the new `TapByResource` rpc type, opting not to change Tap
* extending the proxy's Match type with a new, optional, `destination_label` field.
* `TapEvent` is extended with a new, optional, `destination_meta`.
2018-04-18 15:37:07 -07:00
Andrew Seigner 1e4ac8fda8
Destination service provides pod-template-hash (#784)
The Destination service does not provide ReplicaSet information to the
proxy.

The `pod-template-hash` label approximates selecting over all pods in a
ReplicaSet or ReplicationController. Modify the Destination service to
provide this label to the proxy.

Relates to #508 and #741

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-18 14:41:27 -07:00
Kevin Lingerfelt 71a51afb40
Expose pod stats in CLI, web UI, and Grafana (#788)
* Expose pod stats in CLI, web UI, and Grafana
* Fix js api helpers test
* Add outbound traffic stats to pod dashboard

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-04-18 11:26:47 -07:00
Andrew Seigner 727521f914
Permit arbitrary time windows in public-api (#774)
The public-api previously only permitted 4 hard-coded time windows:
10s, 1m, 10m, 1h. This was primarily a relic of the recently removed
telemetry system.

Modify the public-api to validate the time string, but allow for any
window size, which is then passed through to Prometheus.

Fixes #686

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-16 17:37:17 -07:00
Kevin Lingerfelt 11a4359e9a
Misc cleanup following the telemetry rewrite (#771)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-04-16 15:51:07 -07:00
Oliver Gould 800cefdb77
Skip the proxy on the metrics port (#770)
When prometheus queries the proxy for data, these requests are reported
as inbound traffic to the pod. This leads to misleading stats when a pod
otherwise receives little/no traffic.

In order to prevent these requests being proxied, the metrics port is
now added to the default inbound skip-ports list (as is already case for
the tap server).

Fixes #769
2018-04-16 11:54:58 -07:00
Andrew Seigner 77fb6d3709
Add namespace as a resource type in public-api (#760)
* Add namespace as a resource type in public-api

The cli and public-api only supported deployments as a resource type.

This change adds support for namespace as a resource type in the cli and
public-api. This also change includes:
- cli statsummary now prints `-`'s when objects are not in the mesh
- cli statsummary prints `No resources found.` when applicable
- removed `out-` from cli statsummary flags, and analagous proto changes
- switched public-api to use native prometheus label types
- misc error handling and logging fixes

Part of #627

Signed-off-by: Andrew Seigner <siggy@buoyant.io>

* Refactor filter and groupby label formulation

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>

* Rename stat_summary.go to stat.go in cli

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>

* Update rbac privileges for namespace stats

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-04-13 16:53:01 -07:00
Oliver Gould cc44db054f
Remove NODE_NAME and POD_NAME env usage (#758)
* proxy: Remove pod_name and node_name

* cli: Do not inject POD_NAME and NODE_NAME env vars
2018-04-13 13:09:51 -07:00
Andrew Seigner 21886760c6
Use apps/v1beta2 for Kubernetes 1.8 compatibility (#762)
Conduit was relying on apps/v1 to Deployment and ReplicaSet APIs.
apps/v1 is not available on Kubernetes 1.8. This prevented the
public-api from starting.

Switch Conduit to use apps/v1beta2. Also increase the Kubernetes API
cache sync timeout from 10 to 60 seconds, as it was taking 11 seconds on
a test cluster.

Fixes #761

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-13 12:08:16 -07:00
Kevin Lingerfelt fb15fe7c1a
Remove the telemetry service (#757)
* Remove the telemetry service

The telemetry service is no longer needed, now that prometheus scrapes
metrics directly from proxies, and the public-api talks directly to
prometheus. In this branch I'm removing the service itself as well as
all of the telemetry protobuf, and updating the conduit install command
to no longer install the service. I'm also removing the old version of
the stat command, which required the telemetry service, and renaming the
statsummary command to stat.

* Fix time window tests

* Remove deprecated controller scrape config

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-04-13 11:21:29 -07:00
Kevin Lingerfelt 47caf1ca07
Add --all-namespaces flag to CLI statsummary command (#745)
* Add --all-namespaces flag to CLI statsummary command

* Fix statsummary output formatting

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-04-11 16:40:25 -07:00
Andrew Seigner 259fdcd134
Add latency stats in new stat summary endpoint (#737)
The new StatSummary endpoint was only providing request volume and
successs rate information.

Add support for retrieving latency stats via StatSummary. Also make
all prometheus calls in parallel, and implement kubernetes test
fixtures.

Fixes #681

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-11 11:58:32 -07:00
Kevin Lingerfelt 91c359e612
Switch public API to use cached k8s resources (#724)
* Switch public API to use cached k8s resources
* Move shared informer code to separate goroutine
* Fix spelling issue

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-04-10 11:39:31 -07:00
Andrew Seigner 716b392231
Move StatSummary logic into grpc server (#717)
The StatSummary logic was implemented as a method on http_server.

Move the StatSummary logic into grpc_server, for consistency with the
other endpoints.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-06 16:46:15 -07:00
Kevin Lingerfelt baa4d10c2f
CLI: change conduit namespace shorthand flag to -c (#714)
* CLI: change conduit namespace shorthand flag to -c

All of the conduit CLI subcommands accept a --conduit-namespace flag,
indicating the namespace where conduit is running. Some of the
subcommands also provide a --namespace flag, indicating the kubernetes
namespace where a user's application code is running. To prevent
confusion, I'm changing the shorthand flag for the conduit namespace to
-c, and using the -n shorthand when referring to user namespaces.

As part of this change I've also standardized the capitalization of all
of our command line flags, removed the -r shorthand for the install
--registry flag, and made the global --kubeconfig and --api-addr flags
apply to all subcommands.

* Switch flag descriptions from lowercase to Capital

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-04-06 14:47:31 -07:00