Commit Graph

418 Commits

Author SHA1 Message Date
Kevin Lingerfelt 1f1968ad4d
Add --registry flag support for inject command (#1188)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-06-22 12:52:42 -07:00
Kevin Lingerfelt 5cf8ab00df
Switch to multi-value --tls flag, add to inject (#1182)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-06-21 15:52:14 -07:00
Kevin Lingerfelt af85d1714f
Add probes and log termination policy for distributor (#1178)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-06-21 14:02:41 -07:00
Kevin Lingerfelt 12f869e7fc
Add CA certificate bundle distributor to conduit install (#675)
* Add CA certificate bundle distributor to conduit install
* Update ca-distributor to use shared informers
* Only install CA distributor when --enable-tls flag is set
* Only copy CA bundle into namespaces where inject pods have the same controller
* Update API config to only watch pods and configmaps
* Address review feedback

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-06-21 13:12:21 -07:00
Kevin Lingerfelt e80356de34
Upgrade prometheus to v2.3.1 (#1174)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-06-21 11:02:21 -07:00
Kevin Lingerfelt 682b0274b5
Add controller admin servers and readiness probes (#1168)
* Add controller admin servers and readiness probes
* Tweak readiness probes to be more sane
* Refactor based on review feedback

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-06-20 17:32:44 -07:00
Risha Mars 0ff1bb4ad8
Don't allow stat requests for named resources in --all-namespaces (#1163)
Don't allow the CLI or Web UI to request named resources if --all-namespaces is used.

This follows kubectl, which also does not allow requesting named resources
over all namespaces.

This PR also updates the Web API's behaviour to be in line with the CLI's. 
Both will now default to the default namespace if no namespace is specified.
2018-06-20 12:59:31 -07:00
Risha Mars 46c99febf2
Don't panic on stats that aren't included in StatAllResourceTypes (#1154)
Problem
`conduit stat` would cause a panic for any resource that wasn't in the list 
of StatAllResourceTypes
This bug was introduced by https://github.com/runconduit/conduit/pull/1088/files

Solution
Fix writeStatsToBuffer to not depend on what resources are in StatAllResourceTypes
Also adds a unit test and integration test for `conduit stat ns`
2018-06-19 17:00:16 -07:00
Andrew Seigner 0b9e7ff7df
Enable get for nodes/proxy for Prometheus RBAC (#1142)
The `kubernetes-nodes-cadvisor` Prometheus queries node-level data via
the Kubernetes API server. In some configurations of Kubernetes, namely
minikube and at least one baremetal kubespray cluster, this API call
requires the `get` verb on the `nodes/proxy` resource.

Enable `get` for `nodes/proxy` for the `conduit-prometheus` service
account.

Fixes #912

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-06-18 17:49:23 +01:00
Risha Mars e2c2f19d2c
Propagate errors in conduit containers to the api (#1117)
- It would be nice to display container errors in the UI. This PR gets the pod's container 
statuses and returns them in the public api

- Also add a terminationMessagePolicy to conduit's inject so that we can capture the 
proxy's error messages if it terminates
2018-06-14 16:22:31 -07:00
Thomas Rampelberg 516807bde6
Add readiness/liveness checks for third party components (#1121)
* Add readiness/liveness checks for third party components

Any possible issues with the third party control plane components can wedge the services.

Take the best practices for prometheus/grafana and add them to our template. See #1116

* Update test fixtures for new output
2018-06-14 13:01:13 -07:00
Kevin Lingerfelt 9f1df963e9
Move controller/util and web/util packages to pkg (#1109)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-06-13 11:25:56 -07:00
Kevin Lingerfelt b6d429e80d
dst svc: use shared informer instead of custom endpoints informer (#1079)
* Update destination service ot use shared informer instead of custom endpoints informer
* Add additional tests for dst svc endpoints watcher
* Remove service ports when all listeners unsubscribed
* Update go deps

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-06-13 11:11:57 -07:00
Kevin Lingerfelt 6e66f6d662
Rename Lister to API and expose informers as well as listers (#1072)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-06-12 10:27:55 -07:00
Risha Mars 7d4c4aa290
CLI: print resources in the same order every time stat all is run (#1088)
Previously, in conduit stat all we would just print the map of stat results, which 
resulted in the order in which stats were displayed varying between prints.

Fix:
Define an array, k8s.StatAllResourceTypes and use the order in this array to print 
the map; ensuring a consistent print order every time the command is run.
2018-06-08 15:02:17 -07:00
Kevin Lingerfelt eebc612d52
Add install flag for sending tls identity info to proxies (#1055)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-06-04 16:55:06 -07:00
Kevin Lingerfelt ec2433e9bd
Update controller to use 'tls' metric label (#1044)
* Update controller to use 'tls' metric label
* Fix meshed column formatter

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-06-01 16:44:33 -07:00
Sacha Froment 84781c9c74 conduit inject: Add flag to set proxy bind timeout (#865)
* conduit inject: Add flag to set proxy bind timeout (#863)
* fix test
* fix flag to get it working with #909
* Add time parsing
* Use the variable to set the default value

Signed-off-by: Sacha Froment <sfroment42@gmail.com>
2018-05-29 11:14:29 -07:00
Risha Mars d333a7d861
Add a secured label to the CLI tap responses (#996)
Adds secured=yes/no to the conduit tap responses. This assumes a `meshed` label is
returned by the proxy.
2018-05-25 11:21:38 -07:00
Risha Mars ffabdefc6c
Add queries to prometheus to determine number of fully meshed requests (#983)
- Update the `response_total` prometheus query of the StatSummary endpoint to also
break queries out by a `meshed` label. 
- Add a 'Secured' column to the web UI/CLI stat displays, which indicate the percentage of traffic
starting and ending in the mesh

This meshed label is used in the CLI/Web UI to display a column of the percentage of traffic that
starts/ends in the mesh. (Which is a proxy indicator for whether that traffic is 'secured' when we
add TLS by default for intra mesh requests).

The `meshed` label is not yet added anywhere, so until it is supplied by the proxy, all traffic will
show up as 0% secured in the web/CLI.
2018-05-24 11:05:09 -07:00
Haiwei Liu 8c98cde82b change init image to root options, for install and inject to use (#1001)
Signed-off-by: Haiwei Liu <carllhw@gmail.com>
2018-05-24 10:21:16 -07:00
Andrew Seigner 8a1a3b31d4
Fix non-default proxy-api port (#979)
Running `conduit install --api-port xxx` where xxx != 8086 would yield a
broken install.

Fix the install command to correctly propagate the `api-port` flag,
setting it as the serve address in the proxy-api container.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-05-22 10:34:25 -07:00
Kevin Lingerfelt 2baeaacbc8
Remove package-scoped vars in cmd package (#975)
* Remove package-scoped vars in cmd package
* Run gofmt on all cmd package files
* Re-add missing Args setting on check command

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-05-21 18:15:39 -07:00
Kevin Lingerfelt 36ec391dbe
Go: update k8s dependencies to 1.10.2 (#962)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-05-17 15:46:58 -07:00
Risha Mars b8dc83f9d2
Modify the Stat API to handle requests for resource type "all" (#928)
Allow the Stat endpoint in the public-api to accept requests for resourceType "all".

Currently, this queries Pods, Deployments, RCs and Services, but can be modified 
to query other resources as well.

Both the CLI and web endpoints now work if you set resourceType to all.

e.g. `conduit stat all`
2018-05-11 14:35:37 -07:00
Kevin Lingerfelt 4e8e1eb84d
CLI: Fix validation for service stats (#935)
* CLI: Fix validation for service stats
* Address review feedback

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-05-11 10:28:49 -07:00
Oliver Gould a786089fd6
docker: Cache versionless builds before building versioned go binaries (#921)
The way that git-related version information is linked into go binaries
busts Docker's cache such that every commit causes all binaries to
rebuilt.

In order to ameliorate this, we can build each binary once without
version information first so that its artifacts are cached. When Go
sources are not changed and only the version information changes, builds
are 4.3x faster than before (from 5+ minutes to <90s).

On `master`

Branch off of master and build (mostly cached):

```
:; time DOCKER_TRACE=1 bin/docker-build
...
DOCKER_TRACE=1 bin/docker-build  9.10s user 6.30s system 5% cpu 4:26.47 total
```

Rebuild without changing anything (highly cached):

```
:; time DOCKER_TRACE=1 bin/docker-build
...
DOCKER_TRACE=1 bin/docker-build  9.23s user 6.04s system 47% cpu 32.017 total
```

Update only the git sha and rebuild:

```
:; git ci -am 'bump it' --allow-empty
[ver/eg 2749eb3] bump it
:; time DOCKER_TRACE=1 bin/docker-build
...
DOCKER_TRACE=1 bin/docker-build  8.55s user 6.08s system 4% cpu 5:22.25 total
```

On this branch:

Rebuild without changing anything (highly cached):

```
:; time DOCKER_TRACE=1 bin/docker-build
...
DOCKER_TRACE=1 bin/docker-build  8.94s user 5.97s system 46% cpu 32.257 total
```

Update only the git sha and rebuild:

```
:; git ci -am 'bump it' --allow-empty
[ver/go-docker-cache-versionless 77a80b5] bump it
:; time DOCKER_TRACE=1 bin/docker-build
...
DOCKER_TRACE=1 bin/docker-build-cli-bin  2.02s user 1.34s system 9% cpu 34.144 total
```
2018-05-10 10:22:09 -07:00
Thomas Rampelberg 25d8b22b5c
Adding statefulsets to inject. Fixes #907 (#910) 2018-05-10 09:00:36 -05:00
Thomas Rampelberg e96c2e1135
Made inject aware of the List type. (#886) 2018-05-10 08:08:55 -05:00
Andrew Seigner 1275b1ae89
Introduce Grafana, K8s, and Prom dashboards (#904)
Grafana provides default dashboards for Prometheus and Grafana health.
The community also provides Kubernetes-specific dashboards. Conduit was
not taking advantage of these.

Introduce new Grafana dashboards focused on Grafana, Kubernetes, and
Prometheus health. Tag all Conduit dashboards for easier UI navigation.
Also fix layout in Conduit Health dashboard.

Part of #420

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-05-08 23:11:43 +02:00
Risha Mars f94856e489
Modify the Stat endpoint to also return the number of failed conduit pods (#895)
* Modify the Stat endpoint to also return the count of failed pods
* Add comments explaining pod count stats
* Rename total pod count to running pod count

This is to support the service mesh overview page, as I'd like to include an indicator of
failed pods there.
2018-05-08 10:35:21 -07:00
Andrew Seigner 97bf4fcdf2
Release Notes for 0.4.1 release. (#839)
Also update Getting Started and Debugging docs to reflect changes in
`Tap` and `Stat`.

Fixes #838

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-26 13:32:41 -07:00
Brian Smith c5d2dab8bd
Remove special support for ExternalName services (#764)
After this was implemented we found that ExternalName services are
represented in DNS as CNAMEs, which means that the proxy's DNS
fallback logic can be used instead of doing DNS in the control
plane. Besides simplifying the controller, this will also increase
fidelity with the proxied pods' DNS configuration (improve
transparency).

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-04-25 11:53:33 -10:00
Andrew Seigner dce31b888f
Deprecate Tap, rename TapByResource to Tap (#844)
The `conduit tap` command is now deprecated.

Replace `conduit tap` with `connduit tapByResource`. Rename tapByResource
to tap. The underlying protobuf for tap remains, the tap gRPC endpoint now
returns Unimplemented.

Fixes #804

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-25 12:24:46 -07:00
Andrew Seigner 640570cd6b
Make TapByResource output destination pod (#837)
The TapByResource command now has access to destination labels from the
proxy, but was not outputting them on the cli.

Modify the TapByResource output to print the destination pod label,
rather than the ip, when available.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-24 18:58:40 -07:00
Andrew Seigner a0a9a42e23
Implement Public API and Tap on top of Lister (#835)
public-api and and tap were both using their own implementations of
the Kubernetes Informer/Lister APIs.

This change factors out all Informer/Lister usage into the Lister
module. This also introduces a new `Lister.GetObjects` method.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-24 18:10:48 -07:00
Andrew Seigner baf4ea1a5a
Implement TapByResource in Tap Service (#827)
The TapByResource endpoint was previously a stub.

Implement end-to-end tapByResource functionality, with support for
specifying any kubernetes resource(s) as target and destination.

Fixes #803, #49

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-23 16:13:26 -07:00
Andrew Seigner 39eccb09e2
cli: standardize kubernetes resource parsing (#830)
The Tap command leveraged new cli parsing code, enabling Kubernetes
resources specified as `(TYPE [NAME] | TYPE/NAME)`. The Stat command
did not use this.

Modify the Stat command to use the same cli flag parsing code as Tap.
Remove the to/from-resource flags from Stat.

Fixes #792

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-23 15:17:42 -07:00
Andrew Seigner 1f32f130de
Upgrade Prometheus from 2.1.0 to 2.2.1 (#816)
There have been a number of performance improvements and bug fixes since
v2.1.0.

Bump our Prometheus container to v2.2.1.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-19 18:00:53 -07:00
Andrew Seigner 79bdc638b3
Service support in stat command (#809)
The `stat` command did not support `service` as a resource type.

This change adds `service` support to the `stat` command. Specifically:
- as a destination resource on `--to` commands
- as a target resource on `--from` commands

Fixes #805

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-19 16:51:20 -07:00
Andrew Seigner 293e00bc3e
Introduce tapByResource cli command (#802)
The existing `tap` command is being deprecated.

Introduce a `tapByResource` cli command. It supports tapping a Kubernetes
resource or collection of resources, optionally filtered by outbound resources.
This command will eventually replace `tap`.

Part of #778

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-19 14:44:23 -07:00
Kevin Lingerfelt 653dc6bfaa
Add replication controller stats in CLI (#794)
* Add replication controller stats in CLI
* Fix pod status in stat summary tests

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-04-18 18:12:14 -07:00
Oliver Gould 06dd8d90ee
Introduce the TapByResource API (#778)
This changes the public api to have a new rpc type, `TapByResource`.
This api supersedes the Tap api. `TapByResource` is richer, more closely 
reflecting the proxy's capabilities.

The proxy's Tap api is extended to select over destination labels,
corresponding with those returned by the Destination api.

Now both `Tap` and `TapByResource`'s responses may include destination
labels.

This change avoids breaking backwards compatibility by:

* introducing the new `TapByResource` rpc type, opting not to change Tap
* extending the proxy's Match type with a new, optional, `destination_label` field.
* `TapEvent` is extended with a new, optional, `destination_meta`.
2018-04-18 15:37:07 -07:00
Andrew Seigner 1e4ac8fda8
Destination service provides pod-template-hash (#784)
The Destination service does not provide ReplicaSet information to the
proxy.

The `pod-template-hash` label approximates selecting over all pods in a
ReplicaSet or ReplicationController. Modify the Destination service to
provide this label to the proxy.

Relates to #508 and #741

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-18 14:41:27 -07:00
Kevin Lingerfelt 71a51afb40
Expose pod stats in CLI, web UI, and Grafana (#788)
* Expose pod stats in CLI, web UI, and Grafana
* Fix js api helpers test
* Add outbound traffic stats to pod dashboard

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-04-18 11:26:47 -07:00
Andrew Seigner 727521f914
Permit arbitrary time windows in public-api (#774)
The public-api previously only permitted 4 hard-coded time windows:
10s, 1m, 10m, 1h. This was primarily a relic of the recently removed
telemetry system.

Modify the public-api to validate the time string, but allow for any
window size, which is then passed through to Prometheus.

Fixes #686

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-16 17:37:17 -07:00
Kevin Lingerfelt 11a4359e9a
Misc cleanup following the telemetry rewrite (#771)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-04-16 15:51:07 -07:00
Oliver Gould 800cefdb77
Skip the proxy on the metrics port (#770)
When prometheus queries the proxy for data, these requests are reported
as inbound traffic to the pod. This leads to misleading stats when a pod
otherwise receives little/no traffic.

In order to prevent these requests being proxied, the metrics port is
now added to the default inbound skip-ports list (as is already case for
the tap server).

Fixes #769
2018-04-16 11:54:58 -07:00
Andrew Seigner 77fb6d3709
Add namespace as a resource type in public-api (#760)
* Add namespace as a resource type in public-api

The cli and public-api only supported deployments as a resource type.

This change adds support for namespace as a resource type in the cli and
public-api. This also change includes:
- cli statsummary now prints `-`'s when objects are not in the mesh
- cli statsummary prints `No resources found.` when applicable
- removed `out-` from cli statsummary flags, and analagous proto changes
- switched public-api to use native prometheus label types
- misc error handling and logging fixes

Part of #627

Signed-off-by: Andrew Seigner <siggy@buoyant.io>

* Refactor filter and groupby label formulation

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>

* Rename stat_summary.go to stat.go in cli

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>

* Update rbac privileges for namespace stats

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-04-13 16:53:01 -07:00
Oliver Gould cc44db054f
Remove NODE_NAME and POD_NAME env usage (#758)
* proxy: Remove pod_name and node_name

* cli: Do not inject POD_NAME and NODE_NAME env vars
2018-04-13 13:09:51 -07:00
Andrew Seigner 21886760c6
Use apps/v1beta2 for Kubernetes 1.8 compatibility (#762)
Conduit was relying on apps/v1 to Deployment and ReplicaSet APIs.
apps/v1 is not available on Kubernetes 1.8. This prevented the
public-api from starting.

Switch Conduit to use apps/v1beta2. Also increase the Kubernetes API
cache sync timeout from 10 to 60 seconds, as it was taking 11 seconds on
a test cluster.

Fixes #761

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-13 12:08:16 -07:00
Kevin Lingerfelt fb15fe7c1a
Remove the telemetry service (#757)
* Remove the telemetry service

The telemetry service is no longer needed, now that prometheus scrapes
metrics directly from proxies, and the public-api talks directly to
prometheus. In this branch I'm removing the service itself as well as
all of the telemetry protobuf, and updating the conduit install command
to no longer install the service. I'm also removing the old version of
the stat command, which required the telemetry service, and renaming the
statsummary command to stat.

* Fix time window tests

* Remove deprecated controller scrape config

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-04-13 11:21:29 -07:00
Kevin Lingerfelt 47caf1ca07
Add --all-namespaces flag to CLI statsummary command (#745)
* Add --all-namespaces flag to CLI statsummary command

* Fix statsummary output formatting

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-04-11 16:40:25 -07:00
Andrew Seigner 259fdcd134
Add latency stats in new stat summary endpoint (#737)
The new StatSummary endpoint was only providing request volume and
successs rate information.

Add support for retrieving latency stats via StatSummary. Also make
all prometheus calls in parallel, and implement kubernetes test
fixtures.

Fixes #681

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-11 11:58:32 -07:00
Kevin Lingerfelt 91c359e612
Switch public API to use cached k8s resources (#724)
* Switch public API to use cached k8s resources
* Move shared informer code to separate goroutine
* Fix spelling issue

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-04-10 11:39:31 -07:00
Andrew Seigner 716b392231
Move StatSummary logic into grpc server (#717)
The StatSummary logic was implemented as a method on http_server.

Move the StatSummary logic into grpc_server, for consistency with the
other endpoints.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-06 16:46:15 -07:00
Kevin Lingerfelt baa4d10c2f
CLI: change conduit namespace shorthand flag to -c (#714)
* CLI: change conduit namespace shorthand flag to -c

All of the conduit CLI subcommands accept a --conduit-namespace flag,
indicating the namespace where conduit is running. Some of the
subcommands also provide a --namespace flag, indicating the kubernetes
namespace where a user's application code is running. To prevent
confusion, I'm changing the shorthand flag for the conduit namespace to
-c, and using the -n shorthand when referring to user namespaces.

As part of this change I've also standardized the capitalization of all
of our command line flags, removed the -r shorthand for the install
--registry flag, and made the global --kubeconfig and --api-addr flags
apply to all subcommands.

* Switch flag descriptions from lowercase to Capital

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-04-06 14:47:31 -07:00
Andrew Seigner 50c323c617
Use canonical k8s names, fix prom labels (#702)
The new statsummary command accepted friendly k8s names, which worked
for k8s queries, but Prometheus requires a specific key.

Modify the statsummary query to map friendly k8s names to canonical k8s
names when constructing the query. Then during the query, map the
canonical k8s name to a specific Prometheus label.

Fixes #695

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-06 12:34:54 -07:00
Risha Mars 2f5b5ea5f2
Start implementing conduit stat summary endpoint (#671)
Start implementing new conduit stat summary endpoint. 
Changes the public-api to call prometheus directly instead of the
telemetry service. Wired through to `api/stat` on the web server,
as well as `conduit statsummary` on the CLI. Works for deployments only.

Current implementation just retrieves requests and mesh/total pod count 
(so latency stats are always 0). 

Uses API defined in #663
Example queries the stat endpoint will eventually satisfy in #627

This branch includes commits from @klingerf 

* run ./bin/dep ensure
* run ./bin/update-go-deps-shas
2018-04-05 17:05:06 -07:00
Andrew Seigner 28d5007cdf
Harmonize Prometheus label usage (#690)
The Destination service used slightly different labels than the
telemetry pipeline expected, specifically, prefixed with `k8s_*`.

Make all Prometheus labels consistent by dropping `k8s_*`. Also rename
`pod_name` to `pod` for consistency with `deployement`, etc. Also update
and reorganize `proxy-metrics.md` to reflect new labelling.

Fixes #655

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-05 15:09:06 -07:00
Andrew Seigner 9508e11b45
Build conduit-specific Grafana Docker image (#679)
Using a vanilla Grafana Docker image as part of `conduit install`
avoided maintaining a conduit-specific Grafana Docker image, but made
packaging dashboard json files cumbersome.

Roll our own Grafana Docker image, that includes conduit-specific
dashboard json files. This significantly decreases the `conduit install`
output size, and enables dashboard integration in the docker-compose
environment.

Fixes #567
Part of #420

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-05 14:20:05 -07:00
Andrew Seigner ee042e1943
Rename grafana viz to top-line (#666)
The primary Grafana dashboard was named 'viz' from a prototype.

Rename 'viz' to 'Top Line'.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-02 18:10:35 -07:00
Brian Smith df9ead9c36
Use Go 1.10.1 to build all Go code. (#650)
Go 1.10.1 is a security release.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-04-02 14:58:30 -10:00
Andrew Seigner bf721466e3
Filter out conduit controller pods from Grafana (#657)
The Grafana dashboards were displaying all proxy-enabled pods, including
conduit controller pods. In the old telemetry pipeline filtering these
out required knowledge of the controller's namespace, which the
dashboards are agnostic to.

This change leverages the new `conduit_io_control_plane_component`
prometheus label to filter out proxy-enabled controller components.

Part of #420

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-02 17:56:12 -07:00
Andrew Seigner 8fe742e2de
Update Grafana dashboards to use new proxy metrics (#637)
The Top-line and Deployment Grafana dashboards relied on the
soon-to-be-removed telemetry pipeline metrics.

Update the Grafana dashboards to query for the new, proxy-based metrics.
Grafana dashboard layouts have not changed.

Depends on #635 to render metrics.

Part of #420.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-03-29 13:00:01 -07:00
Andrew Seigner 666c83e963
Add pod_name to Prometheus labels (#649)
Previously we were using the instance label to uniquely identify a pod.
This meant that getting stats by pod name would require extra queries to
Kubernetes to map pod name to instance.

This change adds a pod_name label to metrics at collection time. This
should not affect cardinality as pod_name is invariant with respect to
instance.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-03-29 11:07:35 -07:00
Deshi Xiao 732f3c1565 fix grafana CrashLoopBackoff on image 5.0.3 (#646)
this is a known issue with grafana in k8s. grafana/grafana:5.0.4 was just released today. update the repo from 5.0.3 to 5.0.4
fixed issues #582

Signed-off-by: Deshi Xiao <xiaods@gmail.com>
2018-03-29 09:36:11 -07:00
Oliver Gould 6e435754a1
Improve CLI docker caching (#612)
Currently, the CLI docker image copies the entire `controller`
directory, though the CLI only requires a few of its subdirectories.
This causes the CLI's docker cache to be needlessly invalidated when,
for instance, a service implementation changes.

By restricting the copied directories to `controller/{api,public,util}`,
build caching is improved.
2018-03-29 09:29:06 -07:00
Kevin Lingerfelt 59c75a73a9
Add tests/utils/scripts for running integration tests (#608)
* Add tests/utils/scripts for running integration tests

Add a suite of integration tests in the `test/` directory, as well as
utilities for testing in the `testutil/` directory.

You can use the `bin/test-run` script to run the full suite of tests,
and the `bin/test-cleanup` script to cleanup after the tests.

The test/README.md file has more information about running tests.

@pcalcado, @franziskagoltz, and @rmars also contributed to this change.

* Create TEST.md file at the root of the repo

* Update based on review feedback

* Relax external service IP timeout for GKE

* Update TEST.md with more info about different types of test runs

* More updates to TEST.md based on review feedback

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-03-27 15:06:55 -07:00
Andrew Seigner fe35509406
Clean up Prometheus labels scraped from proxy (#633)
The Prometheus scrape config collects from Conduit proxies, and maps
Kubernetes labels to Prometheus labels, appending "k8s_".

This change keeps the resultant Prometheus labels consistent with their
source Kubernetes labels. For example: "deployment" and
"pod_template_hash".

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-03-27 15:01:08 -07:00
Andrew Seigner 291d8e97ab
Move injected data from env var to k8s labels (#605)
The inject code detects the object it is being injected into, and writes
self-identifying information into the CONDUIT_PROMETHEUS_LABELS
environment variable, so that conduit-proxy may read this information
and report it to Prometheus at collection time.

This change puts the self-identifying information directly into
Kubernetes labels, which Prometheus already collects, removing the need
for conduit-proxy to be aware of this information. The resulting label
in Prometheus is recorded in the form `k8s_deployment`.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-03-23 16:11:34 -07:00
Andrew Seigner fb1d6a5c66
Introduce Conduit Health dashboard (#591)
In addition to dashboards display service health, we need a dashboard to
display health of the Conduit service mesh itself.

This change introduces a conduit-health dashboard. It currently only
displays health metrics for the control plane components. Proxy health
will come later.

Fixes #502

Part of #420

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-03-22 15:16:03 -07:00
Alex Leong d50550515e
Add the proxy pod owner as a Prometheus label (#448)
Update the inject command to set a CONDUIT_PROMETHEUS_LABELS proxy environment variable with the name of the pod spec that the proxy is injected into. This will later be used as a label value when the proxy is exposing metrics.

Fixes: #426

Signed-off-by: Alex Leong <alex@buoyant.io>
2018-03-22 15:10:51 -07:00
Andrew Seigner c03508ba8c
Update Prometheus to scrape data and control plane (#583)
The existing telemetry pipeline relies on Prometheus scraping the
Telemetry service, which will soon be removed.

This change configures Prometheus to scrape the conduit proxies directly
for telemetry data, and the control plane components for control-plane
health information. This affects the output of both conduit install
and conduit inject.

Fixes #428, #501

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-03-22 13:58:11 -07:00
Andrew Seigner 680bf6211a
Add Grafana support to conduit dashboard command (#590)
The existing `conduit dashboard` command supported opening the conduit
dashboard, or displaying the conduit dashboard URL, via a `url` boolean
flag.

Replace the `url` boolean flag with a `show` string flag, with three
modes:
`conduit dashboard --show conduit`: default, open conduit dashboard
`conduit dashboard --show grafana`: open grafana dashboard
`conduit dashboard --show url`: display dashboard URLs

Part of #420

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-03-20 18:07:30 -07:00
Andy Hume e6286e1bdf cli: ensure check command has 80-character output (#587)
Successful `conduit check` commands now take into account `[ok]`
and `\n` tokens when constraining line length.

Fixes #554

Signed-off-by: Andy Hume <andyhume@gmail.com>
2018-03-20 13:55:19 -07:00
Andrew Seigner 3ca8e84eec
Add Top Line and Deployment Grafana dashboards (#562)
Existing Grafana configuration contained no dashboards, just a skeleton
for testing.

Introduce two Grafana dashboards:
1) Top Line: Overall health of all Conduit-enabled services
2) Deployment: Health of a specific conduit-enabled deployment

Fixes #500

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-03-20 10:22:30 -07:00
Brian Smith e7c4a9d4b9
Remove the cli docker image (#579)
This image isn't used. It references its base image using the `latest` tag, which
is wrong; it should have been using the tag that the base image was built with. It
is likely that the last few iterations of this image that we've published have
wrong and useless contents.

With that in mind, just remove the image.

Fixes #578.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-03-16 14:22:46 -10:00
Alex Leong 9eb084c99d Most controller listeners should only bind on localhost (#494)
* Most controller listeners should only bind on localhost
* Use default listening addresses in controller components
* Review feedback
* Revert test_helper change
* Revert use of absolute domains

Signed-off-by: Alex Leong <alex@buoyant.io>
2018-03-12 11:32:20 -07:00
Brian Smith 649e784d9c
Simplify cluster zone suffix handling in the proxy (#528)
* Temporarily stop trying to support configurable zones in the proxy.

None of the zone configuration is tested and lots of things assume the cluster
zone is `cluster.local`. Further, how exactly the proxy will actually learn the
cluster zone hasn't been decided yet.

Just hard-code the zone as "cluster.local" in the proxy until configurable zones
are fully implemented and tested to be working correctly.

Signed-off-by: Brian Smith <brian@briansmith.org>

* Remove the CONDUIT_PROXY_DESTINATIONS_AUTOCOMPLETE_FQDN setting

The way that Kubernetes configures DNS search suffixes has some negative
consequences as some names like "example.com" are ambiguous: depending on
whether there is a service "example" in the "com" namespace, "example.com"
may refer to an external service or an internal service, and this can
fluctuate over time. In recognition of that we added the
CONDUIT_PROXY_DESTINATIONS_AUTOCOMPLETE_FQDN setting, thinking this would
be part of a solution for users to opt out of the unfortunate behavior
if their applications didn't depend on the DNS search suffix feature.

It turns out similar effects can be acheived using a custom dnsConfig,
starting in Kubernetes 1.10 when dnsConfig reaches the beta stability level.
Now any CONDUIT_PROXY_DESTINATIONS_AUTOCOMPLETE_FQDN-based seems duplicative.
Further, attempting to support it optionally made the code complex and hard
to read.

Therefore, let's just remove it. If/when somebody actually requests this
functionality then we can add it back, if dnsConfig isn't a valid alternative
for them.

Signed-off-by: Brian Smith <brian@briansmith.org>

* Further hard-code "cluster.local" as the zone, temporarily.

Addresses review feedback.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-03-07 14:30:13 -10:00
Dennis Adjei-Baah ad42f2f8ab
Retry k8s watch endpoints on error (#510)
Shortly after conduit is installed in k8s environment. The control plane component that establishes a watch endpoint with k8s run in to networking issues during proxy initialization. During failure, each watcher fails to retry its connection to k8s watch endpoint which leads to timeouts and eventually, multiple controller pod restarts.

This PR adds retry logic to each "watch" enabled package.

fixes #478

Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
2018-03-07 13:40:43 -08:00
Brian Smith 0d4ab39ce7
Revert "Make absolute names truly absolute. (#525)" (#533)
This reverts commit 517616a166.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-03-07 10:57:10 -10:00
Brian Smith 517616a166
Make absolute names truly absolute. (#525)
Kubernetes will do multiple DNS lookups for a name like
`proxy-api.conduit.svc.cluster.local` based on the default search settings
in /etc/resolv.conf for each container:

1. proxy-api.conduit.svc.cluster.local.conduit.svc.cluster.local. IN A
2. proxy-api.conduit.svc.cluster.local.svc.cluster.local. IN A
3. proxy-api.conduit.svc.cluster.local.cluster.local. IN A
4. proxy-api.conduit.svc.cluster.local. IN A

We do not need or want this search to be done, so avoid it by making each
name absolute by appending a period so that the first three DNS queries
are skipped for each name.

The case for `localhost` is even worse because we expect that `localhost` will
always resolve to 127.0.0.1 and/or ::1, but this is not guaranteed if the default
search is done:

1. localhost.conduit.svc.cluster.local. IN A
2. localhost.svc.cluster.local. IN A
3. localhost.cluster.local. IN A
4. localhost. IN A

Avoid these unnecessary DNS queries by making each name absolute, so that the
first three DNS queries are skipped for each name.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-03-07 09:46:03 -10:00
Kevin Lingerfelt 47fc2eae20
Set -logtostderr flag on controller components (#524)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-03-07 10:18:15 -08:00
Andrew Seigner a065174688
Disable Grafana update check (#521)
Grafana by default calls out to grafana.com to check for updates. As
user's of Conduit do not have direct control over updating Grafana
directly, this update check is not needed.

Disable Grafana's update check via grafana.ini.

This is also a workaround for #155, root cause of #519.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-03-06 16:14:44 -08:00
Kevin Lingerfelt d6bd17425a
Add --expected-version flag for conduit check command (#497)
* Add --expected-version flag for conduit check command

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>

* Update build instructions

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-03-06 11:32:14 -08:00
Dennis Adjei-Baah 5a4c5aa683
Exclude telemetry generated by the control plane when requesting depl… (#493)
When the conduit proxy is injected into the controller pod, we observe controller pod proxy stats show up as an "outbound" deployment for an unrelated upstream deployment. This may cause confusion when monitoring deployments in the service mesh.

This PR filters out this "misleading" stat in the public api whenever the dashboard requests metric information for a specific deployment.

* exclude telemetry generated by the control plane when requesting deployment metrics

fixes #370

Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
2018-03-05 17:58:08 -08:00
Brian Smith 4c9b9c0f68
Install: Don't install buoyantio/kubectl into the prometheus pod. (#509)
In the initial review for this code (preceding the creation of the
runconduit/conduit repository), it was noted that this container is not
actually used, so this is actually dead code.

Further, this container actualy causes a minor problem, as it doesn't
implement any retry logic, thus it will sometimes often cause errors to
be logged. See
https://github.com/runconduit/conduit/issues/496#issuecomment-370105328.

Further, this is a "buoyantio/" branded container. IF we actually need
such a container then it should be a Conduit-branded container.

See https://github.com/runconduit/conduit/issues/478 for additional
context.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-03-05 08:59:14 -10:00
Igor Zibarev 0f6db6efc0 cli: refactor k8s config to support $KUBECONFIG with multiple paths (#482)
Kubernetes $KUBECONFIG environment variable is a list of paths to
configuration files, but conduit assumes that it is a single path.

Changes in this commit introduce a straightforward way to discover and
load config file(s).

Complete list of changes:

- Use k8s.io/client-go/tools/clientcmd to deal with kubernetes
configuration file
- rename k8s API and k8s proxy constructors to get rid of redundancy
- remove shell package as it is not needed anymore

Signed-off-by: Igor Zibarev <zibarev.i@gmail.com>
2018-02-28 12:13:09 -08:00
Andrew Seigner d50c8b4ac8
Add Grafana to conduit install (#444)
`conduit install` deploys prometheus, but lacks a general-purpose way to
visualize that data.

This change adds a Grafana container to the `conduit install` command. It
includes two sample dashboards, viz and health, in their own respective
source files.

Part of #420

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-02-28 11:36:21 -08:00
Andy Hume 1e611c21c6 cli: add check for latest version of cli and control plane (#460)
As part of `conduit check` command, warn the user if they are
running an outdated version of the cli client or the control
plane components.

Fixes #314

Signed-off-by: Andy Hume <andyhume@gmail.com>
2018-02-27 16:11:38 -08:00
Dennis Adjei-Baah 893bacf8d6
Make prometheus URL in config fully qualified DNS name (#443)
The telemetry service in the controller pod uses a non-fully qualified URL to connect to the prometheus pod in the control plane. This PR changes the URL the telemetry's prometheus URL to be fully qualified to be consistent with other URLs in the control plane. This change was tested in minikube. The logs report no errors and looking at the prometheus dashboard shows that stats are being recorded from all conduit proxies.

fixes #414

Signed-off-by: Dennis Adjei-Baah dennis@buoyant.io
2018-02-26 09:40:31 -08:00
Brian Smith 34cf79a3e6
Add a test of the actual default output of `conduit install`. (#376)
Refactor `conduit install` test into a data-driven test. Then add a test of the actual default output of `conduit install`.

This test is useful to make it clear when we change the default
settings of `conduit install`.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-02-23 13:27:36 -10:00
Brian Smith 78ebd5e340
Base control plane Docker images on scratch instead of base. (#368)
The control plane is proxied through the Conduit proxy. The Conduit
proxy is based on the base image, and the control plane containers
and the proxy share a networking namespace. This means we don't
need the extra base utilities in the controller images since we can
use the utilties in the proxy image.

This is a step towards building the initial no-networking Conduit CA
pod. Since the Conduit CA will not do any networking of its own, we
networking debugging utilties are not helpful for it. They are
actually an unnecessary risk because they could facilitate the
exfiltration of the private key of the CA. (The Conduit CA pod won't
have the Conduit Proxy injected into it either.)

This also simplifies & slightly speeds up the building of the
controller images. This is a stepping stone towards being able to
build the controller images without `docker build` to improve build
times.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-02-23 13:03:19 -10:00
Brian Smith 86bb65a148
Remove potentially-conflicting `app` labels in control plane (#373)
The `app` label should be reserved for end-user applications and we
shouldn't use it ourselves. We already have a Conduit-specific label
that is is prefixed with the `conduit.io/` prefix to avoid naming
collisions with users' labels, so just use that one instead.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-02-23 12:43:55 -10:00
Andrew Seigner 83f9c391bb
Move install template to its own file (#423)
The template used by `conduit install` was hard-coded in install.go.

This change moves the template into its own file, in anticipation of
increasing the template's size and complexity.

Part of #420

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-02-23 14:15:31 -08:00
Dennis Adjei-Baah f66ec6414c
Inject the conduit proxy into controller pod during conduit install (#365)
In order to take advantage of the benefits the conduit proxy gives to deployments, this PR injects the conduit proxy into the control plane pod. This helps us lay the groundwork for future work such as TLS, control plane observability etc.

Fixes #311

Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
2018-02-23 13:55:46 -08:00
Brian Smith cf3c8cd7bc
Use Go 1.10.0 to build Go components. (#408)
* Use Go 1.10.0 to build Go components.

Take advantage of the new build cache in Go 1.10. Future work on improving
build performance will utilize the build cache further.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-02-21 14:31:29 -10:00
Brian Smith e6aad57766
Remove temporary files generated by dep in go-deps image. (#407)
Previously Dockerfile-go-deps was converted from a multi-stage Dockefile
to a single-stage Dockerfile in anticipation of enabling efficient use
of `--cache-from` in CI. However, that resulted in the image ballooning
in size because it contained the Git repo for every package downloaded
by `dep ensure`.

Bring the image back down to the proper size by removing the temporary
files created.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-02-21 13:06:24 -10:00
Kevin Lingerfelt c579a8fe8d
Improve get/stat/tap help text by way of examples (#401)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-02-20 19:27:42 -08:00
Kevin Lingerfelt 8db7115420
Update go-run to set version equal to root-tag (#393)
* Update go-run to set version equal to root-tag

* Fix inject tests for undefined version change

* Pass inject version explitictly as arg

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-02-20 12:25:55 -08:00
Kevin Lingerfelt f48555d3cc
Remove kubectl dependency, validate k8s server version via api (#396)
* Remove kubectl dependency, validate k8s server version via api

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>

* Remove unused MockKubectl

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>

* Remame kubectl.go to version.go

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-02-20 12:14:11 -08:00
Dennis Adjei-Baah 9af3783555
Print error message only when invalid YAML file is used with inject command (#389)
When the `inject` command is used on a YAML file that is invalid, it prints out an invalid YAML file with the injected proxy. This may give a false indication to the user that the inject was successful even though the inject command prints out an error message further down the terminal window. This PR fixes #303 and contains a test input and output file that indicates what should be shown.

This PR also fixes #390.
Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
2018-02-20 11:59:41 -08:00
Kevin Lingerfelt d1ae4c5bc7
Run conduit dashboard on ephemeral port by default (#394)
* Run conduit dashboard on ephemeral port by default

* Fix wording on dashboard --port flag

* log.Debug error instead of discarding it

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-02-20 10:33:47 -08:00
Brian Smith 1489a84316
Refactor `conduit inject` code to eliminate duplicate logic (#383)
* Refactor `conduit inject` code to eliminate duplicate logic

Previously there was a lot of code repeated once for each type of
object that has a pod spec.

Refactor the code to reduce the amount of duplication there, to make future
changes easier.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-02-19 11:18:44 -10:00
Brian Smith 80aba6c075
CLI: Remove now-unnecessary "enhanced" Kubernetes object types (#382)
* CLI: Remove now-unnecessary "enhanced"  Kubernetes object types

The "enhanced" types aren't necessary because now the Kuberentes API
implementation has the correct JSON annotations for the InitContainers
field.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-02-19 09:37:25 -10:00
Risha Mars 8bc7c5acde
UI tweaks: sidebar collapse, latency formatting, table row spacing (#361)
- reduce row spacing on tables to make them more compact
- Rename TabbedMetricsTable to MetricsTable since it's not tabbed any more
- Format latencies greater than 1000ms as seconds
- Make sidebar collapsible 
- poll the /pods endpoint from the sidebar in order to refresh the list of deployments in the autocomplete
- display the conduit namespace in the service mesh details table
- Use floats rather than Col for more responsive layout (fixes #224)
2018-02-19 11:21:54 -08:00
Dennis Adjei-Baah 01e694ad71
add check and friendly error if conduit dashboard is not installed (#289)
* add check and friendly error if conduit dashboard is not installed

Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
2018-02-19 09:28:56 -08:00
Brian Smith d8f9c33183
Skip pods with hostNetwork=true in `conduit inject` (#380)
The init container injected by conduit inject rewrites the iptables configuration for its network namespace. This causes havoc when the network namespace isn't restricted to the pod, i.e. when hostNetwork=true.

Skip pods with hostNetwork=true to avoid this problem.

Fixes #366.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-02-18 13:55:42 -10:00
Brian Smith 51873542e5
Refactor `conduit inject` code to make it unit-testable. (#379)
Refactor `conduit inject` code to make it unit-testable.

Refactor the conduit inject code to make it easier to add unit tests. This work was done by @deebo91 in #365. This is the same PR without the conduit install changes, so that it can land ahead of #365. In particular, this will be used for testing the fix for high-priority bug #366.

Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
Signed-off-by: Brian Smith <brian@briansmith.org>
2018-02-18 12:33:52 -10:00
Brian Smith 64f270b631
Strip `conduit` CLI executables in `docker build`. (#367)
File sizes (in bytes) before and after this change:

        conduit-darwin conduit-linux conduit-windows
Before:     27,056,288    27,282,364      27,359,744
After:      20,023,456    18,080,576      18,262,528
----------------------------------------------------
Diff         7,032,832     9,201,788       9,097,216

Fixes #352.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-02-16 08:20:18 -10:00
Brian Smith b18fe459d4
Precompile large Go libraries in go-deps Docker image. (#332)
On my system (i9-7960x running Docker natively in Linux) this regularly saves
over 11 seconds of build time when a file under pkg/ changes and over 1.5
seconds of build time when a file under controller/ changes. Since most
contributors are running Docker in a VM on less powerful computers, the
savings for most contributors should be significantly greater.

I imagine the savings for web/ and cli/ and proxy-init/ are similar, but I
did not measure them.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-02-13 11:35:10 -10:00
Andrew Seigner 797bba6bc6
Upgrade to Prometheus 2.1.0 (#344)
Conduit has been on Prometheus 1.8.1. Prometheus 2.x promises better
performance.

Upgrade Conduit to Prometheus 2.1.0

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-02-13 13:22:53 -08:00
Brian Smith ec5a02fd64
Upgrade to Go 1.9.4. (#326)
Go 1.9.4 is a security release.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-02-12 13:47:40 -10:00
Brian Smith 86ea1c06bf
Improve the caching behavior of Dockerfile-go-deps. (#325)
Previously Dockerfile-go-deps would run `dep ensure` whenever anything in the
source tree changed. Also, because it was a multi-stage Dockerfile it did not
work well with Docker's `--cache-from` feature.

Change Dockerfile-go-deps to only re-run `dep ensure` when Gopkg.{toml,lock}
and/or bin/dep change. Simplify it to a single stage so that it works better
with Docker's `--cache-from` feature.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-02-12 13:40:20 -10:00
Brian Smith c78df4ba13
Use bin/dep in Dockerfile-go-deps. (#324)
bin/dep verifies the digest of the `dep` downloaded `dep` executable,
whereas previously Dockerfile-go-deps wasn't.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-02-12 13:32:08 -10:00
Eliza Weisman 458e9d2ac5
Remove per-path metrics from telemetry pipeline (#317)
Follow-up from #315.

Now that the UIs don't report per-path metrics, we can remove the path label from Prometheus, the path aggregation and filtering options from the telemetry API, and the path field from the proxy report API.

I've modified the tests to no longer expect the removed fields, and manually verified that Conduit still works after making these changes.

Closes #265 

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-02-09 14:20:28 -08:00
Eliza Weisman 6c2ac6125f
Remove per-path metrics from UIs (#315)
I've removed per-path metrics from the web dashboard and from the `conduit stat` command. 

Manually validated that these metrics are no longer displayed.

Closes  #263

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-02-09 12:35:49 -08:00
Andrew Seigner 33e3c3ace9
Optimize Prometheus queries (#298)
Prometheus queries from the Telemetry service were taking seconds or 10s
of seconds.

Optimize these queries:
- Move all summary queries requiring a single point data off of Prometheus'
  QueryRange() endpoint, onto Query()
- Set `defaultVectorRange` to 30s, and also use it regardless of time
  window
Also add tests for grpc_server and telemetry server

Signed-off-by: Andrew Seigner <siggy@buoyant.io>

Fixes #260
2018-02-09 10:55:07 -08:00
Eliza Weisman 2015d992cc
Remove pod-level metrics from web and CLI (#304)
This PR updates the web UI to remove the pod detail page, and to remove the links to that page from pod names in metrics tables. It also removes the `pods` option from `conduit stat`, and the `sourcePod` and `targetPod` fields from the controller API proto's `MetricMetadata` message.

I've updated the `conduit stat` tests to reflect these changes, and manually verified the web UI changes.

Closes #261 

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-02-08 19:07:10 -08:00
Jeff Haynie f721a0f800 Fixed mispelling in conduit inject args (#300) 2018-02-08 12:48:40 -05:00
Phil Calçado 8041d07e4d
Add --url option to skip browser when opening dashboard (#281)
Browser opening is a UX nicety, but it shouldn't be mandatory.
2018-02-06 14:07:55 -05:00
Phil Calçado 5628d3c8f4
Add --short and --client to CLI version command (#274)
These are very useful when writing scripts, e.g.

conduit_version=`conduit version --short --client`

Signed-off-by: Phil Calcado <phil@buoyant.io>
2018-02-05 17:02:45 -05:00
Alex Leong b691c2e25b
Rename --version flag in conduit install to --conduit-version (#255)
This makes the `conduit install` flag match the `conduit inject` flag.

Signed-off-by: Alex Leong <alex@buoyant.io>
2018-02-05 10:45:44 -08:00
Andrew Seigner 9a40d984ff
Replace shelling out with kubernetes proxy (#249)
The conduit dashboard command asychronously shells out and runs "kubectl
proxy".

This change replaces the shelling out with calls to kubernetes proxy
APIs. It also allows us to enable race detection in our go tests, as the
shell out code tests did not pass race detection.

Fixes #173

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-02-02 10:31:59 -08:00
Alex Leong fa2f5a0140
Add dep wrapper script to ensure consistent version of dep is used (#253)
* Add `bin/dep` which fetches a fixed version of `dep` to be used. 
* Upgrade from dep 0.3.1 to 0.4.1
* Fix inconsistent Gopkg.lock by checking in the result of `bin/dep ensure`

Signed-off-by: Alex Leong <alex@buoyant.io>
2018-02-01 16:09:05 -08:00
Andrew Seigner 277c06cf1e
Simplify and refactor k8s labels and annnotations (#227)
The conduit.io/* k8s labels and annotations we're redundant in some
cases, and not flexible enough in others.

This change modifies the labels in the following ways:
`conduit.io/plane: control` => `conduit.io/controller-component: web`
`conduit.io/controller: conduit` => `conduit.io/controller-ns: conduit`
`conduit.io/plane: data` => (remove, redundant with `conduit.io/controller-ns`)
It also centralizes all k8s labels and annotations into
pkg/k8s/labels.go, and adds tests for the install command.

Part of #201

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-02-01 14:12:06 -08:00
Kevin Lingerfelt 9ff439ef44
Add -log-level flag for install and inject commands (#239)
* Add -log-level flag for install and inject commands

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>

* Turn off all CLI logging by default, rename inject and install flags

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>

* Re-enable color logging

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-02-01 12:38:07 -08:00
Kevin Lingerfelt 4a76c6448b
Update cli subcommands to print errors when encountered (#221)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-01-29 11:28:19 -08:00
Kevin Lingerfelt 7399df83f1
Set conduit version to match conduit docker tags (#208)
* Set conduit version to match conduit docker tags

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>

* Remove --skip-inbound-ports for emojivoto

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>

* Rename git_sha => git_sha_head

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>

* Switch to using the go linker for setting the version

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>

* Log conduit version when go servers start

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>

* Cleanup conduit script

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>

* Add --short flag to head sha command

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>

* Set CONDUIT_VERSION in docker-compose env

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-01-26 11:43:45 -08:00
Phil Calçado 9410da471a
Better error handling for Tap (#177)
Previously, running `$conduit tap` would return a `Unexpected EOF` error when the server wasn't available. This was due to a few problems with the way we were handling errors all the way down the tap server. This change fixes that and cleans some of the protobuf-over-HTTP code.

- first step towards #49
- closes #106
2018-01-25 11:49:38 -05:00
Andrew Seigner d0a0bb22bd
Move EosCtx to common for Tap and Telemetery (#204)
* Make Eos optional in TapEvent

grpc_status not being set in protobuf is the same as being set to zero,
which is also status OK

Modify TapEvent to include an optional EOS struct

Signed-off-by: Andrew Seigner <siggy@buoyant.io>

Part of #198

* Add Eos to proto & proxy tap end-of-stream events

The proxy now outputs `Eos` instead of `grpc_status` in all end-of-stream tap events. The EOS value is set to `grpc_status_code` when the response ended with a `grpc_status` trailer, `http_reset_code` when the response ended with a reset, and no `Eos` when the response ended gracefully without a `grpc_status` trailer.

This PR updates the proxy. The proto and controller changes are in PR #204.
Part of #198. Closes #202

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-01-24 15:48:00 -08:00
Andrew Seigner 47ec2fb190
Remove DOCKER_FORCE_BUILD, disable symbolic tags (#168)
DOCKER_FORCE_BUILD, combined with symbolic tags, added complexity and
risk of running unintended versions of the code.

This change removes DOCKER_FORCE_BUILD, and sets all Docker tags
programmatically. The decision to pull or build has been moved up the
stack from _docker.sh to the docker-build-* scripts. Workflows that
want to favor docker pulls (like ci), can do so explicitly via
docker-pull.

fixes #141

Signed-off-by: Andrew Seigner <andrew@sig.gy>
2018-01-23 12:02:28 -08:00
Andrew Seigner 2413086335
cli polish for 0.1.2 release (#176)
rename conduit status -> conduit check
remove 6h and 24 window options from conduit stat
remove watch and watch-only from conduit stat

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-01-18 16:33:25 -08:00
Dennis Adjei-Baah f7af375e73
Remove scheme requirement for api-addr flag in conduit CLI (#126)
* Allow external controller public api clients that don't rely on a kubeconfig to interact with Conduit CLI

Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
2018-01-17 17:12:44 -08:00
Oliver Gould 008f53865b
Make proxy-deps multi-stage to remove the original source files (#161)
Previously, proxy-deps and go-deps included the source tree for local
projects. This can cause build conflicts when files are renamed.

By adopting a multi-stage build for the proxy-deps image, we can be sure
that we only preserve essential dependencies & manifests in the
proxy-deps and go-deps images.

Furthermore, `bin/update-go-deps-shas` and `bin/update-proxy-deps-shas` have
been added to ease maintenance when files are changed.

Fixes #159

Signed-off-by: Oliver Gould <ver@buoyant.io>
2018-01-17 12:26:22 -08:00
Kevin Lingerfelt fd3cfcb5d9
Move healthcheck proto to separate file, use throughout (#150)
* Move healthcheck proto to separate file, use throughout

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>

* Remove Check message from healthcheck.proto

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>

* Standardize healthcheck protobuf import name

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-01-17 11:15:38 -08:00
Phil Calçado 612bd0f7a0
Add --verbose option to CLI (#154)
* Use stdout as writer for tap command

fixes #136

Signed-off-by: Phil Calcado <phil@buoyant.io>

* Add --log-level to command line

Signed-off-by: Phil Calcado <phil@buoyant.io>
2018-01-17 12:06:43 -05:00
Phil Calçado 4daa007256
Use stdout as writer for tap command (#152)
fixes #136

Signed-off-by: Phil Calcado <phil@buoyant.io>
2018-01-17 11:19:22 -05:00
Zhu Guihua 1ba375b71c fix typo error (#153)
Signed-off-by: Guihua Zhu <z.zhuguihua@gmail.com>
2018-01-16 11:20:21 -05:00
Phil Calçado e328db7e87
Adds conduit-api check for status command (#140)
* Abstract Conduit API client from protobuf interface to add new features

Signed-off-by: Phil Calcado <phil@buoyant.io>

* Consolidate mock api clients

Signed-off-by: Phil Calcado <phil@buoyant.io>

* Add simple implementation of healthcheck for conduit api

Signed-off-by: Phil Calcado <phil@buoyant.io>

* Change NextSteps to FriendlyMessageToUser

Signed-off-by: Phil Calcado <phil@buoyant.io>

* Add grpc check for status on the client

Signed-off-by: Phil Calcado <phil@buoyant.io>

* Add simple server-side check for Conduit API

Signed-off-by: Phil Calcado <phil@buoyant.io>

* Fix feedback from PR

Signed-off-by: Phil Calcado <phil@buoyant.io>
2018-01-12 15:35:22 -05:00
Kevin Lingerfelt 1dc1c00a2a
Upgrade k8s.io/client-go to v6.0.0 (#122)
* Sort imports

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>

* Upgrade k8s.io/client-go to v6.0.0

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>

* Make k8s store initialization blocking with timeout

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-01-11 10:22:37 -08:00
Andrew Seigner caeb83a526
Fix Go and Proxy dependency image SHAs (#117)
The image tags for gcr.io/runconduit/go-deps and
gcr.io/runconduit/proxy-deps were not updating to account for all
changes in those images.

Modify SHA generation to include all files that affect the base
dependency images. Also add instructions to README.md for updating
hard-coded SHAs in Dockerfile's.

Fixes #115

Signed-off-by: Andrew Seigner <andrew@sig.gy>
2018-01-08 11:19:49 -08:00
Phil Calçado 120dbce49d
second iteration of status subcommand: check Kubernetes API #92 (#108)
Signed-off-by: Phil Calcado <phil@buoyant.io>
2018-01-05 13:32:41 -08:00
Risha Mars 38f44fcdb7
Apply fixes from 110 to Dockerfile-bin (#111)
pkg needs to be copied over in Dockerfile-bin as well

Signed-off-by: Risha Mars <mars@buoyant.io>
2018-01-05 10:57:08 -08:00
Phil Calçado 709de5a7b0
Moves k8s and conduit client code to /pkg (#103)
* Rename constructor functions from MakeXyz to NewXyz

As it is more commonly used in the codebase

Signed-off-by: Phil Calcado <phil@buoyant.io>

* Make Conduit client depend on KubernetesAPI

Signed-off-by: Phil Calcado <phil@buoyant.io>

* Move Conduit client and k8s logic to standard go package dir for internal libs

Signed-off-by: Phil Calcado <phil@buoyant.io>

* Move dependencies to /pkg

Signed-off-by: Phil Calcado <phil@buoyant.io>

* Make conduit client more testable

Signed-off-by: Phil Calcado <phil@buoyant.io>

* Remove unused config object

Signed-off-by: Phil Calcado <phil@buoyant.io>

* Add more test cases for marhsalling

Signed-off-by: Phil Calcado <phil@buoyant.io>

* Move client back to controller

Signed-off-by: Phil Calcado <phil@buoyant.io>

* Sort imports

Signed-off-by: Phil Calcado <phil@buoyant.io>
2018-01-04 10:10:10 -08:00
Andrew Seigner 7edea726e6
Add shell completion command to cli (#97)
Cobra supports bash and zsh completion code generation, but our cli
project was not leveraging it.

Add a 'completion' command to the conduit cli, which outputs bash and
zsh completion code.

Signed-off-by: Andrew Seigner <andrew@sig.gy>
2017-12-28 14:24:06 -08:00
Phil Calçado c76b705fce
first iteration of status subcommand: check Kubectl #92 (#96)
* Add framework for healthcheck in CLI

Signed-off-by: Phil Calcado <phil@buoyant.io>

* Add self-checked for kubectl

Signed-off-by: Phil Calcado <phil@buoyant.io>

* Clear formatting code

Signed-off-by: Phil Calcado <phil@buoyant.io>

* Removed ununsed objects from status

Signed-off-by: Phil Calcado <phil@buoyant.io>

* Removed ununsed parameter

Signed-off-by: Phil Calcado <phil@buoyant.io>

* Ignore errored self checkers

Signed-off-by: Phil Calcado <phil@buoyant.io>

* Make the check error by default

Signed-off-by: Phil Calcado <phil@buoyant.io>

* Log error, format changes

Signed-off-by: Phil Calcado <phil@buoyant.io>
2017-12-28 14:03:18 -05:00
Phil Calçado c0ccdfabfc
Skip tests known to be flaky when they fail (#95)
We need to address the flaky checks, but for now it is better to have these skipped than a build with false negatives.

Signed-off-by: Phil Calcado <phil@buoyant.io>
2017-12-27 14:42:36 -05:00
Phil Calçado 31e9846f62
Make several CLI commands testable (#86)
* Add func to rsolve kubectl-like names to canonical names

Signed-off-by: Phil Calcado <phil@buoyant.io>

* Refactor API instantiation

Signed-off-by: Phil Calcado <phil@buoyant.io>

* Make version command testable

Signed-off-by: Phil Calcado <phil@buoyant.io>

* Make get command testable

Signed-off-by: Phil Calcado <phil@buoyant.io>

* Add tests for api utils

Signed-off-by: Phil Calcado <phil@buoyant.io>

* Make stat command testable

Signed-off-by: Phil Calcado <phil@buoyant.io>

* Make tap command testablë

Signed-off-by: Phil Calcado <phil@buoyant.io>
2017-12-27 14:10:41 -05:00
Christopher Schmidt bcf61e1bfb adding RBAC stuff for Prometheus (#83)
* adding RBAC stuff for Prometheus, fixes runconduit/conduit#82
Signed-off-by: Christopher Schmidt fakod666@gmail.com

* fixed ident
Signed-off-by: Christopher Schmidt fakod666@gmail.com
2017-12-22 13:45:56 -08:00
Phil Calçado 78d7b22e1c
Add more information to async test failures (#73) 2017-12-22 10:01:34 +11:00
Kevin Lingerfelt a8e75115ab
Prepare the repo for the v0.1.1 release (#75)
* Prepare the repo for the v0.1.1 release

* Add changelog

* Changelog updates, wrap at 100 characters
2017-12-20 10:51:53 -08:00
Kevin Lingerfelt 9fca7e8b36
Modify inject to stop passing malformed arguments (#74) 2017-12-19 22:03:56 -08:00
Dennis Adjei-Baah 63e2fbb97b Refactor each CLI command to silently exit on uncaught errors (#54)
* add silent exits on all conduit commands

* Revert "add silent exits on all conduit commands"

This reverts commit 07488ca

* adds back a change that was accidentally removed on revert

* Stop printing errors to stderr
2017-12-19 17:24:38 -08:00
Phil Calçado 0a6a9edaee
Respect $KUBECONFIG env var (#68)
* Move kubectl logis to k8s package

* Made kubectl return *url.URL, just like API

* Make k8s API code respect /Users/pcalcado/.kube/config (closes #17)

* Fix style mistakes and typos
2017-12-20 11:50:25 +11:00
Kevin Lingerfelt 42e9a94e45
Make tap output line-oriented (#66)
* Make tap output line-oriented

* Use grpc response code constants, add tests
2017-12-19 15:58:46 -08:00
Phil Calçado 683b5c0dd6 Fix unit test runner and failing tests (#67)
* Remove integration test clause from travis file

* Use correct channel for asyn process error reporting

* Fix test for sorting stat keys

* Make integration tests only run if CONDUIT_INTEGRATION_TESTS_ENABLED is set

* Fix timeouts in tests

* Fix handler test check for version

* Removes smoke test that required kubectl as not present on travis

* Replace tail with sleep to avoid leaking subprocesses during tests

* Fix typo & extract constant
2017-12-19 14:21:56 -08:00
Brian Smith 208723b44d
conduit inject: Enable auto name completion in proxy (#60)
Previously `conduit inject` did not enable automatic name completion in
the proxy. As a result services couldn't connect to services outside
the "default" namespace without qualifying the service name with (at
least) the namespace. This is arguably safer but it isn't compatible
with the way things work in Kubernetes when the proxy isn't used.

Enable name auto-completion in the proxy so that the proxy will add
the current pod's namespace to any unqualified service name. This
depends on the feature being added to the proxy (PR #59).

Due to some issues with how zones are dealt with in the project, the
zone component isn't provided; it turns out that it doesn't matter
whether we provide the zone in the current implementation. Dealing
with the zone better will be added later.

Validated by deploying the emojivoto service with its configuration
updated to use unqualified names (`sed "s/\\.emojivoto//g"`). Before
this change this modified configuration would fail; now it succeeds.

Fixes #9.
2017-12-19 12:07:54 -10:00
Phil Calçado 6f9e8be2ee
Check kubectl versions before attempting dashboard (#53)
* Introduce objects to manage kubectl and shell

* Make dashboard command use new shell & kubectl objects

* Make compatible with version numbers like v1.9.0-beta.1

* Add version check for kubectl

* Refactor error to use proper method from fmt pa ckage

* Make channel and error handling more idiomatic and safe

* Make version require 1.8.0
2017-12-19 11:41:15 +11:00
Alex Leong 772b43fefa Add inject flag for skipping outbound ports (#38)
* Add inject flag for skipping outbound ports

* Fix usage of proxy-init ignore flags (closes #541)
2017-12-19 11:17:11 +11:00
Brian Smith d025bf4c0f
conduit inject: Configure proxy to log at "info" level (#58)
Previously `conduit inject` was configuring the proxy to log a lot of
detail, most of which is probably shouldn't be relevant to Conduit
users.

Configure the proxy to log at the "info" level instead for the proxy
itself, and the "warn" level for internal components of the proxy.

Validated by manually doing a `conduit inject`, triggering some
traffic, and inspecting the logs.

Fixes #57
2017-12-18 08:37:27 -10:00
Alex Leong 8a7579ef4a
Add RBAC support (#40)
Adds a ServiceAccount, ClusterRole, and ClusterRoleBinding to the conduit install output that allows the Conduit controller read access to the k8s API.

I have tested this on RBAC-enabled minikube and minikube without RBAC.
2017-12-14 16:57:52 -08:00
Dennis Adjei-Baah 922b41c0fa
Remove namespace property from the Conduit web app (#10)
* Remove namespace property from ServiceMesh.jsx and Deployments.jsx and related Go code
2017-12-08 15:38:04 -08:00
Kevin Lingerfelt 2f114e69fa
Add support for path stats in cli and web api (#13)
* Add support for path stats in cli and web api

The cli stat command supports grouping by pod and deployment. With this
change, it will also support grouping by path, in order to facilitate a
summary stats per individual endpoint.

* Right-align numeric columns in stat output
2017-12-08 12:24:39 -08:00
Oliver Gould bff3efea3f
Prepare for v0.1.0 (#1)
Update versions in code.

Use default docker tag of v0.1.0
2017-12-04 19:55:56 -08:00
Oliver Gould a1fbafaae3 update go-deps image 2017-12-05 01:17:38 +00:00
Oliver Gould b104bd0676 Introducing Conduit, the ultralight service mesh
We’ve built Conduit from the ground up to be the fastest, lightest,
simplest, and most secure service mesh in the world. It features an
incredibly fast and safe data plane written in Rust, a simple yet
powerful control plane written in Go, and a design that’s focused on
performance, security, and usability. Most importantly, Conduit
incorporates the many lessons we’ve learned from over 18 months of
production service mesh experience with Linkerd.

This repository contains a few tightly-related components:
- `proxy` -- an HTTP/2 proxy written in Rust;
- `controller` -- a control plane written in Go with gRPC;
- `web` -- a UI written in React, served by Go.
2017-12-05 00:24:55 +00:00