Commit Graph

227 Commits

Author SHA1 Message Date
Kevin Lingerfelt e97be1f5da
Move all healthcheck-related code to pkg/healthcheck (#1492)
* Move all healthcheck-related code to pkg/healthcheck
* Fix failed check formatting
* Better version check wording

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-08-20 16:50:22 -07:00
Eliza Weisman b8434d60d4
Add resource metadata to Tap CLI output (#1437)
Closes #1170.

This branch adds a `-o wide` (or `--output wide`) flag to the Tap CLI.
Passing this flag adds `src_res` and `dst_res` elements to the Tap
output, as described in #1170. These use the metadata labels in the tap
event to describe what Kubernetes resource the source and destination
peers belong to, based on what resource type is being tapped, and fall
back to pods if either peer is not a member of the specified resource
type.

In addition, when the resource type is not `namespace`, `src_ns` and
`dst_ns` elements are added, which show what namespaces the the source
and destination peers are in. For peers which are not in the Kubernetes
cluster, none of these labels are displayed.

The source metadata added in #1434 is used to populate the `src_res` and
`src_ns` fields.

Also, this branch includes some refactoring to how tap output is
formatted.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-08-20 14:25:26 -07:00
Kevin Lingerfelt 7c07ba0d53
Upgrade to dep 0.5.0, go 1.10.3 (#1479)
* Upgrade to dep 0.5.0, go 1.10.3
* Remove existing dep binary if it's the wrong version
* Add version in filename of dep binary to prevent version conflicts

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-08-17 16:04:50 -07:00
Alex Leong 094a375015
[RFC] linkerd top (#1435)
This an initial implementation of the `linkerd top` command.  This command launches an ncurses style tabular view of current requests (using data from tap).  Most of the command line arguments are the same as tap and allow selecting the resource to inspect and filtering which requests to view.  

Fixes #1283 

Signed-off-by: Alex Leong <alex@buoyant.io>
2018-08-15 18:10:23 -07:00
Eliza Weisman cda05aa34c
Add inbound destination label hydration to Tap server (#1442)
Based on @adleong's suggestion in
https://github.com/linkerd/linkerd2/pull/1434#pullrequestreview-145428857,
this branch adds label hydration from destination IPs to the Tap server.
This works the same as the label hydration for destination IPs added in
#1434. However, it is only applied to the destination fields of events
recorded by proxies in the inbound direction, since outbound
destinations are already labeled with metadata provided by the
Destination service.

This means that when a user taps inbound traffic, the CLI will show k8s
metadata labels for the destination peer (if it's available). This can
be useful especially when tapping several pods at once, as it makes it
easier to distinguish what pod received a request.

This branch also refactors how the label hydration is performed,
primarily to make adding it to the destination field less repetitive.
Also, the `hydrateIPLabels` function now mutates the label map in the
`TapEvent`, rather than returning a new map of labels, so that the case
where no pod was found doesn't require an additional allocation of an
empty map.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-08-13 13:46:33 -07:00
Eliza Weisman bf7fc12f5c
Add source metadata to Tap server tap events (#1434)
The `TapEvent` protobuf contains two maps, `DestinationMeta` and
`SourceMeta`. The `DestinationMeta` contains all the metadata provided
by the proxy that originated the event (ultimately originating from the
Destination service), while the `SourceMeta` currently only contains the
source connection's TLS status.

This branch modifies the Tap server to hydrate the same set of metadata
from the source IP address, when the source was within the cluster. It
does this by adding an indexer of pod IPs to pods to its k8s API client,
and looking up IPs against this index. If a pod was found, the extra
metadata is added to the tap event sent to the client.

This branch also changes the client so that if a source pod name was
provided in the metadata, it prints the pod name rather than the IP
address for the `src` field in its output. This mimics what is currently
done for the `dst` field in tap output. Furthermore, the added source
metadata will be necessary for adding src resource types to tap output
(see issue #1170).

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-08-13 13:25:14 -07:00
Kevin Lingerfelt 00a0572098
Better CLI error messages when control plane is unavailable (#1428)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-08-09 15:40:41 -07:00
Eliza Weisman 56681015ae
Fix Destination returning no endpoints for single unnamed container port (#1420)
Fixes #1405.

According to the Kubernetes Endpoints API documentation, the `name`
field in the `EndpointPort` response object is "Optional if only one
port is defined". (see
https://v1-9.docs.kubernetes.io/docs/reference/generated/kubernetes-api/v1.9/#endpointport-v1-core)
However, when the Destination service an endpoints response for a
service with a named target port, it expects the ports in the endpoints
response to have the same name as the target port in the service. 

When a user creates a `NodePort` service with an unnamed port that
targets a named container port, this behaviour results in Linkerd
failing to route to that service by hostname. Without Linkerd injected,
the hostname is still reachable. 

This branch fixes this issue by changing the `endpointsToAddresses`
function in `endpoints_watcher.go` to handle the case when an endpoints
response contains only a single unnamed port.

I've manually verified that this fixes the issue described in #1405.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-08-08 13:01:53 -07:00
Kevin Lingerfelt bd19e8aaff
Update prometheus to only scrape proxies in the same mesh (#1402)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-08-06 12:05:55 -07:00
Kevin Lingerfelt f70ad7de11
Use stable version for linkerd2-proxy-api dep (#1400)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-08-03 11:59:42 -07:00
Sean McArthur c035193313
add H2 protocol to destination addrs if managed by linkerd (#1380)
Signed-off-by: Sean McArthur <sean@buoyant.io>
2018-08-03 10:14:30 -07:00
Alex Leong 3e1f35913b
Read all bytes of message length header (#1394)
The `reader.Read` method only reads as many bytes as are currently available from reader.  When reading the 4 byte message length header, if not all 4 of those bytes are available, `Read` will only read the available bytes and return.  This causes alignment issues when the message body is read and there are still unread header bytes in the reader.  These bytes will appear at the beginning of the message body and cause a crash when the message is unmarshalled.

Use `io.ReadFull` to ensure that we read all 4 of the message length header bytes.

Fixes #1287 

Signed-off-by: Alex Leong <alex@buoyant.io>
2018-08-02 10:45:49 -07:00
Risha Mars fef896011f Add more filters to the web UI tap form (#1371)
* Update ant to 3.7.2
* Add autocomplete of namespaces/resources to Tap in web ui
  * Add form fields for authority/path/method/rps/scheme
  * Add the ability to clear error messages to the error banner
* Add error listener to ws object
2018-07-31 15:48:53 -07:00
Kevin Lingerfelt c362d5e114
Update k8s.io dependencies to 1.11.1 (#1369)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-07-27 15:23:03 -07:00
Brian Smith 2098beb123
Don't relink web & controller executables when version changes. (#1338)
Speed up incremental rebuilds by avoiding relinking the controller
and/or web executables when changes are made to unrelated files.
Before this change, any time the git tag changed, the executables
would have to be rebuilt (relinked at least), even if no Go files
changed.

Validated by running the integration tests.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-07-26 16:20:36 -10:00
Kevin Lingerfelt 51848230a0
Send glog logs to stderr by default (#1367)
* Send glog logs to stderr by default
* Factor out more shared flag parsing code

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-07-25 12:59:24 -07:00
Risha Mars ec3c861743
Enable Tap from the Web UI (#1356)
Adds a tap endpoint in the web api that communicates with the dashboard 
via websockets.
I've moved a bunch of code from the cli tap.go into utils so that the code 
can be shared between web and CLI. I think we should consider making the 
display more suited to web, but in the short term, reusing the CLI's 
rendering of tap events works.

Adds a Tap page in the Web UI that you can use to make tap requests. 
The form currently only allows you to enter a resource and namespace, 
other filters coming in a follow-up branch.
2018-07-24 14:23:42 -04:00
Kevin Lingerfelt 4b9700933a
Update prometheus labels to match k8s resource names (#1355)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-07-23 15:45:05 -07:00
Brian Smith a98bfb1ca7
Rename `ca-bundle-distributor` to `ca`. (#1340)
`ca-bundle-distributor` described the original role of the program but
`ca` ("Certificate Authority") better describes its current role.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-07-17 14:10:40 -10:00
Brian Smith 0fcfd2bffb
Stop using `installsuffix` when building Go code. (#1327)
* Stop using `installsuffix` when building Go code.

See https://plus.google.com/117192131596509381660/posts/eNnNePihYnK.
`-installsuffix cgo` isn't necessary as of Go 1.10 (where build caching
changed substantially) and it probably wasn't necessary earlier.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-07-16 13:48:50 -10:00
Kevin Lingerfelt e5cce1abaf
Rename CLI from conduit to linkerd (#1312)
* Rename CLI binary
* Update integration tests for new binary name
* Rename --conduit-namespace flag, change default ns
* Rename occurrences of conduit in rest of CLI
* Rename inject and install components
* Remove conduit occurrences in docker files
* Additional miscellaneous cleanup
* Move protobuf definitions to linkerd2 package
* Rename conduit.io labels to use linkerd.io
* Rename conduit-managed segment to linkerd-managed
* Fix conduit references in web project

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-07-12 17:14:07 -07:00
Kevin Lingerfelt 1624a4ba0f
Ensure destination service always sends pod metadata (#1291)
* Ensure destination service always sends pod metadata
* Fix test that relied on hash ordering
* Stop using protobuf structs as map keys, fix logging

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-07-10 15:08:59 -07:00
Oliver Gould 941cad4a9c
Migrate build infrastructure to linkerd2 (#1298)
This PR begins to migrate Conduit to Linkerd2:
* The proxy has been completely removed from this repo, and is now located at
  github.com/linkerd/linkerd2-proxy.
* A `Dockerfile-proxy` has been added to fetch the most-recently published proxy
  binary from build.l5d.io.
* Proxy-specific protobuf bindings have been moved to
  github.com/linkerd/linkerd2-proxy-api.
* All docker images now use the gcr.io/linkerd-io registry.
* `inject` now uses `LINKERD2_PROXY_` environment variables
* Go paths have been updated to reflect the new (future) repo location.
2018-07-09 15:38:38 -07:00
Kevin Lingerfelt 6f804d600c
Remove docker-compose / simulate-proxy environment (#1294)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-07-06 17:44:35 -07:00
Risha Mars 9050b2d312
Fix authority stat queries when a --from flag is used (#1289)
* Fix bug where we were using dst_authorities as a group by instead of authorities
* Add test to make sure we don't dst_authorities

Previously, we were only checking to make sure we didn't add 
dst_authorities in the query labels in promDstQueryLabels but we 
weren't checking the groupBy labels in promDstGroupByLabelNames - 
this caused us to try to query for dst_authorities when a --from 
query was sent. There are no dst_authorities, so there would be no 
named results.
2018-07-06 17:29:08 -07:00
Kevin Lingerfelt 693acdbf26
Update ListPods endpoint to return all pod owner types (#1275)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-07-05 15:14:16 -07:00
Risha Mars ba2e13c731
Small tweaks to error modal, add Reason to api error response (#1246)
- Add Reason to the error data passed from the api
- Rewrite error logic in the UI to try to make it clearer
- Show 0/0 pods meshed instead of 0/0 pods meshed (N/A) if 0 pods are meshed
2018-07-03 17:14:27 -07:00
Risha Mars 2002a8ba50
Add more tests for the stat summary endpoint --from flags (#1237)
Also add dst_ labels in the metrics we mock, so we can do --from queries with results.
2018-07-03 14:30:15 -07:00
Kevin Lingerfelt f0ba8f3ee8
Fix owner types in TLS identity strings (#1257)
* Fix owner types in TLS identity strings
* Update documentation on TLSIdentity struct

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-07-03 14:20:24 -07:00
Brian Smith 252a8d39d3
Generate an ephemeral CA at startup that distributes TLS credentials (#1245)
Create a ephemeral, in-memory TLS certificate authority and integrate it into the certificate distributor.

Remove the re-creation of deleted ConfigMaps; this will be added back later in #1248.

Signed-off-by: Brian Smith brian@briansmith.org
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-07-02 18:09:31 -10:00
Oliver Gould 20276b106e
tap: Support `tls` labeling (#1244)
The proxy's metrics are instrumented with a `tls` label that describes
the state of TLS for each connection and associated messges.

This same level of detail is useful to get in `tap` output as well.

This change updates Tap in the following ways:
* `TapEvent` protobuf updated:
  * Added `source_meta` field including source labels
  * `proxy_direction` enum indicates which proxy server was used.
* The proxy adds a `tls` label to both source and destination meta indicating the state of each peer's connection
* The CLI uses the `proxy_direction` field to determine which `tls` label should be rendered.
2018-07-02 17:19:20 -07:00
Kevin Lingerfelt a685dba873
Use parent name instead of pod name in identity string (#1236)
* Use parent name instead of pod name in identity string
* Update protobuf comment

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-06-29 14:28:13 -07:00
Risha Mars 8ebc969d2f
Fix bug where we wouldn't run stat table assertions if we expected 0 results (#1235)
I realized that our stat summary expectation checker would only check the actual
proto responses against the expectations if the expectations were non-empty.

Problem
If we expected empty results and the api returned actual results, we never actually 
check those results against the expectations.

The bug can be reproduced by replacing any nonzero metric we expect in 
expectedResponse with expectedResponse: genEmptyResponse() 
The tests on master will still pass.

Solution
Remove this line and ensure we get the expected number of stat tables.
2018-06-29 14:23:14 -07:00
Risha Mars 5ed7fc563c
Add controller component pod uptimes to the ServiceMesh page (#1205)
- Return pod uptimes from the GetPods endpoint
- Adds filtering by namespace to api.GetPods
- Adds a --namespace filter to conduit get pods
- Adds pod uptimes to the controller component toolitps on the ServiceMesh page
- Moves the ServiceMesh page back to using /api/pods
2018-06-28 15:42:00 -07:00
Risha Mars 5963b2ac24
Better format empty errors (#1202) 2018-06-28 14:52:04 -07:00
Risha Mars 68586fe697
Add the ability to query stats by authority (#1181)
Adds the ability to query by a new non-kubernetes resource type, "authorities",
in the StatSummary api.

This includes an extensive refactor of stat_summary.go to deal with non-kubernetes 
resource types.

- Add documentation to Resource in the public api so we can use it for authority
- Handle non-k8s resource requests in the StatSummary endpoint
- Rewrite stat summary fetching and parsing to handle non-k8s resources
- keys stat summary metric handling by Resource instead of a generated string
- Adds authority to the CLI
- Adds /authorities to the Web UI
- Adds some more stat integration and unit tests
2018-06-28 14:31:44 -07:00
Brian Smith cca8e7077d
Add TLS support to `conduit inject`. (#1220)
* Add TLS support to `conduit inject`.

Add the settings needed to enable TLs when `--tls=optional` is passed on the
commend line. Later the requirement to add `--tls` will be removed.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-06-27 16:04:07 -10:00
Kevin Lingerfelt f502596577
Update go bindings for destination.proto change (#1223)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-06-27 18:26:13 -07:00
Kevin Lingerfelt b8ba627ee5
Update dest service with a different tls identity strategy (#1215)
* Update dest service with a different tls identity strategy
* Send controller namespace as separate field

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-06-27 11:40:02 -07:00
Kevin Lingerfelt af85d1714f
Add probes and log termination policy for distributor (#1178)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-06-21 14:02:41 -07:00
Kevin Lingerfelt 12f869e7fc
Add CA certificate bundle distributor to conduit install (#675)
* Add CA certificate bundle distributor to conduit install
* Update ca-distributor to use shared informers
* Only install CA distributor when --enable-tls flag is set
* Only copy CA bundle into namespaces where inject pods have the same controller
* Update API config to only watch pods and configmaps
* Address review feedback

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-06-21 13:12:21 -07:00
Kevin Lingerfelt 682b0274b5
Add controller admin servers and readiness probes (#1168)
* Add controller admin servers and readiness probes
* Tweak readiness probes to be more sane
* Refactor based on review feedback

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-06-20 17:32:44 -07:00
Risha Mars 0ff1bb4ad8
Don't allow stat requests for named resources in --all-namespaces (#1163)
Don't allow the CLI or Web UI to request named resources if --all-namespaces is used.

This follows kubectl, which also does not allow requesting named resources
over all namespaces.

This PR also updates the Web API's behaviour to be in line with the CLI's. 
Both will now default to the default namespace if no namespace is specified.
2018-06-20 12:59:31 -07:00
Risha Mars 46c99febf2
Don't panic on stats that aren't included in StatAllResourceTypes (#1154)
Problem
`conduit stat` would cause a panic for any resource that wasn't in the list 
of StatAllResourceTypes
This bug was introduced by https://github.com/runconduit/conduit/pull/1088/files

Solution
Fix writeStatsToBuffer to not depend on what resources are in StatAllResourceTypes
Also adds a unit test and integration test for `conduit stat ns`
2018-06-19 17:00:16 -07:00
Kevin Lingerfelt 9a66641517
dest service: close open streams on shutdown (#1156)
* dest service: close open streams on shutdown
* Log instead of print in pkg packages
* Convert ServerClose to a receive-only channel

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-06-19 16:00:56 -07:00
Risha Mars e2c2f19d2c
Propagate errors in conduit containers to the api (#1117)
- It would be nice to display container errors in the UI. This PR gets the pod's container 
statuses and returns them in the public api

- Also add a terminationMessagePolicy to conduit's inject so that we can capture the 
proxy's error messages if it terminates
2018-06-14 16:22:31 -07:00
Oliver Gould 2a4f38b9e7
proto: Use explicit `go_package` option (#1120)
protobuf has a `go_package` option that can be used to explicitly name
Go packages such that they can be imported without additional rewrites.

This allows us to store proto files without additional, redundant
directories (which were used for packaging hints, previously).

This change adds an explicit `go_package` to all .proto files and
updates `bin/protoc-go.sh` to ensure these packages are output into
$GOPATH (so that the go_package can be absolute). This removes the need
to manually rewrite imports in bin/protoc-go.sh.
2018-06-14 14:03:00 -07:00
Kevin Lingerfelt 13aaa82c95
Allow k8s API clients to watch a subset of resources (#1118)
* Allow k8s API clients to watch a subset of resources
* Sort resources

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-06-14 11:09:01 -07:00
Kevin Lingerfelt 9f1df963e9
Move controller/util and web/util packages to pkg (#1109)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-06-13 11:25:56 -07:00
Kevin Lingerfelt b6d429e80d
dst svc: use shared informer instead of custom endpoints informer (#1079)
* Update destination service ot use shared informer instead of custom endpoints informer
* Add additional tests for dst svc endpoints watcher
* Remove service ports when all listeners unsubscribed
* Update go deps

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-06-13 11:11:57 -07:00
Kevin Lingerfelt bd1d1af38b
dst svc: use shared informer instead of pod watcher (#1073)
* Update desintation service to use shared informer instead of pod watcher
* Add const for pod IP index name

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-06-12 18:09:47 -07:00
Kevin Lingerfelt 6e66f6d662
Rename Lister to API and expose informers as well as listers (#1072)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-06-12 10:27:55 -07:00
Risha Mars 7d4c4aa290
CLI: print resources in the same order every time stat all is run (#1088)
Previously, in conduit stat all we would just print the map of stat results, which 
resulted in the order in which stats were displayed varying between prints.

Fix:
Define an array, k8s.StatAllResourceTypes and use the order in this array to print 
the map; ensuring a consistent print order every time the command is run.
2018-06-08 15:02:17 -07:00
Ivan Sim 11d1d55632 Filter out failed and completed pods from stats summary result (#1010) (#1065)
Both the conduit stat command and web UI are showing failed and completed pods.
This change filters out those pods before returning the result to the client.

Fixes #1010

Signed-off-by: Ivan Sim <ihcsim@gmail.com>
2018-06-05 13:19:48 -07:00
Kevin Lingerfelt eebc612d52
Add install flag for sending tls identity info to proxies (#1055)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-06-04 16:55:06 -07:00
Kevin Lingerfelt ec2433e9bd
Update controller to use 'tls' metric label (#1044)
* Update controller to use 'tls' metric label
* Fix meshed column formatter

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-06-01 16:44:33 -07:00
Eliza Weisman 5a42ce357e
proto: Add TLS identity to WeightedAddr message (#1041)
Required for #1008.

This PR adds the `TlsIdentity` message to the Destination service proto,
to describe what strategy the proxy should use for verifying an endpoint's TLS
certificates. It also adds a `TlsIdentity` field to the `WeightedAddr` message.

Currently, there is one possible variant for `TlsIdentity`, `KubernetesPodName`, 
which consists of the Kubernetes pod name of the endpoint, the namespace of
the endpoint, and the namespace of that pod's Conduit control plane. The proxy
should attempt to connect over TLS if the control plane namespace matches its 
own control plane namespace. The pod name and namespace are used to verify 
the endpoint's TLS certificate.

See https://github.com/runconduit/conduit/issues/386#issuecomment-392948046.

This change was initially part of #1008, but I factored it out to make the diff
smaller.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-05-31 11:48:25 -07:00
Risha Mars ffabdefc6c
Add queries to prometheus to determine number of fully meshed requests (#983)
- Update the `response_total` prometheus query of the StatSummary endpoint to also
break queries out by a `meshed` label. 
- Add a 'Secured' column to the web UI/CLI stat displays, which indicate the percentage of traffic
starting and ending in the mesh

This meshed label is used in the CLI/Web UI to display a column of the percentage of traffic that
starts/ends in the mesh. (Which is a proxy indicator for whether that traffic is 'secured' when we
add TLS by default for intra mesh requests).

The `meshed` label is not yet added anywhere, so until it is supplied by the proxy, all traffic will
show up as 0% secured in the web/CLI.
2018-05-24 11:05:09 -07:00
Andrew Seigner 8a3b1a638a
Introduce meshed label in simulate-proxy (#992)
The proxy does not yet support a `meshed` label.

In anticipation of a `meshed` label in the proxy, introduce this label
in `simulate-proxy`, for testing.

Relates to #306 and #386.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>

secured -> meshed

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-05-23 15:06:11 -07:00
Andrew Seigner 84e6eb5c87
Fix nil pointer dereference in StatSummary (#991)
The StatSummary endpoint was dereferencing
StatSummaryRequest.Selector.Resource, causing a panic when it received
an empty request.

Fix StatSummary to use the nil-friendly
StatSummaryRequest.GetSelector().GetResource() methods, and add a test
to validate.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-05-23 13:21:49 -07:00
Risha Mars 1e6434f6de
Fix bug in the public-api where conduit stat params were ignored (#971)
* Fix bug where we were dropping parts of the StatSummaryRequest
* Add tests for prometheus query strings and for failed cases

Problem
In #928 I rewrote the stat api to handle 'all' as a resource type. To query for all resource types, 
we would copy the Resource, LabelSelector and TimeWindow of the original request, and then 
go through all the resource types and set Resource.Type for each resource we wanted to get.
The bug is that while we copy over some fields of the original request, we didn't copy over all 
of them - namely Resource.Name and the Outbound resource. So the Stat endpoint would 
ignore any --to or --from flags, and would ignore requests for a specific named resource.

Solution
Copy over all fields from the request.

I've also added tests for this case. In this process I've refactored the stat_summary_test code 
to make it a bit easier to read/use.
2018-05-18 16:06:06 -07:00
Kevin Lingerfelt 36ec391dbe
Go: update k8s dependencies to 1.10.2 (#962)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-05-17 15:46:58 -07:00
Risha Mars b8dc83f9d2
Modify the Stat API to handle requests for resource type "all" (#928)
Allow the Stat endpoint in the public-api to accept requests for resourceType "all".

Currently, this queries Pods, Deployments, RCs and Services, but can be modified 
to query other resources as well.

Both the CLI and web endpoints now work if you set resourceType to all.

e.g. `conduit stat all`
2018-05-11 14:35:37 -07:00
Kevin Lingerfelt 4e8e1eb84d
CLI: Fix validation for service stats (#935)
* CLI: Fix validation for service stats
* Address review feedback

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-05-11 10:28:49 -07:00
Oliver Gould a786089fd6
docker: Cache versionless builds before building versioned go binaries (#921)
The way that git-related version information is linked into go binaries
busts Docker's cache such that every commit causes all binaries to
rebuilt.

In order to ameliorate this, we can build each binary once without
version information first so that its artifacts are cached. When Go
sources are not changed and only the version information changes, builds
are 4.3x faster than before (from 5+ minutes to <90s).

On `master`

Branch off of master and build (mostly cached):

```
:; time DOCKER_TRACE=1 bin/docker-build
...
DOCKER_TRACE=1 bin/docker-build  9.10s user 6.30s system 5% cpu 4:26.47 total
```

Rebuild without changing anything (highly cached):

```
:; time DOCKER_TRACE=1 bin/docker-build
...
DOCKER_TRACE=1 bin/docker-build  9.23s user 6.04s system 47% cpu 32.017 total
```

Update only the git sha and rebuild:

```
:; git ci -am 'bump it' --allow-empty
[ver/eg 2749eb3] bump it
:; time DOCKER_TRACE=1 bin/docker-build
...
DOCKER_TRACE=1 bin/docker-build  8.55s user 6.08s system 4% cpu 5:22.25 total
```

On this branch:

Rebuild without changing anything (highly cached):

```
:; time DOCKER_TRACE=1 bin/docker-build
...
DOCKER_TRACE=1 bin/docker-build  8.94s user 5.97s system 46% cpu 32.257 total
```

Update only the git sha and rebuild:

```
:; git ci -am 'bump it' --allow-empty
[ver/go-docker-cache-versionless 77a80b5] bump it
:; time DOCKER_TRACE=1 bin/docker-build
...
DOCKER_TRACE=1 bin/docker-build-cli-bin  2.02s user 1.34s system 9% cpu 34.144 total
```
2018-05-10 10:22:09 -07:00
Risha Mars 416381cdfd
Fix bug where GetPodsFor(pod) was returning all pods in a namespace (#900)
* Fix bug where GetPodsFor(pod) was returning all pods in a namespace

Problem
In lister.GetPodsFor, when the input object was a pod, we would return all the pods in the namespace. I would expect GetPodsFor(pod) to return only one pod - the pod itself.

Cause
The cause of this is that when the object type was pod we were setting the selector to selector = labels.Everything() which gets all the pods in the namespace.

Fix
Special case GetPodsFor(pod) to return the pod itself, rather than looking up pods via labels.
2018-05-08 13:52:49 -07:00
Risha Mars f94856e489
Modify the Stat endpoint to also return the number of failed conduit pods (#895)
* Modify the Stat endpoint to also return the count of failed pods
* Add comments explaining pod count stats
* Rename total pod count to running pod count

This is to support the service mesh overview page, as I'd like to include an indicator of
failed pods there.
2018-05-08 10:35:21 -07:00
Brian Smith c5d2dab8bd
Remove special support for ExternalName services (#764)
After this was implemented we found that ExternalName services are
represented in DNS as CNAMEs, which means that the proxy's DNS
fallback logic can be used instead of doing DNS in the control
plane. Besides simplifying the controller, this will also increase
fidelity with the proxied pods' DNS configuration (improve
transparency).

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-04-25 11:53:33 -10:00
Andrew Seigner dce31b888f
Deprecate Tap, rename TapByResource to Tap (#844)
The `conduit tap` command is now deprecated.

Replace `conduit tap` with `connduit tapByResource`. Rename tapByResource
to tap. The underlying protobuf for tap remains, the tap gRPC endpoint now
returns Unimplemented.

Fixes #804

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-25 12:24:46 -07:00
Andrew Seigner a0a9a42e23
Implement Public API and Tap on top of Lister (#835)
public-api and and tap were both using their own implementations of
the Kubernetes Informer/Lister APIs.

This change factors out all Informer/Lister usage into the Lister
module. This also introduces a new `Lister.GetObjects` method.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-24 18:10:48 -07:00
Andrew Seigner 03d4684d3b
Introduce K8s Lister, integrate simulate-proxy (#829)
The Kubernetes client-go Informer/Lister APIs are implemented in several
parts of the code base.

This change introduces a Lister module, providing Informer/Lister
capability through a simple interface. Once this merges, we can follow
up with moving public-api and tap onto Lister.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-23 16:44:19 -07:00
Andrew Seigner baf4ea1a5a
Implement TapByResource in Tap Service (#827)
The TapByResource endpoint was previously a stub.

Implement end-to-end tapByResource functionality, with support for
specifying any kubernetes resource(s) as target and destination.

Fixes #803, #49

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-23 16:13:26 -07:00
Eliza Weisman d9112abc93
proxy: remove unused metrics (#826)
This PR removes the unused `request_duration_ms` and `response_duration_ms` histogram metrics from the proxy. It also removes them from the `simulate-proxy` script's output, and from `docs/proxy-metrics.md`

Closes #821
2018-04-23 16:05:20 -07:00
Andrew Seigner 39eccb09e2
cli: standardize kubernetes resource parsing (#830)
The Tap command leveraged new cli parsing code, enabling Kubernetes
resources specified as `(TYPE [NAME] | TYPE/NAME)`. The Stat command
did not use this.

Modify the Stat command to use the same cli flag parsing code as Tap.
Remove the to/from-resource flags from Stat.

Fixes #792

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-23 15:17:42 -07:00
Eliza Weisman 8147a363e9
Make simulate-proxy match proxy output (#822)
This PR makes two changes to the `simulate-proxy` script: 

1.  Removed the `protocol={"http", "tcp"}` label from TCP metrics. The proxy no longer adds this label (see https://github.com/runconduit/conduit/pull/785#discussion_r182563499).

2. Fixed failed responses being labeled with `classification="fail"` rather than `classification="failure"` (the label the proxy sets). I noticed that while I was here and decided to fix it as well.

Note that the first change required some minor changes to the `proxyMetricCollectors` struct in `simulate-proxy`; since the label cardinality for TCP open stats decreased by one due to removing the `protocol` label, it's no longer necessary for that struct to `haveCounterVec`/`GaugeVec` pointers for these stats. It now owns the actual `Counter`/`Gauge` instead. This means that the metric vecs that are created to be labeled for `inbound` and `outbound` are now stored as variables in the `newSimulatedProxy` function rather than going in a `proxyMetricCollectors` struct first. This shouldn't impact behaviour at all.
2018-04-20 12:11:57 -07:00
Andrew Seigner 79bdc638b3
Service support in stat command (#809)
The `stat` command did not support `service` as a resource type.

This change adds `service` support to the `stat` command. Specifically:
- as a destination resource on `--to` commands
- as a target resource on `--from` commands

Fixes #805

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-19 16:51:20 -07:00
Eliza Weisman 6eec6256f7
Add transport-level metrics to simulate-proxy (#811)
This PR adds the transport-level metrics described in #742 to the `simulate-proxy` script. This will be useful while adding these metrics to the Grafana dashboard and/or CLI.

Closes #793
2018-04-19 15:18:43 -07:00
Andrew Seigner 293e00bc3e
Introduce tapByResource cli command (#802)
The existing `tap` command is being deprecated.

Introduce a `tapByResource` cli command. It supports tapping a Kubernetes
resource or collection of resources, optionally filtered by outbound resources.
This command will eventually replace `tap`.

Part of #778

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-19 14:44:23 -07:00
Kevin Lingerfelt 653dc6bfaa
Add replication controller stats in CLI (#794)
* Add replication controller stats in CLI
* Fix pod status in stat summary tests

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-04-18 18:12:14 -07:00
Oliver Gould 06dd8d90ee
Introduce the TapByResource API (#778)
This changes the public api to have a new rpc type, `TapByResource`.
This api supersedes the Tap api. `TapByResource` is richer, more closely 
reflecting the proxy's capabilities.

The proxy's Tap api is extended to select over destination labels,
corresponding with those returned by the Destination api.

Now both `Tap` and `TapByResource`'s responses may include destination
labels.

This change avoids breaking backwards compatibility by:

* introducing the new `TapByResource` rpc type, opting not to change Tap
* extending the proxy's Match type with a new, optional, `destination_label` field.
* `TapEvent` is extended with a new, optional, `destination_meta`.
2018-04-18 15:37:07 -07:00
Andrew Seigner 1e4ac8fda8
Destination service provides pod-template-hash (#784)
The Destination service does not provide ReplicaSet information to the
proxy.

The `pod-template-hash` label approximates selecting over all pods in a
ReplicaSet or ReplicationController. Modify the Destination service to
provide this label to the proxy.

Relates to #508 and #741

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-18 14:41:27 -07:00
Kevin Lingerfelt 71a51afb40
Expose pod stats in CLI, web UI, and Grafana (#788)
* Expose pod stats in CLI, web UI, and Grafana
* Fix js api helpers test
* Add outbound traffic stats to pod dashboard

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-04-18 11:26:47 -07:00
Andrew Seigner 9e8cce0838
Destination service returns "Running" pod labels (#781)
When the Destination sees an IP address, it looks up Pods by that IP,
and associates Pod label data to it. If the lookup by IP returned more
than one Pod, it simply picked the first one. This is not correct,
specifically in cases where one pod is in a Running state, and others
are not.

Modify the Destination service to only return label data for Pods in the
Running state.

Fixes #773

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-17 14:42:54 -07:00
Andrew Seigner 727521f914
Permit arbitrary time windows in public-api (#774)
The public-api previously only permitted 4 hard-coded time windows:
10s, 1m, 10m, 1h. This was primarily a relic of the recently removed
telemetry system.

Modify the public-api to validate the time string, but allow for any
window size, which is then passed through to Prometheus.

Fixes #686

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-16 17:37:17 -07:00
Kevin Lingerfelt 11a4359e9a
Misc cleanup following the telemetry rewrite (#771)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-04-16 15:51:07 -07:00
Andrew Seigner 77fb6d3709
Add namespace as a resource type in public-api (#760)
* Add namespace as a resource type in public-api

The cli and public-api only supported deployments as a resource type.

This change adds support for namespace as a resource type in the cli and
public-api. This also change includes:
- cli statsummary now prints `-`'s when objects are not in the mesh
- cli statsummary prints `No resources found.` when applicable
- removed `out-` from cli statsummary flags, and analagous proto changes
- switched public-api to use native prometheus label types
- misc error handling and logging fixes

Part of #627

Signed-off-by: Andrew Seigner <siggy@buoyant.io>

* Refactor filter and groupby label formulation

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>

* Rename stat_summary.go to stat.go in cli

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>

* Update rbac privileges for namespace stats

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-04-13 16:53:01 -07:00
Andrew Seigner 21886760c6
Use apps/v1beta2 for Kubernetes 1.8 compatibility (#762)
Conduit was relying on apps/v1 to Deployment and ReplicaSet APIs.
apps/v1 is not available on Kubernetes 1.8. This prevented the
public-api from starting.

Switch Conduit to use apps/v1beta2. Also increase the Kubernetes API
cache sync timeout from 10 to 60 seconds, as it was taking 11 seconds on
a test cluster.

Fixes #761

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-13 12:08:16 -07:00
Kevin Lingerfelt fb15fe7c1a
Remove the telemetry service (#757)
* Remove the telemetry service

The telemetry service is no longer needed, now that prometheus scrapes
metrics directly from proxies, and the public-api talks directly to
prometheus. In this branch I'm removing the service itself as well as
all of the telemetry protobuf, and updating the conduit install command
to no longer install the service. I'm also removing the old version of
the stat command, which required the telemetry service, and renaming the
statsummary command to stat.

* Fix time window tests

* Remove deprecated controller scrape config

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-04-13 11:21:29 -07:00
Andrew Seigner e9b209829d
Handle NaN metrics (#750)
The Prometheus client sometimes returns NaN if a calculation is invalid,
such as histogram_quantile when no requests have occurred.

Add IsNaN check in the public-api and set output to zero.

Fixes #747

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-12 15:21:00 -07:00
Andrew Seigner 624b87f743
Implement ListPods in public-api (#743)
The ListPods endpoint's logic resides in the telemetry service, which is
going away.

Move ListPods logic into public-api, use new k8s informer APIs.

Fixes #694

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-11 17:53:57 -07:00
Kevin Lingerfelt 47caf1ca07
Add --all-namespaces flag to CLI statsummary command (#745)
* Add --all-namespaces flag to CLI statsummary command

* Fix statsummary output formatting

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-04-11 16:40:25 -07:00
Andrew Seigner 259fdcd134
Add latency stats in new stat summary endpoint (#737)
The new StatSummary endpoint was only providing request volume and
successs rate information.

Add support for retrieving latency stats via StatSummary. Also make
all prometheus calls in parallel, and implement kubernetes test
fixtures.

Fixes #681

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-11 11:58:32 -07:00
Kevin Lingerfelt e1e1b6b599
Controller: add more destination labels, fix service label (#731)
* Add more destination labels, fix service label

* Update owner labels to match proxy metrics docs

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-04-11 10:44:52 -07:00
Kevin Lingerfelt 91c359e612
Switch public API to use cached k8s resources (#724)
* Switch public API to use cached k8s resources
* Move shared informer code to separate goroutine
* Fix spelling issue

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-04-10 11:39:31 -07:00
Andrew Seigner 3a341abe9a
Fix success rate calculation in public api (#723)
The success rate calculation relies on the `classification` label, but
was incorrectly specifying `fail` rather than `failure`.

Fix public api to specify `failure`. Also re-org public api tests for
easier Kubernetes and Prometheus mocking.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-10 11:04:04 -07:00
Andrew Seigner 716b392231
Move StatSummary logic into grpc server (#717)
The StatSummary logic was implemented as a method on http_server.

Move the StatSummary logic into grpc_server, for consistency with the
other endpoints.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-06 16:46:15 -07:00
Andrew Seigner 50c323c617
Use canonical k8s names, fix prom labels (#702)
The new statsummary command accepted friendly k8s names, which worked
for k8s queries, but Prometheus requires a specific key.

Modify the statsummary query to map friendly k8s names to canonical k8s
names when constructing the query. Then during the query, map the
canonical k8s name to a specific Prometheus label.

Fixes #695

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-06 12:34:54 -07:00
Risha Mars 2f5b5ea5f2
Start implementing conduit stat summary endpoint (#671)
Start implementing new conduit stat summary endpoint. 
Changes the public-api to call prometheus directly instead of the
telemetry service. Wired through to `api/stat` on the web server,
as well as `conduit statsummary` on the CLI. Works for deployments only.

Current implementation just retrieves requests and mesh/total pod count 
(so latency stats are always 0). 

Uses API defined in #663
Example queries the stat endpoint will eventually satisfy in #627

This branch includes commits from @klingerf 

* run ./bin/dep ensure
* run ./bin/update-go-deps-shas
2018-04-05 17:05:06 -07:00
Andrew Seigner 28d5007cdf
Harmonize Prometheus label usage (#690)
The Destination service used slightly different labels than the
telemetry pipeline expected, specifically, prefixed with `k8s_*`.

Make all Prometheus labels consistent by dropping `k8s_*`. Also rename
`pod_name` to `pod` for consistency with `deployement`, etc. Also update
and reorganize `proxy-metrics.md` to reflect new labelling.

Fixes #655

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-05 15:09:06 -07:00
Risha Mars d1a39ea6bf
Define a new telemetry Stat API (#663)
* Define a new telemetry Stat API

Proposal definition for a new Stat API, for the purposes of satisfying the queries proposed in #627.
StatSummary will replace Stat once implemented and the original Stat deleted.
2018-04-03 14:45:58 -07:00