Commit Graph

399 Commits

Author SHA1 Message Date
Eliza Weisman 846975a190
Remove proxy bind timeout from CLIs (#2017)
This branch removes the `--proxy-bind-timeout` flag from the 
`linkerd inject` and `linkerd install` CLI commands, and the
`LINKERD2_PROXY_BIND_TIMEOUT` environment variable from their output.
This is in preparation for removing that timeout from the proxy (as
described in #2013). 

I thought it was prudent to remove this from the CLIs before removing it
from the proxy, so we can't create a situation where the CLIs produce
output that results in broken proxy containers.

Fixes #2013

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2019-01-24 15:34:09 -08:00
Alex Leong d542571b65
GetProfiles should always respond with a (possibly empty) profile immediately (#2146)
When `GetProfiles` is called for a destination that does not have a service profile, the proxy-api service does not return any messages until a service profile is created for that service.  This can be interpreted as hanging, and can make it difficult to calculate response latency metrics.

Change the behavior of the API to always return a service profile message immediately.  If the service does not have a service profile, the default service profile is returned.

Signed-off-by: Alex Leong <alex@buoyant.io>
2019-01-24 15:22:14 -08:00
zak 8c413ca38b Wire up stats commands for daemonsets (#2006) (#2086)
DaemonSet stats are not currently shown in the cli stat command, web ui
or grafana dashboard. This commit adds daemonset support for stat.

Update stat command's help message to reference daemonsets.
Update the public-api to support stats for daemonsets.
Add tests for stat summary and api.

Add daemonset get/list/watch permissions to the linkerd-controller
cluster role that's created using the install command.
Update golden expectation test files for install command
yaml manifest output.

Update web UI with daemonsets
Update navigation, overview and pages to list daemonsets and the pods
associated to them.
Add daemonset paths to server, and ui apps.

Add grafana dashboard for daemonsets; a clone of the deployment
dashboard.

Update dependencies and dockerfile hashes

Add DaemonSet support to tap and top commands

Fixes of #2006

Signed-off-by: Zak Knill <zrjknill@gmail.com>
2019-01-24 14:34:13 -08:00
Alex Leong 32efab41b5
Fix panic when routes is called in single-namespace mode (#2123)
Fixes #2119 

When Linkerd is installed in single-namespace mode, the public-api container panics when it attempts to access watch service profiles.

In single-namespace mode, we no longer watch service profiles and return an informative error when the TopRoutes API is called.

Signed-off-by: Alex Leong <alex@buoyant.io>
2019-01-23 16:47:05 -08:00
Alena Varkockova 28f662c9c6 Introduce resource selector and deprecate namespace field for ListPods (#2025)
* Introduce resource selector and deprecate namespace field for ListPods
* Changes from code review
* Properly deprecate the field
* Do not check for nil
* Fix the mockProm usage
* Protoc changes revert
* Changed from code review

Signed-off-by: Alena Varkockova <varkockova.a@gmail.com>
2019-01-23 10:35:55 -08:00
Dennis Adjei-Baah f9cd9366d9
Surface logs from control plane pods (#2037)
When debugging control plane issues or issues pertaining to a linkerd proxy, it can be cumbersome to get logs from affected containers quickly. 

This PR adds a new `logs` command to the Linkerd CLI to surface log lines from any container within linkerd's control plane. This feature relies heavily on [stern](https://github.com/wercker/stern), which already provides this behavior. This PR integrates this package into the Linkerd CLI to allow users to quickly retrieve logs whenever they run into issues when using Linkerd. 

Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
2019-01-16 19:24:42 -08:00
Alex Leong a562f8b9fd
Improve routes command to list all routes (#2066)
Fixes #1875 

This change improves the `linkerd routes` command in a number of important ways:

* The restriction on the type of the `--to` argument is lifted and any resource type can now be used.  Try `--to ns/books`, `--to po/webapp-ABCDEF`, `--to au/linkerd.io`, or even `--to svc`.
* All routes for the target will now be populated in the table, even if there are no Prometheus metrics for that route.
* [UNKNOWN] has been renamed to [DEFAULT]
* The `Service/Authority` column will now list `Service` in all cases except for when an authority target is explicitly requested.

```
$ linkerd routes deploy/traffic --to deploy/webapp
ROUTE                       SERVICE   SUCCESS      RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99
GET /                        webapp   100.00%   0.5rps          50ms         180ms         196ms
GET /authors/{id}            webapp   100.00%   0.5rps         100ms         900ms         980ms
GET /books/{id}              webapp   100.00%   0.9rps          38ms          93ms          99ms
POST /authors                webapp   100.00%   0.5rps          35ms          48ms          50ms
POST /authors/{id}/delete    webapp   100.00%   0.5rps          83ms         180ms         196ms
POST /authors/{id}/edit      webapp     0.00%   0.0rps           0ms           0ms           0ms
POST /books                  webapp    45.16%   2.1rps          75ms         425ms         485ms
POST /books/{id}/delete      webapp   100.00%   0.5rps          30ms          90ms          98ms
POST /books/{id}/edit        webapp    56.00%   0.8rps          92ms         875ms         975ms
[DEFAULT]                    webapp     0.00%   0.0rps           0ms           0ms           0ms
```

This is all made possible by a shift in the way we handle the destination resource.  When we get a request with a `ToResource`, we use the k8s API to find all Services which include at least one pod belonging to that resource.  We then fetch all service profiles for those services and display the routes from those serivce profiles.  

This shift in thinking also precipitates a change in the TopRoutes API where we no longer need special cases for `ToAll` (which can be specified by `--to au`) or `ToAuthority` (which can be specified by `--to au/<authority>`) and instead can use a `ToResource` to handle all cases.

Signed-off-by: Alex Leong <alex@buoyant.io>
2019-01-16 17:15:35 -08:00
Andrew Seigner 92f2cd9b63
Update check and inject output (#2087)
The outputs of the `check` and `inject` commands did not vary much
between successful and failed executions, and were a bit verbose and
challenging to parse.

Reorganize output of `check` and `inject` commands, to provide more
output when errors occur, and less output when successful.

Specific changes:

`linkerd check`
- visually group checks by category
- introduce `hintURL`'s, to provide doc links when checks fail
- add spinners when retrying, remove additional retry lines
- colored unicode characters to indicate success/warning/failure

`linkerd inject`
- modify default output to mirror `kubectl apply`
- only output non-successful inject reports
- support `--verbose` flag to output all inject reports

Fixes #1471, #1653, #1656, #1739

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-01-16 15:14:14 -08:00
Alex Leong 771542dde2
Add support for retries (#2038) 2019-01-16 14:13:48 -08:00
Kevin Lingerfelt ed3fbd75f3
Setup port-forwarding for linkerd dashboard command (#2052)
* Setup port-forwarding for linkerd dashboard command
* Output port-forward logs when --verbose flag is set

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2019-01-10 10:16:08 -08:00
Andrew Seigner a91c77d0bf
Followups from lint/comment changes (#2032)
This is a followup branch from #2023:
- delete `proxy/client.go`, move code to `destination-client`
- move `RenderTapEvent` and stat functions from `util` to `cmd`

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-01-02 15:28:09 -08:00
Andrew Seigner 1c302182ef
Enable lint check for comments (#2023)
Commit 1: Enable lint check for comments

Part of #217. Follow up from #1982 and #2018.

A subsequent commit will fix the ci failure.

Commit 2: Address all comment-related linter errors.

This change addresses all comment-related linter errors by doing the
following:
- Add comments to exported symbols
- Make some exported symbols private
- Recommend via TODOs that some exported symbols should should move or
  be removed

This PR does not:
- Modify, move, or remove any code
- Modify existing comments

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-01-02 14:03:59 -08:00
Kevin Lingerfelt f1b0983f72
Add go linting to CI config (#2018)
* Add go linting to CI config
* Fix lint warnings
* Add note about bin/lint script in TEST.md

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-12-20 15:33:09 -08:00
Radu M 07cbfe2725 Fix most golint issues that are not comment related (#1982)
Signed-off-by: Radu Matei <radu@radu-matei.com>
2018-12-20 10:37:47 -08:00
Kevin Lingerfelt 10d5ebd064
Fix flaky certificate controller test (#2009)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-12-19 17:10:17 -08:00
Alex Leong cb3fa1245b
Remove TLS column from routes command output (#1956)
Signed-off-by: Alex Leong <alex@buoyant.io>
2018-12-14 21:52:49 -08:00
Kevin Lingerfelt 0866bb2a41
Remove runAsGroup field from security context settings (#1986)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-12-13 15:12:13 -08:00
Kevin Lingerfelt 86e95b7ad3
Disable serivce profiles in single-namespace mode (#1980)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-12-13 14:37:18 -08:00
Kevin Lingerfelt 00de48bd26
Fix proxy-api handling of named target ports (#1973)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-12-12 13:42:47 -08:00
Cody Vandermyn 8e4d9d2ef6 add securityContext with runAsUser: {{.ControllerUID}} to the various cont… (#1929)
* add securityContext with runAsUser: {{.ProxyUID}} to the various containers in the install template
* Update golden to reflect new additions
* changed to a different user id than the proxy user id
* Added a controller-uid install option
* change the port that the proxy-injector runs
* The initContainers needs to be run as the root user.
* move security contexts to container level

Signed-off-by: Cody Vandermyn <cody.vandermyn@nordstrom.com>
2018-12-11 11:51:28 -08:00
Alejandro Pedraza 8c67bfbcc6 Add parameter to stats API to skip retrieving Prometheus stats (#1871)
* Add parameter to stats API to skip retrieving Prometheus stats

Used by the dashboard to populate list of resources.

Fixes #1022

Signed-off-by: Alejandro Pedraza <alejandro.pedraza@gmail.com>

* Prometheus queries check results were being ignored
* Refactor verifyPromQueries() to also test when no prometheus queries
should be generated

* Add test for SkipStats=true

Includes adding ability to public.GenStatSummaryResponse to not generate
basicStats

* Fix previous test
2018-12-10 16:48:12 -08:00
Kevin Lingerfelt 0f8bcc9159
Controller: wait for caches to sync before opening listeners (#1958)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-12-07 11:15:45 -08:00
Alex Leong 04ed200e36
Rename path_regex to pathRegex (#1951)
Rename snake case fields to camel case in service profile spec.  This improves the way they are rendered when the `kubectl describe` command is used.

Signed-off-by: Alex Leong <alex@buoyant.io>
2018-12-06 11:51:33 -08:00
Andrew Seigner bef9479f57
Add input validation for profile command (#1934)
Fixes #1878

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-12-05 15:13:10 -08:00
Alex Leong cbb196066f
Support service profiles for external authorities (#1928)
Add support for service profiles created on external (non-service) authorities.  For example, this allows you to create a service profile named `linkerd.io` which will apply to calls made to `linkerd.io`.

This is done by changing the `LINKERD2_PROXY_DESTINATION_PROFILE_SUFFIXES` to `.` so that the proxy will attempt to lookup a service profile for any authority.  We provide the `--disable-external-profiles` proxy flag to revert this behavior in case it is a problem.

We also refactor the proxy-api implementation of GetProfiles so that it does the profile lookup, regardless of if the authority looks like a Kubernetes service name or not.  To simplify this, support for multiple resolves (which was unused) was removed.

Signed-off-by: Alex Leong <alex@buoyant.io>
2018-12-05 14:32:59 -08:00
Oliver Gould 8f9bb711dd
proxy-api: Expose a flag to control auto-h2-upgrade (#1925)
When debugging issues, it's helpful to disable HTTP/2 upgrading to
simplify diagnostics.

This chagne adds an `enable-h2-ugprade` flag to _proxy-api_. When this
flag is set to false, the proxy-api will not suggest that meshed
endpoints are upgraded to use HTTP/2.

As a follow-up, a flag should be added to `install` to control how the
proxy-api is initialized.
2018-12-05 12:41:20 -08:00
Alex Leong 380ec52a39
Rework routes command to accept any resource (#1921)
We rework the routes command so that it can accept any Kubernetes resource, making it act much more similarly to the stat command.

Signed-off-by: Alex Leong <alex@buoyant.io>
2018-12-05 11:11:34 -08:00
Alex Leong 4f3e55e937
Rename path to path_regex in ServiceProfile CRD (#1923)
We rename path to path_regex in the ServiceProfile CRD to make it clear that this field accepts a regular expression. We also take this opportunity to remove unnecessary line anchors from regular expressions now that these anchors are added in the proxy.

Signed-off-by: Alex Leong <alex@buoyant.io>
2018-12-05 10:42:47 -08:00
Kevin Lingerfelt 37ae423bb3
Add linkerd- prefix to all objects in linkerd install (#1920)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-12-04 15:41:47 -08:00
Andrew Seigner ad2366f208
Revert proxy readiness initialDelaySeconds change (#1912)
Reverts part of #1899 to workaround readiness failures.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-12-04 14:27:55 -08:00
Andrew Seigner 37a5455445
Add filtering by job in stat, tap, top; fix panic (#1904)
Filtering by Kubernetes job was not supported. Also filtering by any unknown
type caused a panic.

Add filtering support by Kubernetes job, with special case mapping `job` to
`k8s_job`, to not conflict with Prometheus' job label.

Fix panic when unknown type specified as a `--from` or `--to` flag.

Fix `job` label from `linkerd-proxy` overwriting Prometheus `job` label at
collection time. This caused all metrics collected by proxy sidecars in
Kubernetes jobs to be collected into an incorrect Prometheus job, rather than
the expected `linkerd-proxy` Prometheus job.

Fix `unsupported resource type` tap error message incorrectly printing the
target resource rather than the destination.

Set `--controller-log-level debug` in `install_test.go` for easier debugging.

Expose `slow-cooker`'s metrics via a k8s service in the tap integration test, to
validate proxy requests with a job as destination.

Fixes #1872
Part of #627

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-12-03 15:34:49 -08:00
Oliver Gould 926395f616
tap: Include route labels in tap events (#1902)
This change alters the controller's Tap service to include route labels
when translating tap events, modifies the public API to include route
metadata in responses, and modifies the tap CLI command to include
rt_ labels in tap output (when -o wide is used).
2018-12-03 13:52:47 -08:00
Andrew Seigner d121071f87
Adjust proxy, Prometheus, and Grafana probes (#1899)
* Adjust proxy, Prometheus, and Grafana probes

High `readinessProbe.initialDelaySeconds` values delayed the controller's
readiness by up to 30s, preventing cli commands from succeeding shortly after
control plane deployment.

Decrease `readinessProbe.initialDelaySeconds` in the proxy, Prometheus, and
Grafana to the default 0s. Also change `linkerd check` controller pod ordering
to: controller, prometheus, web, grafana.

Detailed probe changes:
- proxy
  - decrease `readinessProbe.initialDelaySeconds` from 10s to 0s
- prometheus
  - decrease `readinessProbe.initialDelaySeconds` from 30s to 0s
  - decrease `readinessProbe.timeoutSeconds` from 30s to 1s
  - decrease `livenessProbe.timeoutSeconds` from 30s to 1s
- grafana
  - decrease `readinessProbe.initialDelaySeconds` from 30s to 0s
  - decrease `readinessProbe.timeoutSeconds` from 30s to 1s
  - decrease `readinessProbe.failureThreshold` from 10 to 3
  - increase `livenessProbe.initialDelaySeconds` from 0s to 30s

Fixes #1804

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-12-03 10:41:11 -08:00
Alex Leong f9d66cf4de
Add --open-api option to linkerd profiles command (#1867)
The `--open-api` flag is an alternative to the `--template` flag for the `linkerd profile` command.  It reads an OpenAPI specification file (also called a swagger file) and uses it to generate a corresponding service profile.

Signed-off-by: Alex Leong <alex@buoyant.io>
2018-11-30 09:25:19 -08:00
Andrew Seigner 34d9eef03e
proxy injector: insert at end of arrays (#1881)
When using `--proxy-auto-inject` with Kuberntes `v1.9.11`, observed auto
injector incorrectly merging list elements rather than inserting new
ones. This issue was not reproducible on `v1.10.3`.

For example, this input:
```
spec:
  template:
    spec:
      containers:
      - name: vote-bot
        command:
        - emojivoto-vote-bot
```

Would yield:
```
spec:
  template:
    spec:
      containers:
      - name: linkerd-proxy
        command:
        - emojivoto-vote-bot
      - name: vote-bot
        command:
        - emojivoto-vote-bot
```

This change replaces json patch specs like
`/spec/template/spec/containers/0` with
`/spec/template/spec/containers/-`. The former is intended to insert at
the beggining of a list, the latter at the end. This also simplifies the
code a bit and more closely aligns with the intent of injecting at the
end of lists.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-11-28 14:21:18 -08:00
Risha Mars f8583df4db
Add ListServices to controller public api (#1876)
Add a barebones ListServices endpoint, in support of autocomplete for services.
As we develop service profiles, this endpoint could probably be used to describe
more aspects of services (like, if there were some way to check whether a
service profile was enabled or not).

Accessible from the web UI via http://localhost:8084/api/services
2018-11-27 11:34:47 -08:00
Alex Leong 73836f05cf
Update proxy version and use canonicalized dst (#1866)
The `linkerd` routes command only supports outbound metrics queries (i.e. ones with the `--from` flag).  Inbound queries (i.e. ones without the `--from` flag) never return any metrics.

We update the proxy version and use the new canonicalized form for dst labels to gain support for inbound metrics as well.

Signed-off-by: Alex Leong <alex@buoyant.io>
2018-11-26 17:20:07 -08:00
Oliver Gould ba11698d4b
tap: Use nil-safe protobuf accessors (#1873)
The tap server accesses protobuf fields directly instead of using the
`Get*()` accessors. The accessors are necessary to prevent dereferencing
a nil pointer and crashing the tap service.

Furthermore, these maps are explicitly initialized when `nil` to support
label hydration.
2018-11-26 14:14:28 -08:00
Alex Leong 7a7f6b6ecb
Add TopRoutes method the the public api and route CLI command to consume it (#1860)
Add a routes command which displays per-route stats for services that have service profiles defined.

This change has three parts:
* A new public-api RPC called `TopRoutes` which serves per-route stat data about a service
* An implementation of TopRoutes in the public-api service.  This implementation reads per-route data from Prometheus.  This is very similar to how the StatSummaries RPC and much of the code was able to be refactored and shared.
* A new CLI command called `routes` which displays the per-route data in a tabular or json format.  This is very similar to the `stat` command and much of the code was able to be refactored and shared.

Note that as of the currently targeted proxy version, only outbound route stats are supported so the `--from` flag must be included in order to see data.  This restriction will be lifted in an upcoming change once we add support for inbound route stats as well.

Signed-off-by: Alex Leong <alex@buoyant.io>
2018-11-19 12:20:30 -08:00
Kevin Leimkuhler c68693e820
Fix stat filtering for `--from` queries (#1856)
# Problem
When we add a `--from` query to `linkerd stat au` we get more rows than if we would have just run `linkerd stat au`.

Adding a `--from` causes an extra row to be added, and the named authority to be ignored (this is the result we would have expected when running `linkerd stat au -n emojivoto --from deploy/web`).

# Solution
Destination query labels are now appended to `labels` so that those labels can be filtered on.

# Validation
Tests have been updated to reflect the expected expected destination labels now appended in `--from` queries.

Fixes #1766 

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2018-11-14 10:52:27 -08:00
Alejandro Pedraza bbcf5a8c9f Allow stat summary to query for multiple resources (#1841)
* Refactor util.BuildResource so it can deal with multiple resources

First step to address #1487: Allow stat summary to query for multiple
resources

Signed-off-by: Alejandro Pedraza <alejandro.pedraza@gmail.com>

* Update the stat cli help text to explain the new multi resource querying ability

Propsal for #1487: Allow stat summary to query for multiple resources

Signed-off-by: Alejandro Pedraza <alejandro.pedraza@gmail.com>

* Allow stat summary to query for multiple resources

Implement this ability by issuing parallel requests to requestStatsFromAPI()

Proposal for #1487

Signed-off-by: Alejandro Pedraza <alejandro.pedraza@gmail.com>

* Update tests as part of multi-resource support in `linkerd stat` (#1487)

- Refactor stat_test.go to reuse the same logic in multiple tests, and
add cases and files for json output.
- Add a couple of cases to api_utils_test.go to test multiple resources
validation.

Signed-off-by: Alejandro Pedraza <alejandro.pedraza@gmail.com>

* `linkerd stat` called with multiple resources should keep an ordering (#1487)

Add SortedRes holding the order of resources to be followed when
querying `linkerd stat` with multiple resources

Signed-off-by: Alejandro Pedraza <alejandro.pedraza@gmail.com>

* Extra validations for `linkerd stat` with multiple resources (#1487)

Signed-off-by: Alejandro Pedraza <alejandro.pedraza@gmail.com>

* `linkerd stat` resource grouping, ordering and name prefixing (#1487)

- Group together stats per resource type.
- When more than one resource, prepend name with type.
- Make sure tables always appear in the same order.

Signed-off-by: Alejandro Pedraza <alejandro.pedraza@gmail.com>

* Allow `linkerd stat` to be called with multiple resources

A few final refactorings as per code review.

Fixes #1487

Signed-off-by: Alejandro Pedraza <alejandro.pedraza@gmail.com>
2018-11-14 10:44:04 -08:00
Igor Zibarev 60bcdb15f9 controller: use GetConfig from pkg/k8s package (#1857)
This commit removes duplicate logic that loads Kubernetes config and
replaces it with GetConfig from pkg/k8s. This also allows to load
config from default sources like $KUBECONFIG instead of explicitly
passing -kubeconfig option to controller components.

Signed-off-by: Igor Zibarev <zibarev.i@gmail.com>
2018-11-13 14:41:31 -08:00
Alex Leong 32d556e732
Improve ergonomics of service profile spec (#1828)
We make several changes to the service profile spec to make service profiles more ergonomic and to make them more consistent with the destination profile API.

* Allow multiple fields to be simultaneously set on a RequestMatch or ResponseMatch condition.  Doing so is equivalent to combining the fields with an "all" condition.
* Rename "responses" to "response_classes"
* Change "IsSuccess" to "is_failure"

Signed-off-by: Alex Leong <alex@buoyant.io>
2018-10-31 12:00:22 -07:00
Alex Leong d8b5ebaa6d
Remove the proxy-api container (#1813)
A container called `proxy-api` runs in the Linkerd2 controller pod.  This container listens on port 8086 and serves the proxy-api but does nothing other than forward gRPC requests to the destination container which listens on port 8089.

We remove the proxy-api container altogether and change the destination container to listen on port 8086 instead of 8089.  The result is that clients still use the proxy-api by connecting to `proxy-api.<ns>.svc.cluster.local:8086` but the controller has one fewer containers.  This results in a simpler system that is easier to reason about.

Signed-off-by: Alex Leong <alex@buoyant.io>
2018-10-29 16:31:43 -07:00
Alex Leong 82ca821e62
Use fqdn for service profile name (#1808)
Service profiles must be named in the form `"<service>.<namespace>"`.  This is inconsistent with the fully normalized domain name that the proxy sends to the controller.  It also does not permit creating service profiles for non-Kubernetes services.

We switch to requiring that service profiles must be named with the FQDN of their service.  For Kubernetes services, this is `"<service>.<namespace>.svc.cluster.local"`.

This change alone is not sufficient for allowing service profile for non-Kubernetes services because the k8s resolver will ignore any DNS names which are not Kubernetes services.  Further refactoring of the resolver will be required to allow looking up non-Kubernetes service profiles in Kuberenetes.

Signed-off-by: Alex Leong <alex@buoyant.io>
2018-10-29 14:35:42 -07:00
Alex Leong 622185a4dd
Send metric labels in profile API (#1800)
* Send metric labels in profile API

Signed-off-by: Alex Leong <alex@buoyant.io>
2018-10-29 14:28:09 -07:00
Oliver Gould 0e91dbb18d
Implement GetProfile for the proxy-api service (#1801)
The `proxy-api` service included a stub implementation of `GetProfile`
instead of forwarding requests to the `destination` service.

This change fills in the proxy-api service's `GetProfile` implementation
to forward requests to the destination service.
2018-10-24 12:37:29 -07:00
Alex Leong f549868033
Fix integration test and docker build (#1790)
Fix broken docker build by moving Service Profile conversion and validation into `/pkg`.

Fix broken integration test by adding service profile validation output to `check`'s expected output.

Testing done:
* `gotest -v ./...`
* `bin/docker-build`
* `bin/test-run (pwd)/bin/linkerd`

Signed-off-by: Alex Leong <alex@buoyant.io>
2018-10-19 10:23:34 -07:00
Alex Leong 5210b7b44a
Add check for service profile validation (#1775)
Add a check to `linkerd check` which validates all service profile resources.  In particular it checks:
* does the service profile refer to an existent service
* is the service profile valid

Signed-off-by: Alex Leong <alex@buoyant.io>
2018-10-18 16:37:39 -07:00
Alex Leong 43c22fe967
Implement getProfiles method in destination service (#1759)
We implement the getProfiles method in the destination service. This method returns a stream of destination profiles for a given authority. It does this by looking up the ServiceProfile resource in the controller namespace named `<svc>.<ns>` where `<svc>` is the name of the service and `<ns>` is the namespace of the service.

This PR includes:
* Adding a ServiceProfile Custom Resource Definition to linkerd install
* A watch based implementation of the getProfiles method in the destination service, similar to the implementation of get.
* An update to the destination client script that allows querying the getProfiles method.

Signed-off-by: Alex Leong <alex@buoyant.io>
2018-10-16 15:39:12 -07:00
Ivan Sim 1100c4fa8c Proxy injector must preserve the original pod template labels and annotations (#1765)
* Ensure that the proxy injector mutating webhook preserves the original labels
and annotations

The deployment's selector must also match the pod template labels in
newer version of Kubernetes.

This resolves issue #1756.

* Add the Linkerd labels to the deployment metadata during auto proxy
injection

* Remove selector match labels JSON patch from proxy injector

This isn't needed to resolve the selector label mismatch errors.

Signed-off-by: ihcsim <ihcsim@gmail.com>
2018-10-16 15:30:45 -07:00
Ivan Sim 2e1a984eb0 Change the proxy-init container ordering during auto proxy injection (#1763)
Appending proxy-init to the end of the list ensures that it won't
interfere with other init containers from accessing the network,
before the proxy container is created.

This resolves bug #1760

Signed-off-by: ihcsim <ihcsim@gmail.com>
2018-10-15 15:33:09 -07:00
Alejandro Pedraza 37bc8a69db Added support for json output in `linkerd stat` (#1749)
Added support for json output in `linkerd stat` through a new (-o|--output)=json option.

Fixes #1417

Signed-off-by: Alejandro Pedraza <alejandro.pedraza@gmail.com>
2018-10-15 14:10:48 -07:00
Risha Mars 31a396b631
Fix incorrect test wording (#1767) 2018-10-15 12:07:06 -07:00
Alex Leong 1fe19bf3ce
Add ServiceProfile support to k8s utilities (#1758)
Updates to the Kubernetes utility code in `/controller/k8s` to support interacting with ServiceProfiles.

This makes use of the code generated client added in #1752 

Signed-off-by: Alex Leong <alex@buoyant.io>
2018-10-12 09:35:11 -07:00
Alex Leong f1f5b49f59
Add generated Kubernetes client for ServiceProfile custom resource (#1752)
To support reading and writing of the ServiceProfile custom resource, we add a codegen'd Kubernetes client for this resource.

* Adding the ServiceProfile type and related boilerplate to /controller/gen/apis/serviceprofile. This boilerplate also contains directives that control how codegen works.
* A script in /hack which invokes codegen that generates Kubernetes client machinery for interacting with ServiceProfile resources. The majority of the generated code lives in /controller/gen/client.
* The above-mentioned generated code.

Signed-off-by: Alex Leong <alex@buoyant.io>
2018-10-11 11:43:35 -07:00
Kevin Lingerfelt 46c887ca00
Add --single-namespace install flag for restricted permissions (#1721)
* Add --single-namespace install flag for restricted permissions
* Better formatting in install template
* Mark --single-namespace and --proxy-auto-inject as experimental
* Fix wording of --single-namespace check flag
* Small healthcheck refactor

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-10-11 10:55:57 -07:00
Andrew Seigner 8f4240125e fix test failure, logrus api consistency (#1755)
`go test` was failing with
`Fatalf call has arguments but no formatting directives`

Fix test failure, make all logrus api calls consistent.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-10-11 10:44:32 -07:00
Ivan Sim 4fba6aca0a Proxy init and sidecar containers auto-injection (#1714)
* Support auto sidecar-injection

1. Add proxy-injector deployment spec to cli/install/template.go
2. Inject the Linkerd CA bundle into the MutatingWebhookConfiguration
during the webhook's start-up process.
3. Add a new handler to the CA controller to create a new secret for the
webhook when a new MutatingWebhookConfiguration is created.
4. Declare a config map to store the proxy and proxy-init container
specs used during the auto-inject process.
5. Ignore namespace and pods that are labeled with
linkerd.io/auto-inject: disabled or linkerd.io/auto-inject: completed
6. Add new flag to `linkerd install` to enable/disable proxy
auto-injection

Proposed implementation for #561.

* Resolve missing packages errors
* Move the auto-inject label to the pod level
* PR review items
* Move proxy-injector to its own deployment
* Ignore pods that already have proxy injected

This ensures the webhook doesn't error out due to proxy that are injected using the  command

* PR review items on creating/updating the MWC on-start
* Replace API calls to ConfigMap with file reads
* Fixed post-rebase broken tests
* Don't mutate the auto-inject label

Since we started using healhcheck.HasExistingSidecars() to ensure pods with
existing proxies aren't mutated, we don't need to use the auto-inject label as
an indicator.

This resolves a bug which happens with the kubectl run command where the deployment
is also assigned the auto-inject label. The mutation causes the pod auto-inject
label to not match the deployment label, causing kubectl run to fail.

* Tidy up unit tests
* Include proxy resource requests in sidecar config map
* Fixes to broken YAML in CLI install config

The ignore inbound and outbound ports are changed to string type to
avoid broken YAML caused by the string conversion in the uint slice.

Also, parameterized the proxy bind timeout option in template.go.

Renamed the sidecar config map to
'linkerd-proxy-injector-webhook-config'.

Signed-off-by: ihcsim <ihcsim@gmail.com>
2018-10-10 12:09:22 -07:00
Ben Lambert 69cebae1a2 Added ability to configure sidecar CPU + Memory requests (#1731)
Horizontal Pod Autoscaling does not work when container definitions in pods do not all have resource requests, so here's the ability to add CPU + Memory requests to install + inject commands by proving proxy options --proxy-cpu + --proxy-memory

Fixes #1480

Signed-off-by: Ben Lambert <ben@blam.sh>
2018-10-08 10:51:29 -07:00
Andrew Seigner dccccebd79
Add LICENSE files to all Docker images (#1727)
To comply with certain environments, include our LICENSE file in all
Docker images.


Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-10-02 16:25:52 -07:00
Alena Varkockova 5a853e8990 Use ListPods always for data plane HC (#1701)
* Use ListPods always for data plane HC
* Missing changes in grpc_server.go
* Address review comments
* Read proxy version from spec

Signed-off-by: Alena Varkockova <varkockova.a@gmail.com>
2018-10-02 11:45:01 -07:00
Alena Varkockova 11c9b7425b Fix the debug message in endpoints watcher (#1658)
* Fix the debug message in endpoints watcher
* Use better method for converting

Signed-off-by: Alena Varkockova <varkockova.a@gmail.com>
2018-09-20 13:03:45 -07:00
Alex Leong e65a9617bd
Add can-i checks to linkerd check --pre (#1644)
Add checks to `linkerd check --pre` to verify that the user has permission to create:
* namespaces
* serviceaccounts
* clusterroles
* clusterrolebindings
* services
* deployments
* configmaps

Signed-off-by: Alex Leong <alex@buoyant.io>
2018-09-17 11:31:10 -07:00
Dennis Adjei-Baah 00d0a26a9c
Cleanly shutdown tap stream to data plane proxies (#1624)
Sometimes, the tap server causes the controller pod to restart after it receives this error.
This error arises when the Tap server does not close gRPC tap streams to proxies before the tap server terminates its streams to its upstream clients and causes the controller pod to restart.

This PR uses the request context from the initial TapByReource to help shutdown tap streams to the data plane proxies gracefully.

fixes #1504

Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
2018-09-12 15:00:19 -07:00
Andrew Seigner c5a719da47
Modify inject to warn when file is un-injectable (#1603)
If an input file is un-injectable, existing inject behavior is to simply
output a copy of the input.

Introduce a report, printed to stderr, that communicates the end state
of the inject command. Currently this includes checking for hostNetwork
and unsupported resources.

Malformed YAML documents will continue to cause no YAML output, and return
error code 1.

This change also modifies integration tests to handle stdout and stderr separately.

example outputs...

some pods injected, none with host networking:

```
hostNetwork: pods do not use host networking...............................[ok]
supported: at least one resource injected..................................[ok]

Summary: 4 of 8 YAML document(s) injected
  deploy/emoji
  deploy/voting
  deploy/web
  deploy/vote-bot
```

some pods injected, one host networking:

```
hostNetwork: pods do not use host networking...............................[warn] -- deploy/vote-bot uses "hostNetwork: true"
supported: at least one resource injected..................................[ok]

Summary: 3 of 8 YAML document(s) injected
  deploy/emoji
  deploy/voting
  deploy/web
```

no pods injected:

```
hostNetwork: pods do not use host networking...............................[warn] -- deploy/emoji, deploy/voting, deploy/web, deploy/vote-bot use "hostNetwork: true"
supported: at least one resource injected..................................[warn] -- no supported objects found

Summary: 0 of 8 YAML document(s) injected
```

TODO: check for UDP and other init containers

Part of #1516

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-09-10 10:34:25 -07:00
Kevin Lingerfelt f884caf56d
Upgrade protobuf to v1.2.0 (#1591)
* Upgrade protobuf to v1.2.0
* Fix Gopkg.lock
* Switch linkerd2-proxy-api dep back to stable

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-09-06 11:36:29 -07:00
Kevin Lingerfelt b5ff29c8aa
Add data plane check to validate proxy version (#1574)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-09-04 15:22:38 -07:00
Risha Mars 249b51f950
Increase MaxRps in Tap server, remove default setting from Web (#1560)
Increase the MaxRps on the tap server to 100 RPS.

The max RPS for tap/top was increased in for the CLI #1531, but we were
still manually setting this to 1 RPS in the Web UI and Web server.

Remove the pervasive setting of MaxRps to 1 in the web frontend and server
2018-08-30 13:37:37 -07:00
Alex Leong 0f7d684ca9
Increase default max-rps for tap and top (#1531)
The default value for the max-rps argument to the tap and top commands is an overly conservative 1rps.  This causes the data to come in very slowly and much data to be discarded.  Furthermore, because tap requests are windowed to 10 seconds, this causes long pauses between updates.

We fix this in two ways.  Firstly we reduce the window size to 1s so that updates will come in at least once per second, even when the actual RPS of the data path is extremely high.  Secondly, we increase the default max-rps parameter from 1 to 100.  This allows tap to paint an accurate picture of the data much more quickly and sidesteps some sampling bias that happens when the max-rps is low.

In general, tap events tend to happen in bursts.  For example, one request in may trigger one or more requests out.  Likewise, a single upstream event may trigger several requests to the tapped pod in quick succession.  Sampling bias will occur when the max-rps is less than the actual rps and when the tap event limit subdivides these event bursts (biasing towards the first few events in the burst).  The greater the max-rps, the less the effects of this bias.

Fixes #1525 

Signed-off-by: Alex Leong <alex@buoyant.io>
2018-08-28 14:16:39 -07:00
Risha Mars fff09c5d06
Only tap pods that are meshed (#1535)
Previously, we would tap any resource's pods, regardless of whether the pods
were meshed or not. We can't actually tap non-meshed pods, so I'm adding a check
that will filter out non-meshed pods from the pods that tap watches.

Previous behaviour:
When attempting to hang a non meshed pod, it would establish
a watch on the pods, but then never return any results. In the CLI you could
just cancel it with Ctrl-C. In the web, clicking Stop would send a
WebSocket.close(1000) but wouldn't actually close the connection... 

Behaviour after change :
If no pods under the specified resource are meshed, it'll
return an error of no pods being found to tap
2018-08-28 09:59:52 -07:00
Eliza Weisman efabd90ff7
Fix missing ns/svc labels in metadata hydrated by Tap server (#1496)
Fixes #1493.

When the tap server hydrates metadata for the source or destination peer
of a Tap event from the peer's IP address, it doesn't currently add a
namespace label. However, destinations labeled by the proxy do have such
a label.

This is because the tap server currently gets the hydrated labels from
the `GetPodLabels` function, which is also used by the Destination
service for labeling the individual endpoints in a `WeightedAddrSet`
response. However, the Destination service also adds some labels to all
the endpoints in the set, including the namespace and service, so
`GetPodLabels` doesn't return these labels. However, when the tap server
uses that function, it does not add the service or namespace labels.

This branch fixes this issue by adding those labels to the Tap event 
after calling `GetPodLabels`. In addition, it fixes a missing space 
between the `src/dst_res` and `src/dst_ns` labels in Tap CLI output
with the `-o wide` flag set. This issue was introduced during the 
review of #1437, but was missed at the time because the namespace label
wasn't being set correctly.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-08-20 18:09:34 -07:00
Kevin Lingerfelt e97be1f5da
Move all healthcheck-related code to pkg/healthcheck (#1492)
* Move all healthcheck-related code to pkg/healthcheck
* Fix failed check formatting
* Better version check wording

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-08-20 16:50:22 -07:00
Eliza Weisman b8434d60d4
Add resource metadata to Tap CLI output (#1437)
Closes #1170.

This branch adds a `-o wide` (or `--output wide`) flag to the Tap CLI.
Passing this flag adds `src_res` and `dst_res` elements to the Tap
output, as described in #1170. These use the metadata labels in the tap
event to describe what Kubernetes resource the source and destination
peers belong to, based on what resource type is being tapped, and fall
back to pods if either peer is not a member of the specified resource
type.

In addition, when the resource type is not `namespace`, `src_ns` and
`dst_ns` elements are added, which show what namespaces the the source
and destination peers are in. For peers which are not in the Kubernetes
cluster, none of these labels are displayed.

The source metadata added in #1434 is used to populate the `src_res` and
`src_ns` fields.

Also, this branch includes some refactoring to how tap output is
formatted.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-08-20 14:25:26 -07:00
Kevin Lingerfelt 7c07ba0d53
Upgrade to dep 0.5.0, go 1.10.3 (#1479)
* Upgrade to dep 0.5.0, go 1.10.3
* Remove existing dep binary if it's the wrong version
* Add version in filename of dep binary to prevent version conflicts

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-08-17 16:04:50 -07:00
Alex Leong 094a375015
[RFC] linkerd top (#1435)
This an initial implementation of the `linkerd top` command.  This command launches an ncurses style tabular view of current requests (using data from tap).  Most of the command line arguments are the same as tap and allow selecting the resource to inspect and filtering which requests to view.  

Fixes #1283 

Signed-off-by: Alex Leong <alex@buoyant.io>
2018-08-15 18:10:23 -07:00
Eliza Weisman cda05aa34c
Add inbound destination label hydration to Tap server (#1442)
Based on @adleong's suggestion in
https://github.com/linkerd/linkerd2/pull/1434#pullrequestreview-145428857,
this branch adds label hydration from destination IPs to the Tap server.
This works the same as the label hydration for destination IPs added in
#1434. However, it is only applied to the destination fields of events
recorded by proxies in the inbound direction, since outbound
destinations are already labeled with metadata provided by the
Destination service.

This means that when a user taps inbound traffic, the CLI will show k8s
metadata labels for the destination peer (if it's available). This can
be useful especially when tapping several pods at once, as it makes it
easier to distinguish what pod received a request.

This branch also refactors how the label hydration is performed,
primarily to make adding it to the destination field less repetitive.
Also, the `hydrateIPLabels` function now mutates the label map in the
`TapEvent`, rather than returning a new map of labels, so that the case
where no pod was found doesn't require an additional allocation of an
empty map.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-08-13 13:46:33 -07:00
Eliza Weisman bf7fc12f5c
Add source metadata to Tap server tap events (#1434)
The `TapEvent` protobuf contains two maps, `DestinationMeta` and
`SourceMeta`. The `DestinationMeta` contains all the metadata provided
by the proxy that originated the event (ultimately originating from the
Destination service), while the `SourceMeta` currently only contains the
source connection's TLS status.

This branch modifies the Tap server to hydrate the same set of metadata
from the source IP address, when the source was within the cluster. It
does this by adding an indexer of pod IPs to pods to its k8s API client,
and looking up IPs against this index. If a pod was found, the extra
metadata is added to the tap event sent to the client.

This branch also changes the client so that if a source pod name was
provided in the metadata, it prints the pod name rather than the IP
address for the `src` field in its output. This mimics what is currently
done for the `dst` field in tap output. Furthermore, the added source
metadata will be necessary for adding src resource types to tap output
(see issue #1170).

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-08-13 13:25:14 -07:00
Kevin Lingerfelt 00a0572098
Better CLI error messages when control plane is unavailable (#1428)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-08-09 15:40:41 -07:00
Eliza Weisman 56681015ae
Fix Destination returning no endpoints for single unnamed container port (#1420)
Fixes #1405.

According to the Kubernetes Endpoints API documentation, the `name`
field in the `EndpointPort` response object is "Optional if only one
port is defined". (see
https://v1-9.docs.kubernetes.io/docs/reference/generated/kubernetes-api/v1.9/#endpointport-v1-core)
However, when the Destination service an endpoints response for a
service with a named target port, it expects the ports in the endpoints
response to have the same name as the target port in the service. 

When a user creates a `NodePort` service with an unnamed port that
targets a named container port, this behaviour results in Linkerd
failing to route to that service by hostname. Without Linkerd injected,
the hostname is still reachable. 

This branch fixes this issue by changing the `endpointsToAddresses`
function in `endpoints_watcher.go` to handle the case when an endpoints
response contains only a single unnamed port.

I've manually verified that this fixes the issue described in #1405.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-08-08 13:01:53 -07:00
Kevin Lingerfelt bd19e8aaff
Update prometheus to only scrape proxies in the same mesh (#1402)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-08-06 12:05:55 -07:00
Kevin Lingerfelt f70ad7de11
Use stable version for linkerd2-proxy-api dep (#1400)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-08-03 11:59:42 -07:00
Sean McArthur c035193313
add H2 protocol to destination addrs if managed by linkerd (#1380)
Signed-off-by: Sean McArthur <sean@buoyant.io>
2018-08-03 10:14:30 -07:00
Alex Leong 3e1f35913b
Read all bytes of message length header (#1394)
The `reader.Read` method only reads as many bytes as are currently available from reader.  When reading the 4 byte message length header, if not all 4 of those bytes are available, `Read` will only read the available bytes and return.  This causes alignment issues when the message body is read and there are still unread header bytes in the reader.  These bytes will appear at the beginning of the message body and cause a crash when the message is unmarshalled.

Use `io.ReadFull` to ensure that we read all 4 of the message length header bytes.

Fixes #1287 

Signed-off-by: Alex Leong <alex@buoyant.io>
2018-08-02 10:45:49 -07:00
Risha Mars fef896011f Add more filters to the web UI tap form (#1371)
* Update ant to 3.7.2
* Add autocomplete of namespaces/resources to Tap in web ui
  * Add form fields for authority/path/method/rps/scheme
  * Add the ability to clear error messages to the error banner
* Add error listener to ws object
2018-07-31 15:48:53 -07:00
Kevin Lingerfelt c362d5e114
Update k8s.io dependencies to 1.11.1 (#1369)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-07-27 15:23:03 -07:00
Brian Smith 2098beb123
Don't relink web & controller executables when version changes. (#1338)
Speed up incremental rebuilds by avoiding relinking the controller
and/or web executables when changes are made to unrelated files.
Before this change, any time the git tag changed, the executables
would have to be rebuilt (relinked at least), even if no Go files
changed.

Validated by running the integration tests.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-07-26 16:20:36 -10:00
Kevin Lingerfelt 51848230a0
Send glog logs to stderr by default (#1367)
* Send glog logs to stderr by default
* Factor out more shared flag parsing code

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-07-25 12:59:24 -07:00
Risha Mars ec3c861743
Enable Tap from the Web UI (#1356)
Adds a tap endpoint in the web api that communicates with the dashboard 
via websockets.
I've moved a bunch of code from the cli tap.go into utils so that the code 
can be shared between web and CLI. I think we should consider making the 
display more suited to web, but in the short term, reusing the CLI's 
rendering of tap events works.

Adds a Tap page in the Web UI that you can use to make tap requests. 
The form currently only allows you to enter a resource and namespace, 
other filters coming in a follow-up branch.
2018-07-24 14:23:42 -04:00
Kevin Lingerfelt 4b9700933a
Update prometheus labels to match k8s resource names (#1355)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-07-23 15:45:05 -07:00
Brian Smith a98bfb1ca7
Rename `ca-bundle-distributor` to `ca`. (#1340)
`ca-bundle-distributor` described the original role of the program but
`ca` ("Certificate Authority") better describes its current role.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-07-17 14:10:40 -10:00
Brian Smith 0fcfd2bffb
Stop using `installsuffix` when building Go code. (#1327)
* Stop using `installsuffix` when building Go code.

See https://plus.google.com/117192131596509381660/posts/eNnNePihYnK.
`-installsuffix cgo` isn't necessary as of Go 1.10 (where build caching
changed substantially) and it probably wasn't necessary earlier.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-07-16 13:48:50 -10:00
Kevin Lingerfelt e5cce1abaf
Rename CLI from conduit to linkerd (#1312)
* Rename CLI binary
* Update integration tests for new binary name
* Rename --conduit-namespace flag, change default ns
* Rename occurrences of conduit in rest of CLI
* Rename inject and install components
* Remove conduit occurrences in docker files
* Additional miscellaneous cleanup
* Move protobuf definitions to linkerd2 package
* Rename conduit.io labels to use linkerd.io
* Rename conduit-managed segment to linkerd-managed
* Fix conduit references in web project

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-07-12 17:14:07 -07:00
Kevin Lingerfelt 1624a4ba0f
Ensure destination service always sends pod metadata (#1291)
* Ensure destination service always sends pod metadata
* Fix test that relied on hash ordering
* Stop using protobuf structs as map keys, fix logging

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-07-10 15:08:59 -07:00
Oliver Gould 941cad4a9c
Migrate build infrastructure to linkerd2 (#1298)
This PR begins to migrate Conduit to Linkerd2:
* The proxy has been completely removed from this repo, and is now located at
  github.com/linkerd/linkerd2-proxy.
* A `Dockerfile-proxy` has been added to fetch the most-recently published proxy
  binary from build.l5d.io.
* Proxy-specific protobuf bindings have been moved to
  github.com/linkerd/linkerd2-proxy-api.
* All docker images now use the gcr.io/linkerd-io registry.
* `inject` now uses `LINKERD2_PROXY_` environment variables
* Go paths have been updated to reflect the new (future) repo location.
2018-07-09 15:38:38 -07:00
Kevin Lingerfelt 6f804d600c
Remove docker-compose / simulate-proxy environment (#1294)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-07-06 17:44:35 -07:00
Risha Mars 9050b2d312
Fix authority stat queries when a --from flag is used (#1289)
* Fix bug where we were using dst_authorities as a group by instead of authorities
* Add test to make sure we don't dst_authorities

Previously, we were only checking to make sure we didn't add 
dst_authorities in the query labels in promDstQueryLabels but we 
weren't checking the groupBy labels in promDstGroupByLabelNames - 
this caused us to try to query for dst_authorities when a --from 
query was sent. There are no dst_authorities, so there would be no 
named results.
2018-07-06 17:29:08 -07:00
Kevin Lingerfelt 693acdbf26
Update ListPods endpoint to return all pod owner types (#1275)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-07-05 15:14:16 -07:00
Risha Mars ba2e13c731
Small tweaks to error modal, add Reason to api error response (#1246)
- Add Reason to the error data passed from the api
- Rewrite error logic in the UI to try to make it clearer
- Show 0/0 pods meshed instead of 0/0 pods meshed (N/A) if 0 pods are meshed
2018-07-03 17:14:27 -07:00
Risha Mars 2002a8ba50
Add more tests for the stat summary endpoint --from flags (#1237)
Also add dst_ labels in the metrics we mock, so we can do --from queries with results.
2018-07-03 14:30:15 -07:00
Kevin Lingerfelt f0ba8f3ee8
Fix owner types in TLS identity strings (#1257)
* Fix owner types in TLS identity strings
* Update documentation on TLSIdentity struct

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-07-03 14:20:24 -07:00
Brian Smith 252a8d39d3
Generate an ephemeral CA at startup that distributes TLS credentials (#1245)
Create a ephemeral, in-memory TLS certificate authority and integrate it into the certificate distributor.

Remove the re-creation of deleted ConfigMaps; this will be added back later in #1248.

Signed-off-by: Brian Smith brian@briansmith.org
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-07-02 18:09:31 -10:00
Oliver Gould 20276b106e
tap: Support `tls` labeling (#1244)
The proxy's metrics are instrumented with a `tls` label that describes
the state of TLS for each connection and associated messges.

This same level of detail is useful to get in `tap` output as well.

This change updates Tap in the following ways:
* `TapEvent` protobuf updated:
  * Added `source_meta` field including source labels
  * `proxy_direction` enum indicates which proxy server was used.
* The proxy adds a `tls` label to both source and destination meta indicating the state of each peer's connection
* The CLI uses the `proxy_direction` field to determine which `tls` label should be rendered.
2018-07-02 17:19:20 -07:00
Kevin Lingerfelt a685dba873
Use parent name instead of pod name in identity string (#1236)
* Use parent name instead of pod name in identity string
* Update protobuf comment

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-06-29 14:28:13 -07:00
Risha Mars 8ebc969d2f
Fix bug where we wouldn't run stat table assertions if we expected 0 results (#1235)
I realized that our stat summary expectation checker would only check the actual
proto responses against the expectations if the expectations were non-empty.

Problem
If we expected empty results and the api returned actual results, we never actually 
check those results against the expectations.

The bug can be reproduced by replacing any nonzero metric we expect in 
expectedResponse with expectedResponse: genEmptyResponse() 
The tests on master will still pass.

Solution
Remove this line and ensure we get the expected number of stat tables.
2018-06-29 14:23:14 -07:00
Risha Mars 5ed7fc563c
Add controller component pod uptimes to the ServiceMesh page (#1205)
- Return pod uptimes from the GetPods endpoint
- Adds filtering by namespace to api.GetPods
- Adds a --namespace filter to conduit get pods
- Adds pod uptimes to the controller component toolitps on the ServiceMesh page
- Moves the ServiceMesh page back to using /api/pods
2018-06-28 15:42:00 -07:00
Risha Mars 5963b2ac24
Better format empty errors (#1202) 2018-06-28 14:52:04 -07:00
Risha Mars 68586fe697
Add the ability to query stats by authority (#1181)
Adds the ability to query by a new non-kubernetes resource type, "authorities",
in the StatSummary api.

This includes an extensive refactor of stat_summary.go to deal with non-kubernetes 
resource types.

- Add documentation to Resource in the public api so we can use it for authority
- Handle non-k8s resource requests in the StatSummary endpoint
- Rewrite stat summary fetching and parsing to handle non-k8s resources
- keys stat summary metric handling by Resource instead of a generated string
- Adds authority to the CLI
- Adds /authorities to the Web UI
- Adds some more stat integration and unit tests
2018-06-28 14:31:44 -07:00
Brian Smith cca8e7077d
Add TLS support to `conduit inject`. (#1220)
* Add TLS support to `conduit inject`.

Add the settings needed to enable TLs when `--tls=optional` is passed on the
commend line. Later the requirement to add `--tls` will be removed.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-06-27 16:04:07 -10:00
Kevin Lingerfelt f502596577
Update go bindings for destination.proto change (#1223)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-06-27 18:26:13 -07:00
Kevin Lingerfelt b8ba627ee5
Update dest service with a different tls identity strategy (#1215)
* Update dest service with a different tls identity strategy
* Send controller namespace as separate field

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-06-27 11:40:02 -07:00
Kevin Lingerfelt af85d1714f
Add probes and log termination policy for distributor (#1178)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-06-21 14:02:41 -07:00
Kevin Lingerfelt 12f869e7fc
Add CA certificate bundle distributor to conduit install (#675)
* Add CA certificate bundle distributor to conduit install
* Update ca-distributor to use shared informers
* Only install CA distributor when --enable-tls flag is set
* Only copy CA bundle into namespaces where inject pods have the same controller
* Update API config to only watch pods and configmaps
* Address review feedback

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-06-21 13:12:21 -07:00
Kevin Lingerfelt 682b0274b5
Add controller admin servers and readiness probes (#1168)
* Add controller admin servers and readiness probes
* Tweak readiness probes to be more sane
* Refactor based on review feedback

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-06-20 17:32:44 -07:00
Risha Mars 0ff1bb4ad8
Don't allow stat requests for named resources in --all-namespaces (#1163)
Don't allow the CLI or Web UI to request named resources if --all-namespaces is used.

This follows kubectl, which also does not allow requesting named resources
over all namespaces.

This PR also updates the Web API's behaviour to be in line with the CLI's. 
Both will now default to the default namespace if no namespace is specified.
2018-06-20 12:59:31 -07:00
Risha Mars 46c99febf2
Don't panic on stats that aren't included in StatAllResourceTypes (#1154)
Problem
`conduit stat` would cause a panic for any resource that wasn't in the list 
of StatAllResourceTypes
This bug was introduced by https://github.com/runconduit/conduit/pull/1088/files

Solution
Fix writeStatsToBuffer to not depend on what resources are in StatAllResourceTypes
Also adds a unit test and integration test for `conduit stat ns`
2018-06-19 17:00:16 -07:00
Kevin Lingerfelt 9a66641517
dest service: close open streams on shutdown (#1156)
* dest service: close open streams on shutdown
* Log instead of print in pkg packages
* Convert ServerClose to a receive-only channel

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-06-19 16:00:56 -07:00
Risha Mars e2c2f19d2c
Propagate errors in conduit containers to the api (#1117)
- It would be nice to display container errors in the UI. This PR gets the pod's container 
statuses and returns them in the public api

- Also add a terminationMessagePolicy to conduit's inject so that we can capture the 
proxy's error messages if it terminates
2018-06-14 16:22:31 -07:00
Oliver Gould 2a4f38b9e7
proto: Use explicit `go_package` option (#1120)
protobuf has a `go_package` option that can be used to explicitly name
Go packages such that they can be imported without additional rewrites.

This allows us to store proto files without additional, redundant
directories (which were used for packaging hints, previously).

This change adds an explicit `go_package` to all .proto files and
updates `bin/protoc-go.sh` to ensure these packages are output into
$GOPATH (so that the go_package can be absolute). This removes the need
to manually rewrite imports in bin/protoc-go.sh.
2018-06-14 14:03:00 -07:00
Kevin Lingerfelt 13aaa82c95
Allow k8s API clients to watch a subset of resources (#1118)
* Allow k8s API clients to watch a subset of resources
* Sort resources

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-06-14 11:09:01 -07:00
Kevin Lingerfelt 9f1df963e9
Move controller/util and web/util packages to pkg (#1109)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-06-13 11:25:56 -07:00
Kevin Lingerfelt b6d429e80d
dst svc: use shared informer instead of custom endpoints informer (#1079)
* Update destination service ot use shared informer instead of custom endpoints informer
* Add additional tests for dst svc endpoints watcher
* Remove service ports when all listeners unsubscribed
* Update go deps

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-06-13 11:11:57 -07:00
Kevin Lingerfelt bd1d1af38b
dst svc: use shared informer instead of pod watcher (#1073)
* Update desintation service to use shared informer instead of pod watcher
* Add const for pod IP index name

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-06-12 18:09:47 -07:00
Kevin Lingerfelt 6e66f6d662
Rename Lister to API and expose informers as well as listers (#1072)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-06-12 10:27:55 -07:00
Risha Mars 7d4c4aa290
CLI: print resources in the same order every time stat all is run (#1088)
Previously, in conduit stat all we would just print the map of stat results, which 
resulted in the order in which stats were displayed varying between prints.

Fix:
Define an array, k8s.StatAllResourceTypes and use the order in this array to print 
the map; ensuring a consistent print order every time the command is run.
2018-06-08 15:02:17 -07:00
Ivan Sim 11d1d55632 Filter out failed and completed pods from stats summary result (#1010) (#1065)
Both the conduit stat command and web UI are showing failed and completed pods.
This change filters out those pods before returning the result to the client.

Fixes #1010

Signed-off-by: Ivan Sim <ihcsim@gmail.com>
2018-06-05 13:19:48 -07:00
Kevin Lingerfelt eebc612d52
Add install flag for sending tls identity info to proxies (#1055)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-06-04 16:55:06 -07:00
Kevin Lingerfelt ec2433e9bd
Update controller to use 'tls' metric label (#1044)
* Update controller to use 'tls' metric label
* Fix meshed column formatter

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-06-01 16:44:33 -07:00
Eliza Weisman 5a42ce357e
proto: Add TLS identity to WeightedAddr message (#1041)
Required for #1008.

This PR adds the `TlsIdentity` message to the Destination service proto,
to describe what strategy the proxy should use for verifying an endpoint's TLS
certificates. It also adds a `TlsIdentity` field to the `WeightedAddr` message.

Currently, there is one possible variant for `TlsIdentity`, `KubernetesPodName`, 
which consists of the Kubernetes pod name of the endpoint, the namespace of
the endpoint, and the namespace of that pod's Conduit control plane. The proxy
should attempt to connect over TLS if the control plane namespace matches its 
own control plane namespace. The pod name and namespace are used to verify 
the endpoint's TLS certificate.

See https://github.com/runconduit/conduit/issues/386#issuecomment-392948046.

This change was initially part of #1008, but I factored it out to make the diff
smaller.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-05-31 11:48:25 -07:00
Risha Mars ffabdefc6c
Add queries to prometheus to determine number of fully meshed requests (#983)
- Update the `response_total` prometheus query of the StatSummary endpoint to also
break queries out by a `meshed` label. 
- Add a 'Secured' column to the web UI/CLI stat displays, which indicate the percentage of traffic
starting and ending in the mesh

This meshed label is used in the CLI/Web UI to display a column of the percentage of traffic that
starts/ends in the mesh. (Which is a proxy indicator for whether that traffic is 'secured' when we
add TLS by default for intra mesh requests).

The `meshed` label is not yet added anywhere, so until it is supplied by the proxy, all traffic will
show up as 0% secured in the web/CLI.
2018-05-24 11:05:09 -07:00
Andrew Seigner 8a3b1a638a
Introduce meshed label in simulate-proxy (#992)
The proxy does not yet support a `meshed` label.

In anticipation of a `meshed` label in the proxy, introduce this label
in `simulate-proxy`, for testing.

Relates to #306 and #386.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>

secured -> meshed

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-05-23 15:06:11 -07:00
Andrew Seigner 84e6eb5c87
Fix nil pointer dereference in StatSummary (#991)
The StatSummary endpoint was dereferencing
StatSummaryRequest.Selector.Resource, causing a panic when it received
an empty request.

Fix StatSummary to use the nil-friendly
StatSummaryRequest.GetSelector().GetResource() methods, and add a test
to validate.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-05-23 13:21:49 -07:00
Risha Mars 1e6434f6de
Fix bug in the public-api where conduit stat params were ignored (#971)
* Fix bug where we were dropping parts of the StatSummaryRequest
* Add tests for prometheus query strings and for failed cases

Problem
In #928 I rewrote the stat api to handle 'all' as a resource type. To query for all resource types, 
we would copy the Resource, LabelSelector and TimeWindow of the original request, and then 
go through all the resource types and set Resource.Type for each resource we wanted to get.
The bug is that while we copy over some fields of the original request, we didn't copy over all 
of them - namely Resource.Name and the Outbound resource. So the Stat endpoint would 
ignore any --to or --from flags, and would ignore requests for a specific named resource.

Solution
Copy over all fields from the request.

I've also added tests for this case. In this process I've refactored the stat_summary_test code 
to make it a bit easier to read/use.
2018-05-18 16:06:06 -07:00
Kevin Lingerfelt 36ec391dbe
Go: update k8s dependencies to 1.10.2 (#962)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-05-17 15:46:58 -07:00
Risha Mars b8dc83f9d2
Modify the Stat API to handle requests for resource type "all" (#928)
Allow the Stat endpoint in the public-api to accept requests for resourceType "all".

Currently, this queries Pods, Deployments, RCs and Services, but can be modified 
to query other resources as well.

Both the CLI and web endpoints now work if you set resourceType to all.

e.g. `conduit stat all`
2018-05-11 14:35:37 -07:00
Kevin Lingerfelt 4e8e1eb84d
CLI: Fix validation for service stats (#935)
* CLI: Fix validation for service stats
* Address review feedback

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-05-11 10:28:49 -07:00
Oliver Gould a786089fd6
docker: Cache versionless builds before building versioned go binaries (#921)
The way that git-related version information is linked into go binaries
busts Docker's cache such that every commit causes all binaries to
rebuilt.

In order to ameliorate this, we can build each binary once without
version information first so that its artifacts are cached. When Go
sources are not changed and only the version information changes, builds
are 4.3x faster than before (from 5+ minutes to <90s).

On `master`

Branch off of master and build (mostly cached):

```
:; time DOCKER_TRACE=1 bin/docker-build
...
DOCKER_TRACE=1 bin/docker-build  9.10s user 6.30s system 5% cpu 4:26.47 total
```

Rebuild without changing anything (highly cached):

```
:; time DOCKER_TRACE=1 bin/docker-build
...
DOCKER_TRACE=1 bin/docker-build  9.23s user 6.04s system 47% cpu 32.017 total
```

Update only the git sha and rebuild:

```
:; git ci -am 'bump it' --allow-empty
[ver/eg 2749eb3] bump it
:; time DOCKER_TRACE=1 bin/docker-build
...
DOCKER_TRACE=1 bin/docker-build  8.55s user 6.08s system 4% cpu 5:22.25 total
```

On this branch:

Rebuild without changing anything (highly cached):

```
:; time DOCKER_TRACE=1 bin/docker-build
...
DOCKER_TRACE=1 bin/docker-build  8.94s user 5.97s system 46% cpu 32.257 total
```

Update only the git sha and rebuild:

```
:; git ci -am 'bump it' --allow-empty
[ver/go-docker-cache-versionless 77a80b5] bump it
:; time DOCKER_TRACE=1 bin/docker-build
...
DOCKER_TRACE=1 bin/docker-build-cli-bin  2.02s user 1.34s system 9% cpu 34.144 total
```
2018-05-10 10:22:09 -07:00
Risha Mars 416381cdfd
Fix bug where GetPodsFor(pod) was returning all pods in a namespace (#900)
* Fix bug where GetPodsFor(pod) was returning all pods in a namespace

Problem
In lister.GetPodsFor, when the input object was a pod, we would return all the pods in the namespace. I would expect GetPodsFor(pod) to return only one pod - the pod itself.

Cause
The cause of this is that when the object type was pod we were setting the selector to selector = labels.Everything() which gets all the pods in the namespace.

Fix
Special case GetPodsFor(pod) to return the pod itself, rather than looking up pods via labels.
2018-05-08 13:52:49 -07:00
Risha Mars f94856e489
Modify the Stat endpoint to also return the number of failed conduit pods (#895)
* Modify the Stat endpoint to also return the count of failed pods
* Add comments explaining pod count stats
* Rename total pod count to running pod count

This is to support the service mesh overview page, as I'd like to include an indicator of
failed pods there.
2018-05-08 10:35:21 -07:00
Brian Smith c5d2dab8bd
Remove special support for ExternalName services (#764)
After this was implemented we found that ExternalName services are
represented in DNS as CNAMEs, which means that the proxy's DNS
fallback logic can be used instead of doing DNS in the control
plane. Besides simplifying the controller, this will also increase
fidelity with the proxied pods' DNS configuration (improve
transparency).

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-04-25 11:53:33 -10:00
Andrew Seigner dce31b888f
Deprecate Tap, rename TapByResource to Tap (#844)
The `conduit tap` command is now deprecated.

Replace `conduit tap` with `connduit tapByResource`. Rename tapByResource
to tap. The underlying protobuf for tap remains, the tap gRPC endpoint now
returns Unimplemented.

Fixes #804

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-25 12:24:46 -07:00
Andrew Seigner a0a9a42e23
Implement Public API and Tap on top of Lister (#835)
public-api and and tap were both using their own implementations of
the Kubernetes Informer/Lister APIs.

This change factors out all Informer/Lister usage into the Lister
module. This also introduces a new `Lister.GetObjects` method.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-24 18:10:48 -07:00
Andrew Seigner 03d4684d3b
Introduce K8s Lister, integrate simulate-proxy (#829)
The Kubernetes client-go Informer/Lister APIs are implemented in several
parts of the code base.

This change introduces a Lister module, providing Informer/Lister
capability through a simple interface. Once this merges, we can follow
up with moving public-api and tap onto Lister.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-23 16:44:19 -07:00
Andrew Seigner baf4ea1a5a
Implement TapByResource in Tap Service (#827)
The TapByResource endpoint was previously a stub.

Implement end-to-end tapByResource functionality, with support for
specifying any kubernetes resource(s) as target and destination.

Fixes #803, #49

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-23 16:13:26 -07:00
Eliza Weisman d9112abc93
proxy: remove unused metrics (#826)
This PR removes the unused `request_duration_ms` and `response_duration_ms` histogram metrics from the proxy. It also removes them from the `simulate-proxy` script's output, and from `docs/proxy-metrics.md`

Closes #821
2018-04-23 16:05:20 -07:00
Andrew Seigner 39eccb09e2
cli: standardize kubernetes resource parsing (#830)
The Tap command leveraged new cli parsing code, enabling Kubernetes
resources specified as `(TYPE [NAME] | TYPE/NAME)`. The Stat command
did not use this.

Modify the Stat command to use the same cli flag parsing code as Tap.
Remove the to/from-resource flags from Stat.

Fixes #792

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-23 15:17:42 -07:00
Eliza Weisman 8147a363e9
Make simulate-proxy match proxy output (#822)
This PR makes two changes to the `simulate-proxy` script: 

1.  Removed the `protocol={"http", "tcp"}` label from TCP metrics. The proxy no longer adds this label (see https://github.com/runconduit/conduit/pull/785#discussion_r182563499).

2. Fixed failed responses being labeled with `classification="fail"` rather than `classification="failure"` (the label the proxy sets). I noticed that while I was here and decided to fix it as well.

Note that the first change required some minor changes to the `proxyMetricCollectors` struct in `simulate-proxy`; since the label cardinality for TCP open stats decreased by one due to removing the `protocol` label, it's no longer necessary for that struct to `haveCounterVec`/`GaugeVec` pointers for these stats. It now owns the actual `Counter`/`Gauge` instead. This means that the metric vecs that are created to be labeled for `inbound` and `outbound` are now stored as variables in the `newSimulatedProxy` function rather than going in a `proxyMetricCollectors` struct first. This shouldn't impact behaviour at all.
2018-04-20 12:11:57 -07:00
Andrew Seigner 79bdc638b3
Service support in stat command (#809)
The `stat` command did not support `service` as a resource type.

This change adds `service` support to the `stat` command. Specifically:
- as a destination resource on `--to` commands
- as a target resource on `--from` commands

Fixes #805

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-19 16:51:20 -07:00
Eliza Weisman 6eec6256f7
Add transport-level metrics to simulate-proxy (#811)
This PR adds the transport-level metrics described in #742 to the `simulate-proxy` script. This will be useful while adding these metrics to the Grafana dashboard and/or CLI.

Closes #793
2018-04-19 15:18:43 -07:00
Andrew Seigner 293e00bc3e
Introduce tapByResource cli command (#802)
The existing `tap` command is being deprecated.

Introduce a `tapByResource` cli command. It supports tapping a Kubernetes
resource or collection of resources, optionally filtered by outbound resources.
This command will eventually replace `tap`.

Part of #778

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-19 14:44:23 -07:00
Kevin Lingerfelt 653dc6bfaa
Add replication controller stats in CLI (#794)
* Add replication controller stats in CLI
* Fix pod status in stat summary tests

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-04-18 18:12:14 -07:00
Oliver Gould 06dd8d90ee
Introduce the TapByResource API (#778)
This changes the public api to have a new rpc type, `TapByResource`.
This api supersedes the Tap api. `TapByResource` is richer, more closely 
reflecting the proxy's capabilities.

The proxy's Tap api is extended to select over destination labels,
corresponding with those returned by the Destination api.

Now both `Tap` and `TapByResource`'s responses may include destination
labels.

This change avoids breaking backwards compatibility by:

* introducing the new `TapByResource` rpc type, opting not to change Tap
* extending the proxy's Match type with a new, optional, `destination_label` field.
* `TapEvent` is extended with a new, optional, `destination_meta`.
2018-04-18 15:37:07 -07:00
Andrew Seigner 1e4ac8fda8
Destination service provides pod-template-hash (#784)
The Destination service does not provide ReplicaSet information to the
proxy.

The `pod-template-hash` label approximates selecting over all pods in a
ReplicaSet or ReplicationController. Modify the Destination service to
provide this label to the proxy.

Relates to #508 and #741

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-18 14:41:27 -07:00
Kevin Lingerfelt 71a51afb40
Expose pod stats in CLI, web UI, and Grafana (#788)
* Expose pod stats in CLI, web UI, and Grafana
* Fix js api helpers test
* Add outbound traffic stats to pod dashboard

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-04-18 11:26:47 -07:00
Andrew Seigner 9e8cce0838
Destination service returns "Running" pod labels (#781)
When the Destination sees an IP address, it looks up Pods by that IP,
and associates Pod label data to it. If the lookup by IP returned more
than one Pod, it simply picked the first one. This is not correct,
specifically in cases where one pod is in a Running state, and others
are not.

Modify the Destination service to only return label data for Pods in the
Running state.

Fixes #773

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-17 14:42:54 -07:00
Andrew Seigner 727521f914
Permit arbitrary time windows in public-api (#774)
The public-api previously only permitted 4 hard-coded time windows:
10s, 1m, 10m, 1h. This was primarily a relic of the recently removed
telemetry system.

Modify the public-api to validate the time string, but allow for any
window size, which is then passed through to Prometheus.

Fixes #686

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-16 17:37:17 -07:00
Kevin Lingerfelt 11a4359e9a
Misc cleanup following the telemetry rewrite (#771)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-04-16 15:51:07 -07:00
Andrew Seigner 77fb6d3709
Add namespace as a resource type in public-api (#760)
* Add namespace as a resource type in public-api

The cli and public-api only supported deployments as a resource type.

This change adds support for namespace as a resource type in the cli and
public-api. This also change includes:
- cli statsummary now prints `-`'s when objects are not in the mesh
- cli statsummary prints `No resources found.` when applicable
- removed `out-` from cli statsummary flags, and analagous proto changes
- switched public-api to use native prometheus label types
- misc error handling and logging fixes

Part of #627

Signed-off-by: Andrew Seigner <siggy@buoyant.io>

* Refactor filter and groupby label formulation

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>

* Rename stat_summary.go to stat.go in cli

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>

* Update rbac privileges for namespace stats

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-04-13 16:53:01 -07:00
Andrew Seigner 21886760c6
Use apps/v1beta2 for Kubernetes 1.8 compatibility (#762)
Conduit was relying on apps/v1 to Deployment and ReplicaSet APIs.
apps/v1 is not available on Kubernetes 1.8. This prevented the
public-api from starting.

Switch Conduit to use apps/v1beta2. Also increase the Kubernetes API
cache sync timeout from 10 to 60 seconds, as it was taking 11 seconds on
a test cluster.

Fixes #761

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-13 12:08:16 -07:00
Kevin Lingerfelt fb15fe7c1a
Remove the telemetry service (#757)
* Remove the telemetry service

The telemetry service is no longer needed, now that prometheus scrapes
metrics directly from proxies, and the public-api talks directly to
prometheus. In this branch I'm removing the service itself as well as
all of the telemetry protobuf, and updating the conduit install command
to no longer install the service. I'm also removing the old version of
the stat command, which required the telemetry service, and renaming the
statsummary command to stat.

* Fix time window tests

* Remove deprecated controller scrape config

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-04-13 11:21:29 -07:00
Andrew Seigner e9b209829d
Handle NaN metrics (#750)
The Prometheus client sometimes returns NaN if a calculation is invalid,
such as histogram_quantile when no requests have occurred.

Add IsNaN check in the public-api and set output to zero.

Fixes #747

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-12 15:21:00 -07:00
Andrew Seigner 624b87f743
Implement ListPods in public-api (#743)
The ListPods endpoint's logic resides in the telemetry service, which is
going away.

Move ListPods logic into public-api, use new k8s informer APIs.

Fixes #694

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-11 17:53:57 -07:00
Kevin Lingerfelt 47caf1ca07
Add --all-namespaces flag to CLI statsummary command (#745)
* Add --all-namespaces flag to CLI statsummary command

* Fix statsummary output formatting

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-04-11 16:40:25 -07:00
Andrew Seigner 259fdcd134
Add latency stats in new stat summary endpoint (#737)
The new StatSummary endpoint was only providing request volume and
successs rate information.

Add support for retrieving latency stats via StatSummary. Also make
all prometheus calls in parallel, and implement kubernetes test
fixtures.

Fixes #681

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-11 11:58:32 -07:00
Kevin Lingerfelt e1e1b6b599
Controller: add more destination labels, fix service label (#731)
* Add more destination labels, fix service label

* Update owner labels to match proxy metrics docs

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-04-11 10:44:52 -07:00
Kevin Lingerfelt 91c359e612
Switch public API to use cached k8s resources (#724)
* Switch public API to use cached k8s resources
* Move shared informer code to separate goroutine
* Fix spelling issue

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-04-10 11:39:31 -07:00
Andrew Seigner 3a341abe9a
Fix success rate calculation in public api (#723)
The success rate calculation relies on the `classification` label, but
was incorrectly specifying `fail` rather than `failure`.

Fix public api to specify `failure`. Also re-org public api tests for
easier Kubernetes and Prometheus mocking.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-10 11:04:04 -07:00
Andrew Seigner 716b392231
Move StatSummary logic into grpc server (#717)
The StatSummary logic was implemented as a method on http_server.

Move the StatSummary logic into grpc_server, for consistency with the
other endpoints.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-06 16:46:15 -07:00
Andrew Seigner 50c323c617
Use canonical k8s names, fix prom labels (#702)
The new statsummary command accepted friendly k8s names, which worked
for k8s queries, but Prometheus requires a specific key.

Modify the statsummary query to map friendly k8s names to canonical k8s
names when constructing the query. Then during the query, map the
canonical k8s name to a specific Prometheus label.

Fixes #695

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-06 12:34:54 -07:00
Risha Mars 2f5b5ea5f2
Start implementing conduit stat summary endpoint (#671)
Start implementing new conduit stat summary endpoint. 
Changes the public-api to call prometheus directly instead of the
telemetry service. Wired through to `api/stat` on the web server,
as well as `conduit statsummary` on the CLI. Works for deployments only.

Current implementation just retrieves requests and mesh/total pod count 
(so latency stats are always 0). 

Uses API defined in #663
Example queries the stat endpoint will eventually satisfy in #627

This branch includes commits from @klingerf 

* run ./bin/dep ensure
* run ./bin/update-go-deps-shas
2018-04-05 17:05:06 -07:00
Andrew Seigner 28d5007cdf
Harmonize Prometheus label usage (#690)
The Destination service used slightly different labels than the
telemetry pipeline expected, specifically, prefixed with `k8s_*`.

Make all Prometheus labels consistent by dropping `k8s_*`. Also rename
`pod_name` to `pod` for consistency with `deployement`, etc. Also update
and reorganize `proxy-metrics.md` to reflect new labelling.

Fixes #655

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-05 15:09:06 -07:00
Risha Mars d1a39ea6bf
Define a new telemetry Stat API (#663)
* Define a new telemetry Stat API

Proposal definition for a new Stat API, for the purposes of satisfying the queries proposed in #627.
StatSummary will replace Stat once implemented and the original Stat deleted.
2018-04-03 14:45:58 -07:00
Phil Calçado 19001f8d38 Add pod-based metric_labels to destinations response (#429) (#654)
* Extracted logic from destination server
* Make tests follow style used elsewhere in the code
* Extract single interface for resolvers
* Add tests for k8s and ipv4 resolvers
* Fix small usability issues
* Update dep
* Act on feedback
* Add pod-based metric_labels to destinations response
* Add documentation on running control plane to BUILD.md

Signed-off-by: Phil Calcado <phil@buoyant.io>

* Fix mock controller in proxy tests (#656)

Signed-off-by: Eliza Weisman <eliza@buoyant.io>

* Address review feedback
* Rename files in the destination package

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-04-02 18:36:57 -07:00
Brian Smith df9ead9c36
Use Go 1.10.1 to build all Go code. (#650)
Go 1.10.1 is a security release.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-04-02 14:58:30 -10:00
Andrew Seigner 97546e0646
Modify simulate-proxy to be more pod-centric (#653)
simulate-proxy uses a deployment object from kubernetes to simulate
each proxy metrics endpoint.

Modify simulate-proxy to instead use a pod to simulate each proxy
metrics endpoint. This ensures that each metrics endpoint consistently
represents a pod in kubernetes, including it's namespace, deployment,
and label information.

This change also adds support for:
- a new `metric-ports` flag, default is `10000-10009`.
- `classification`, `pod_name`, and `pod_template_hash` labels

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-03-30 13:28:45 -07:00
Phil Calçado bbed49c5bd Refactor destination service and add tests in preparation to add information about labels (#645)
* Extracted logic from destination server

* Make tests follow style used elsewhere in the code

* Extract single interface for resolvers

* Add tests for k8s and ipv4 resolvers

* Fix small usability issues

* Update dep

* Act on feedback

Signed-off-by: Phil Calcado <phil@buoyant.io>
2018-03-30 11:36:48 -07:00
Andrew Seigner 1ed4a93b5e
Higher velocity metrics from simulate-proxy (#635)
simulate-proxy increments a single set of metrics on each iteration, and
also randomizes http status codes, leaving counters unchanged across
several collections.

Modify simuilate-proxy to increment all metrics on each iteration,
provide a 90% success rate, ensure a pod does not call itself, and
increase proxy count from 3 to 10.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-03-28 13:30:02 -07:00
Kevin Lingerfelt 59c75a73a9
Add tests/utils/scripts for running integration tests (#608)
* Add tests/utils/scripts for running integration tests

Add a suite of integration tests in the `test/` directory, as well as
utilities for testing in the `testutil/` directory.

You can use the `bin/test-run` script to run the full suite of tests,
and the `bin/test-cleanup` script to cleanup after the tests.

The test/README.md file has more information about running tests.

@pcalcado, @franziskagoltz, and @rmars also contributed to this change.

* Create TEST.md file at the root of the repo

* Update based on review feedback

* Relax external service IP timeout for GKE

* Update TEST.md with more info about different types of test runs

* More updates to TEST.md based on review feedback

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-03-27 15:06:55 -07:00
Andrew Seigner fe35509406
Clean up Prometheus labels scraped from proxy (#633)
The Prometheus scrape config collects from Conduit proxies, and maps
Kubernetes labels to Prometheus labels, appending "k8s_".

This change keeps the resultant Prometheus labels consistent with their
source Kubernetes labels. For example: "deployment" and
"pod_template_hash".

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-03-27 15:01:08 -07:00
Brian Smith 7dc21f9588
Add the NoEndpoints message to the Destination API (#564)
Have the controller tell the client whether the service exists, not
just what are available. This way we can implement fallback logic to
alternate service discovery mechanisms for ambigious names.

Signed-off-by: Brian Smith <brian@briansmith.org>
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-03-27 10:45:41 -10:00
Andrew Seigner 12c6531546
Update docker-compose environment to match prod (#609)
The Prometheus config in the docker-compose environment had fallen
behind the prod setup.

This change updates the docker-compose environment in the following
ways:
- Prometheus config more closely matches prod, based on #583
- simulate-proxy labels matches prod, based on #605
- add Grafana container

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-03-23 17:00:39 -07:00
Dennis Adjei-Baah b90668a0b5
Modify simulate proxy to expose prometheus metrics (#576)
The simulate-proxy script pushes metrics to the telemetry service. This PR modifies the script to expose metrics to a prometheus endpoint. This functionality creates a server that randomly generates response_total, request_totals, response_duration_ms and response_latency_ms. The server reads pod information from a k8s cluster and picks a random namespace to use for all exposed metrics.

Tested out these changes with a locally running prometheus server. I also ran the docker-compose.yml to make sure metrics were being recorded by the prometheus docker container.

fixes #498

Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
2018-03-21 16:40:12 -07:00
Alena Varkockova b82f89f4d9 Reuse code for metrics serving in controller (#585)
Signed-off-by: Alena Varkockova varkockova.a@gmail.com
2018-03-19 10:33:25 -07:00
Alex Leong 9eb084c99d Most controller listeners should only bind on localhost (#494)
* Most controller listeners should only bind on localhost
* Use default listening addresses in controller components
* Review feedback
* Revert test_helper change
* Revert use of absolute domains

Signed-off-by: Alex Leong <alex@buoyant.io>
2018-03-12 11:32:20 -07:00
Dennis Adjei-Baah ad42f2f8ab
Retry k8s watch endpoints on error (#510)
Shortly after conduit is installed in k8s environment. The control plane component that establishes a watch endpoint with k8s run in to networking issues during proxy initialization. During failure, each watcher fails to retry its connection to k8s watch endpoint which leads to timeouts and eventually, multiple controller pod restarts.

This PR adds retry logic to each "watch" enabled package.

fixes #478

Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
2018-03-07 13:40:43 -08:00
Dennis Adjei-Baah 5a4c5aa683
Exclude telemetry generated by the control plane when requesting depl… (#493)
When the conduit proxy is injected into the controller pod, we observe controller pod proxy stats show up as an "outbound" deployment for an unrelated upstream deployment. This may cause confusion when monitoring deployments in the service mesh.

This PR filters out this "misleading" stat in the public api whenever the dashboard requests metric information for a specific deployment.

* exclude telemetry generated by the control plane when requesting deployment metrics

fixes #370

Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
2018-03-05 17:58:08 -08:00
Andrew Seigner 698e65da8b
Fix flakey dns_test (#516)
The dns_test had assumed DNS changes were deterministically ordered, but
util.DiffAddresses uses a map and therefore does not guarantee ordering.

Fix dns_test to sort TCP Addresses prior to comparison.

Fixes #515

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-03-05 16:50:33 -08:00
Kevin Lingerfelt 8e2ef9d658
Handle ExternalName-type svcs in destination service (#490)
* Handle ExternalName-type svcs in destination service

* Move refresh interval to a global var

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-03-02 11:30:53 -08:00
Alex Leong 9b4e847555
Add DNS label validation in destination service (#464)
Add a validation in the destination service that ensures that DNS destinations consist of valid labels.

Signed-off-by: Alex Leong <alex@buoyant.io>
2018-03-01 15:49:49 -08:00
Kevin Lingerfelt e57e74056e
Run go fix to fix context package imports (#470)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-02-28 13:25:33 -08:00
Alex Leong 84ba1f3017
Ensure tap requests at least 1rps from each pod (#459)
When attempting to tap N pods when N is greater than the target rps, a rounding error occurs that requests 0 rps from each pod and no tap data is returned.

Ensure that tap requests at least 1 rps from each target pod.

Tested in Kubernetes on docker-for-desktop with a 15 replica deployment and a maxRps of 10.

Signed-off-by: Alex Leong <alex@buoyant.io>
2018-02-27 16:03:47 -08:00
Brian Smith 78ebd5e340
Base control plane Docker images on scratch instead of base. (#368)
The control plane is proxied through the Conduit proxy. The Conduit
proxy is based on the base image, and the control plane containers
and the proxy share a networking namespace. This means we don't
need the extra base utilities in the controller images since we can
use the utilties in the proxy image.

This is a step towards building the initial no-networking Conduit CA
pod. Since the Conduit CA will not do any networking of its own, we
networking debugging utilties are not helpful for it. They are
actually an unnecessary risk because they could facilitate the
exfiltration of the private key of the CA. (The Conduit CA pod won't
have the Conduit Proxy injected into it either.)

This also simplifies & slightly speeds up the building of the
controller images. This is a stepping stone towards being able to
build the controller images without `docker build` to improve build
times.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-02-23 13:03:19 -10:00
Brian Smith cf3c8cd7bc
Use Go 1.10.0 to build Go components. (#408)
* Use Go 1.10.0 to build Go components.

Take advantage of the new build cache in Go 1.10. Future work on improving
build performance will utilize the build cache further.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-02-21 14:31:29 -10:00
Brian Smith e6aad57766
Remove temporary files generated by dep in go-deps image. (#407)
Previously Dockerfile-go-deps was converted from a multi-stage Dockefile
to a single-stage Dockerfile in anticipation of enabling efficient use
of `--cache-from` in CI. However, that resulted in the image ballooning
in size because it contained the Git repo for every package downloaded
by `dep ensure`.

Bring the image back down to the proper size by removing the temporary
files created.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-02-21 13:06:24 -10:00
Alex Leong 552204366c
Use Prometheus to track added data plane pods. (#338)
The instance cache that powers the ListPods API is stored in memory in the telemetry service. This means that when there are multiple replicas of the telemetry service, each replica will have a distinct, incomplete view of the added pods based on which pods report to that telemetry replica. This causes the data plane bubbles on the dashboard to not all be filled in, and to flicker with each data refresh.

We create a Prometheus counter called reports_total which has pod as a label. Whenever a telemetry service instance receives a report from a pod, it increments reports_total for that pod. This allows us to remove the in-memory instance cache and instead query Prometheus to see if each pod has had a report in the last 30 seconds.

Fixes #337

Signed-off-by: Alex Leong <alex@buoyant.io>
2018-02-14 16:09:55 -08:00
Andrew Seigner 1db7d2a2fb
Ensure latency quantile queries match timestamps (#348)
In PR #298 we moved time window parsing (10s => (time.now - 10s,
time.now) down the stack to immediately before the query. This had the
unintended effect of creating parallel latency quantile requests with
slightly different timestamps.

This change parses the time window prior to latency quantile fan out,
ensuring all requests have the same timestamp.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-02-13 16:26:54 -08:00
Andrew Seigner 50f4aa57e5
Require timestamp on all telemetry requests (#342)
PR #298 moved summary (non-timeseries) requests to Prometheus' Query
endpoint, with no timestamp provided. This Query endpoint returns a
single data point with whatever timestamp was provided in the request.
In the absense of a timestamp, it uses current server time. This causes
the Public API to return discreet data points with slightly different
timestamps, which is unexpected behavior.

Modify the Public API -> Telemetry -> Prometheus request path to always
require a timestamp for single data point requests.

Fixes #340

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-02-13 13:52:21 -08:00
Brian Smith b18fe459d4
Precompile large Go libraries in go-deps Docker image. (#332)
On my system (i9-7960x running Docker natively in Linux) this regularly saves
over 11 seconds of build time when a file under pkg/ changes and over 1.5
seconds of build time when a file under controller/ changes. Since most
contributors are running Docker in a VM on less powerful computers, the
savings for most contributors should be significantly greater.

I imagine the savings for web/ and cli/ and proxy-init/ are similar, but I
did not measure them.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-02-13 11:35:10 -10:00
Brian Smith 37008f9626
Improve caching behavior of controller/Dockerfile. (#331)
Precompiling pkg/ in an earlier layer saves ~10 seconds of wall clock
time on an incremental build on my machine (i9-7960x) when I update a
file in controller/ such as controller/destination/server.go. This
makes a significant difference in the edit-build-test loop.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-02-13 11:21:22 -10:00
Brian Smith ec5a02fd64
Upgrade to Go 1.9.4. (#326)
Go 1.9.4 is a security release.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-02-12 13:47:40 -10:00