Commit Graph

1021 Commits

Author SHA1 Message Date
Alex Leong 380ec52a39
Rework routes command to accept any resource (#1921)
We rework the routes command so that it can accept any Kubernetes resource, making it act much more similarly to the stat command.

Signed-off-by: Alex Leong <alex@buoyant.io>
2018-12-05 11:11:34 -08:00
Alex Leong 4f3e55e937
Rename path to path_regex in ServiceProfile CRD (#1923)
We rename path to path_regex in the ServiceProfile CRD to make it clear that this field accepts a regular expression. We also take this opportunity to remove unnecessary line anchors from regular expressions now that these anchors are added in the proxy.

Signed-off-by: Alex Leong <alex@buoyant.io>
2018-12-05 10:42:47 -08:00
Risha Mars 7949a3355c
s/Request Rate/RPS (#1927) 2018-12-04 17:29:31 -08:00
Kevin Lingerfelt 37ae423bb3
Add linkerd- prefix to all objects in linkerd install (#1920)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-12-04 15:41:47 -08:00
Andrew Seigner ad2366f208
Revert proxy readiness initialDelaySeconds change (#1912)
Reverts part of #1899 to workaround readiness failures.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-12-04 14:27:55 -08:00
Oliver Gould ffa302eb6a
proxy: Update for debug logging (#1922)
commit 68f42c337f2580f3b33ddab2e01540f6849d0d1a (HEAD -> master, origin/master)
Author: Oliver Gould <ver@buoyant.io>
Date:   Tue Dec 4 07:45:20 2018 -0800

    Log discovery updates in the outbound proxy (#153)

    When debugging issues that users believe is related to discovery, it's
    helpful to get a narrow set of logs out to determine whether the proxy
    is observing discovery updates.

    With this change, a user can inject the proxy with
    ```
    LINKERD2_PROXY_LOG='warn,linkerd2_proxy=info,linkerd2_proxy::app::outbound::discovery=debug'
    ```
    and the proxy's logs will include messages like:

    ```
    DBUG voting-svc.emojivoto.svc.cluster.local:8080 linkerd2_proxy::app::outbound::discovery adding 10.233.70.98:8080
    DBUG voting-svc.emojivoto.svc.cluster.local:8080 linkerd2_proxy::app::outbound::discovery removing 10.233.66.36:8080
    ```

    This change also turns-down some overly chatty INFO logging in main.
2018-12-04 12:13:45 -08:00
Risha Mars 92d92d3f9b
Move the determining of display order into the cli query module (#1917)
[web UI] Previously, we were specifying the display order to display the cli flags in the
QueryToCliCmd module. But this order is pretty standard for each command, and
I'd like to avoid hardcoding that list everywhere.

Move the handling of order into the QueryToCliCmd module.
2018-12-04 10:08:59 -08:00
Risha Mars e8a39cd17e
Add ability to download a service profile template from the web UI (#1893)
Adds an endpoint, at /profiles/new that allows you to input a service name and
namespace, and download a service profile yaml template. 

This will enable future work, where we can add more of the yaml customization via 
a form in the dashboard, and use that data to help the user configure routes.
2018-12-03 16:48:43 -08:00
Oliver Gould baa7436cc7
Bump the proxy version to fix integration tests (#1914)
A Tap integration test fails and has been fixed by
linkerd/linkerd2-proxy#152.

This change bumps the proxy version to get this change, as well as an
upgrade to the `h2` library for bugfixes.
2018-12-03 16:30:35 -08:00
Andrew Seigner 37a5455445
Add filtering by job in stat, tap, top; fix panic (#1904)
Filtering by Kubernetes job was not supported. Also filtering by any unknown
type caused a panic.

Add filtering support by Kubernetes job, with special case mapping `job` to
`k8s_job`, to not conflict with Prometheus' job label.

Fix panic when unknown type specified as a `--from` or `--to` flag.

Fix `job` label from `linkerd-proxy` overwriting Prometheus `job` label at
collection time. This caused all metrics collected by proxy sidecars in
Kubernetes jobs to be collected into an incorrect Prometheus job, rather than
the expected `linkerd-proxy` Prometheus job.

Fix `unsupported resource type` tap error message incorrectly printing the
target resource rather than the destination.

Set `--controller-log-level debug` in `install_test.go` for easier debugging.

Expose `slow-cooker`'s metrics via a k8s service in the tap integration test, to
validate proxy requests with a job as destination.

Fixes #1872
Part of #627

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-12-03 15:34:49 -08:00
Oliver Gould 926395f616
tap: Include route labels in tap events (#1902)
This change alters the controller's Tap service to include route labels
when translating tap events, modifies the public API to include route
metadata in responses, and modifies the tap CLI command to include
rt_ labels in tap output (when -o wide is used).
2018-12-03 13:52:47 -08:00
Risha Mars dd44e10a58
Display the correct command name for the cli equivalent (#1907)
Previously, we were passing in "tap" as the command name for both the tap and
top forms, resulting in the equivalent CLI command always being linkerd tap
regardless of whether you were in the Tap or Top view.

Fix this to correctly pass in tap or top depending on the page.
2018-12-03 12:22:00 -08:00
Risha Mars c108cd260d
Upgrade to material-ui 3.6.1 (#1906) 2018-12-03 11:59:45 -08:00
Andrew Seigner d121071f87
Adjust proxy, Prometheus, and Grafana probes (#1899)
* Adjust proxy, Prometheus, and Grafana probes

High `readinessProbe.initialDelaySeconds` values delayed the controller's
readiness by up to 30s, preventing cli commands from succeeding shortly after
control plane deployment.

Decrease `readinessProbe.initialDelaySeconds` in the proxy, Prometheus, and
Grafana to the default 0s. Also change `linkerd check` controller pod ordering
to: controller, prometheus, web, grafana.

Detailed probe changes:
- proxy
  - decrease `readinessProbe.initialDelaySeconds` from 10s to 0s
- prometheus
  - decrease `readinessProbe.initialDelaySeconds` from 30s to 0s
  - decrease `readinessProbe.timeoutSeconds` from 30s to 1s
  - decrease `livenessProbe.timeoutSeconds` from 30s to 1s
- grafana
  - decrease `readinessProbe.initialDelaySeconds` from 30s to 0s
  - decrease `readinessProbe.timeoutSeconds` from 30s to 1s
  - decrease `readinessProbe.failureThreshold` from 10 to 3
  - increase `livenessProbe.initialDelaySeconds` from 0s to 30s

Fixes #1804

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-12-03 10:41:11 -08:00
Alex Leong f9d66cf4de
Add --open-api option to linkerd profiles command (#1867)
The `--open-api` flag is an alternative to the `--template` flag for the `linkerd profile` command.  It reads an OpenAPI specification file (also called a swagger file) and uses it to generate a corresponding service profile.

Signed-off-by: Alex Leong <alex@buoyant.io>
2018-11-30 09:25:19 -08:00
Risha Mars 7a2689ce4a
Improve the top routes request form and code structure. (#1886)
Separates out the querying and table display of route data, so that this module
can be easily placed in other places in the UI. 

Adds usability improvements to the routes query form at /routes:
- displays CLI equivalent 
- adds dropdown with populated options for service / namespace 

As part of this work, made TapQueryCliCmd more generic, so it can work
for other CLI commands besides tap/top.
2018-11-29 10:20:53 -08:00
Andrew Seigner e9aa9114e1
Release notes for edge-18.11.3 (#1885)
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-11-28 15:41:27 -08:00
Andrew Seigner 34d9eef03e
proxy injector: insert at end of arrays (#1881)
When using `--proxy-auto-inject` with Kuberntes `v1.9.11`, observed auto
injector incorrectly merging list elements rather than inserting new
ones. This issue was not reproducible on `v1.10.3`.

For example, this input:
```
spec:
  template:
    spec:
      containers:
      - name: vote-bot
        command:
        - emojivoto-vote-bot
```

Would yield:
```
spec:
  template:
    spec:
      containers:
      - name: linkerd-proxy
        command:
        - emojivoto-vote-bot
      - name: vote-bot
        command:
        - emojivoto-vote-bot
```

This change replaces json patch specs like
`/spec/template/spec/containers/0` with
`/spec/template/spec/containers/-`. The former is intended to insert at
the beggining of a list, the latter at the end. This also simplifies the
code a bit and more closely aligns with the intent of injecting at the
end of lists.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-11-28 14:21:18 -08:00
Alex Leong 835e34b500
Left-align the routes column and sort by route name (#1879)
Signed-off-by: Alex Leong <alex@buoyant.io>
2018-11-28 09:32:59 -08:00
Risha Mars d9539bcb37
Add the top routes feature to the dashboard UI (#1868)
Adds a (currently not displayed in sidebar, but available at /routes) page to
mirror the current functionality of `linkerd routes <service>`. So far, this is just a
barebones form and table, but it works.

Adds a /api/routes path and handler to the api to receive TopRoutes requests from the web.
2018-11-27 16:53:10 -08:00
Risha Mars f8583df4db
Add ListServices to controller public api (#1876)
Add a barebones ListServices endpoint, in support of autocomplete for services.
As we develop service profiles, this endpoint could probably be used to describe
more aspects of services (like, if there were some way to check whether a
service profile was enabled or not).

Accessible from the web UI via http://localhost:8084/api/services
2018-11-27 11:34:47 -08:00
Alex Leong 73836f05cf
Update proxy version and use canonicalized dst (#1866)
The `linkerd` routes command only supports outbound metrics queries (i.e. ones with the `--from` flag).  Inbound queries (i.e. ones without the `--from` flag) never return any metrics.

We update the proxy version and use the new canonicalized form for dst labels to gain support for inbound metrics as well.

Signed-off-by: Alex Leong <alex@buoyant.io>
2018-11-26 17:20:07 -08:00
Oliver Gould ba11698d4b
tap: Use nil-safe protobuf accessors (#1873)
The tap server accesses protobuf fields directly instead of using the
`Get*()` accessors. The accessors are necessary to prevent dereferencing
a nil pointer and crashing the tap service.

Furthermore, these maps are explicitly initialized when `nil` to support
label hydration.
2018-11-26 14:14:28 -08:00
Ben Lambert 297cb570f2 Added a --ha flag to install CLI (#1852)
This change allows some advised production config to be applied to the install of the control plane.
Currently this runs 3x replicas of the controller and adds some pretty sane requests to each of the components + containers of the control plane.

Fixes #1101

Signed-off-by: Ben Lambert <ben@blam.sh>
2018-11-20 23:03:59 -05:00
Alex Leong 7a7f6b6ecb
Add TopRoutes method the the public api and route CLI command to consume it (#1860)
Add a routes command which displays per-route stats for services that have service profiles defined.

This change has three parts:
* A new public-api RPC called `TopRoutes` which serves per-route stat data about a service
* An implementation of TopRoutes in the public-api service.  This implementation reads per-route data from Prometheus.  This is very similar to how the StatSummaries RPC and much of the code was able to be refactored and shared.
* A new CLI command called `routes` which displays the per-route data in a tabular or json format.  This is very similar to the `stat` command and much of the code was able to be refactored and shared.

Note that as of the currently targeted proxy version, only outbound route stats are supported so the `--from` flag must be included in order to see data.  This restriction will be lifted in an upcoming change once we add support for inbound route stats as well.

Signed-off-by: Alex Leong <alex@buoyant.io>
2018-11-19 12:20:30 -08:00
Kevin Lingerfelt 4d4b1ebc89
Update CHANGES.md for edge-18.11.2 release (#1864)
* Update CHANGES.md for edge-18.11.2 release
* Add issue number for same src/dst issue

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-11-15 14:39:25 -08:00
Dennis Adjei-Baah 214540c823
Add new iptable rule to for outbound traffic (#1863)
When requests from a pod send requests to itself, the proxy properly redirects traffic from the originating container in the pod through the outbound listener of the proxy. Once the request ends on the inbound side of the proxy, it skips the proxy and calls the original container that made the request. This can cause problems for containers that serve HTTP as the proxy naively tries to initiate an HTTP/2 connection to the destination of a request.  (See #1585 for a concrete example)

This PR adds a new iptable rule, coupled with a proxy [change](https://github.com/linkerd/linkerd2-proxy/pull/122) ensure that requests from a that occur in the aforementioned scenario, always redirect to the inbound listener of the proxy first.

fixes #1585

Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
2018-11-15 13:56:45 -08:00
Kevin Leimkuhler c68693e820
Fix stat filtering for `--from` queries (#1856)
# Problem
When we add a `--from` query to `linkerd stat au` we get more rows than if we would have just run `linkerd stat au`.

Adding a `--from` causes an extra row to be added, and the named authority to be ignored (this is the result we would have expected when running `linkerd stat au -n emojivoto --from deploy/web`).

# Solution
Destination query labels are now appended to `labels` so that those labels can be filtered on.

# Validation
Tests have been updated to reflect the expected expected destination labels now appended in `--from` queries.

Fixes #1766 

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2018-11-14 10:52:27 -08:00
Alejandro Pedraza bbcf5a8c9f Allow stat summary to query for multiple resources (#1841)
* Refactor util.BuildResource so it can deal with multiple resources

First step to address #1487: Allow stat summary to query for multiple
resources

Signed-off-by: Alejandro Pedraza <alejandro.pedraza@gmail.com>

* Update the stat cli help text to explain the new multi resource querying ability

Propsal for #1487: Allow stat summary to query for multiple resources

Signed-off-by: Alejandro Pedraza <alejandro.pedraza@gmail.com>

* Allow stat summary to query for multiple resources

Implement this ability by issuing parallel requests to requestStatsFromAPI()

Proposal for #1487

Signed-off-by: Alejandro Pedraza <alejandro.pedraza@gmail.com>

* Update tests as part of multi-resource support in `linkerd stat` (#1487)

- Refactor stat_test.go to reuse the same logic in multiple tests, and
add cases and files for json output.
- Add a couple of cases to api_utils_test.go to test multiple resources
validation.

Signed-off-by: Alejandro Pedraza <alejandro.pedraza@gmail.com>

* `linkerd stat` called with multiple resources should keep an ordering (#1487)

Add SortedRes holding the order of resources to be followed when
querying `linkerd stat` with multiple resources

Signed-off-by: Alejandro Pedraza <alejandro.pedraza@gmail.com>

* Extra validations for `linkerd stat` with multiple resources (#1487)

Signed-off-by: Alejandro Pedraza <alejandro.pedraza@gmail.com>

* `linkerd stat` resource grouping, ordering and name prefixing (#1487)

- Group together stats per resource type.
- When more than one resource, prepend name with type.
- Make sure tables always appear in the same order.

Signed-off-by: Alejandro Pedraza <alejandro.pedraza@gmail.com>

* Allow `linkerd stat` to be called with multiple resources

A few final refactorings as per code review.

Fixes #1487

Signed-off-by: Alejandro Pedraza <alejandro.pedraza@gmail.com>
2018-11-14 10:44:04 -08:00
Kevin Lingerfelt 4547ba7f0a
Make permission checks non-fatal, add check for CRDs (#1859)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-11-14 10:29:04 -08:00
Igor Zibarev 60bcdb15f9 controller: use GetConfig from pkg/k8s package (#1857)
This commit removes duplicate logic that loads Kubernetes config and
replaces it with GetConfig from pkg/k8s. This also allows to load
config from default sources like $KUBECONFIG instead of explicitly
passing -kubeconfig option to controller components.

Signed-off-by: Igor Zibarev <zibarev.i@gmail.com>
2018-11-13 14:41:31 -08:00
Alena Varkockova fda834cf64 Allow retrying control plane API check (#1858)
Signed-off-by: Alena Varkockova <varkockova.a@gmail.com>
2018-11-13 10:52:50 -08:00
Igor Zibarev 9b3f67e3dc Fix comment typos (#1855)
Signed-off-by: Igor Zibarev <zibarev.i@gmail.com>
2018-11-12 15:27:22 -08:00
Alena Varkockova 38dfc5308f Make version checks warning (#1844)
Signed-off-by: Alena Varkockova <varkockova.a@gmail.com>
2018-11-09 09:48:14 -08:00
Dennis Adjei-Baah 15e87bfd8d
Increase retry timeout for retryable tests, refactor RetryFor (#1835)
When running integration tests in a Kubernetes cluster that sometimes takes a little longer to get pods ready, the integration tests fail tests too early because most tests have a retry timeout of 30 seconds. 

This PR bumps up this retry timeout for `TestInstall` to 3 minutes. This gives the test enough time to download any new docker images that it needs to complete succesfully and also reduces the need to have large timeout values for subsequent tests. This PR also refactors `CheckPods` to check that all containers in a pods for a deployment are in a`Ready` state. This helps also helps in ensuring that all docker images have been downloaded and the pods are in a good state.

Tests were run on the community cluster and all were successful.

Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
2018-11-06 16:03:58 -08:00
Dennis Adjei-Baah dfaf3b1e1b
bump proxy version to 5e0a15b (#1842)
Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
2018-11-06 13:20:52 -08:00
Dennis Adjei-Baah 1c79f0146c
add control plane check before running proxy test assertion (#1836)
This PR adds a check before the TestCheckProxy executes the CLI command to make sure the control plane has installed all its pods. This avoids the situation where the test would fail because the pods are retrying the data plane check. The PR's base is #1835 but will switch to master once that PR merges.

Relates to PR #1835
Fixes #1733

Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
2018-11-02 13:30:06 -07:00
Oliver Gould 747fd328e9
grafana: Show TCP closes by errno (#1839)
linkerd/linkerd2-proxy#116 removes the `classification` label for the
`tcp_close_total` metric because TCP sockets that close with an error
do not actually indicate any sort of failure -- many graceful shutdown
situations can still cause a socket error.

This change uses the `errno` label to enumerate tcp_close_total metrics.
2018-11-02 10:20:11 -07:00
Risha Mars a6c5f9820e
Update CHANGES.md for the edge-18.11.1 release (#1840)
* Update CHANGES.md for the edge-18.11.1 release
2018-11-01 14:02:12 -07:00
Risha Mars 6d8911090d
Rewrite octopus arm code to be more parameterized and flexible. (#1834)
As a result, displays better in the material UI version of the dashboard.
Also adds Success rate to data displayed on neighbour nodes.

* Rewrite octopus arm code to be more parameterized and flexible.
As a result, displays better in the material UI version of the dashboard.
* Add Success rate to data displayed on neighbour nodes
* Fix variablilty in grid spacing by fixing the max and min widths of the chart,
and by scrolling the overflow
* Center the octopus graph so it looks better at full width
* Also add padding, so that the drop shadows aren't cut off
2018-11-01 12:30:19 -07:00
Andrew Seigner f777f87924
Fix grid alignment in tap query form (#1833)
Fixes #1789

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-10-31 14:10:16 -07:00
Andrew Seigner 1eb93b670a
Replace some sidebar icons with Font Awesome (#1830)
This replaces a couple of the MaterialUI icons introduced in #1776 with
their original counterparts in Font Awesome, but wrapped in a MaterialUI
`Icon` tag. Also fix Linkerd logo padding in sidebar.

Part of #1781.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-10-31 13:42:55 -07:00
Andrew Seigner 9a49f96a9a
Fix popovers to be reachable and clickable (#1831)
The popover on the src/dst column in the top and tap tables disappeared
before a use could click on it.

Modify the popovers to be reachable, also reimplement them as activated
by mouse clicks rather than mouse over events, allowing the src/dst
column to be both clickable and provide an icon for popover.

Fixes #1784

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-10-31 13:42:34 -07:00
Alex Leong 32d556e732
Improve ergonomics of service profile spec (#1828)
We make several changes to the service profile spec to make service profiles more ergonomic and to make them more consistent with the destination profile API.

* Allow multiple fields to be simultaneously set on a RequestMatch or ResponseMatch condition.  Doing so is equivalent to combining the fields with an "all" condition.
* Rename "responses" to "response_classes"
* Change "IsSuccess" to "is_failure"

Signed-off-by: Alex Leong <alex@buoyant.io>
2018-10-31 12:00:22 -07:00
Oliver Gould 557dca5a56
Upgrade to linkerd/linkerd2-proxy#f97239ba (#1829)
This change updates the proxy version to fix grpc failure
classification, per #1819.
2018-10-30 15:19:01 -07:00
Andrew Seigner 3cd13f2913
Fix sidebar not rendering to end of page (#1827)
Also make main content independently scrollable.

Part of #1781

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-10-30 11:18:42 -07:00
Risha Mars 1b2b02985b
Small tweaks to the metrics tables to make them denser (#1826)
Try to squish the metrics columns so that the data fits on the page without
having to scroll the table. This was mostly evident in the Tap and Top tables,
where a lot of the table content would be initially out of view.

This branch also includes an unrelated tiny fix for max error length, which had
been changed from 500 to 50 for testing, and had not been changed back
2018-10-29 18:27:02 -07:00
Alex Leong d8b5ebaa6d
Remove the proxy-api container (#1813)
A container called `proxy-api` runs in the Linkerd2 controller pod.  This container listens on port 8086 and serves the proxy-api but does nothing other than forward gRPC requests to the destination container which listens on port 8089.

We remove the proxy-api container altogether and change the destination container to listen on port 8086 instead of 8089.  The result is that clients still use the proxy-api by connecting to `proxy-api.<ns>.svc.cluster.local:8086` but the controller has one fewer containers.  This results in a simpler system that is easier to reason about.

Signed-off-by: Alex Leong <alex@buoyant.io>
2018-10-29 16:31:43 -07:00
Alex Leong 82ca821e62
Use fqdn for service profile name (#1808)
Service profiles must be named in the form `"<service>.<namespace>"`.  This is inconsistent with the fully normalized domain name that the proxy sends to the controller.  It also does not permit creating service profiles for non-Kubernetes services.

We switch to requiring that service profiles must be named with the FQDN of their service.  For Kubernetes services, this is `"<service>.<namespace>.svc.cluster.local"`.

This change alone is not sufficient for allowing service profile for non-Kubernetes services because the k8s resolver will ignore any DNS names which are not Kubernetes services.  Further refactoring of the resolver will be required to allow looking up non-Kubernetes service profiles in Kuberenetes.

Signed-off-by: Alex Leong <alex@buoyant.io>
2018-10-29 14:35:42 -07:00
Alex Leong 622185a4dd
Send metric labels in profile API (#1800)
* Send metric labels in profile API

Signed-off-by: Alex Leong <alex@buoyant.io>
2018-10-29 14:28:09 -07:00