Commit Graph

103 Commits

Author SHA1 Message Date
Risha Mars f85dab8937
Upgrade antd to 3.4.3 (#855) 2018-04-26 16:35:42 -07:00
Risha Mars 4661aaf30d
Add a namespace column to the metrics tables (#854)
* Add a namespace column to the metrics tables, support long resource names

* Add a test for GrafanaLink

* Change the PodList.jsx component to not use the ListPods api
2018-04-26 16:34:59 -07:00
Risha Mars cd8c53ca2d
Remove the deployment search bar from the sidebar (#853)
We removed individual Deployment pages a while ago, but left the autocomplete search bar in. Clicking on searches goes to a 404 because we don't have /deployment any more.

This will be revisited in the future with direct links to grafana dashboards to all the 
resources we support.
2018-04-25 16:28:54 -07:00
Risha Mars fbacdd8a05
Add a Replication Controllers page in the Web UI (#850)
* Add a Replication Controllers page in the Web UI


@siggy pointed out that we don't need to use the PodsList api any more, since the new stats endpoint (#671) includes meshedPodCount and totalPodCount, which is all we need to determine whether the deployment/rc has been added to the mesh (which is what we were using ListPods to determine).

This PR modifies deployments to not use the pods api any more, and adds a Replication Controllers page. This page is quite similar to the Deployments page in logic, so I've made a PodOwnersList component to share the code.

I haven't added Replication Controllers to the Service Mesh page yet, because that page does require a list of component pods. Also, we don't need the calls to Prometheus for the Service Mesh page, so I don't want to use the existing stat apis for it. I figure that is a large enough change for a separate PR.
2018-04-25 15:01:06 -07:00
Brian Smith c5d2dab8bd
Remove special support for ExternalName services (#764)
After this was implemented we found that ExternalName services are
represented in DNS as CNAMEs, which means that the proxy's DNS
fallback logic can be used instead of doing DNS in the control
plane. Besides simplifying the controller, this will also increase
fidelity with the proxied pods' DNS configuration (improve
transparency).

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-04-25 11:53:33 -10:00
Andrew Seigner dce31b888f
Deprecate Tap, rename TapByResource to Tap (#844)
The `conduit tap` command is now deprecated.

Replace `conduit tap` with `connduit tapByResource`. Rename tapByResource
to tap. The underlying protobuf for tap remains, the tap gRPC endpoint now
returns Unimplemented.

Fixes #804

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-25 12:24:46 -07:00
Andrew Seigner a0a9a42e23
Implement Public API and Tap on top of Lister (#835)
public-api and and tap were both using their own implementations of
the Kubernetes Informer/Lister APIs.

This change factors out all Informer/Lister usage into the Lister
module. This also introduces a new `Lister.GetObjects` method.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-24 18:10:48 -07:00
Andrew Seigner baf4ea1a5a
Implement TapByResource in Tap Service (#827)
The TapByResource endpoint was previously a stub.

Implement end-to-end tapByResource functionality, with support for
specifying any kubernetes resource(s) as target and destination.

Fixes #803, #49

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-23 16:13:26 -07:00
Andrew Seigner 1e4ac8fda8
Destination service provides pod-template-hash (#784)
The Destination service does not provide ReplicaSet information to the
proxy.

The `pod-template-hash` label approximates selecting over all pods in a
ReplicaSet or ReplicationController. Modify the Destination service to
provide this label to the proxy.

Relates to #508 and #741

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-18 14:41:27 -07:00
Kevin Lingerfelt 71a51afb40
Expose pod stats in CLI, web UI, and Grafana (#788)
* Expose pod stats in CLI, web UI, and Grafana
* Fix js api helpers test
* Add outbound traffic stats to pod dashboard

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-04-18 11:26:47 -07:00
Andrew Seigner 727521f914
Permit arbitrary time windows in public-api (#774)
The public-api previously only permitted 4 hard-coded time windows:
10s, 1m, 10m, 1h. This was primarily a relic of the recently removed
telemetry system.

Modify the public-api to validate the time string, but allow for any
window size, which is then passed through to Prometheus.

Fixes #686

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-16 17:37:17 -07:00
Andrew Seigner 77fb6d3709
Add namespace as a resource type in public-api (#760)
* Add namespace as a resource type in public-api

The cli and public-api only supported deployments as a resource type.

This change adds support for namespace as a resource type in the cli and
public-api. This also change includes:
- cli statsummary now prints `-`'s when objects are not in the mesh
- cli statsummary prints `No resources found.` when applicable
- removed `out-` from cli statsummary flags, and analagous proto changes
- switched public-api to use native prometheus label types
- misc error handling and logging fixes

Part of #627

Signed-off-by: Andrew Seigner <siggy@buoyant.io>

* Refactor filter and groupby label formulation

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>

* Rename stat_summary.go to stat.go in cli

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>

* Update rbac privileges for namespace stats

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-04-13 16:53:01 -07:00
Andrew Seigner 21886760c6
Use apps/v1beta2 for Kubernetes 1.8 compatibility (#762)
Conduit was relying on apps/v1 to Deployment and ReplicaSet APIs.
apps/v1 is not available on Kubernetes 1.8. This prevented the
public-api from starting.

Switch Conduit to use apps/v1beta2. Also increase the Kubernetes API
cache sync timeout from 10 to 60 seconds, as it was taking 11 seconds on
a test cluster.

Fixes #761

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-13 12:08:16 -07:00
Kevin Lingerfelt fb15fe7c1a
Remove the telemetry service (#757)
* Remove the telemetry service

The telemetry service is no longer needed, now that prometheus scrapes
metrics directly from proxies, and the public-api talks directly to
prometheus. In this branch I'm removing the service itself as well as
all of the telemetry protobuf, and updating the conduit install command
to no longer install the service. I'm also removing the old version of
the stat command, which required the telemetry service, and renaming the
statsummary command to stat.

* Fix time window tests

* Remove deprecated controller scrape config

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-04-13 11:21:29 -07:00
Kevin Lingerfelt 37434d048a
Update web component to use new stat api (#753)
* Update web component to use new stat api
* Address review feedback
* Add external link icon

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-04-12 17:35:03 -07:00
Andrew Seigner 259fdcd134
Add latency stats in new stat summary endpoint (#737)
The new StatSummary endpoint was only providing request volume and
successs rate information.

Add support for retrieving latency stats via StatSummary. Also make
all prometheus calls in parallel, and implement kubernetes test
fixtures.

Fixes #681

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-11 11:58:32 -07:00
Kevin Lingerfelt 91c359e612
Switch public API to use cached k8s resources (#724)
* Switch public API to use cached k8s resources
* Move shared informer code to separate goroutine
* Fix spelling issue

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-04-10 11:39:31 -07:00
Andrew Seigner b6bcdcc059
Namespace-aware Grafana dashboards (#716)
The Grafana dashboards key off of deployment, but had no awareness of
namespaces, causing incorrect metrics aggregation and display.

This change makes the Grafana dashboards key off of namespaces, and also
modifies the Grafana links in the Conduit dashboard to link to
namespace+deployment.

Fixes #704
Part of #420

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-06 15:37:53 -07:00
Andrew Seigner 50c323c617
Use canonical k8s names, fix prom labels (#702)
The new statsummary command accepted friendly k8s names, which worked
for k8s queries, but Prometheus requires a specific key.

Modify the statsummary query to map friendly k8s names to canonical k8s
names when constructing the query. Then during the query, map the
canonical k8s name to a specific Prometheus label.

Fixes #695

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-06 12:34:54 -07:00
Andrew Seigner 836168884e
Link to Grafana from Conduit Dashboard (#678)
* Link to Grafana from Conduit Dashboard

Previously the only way to access the Grafana dashboards was via direct
link, provided by the `conduit dashboard` command.

Add Grafana links throughout the Conduit Dashboard, next to all
Deployment objects. This change also modifies the behavior of the
ConduitLink helper, to enable linking to other deployments proxied by
the `conduit dashboard` command.

Part of #420

Signed-off-by: Andrew Seigner <siggy@buoyant.io>

* review feedback

Signed-off-by: Andrew Seigner <siggy@buoyant.io>

* review feedback, fix console, remove absolute

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-06 10:56:42 -07:00
Risha Mars 2f5b5ea5f2
Start implementing conduit stat summary endpoint (#671)
Start implementing new conduit stat summary endpoint. 
Changes the public-api to call prometheus directly instead of the
telemetry service. Wired through to `api/stat` on the web server,
as well as `conduit statsummary` on the CLI. Works for deployments only.

Current implementation just retrieves requests and mesh/total pod count 
(so latency stats are always 0). 

Uses API defined in #663
Example queries the stat endpoint will eventually satisfy in #627

This branch includes commits from @klingerf 

* run ./bin/dep ensure
* run ./bin/update-go-deps-shas
2018-04-05 17:05:06 -07:00
Franziska von der Goltz eff848a8cf
fix pod status and count in control plane dashboard (#659)
* fix pod status and count display in control plane dashboard section:
- the control plane would show terminated and stale deployments in the UI, this is confusing and might indicate errors
- this filters out temrinated and failed component deploys from the UI
- it is to note that pending deploys will still be counted and represented with a greyed out status dot
- Fixes: #606

Signed-off-by: Franziska von der Goltz <franziska@vdgoltz.eu>


Signed-off-by: Franziska von der Goltz <franziska@vdgoltz.eu>
2018-04-03 10:39:35 -07:00
Brian Smith df9ead9c36
Use Go 1.10.1 to build all Go code. (#650)
Go 1.10.1 is a security release.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-04-02 14:58:30 -10:00
Franziska von der Goltz 67fac9d240
remove toggle sorting functionality from TableComponent (#630)
remove toggle sorting functionality from TableComponent:
- tables displaying metrics allowed to toggle between being sorted and unsorted when clicking the same button. This was confusing behavior for the user.
- this PR removes the toggle functionality and introduces a BaseTable Component that extends antd's component without the capability to toggle
- Fixes: #566

Signed-off-by: Franziska von der Goltz <franziska@vdgoltz.eu>
2018-03-28 18:01:34 -07:00
Kevin Lingerfelt 59c75a73a9
Add tests/utils/scripts for running integration tests (#608)
* Add tests/utils/scripts for running integration tests

Add a suite of integration tests in the `test/` directory, as well as
utilities for testing in the `testutil/` directory.

You can use the `bin/test-run` script to run the full suite of tests,
and the `bin/test-cleanup` script to cleanup after the tests.

The test/README.md file has more information about running tests.

@pcalcado, @franziskagoltz, and @rmars also contributed to this change.

* Create TEST.md file at the root of the repo

* Update based on review feedback

* Relax external service IP timeout for GKE

* Update TEST.md with more info about different types of test runs

* More updates to TEST.md based on review feedback

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-03-27 15:06:55 -07:00
Dennis Adjei-Baah ad42f2f8ab
Retry k8s watch endpoints on error (#510)
Shortly after conduit is installed in k8s environment. The control plane component that establishes a watch endpoint with k8s run in to networking issues during proxy initialization. During failure, each watcher fails to retry its connection to k8s watch endpoint which leads to timeouts and eventually, multiple controller pod restarts.

This PR adds retry logic to each "watch" enabled package.

fixes #478

Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
2018-03-07 13:40:43 -08:00
Brian Smith cf3c8cd7bc
Use Go 1.10.0 to build Go components. (#408)
* Use Go 1.10.0 to build Go components.

Take advantage of the new build cache in Go 1.10. Future work on improving
build performance will utilize the build cache further.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-02-21 14:31:29 -10:00
Brian Smith e6aad57766
Remove temporary files generated by dep in go-deps image. (#407)
Previously Dockerfile-go-deps was converted from a multi-stage Dockefile
to a single-stage Dockerfile in anticipation of enabling efficient use
of `--cache-from` in CI. However, that resulted in the image ballooning
in size because it contained the Git repo for every package downloaded
by `dep ensure`.

Bring the image back down to the proper size by removing the temporary
files created.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-02-21 13:06:24 -10:00
Kevin Lingerfelt b9b16195b8
Remove uses of upstream/downstream from web UI (#400)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-02-20 17:05:22 -08:00
Risha Mars b26a551d89
Increase padding of main section (#395) 2018-02-20 10:11:32 -08:00
Risha Mars ae0d47d5c9
Add ability to cancel promises via a wrapper (#374)
* Add ability to cancel promises via a wrapper

* Let the ApiHelpers keep track of outstanding requests, provide ApiHelpers.cancel()
2018-02-19 17:28:40 -08:00
Risha Mars 53354cf68f
Small UI tweaks for 0.3 prep (#377)
* Display more decimal points for truncated numbers, add hover info

* Filter completed pods out of web UI

* Decrease the polling interval from 10s to 2s

* Add more detailed pod categorization based on status

* Tweak filtering of pods, tweak explanations in status table
2018-02-19 14:11:03 -08:00
Risha Mars 8bc7c5acde
UI tweaks: sidebar collapse, latency formatting, table row spacing (#361)
- reduce row spacing on tables to make them more compact
- Rename TabbedMetricsTable to MetricsTable since it's not tabbed any more
- Format latencies greater than 1000ms as seconds
- Make sidebar collapsible 
- poll the /pods endpoint from the sidebar in order to refresh the list of deployments in the autocomplete
- display the conduit namespace in the service mesh details table
- Use floats rather than Col for more responsive layout (fixes #224)
2018-02-19 11:21:54 -08:00
Kevin Lingerfelt 300fd3475b
Remove unused web routes and helper (#356)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-02-14 11:52:39 -08:00
Brian Smith b18fe459d4
Precompile large Go libraries in go-deps Docker image. (#332)
On my system (i9-7960x running Docker natively in Linux) this regularly saves
over 11 seconds of build time when a file under pkg/ changes and over 1.5
seconds of build time when a file under controller/ changes. Since most
contributors are running Docker in a VM on less powerful computers, the
savings for most contributors should be significantly greater.

I imagine the savings for web/ and cli/ and proxy-init/ are similar, but I
did not measure them.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-02-13 11:35:10 -10:00
Brian Smith ec5a02fd64
Upgrade to Go 1.9.4. (#326)
Go 1.9.4 is a security release.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-02-12 13:47:40 -10:00
Brian Smith 86ea1c06bf
Improve the caching behavior of Dockerfile-go-deps. (#325)
Previously Dockerfile-go-deps would run `dep ensure` whenever anything in the
source tree changed. Also, because it was a multi-stage Dockerfile it did not
work well with Docker's `--cache-from` feature.

Change Dockerfile-go-deps to only re-run `dep ensure` when Gopkg.{toml,lock}
and/or bin/dep change. Simplify it to a single stage so that it works better
with Docker's `--cache-from` feature.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-02-12 13:40:20 -10:00
Brian Smith c78df4ba13
Use bin/dep in Dockerfile-go-deps. (#324)
bin/dep verifies the digest of the `dep` downloaded `dep` executable,
whereas previously Dockerfile-go-deps wasn't.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-02-12 13:32:08 -10:00
Risha Mars 1f6aa27922
UI updates, graph removals (#319)
UI cleanups. Remove repetitive labels in the UI, remove unused elements, 
remove graphs until we improve their utility.

- remove “Deployment” from the headers of the Deployment Detail Page
- remove Routes in sidebar
- kill leftmost 100px of sidebear
- remove word controller from service mesh page first table
- add twitter and GitHub and slack links
- kill the graphs, replace with one large header (request rate, success rate, latency top bar)
put upstream/downstream diagram before upstream downstream tables

* Clean up DeploymentList page (#321)

- remove "Most active deployments" graphs from the Deployments List page
- remove the scatterplot sections of the page as I don't think we'll be using them for a while
2018-02-12 12:44:33 -08:00
Eliza Weisman 458e9d2ac5
Remove per-path metrics from telemetry pipeline (#317)
Follow-up from #315.

Now that the UIs don't report per-path metrics, we can remove the path label from Prometheus, the path aggregation and filtering options from the telemetry API, and the path field from the proxy report API.

I've modified the tests to no longer expect the removed fields, and manually verified that Conduit still works after making these changes.

Closes #265 

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-02-09 14:20:28 -08:00
Eliza Weisman 6c2ac6125f
Remove per-path metrics from UIs (#315)
I've removed per-path metrics from the web dashboard and from the `conduit stat` command. 

Manually validated that these metrics are no longer displayed.

Closes  #263

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-02-09 12:35:49 -08:00
Andrew Seigner 33e3c3ace9
Optimize Prometheus queries (#298)
Prometheus queries from the Telemetry service were taking seconds or 10s
of seconds.

Optimize these queries:
- Move all summary queries requiring a single point data off of Prometheus'
  QueryRange() endpoint, onto Query()
- Set `defaultVectorRange` to 30s, and also use it regardless of time
  window
Also add tests for grpc_server and telemetry server

Signed-off-by: Andrew Seigner <siggy@buoyant.io>

Fixes #260
2018-02-09 10:55:07 -08:00
Eliza Weisman 2015d992cc
Remove pod-level metrics from web and CLI (#304)
This PR updates the web UI to remove the pod detail page, and to remove the links to that page from pod names in metrics tables. It also removes the `pods` option from `conduit stat`, and the `sourcePod` and `targetPod` fields from the controller API proto's `MetricMetadata` message.

I've updated the `conduit stat` tests to reflect these changes, and manually verified the web UI changes.

Closes #261 

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-02-08 19:07:10 -08:00
Risha Mars 81d4b7b924
Fix bug where table data wasn't being updated (#290) 2018-02-08 10:33:33 -08:00
Risha Mars ff15574a0d
MetricsTable: Consolidate latency, success, request metrics into one tab (#276)
* Consolidate latency, success, request metrics into one tab
on the SortableMetricsTable

- removes sparklines from the table
- makes tables sortable by default
- move pod table in DeploymentDetail to its own row

* remove request distribution column, reorder columns
2018-02-07 09:50:01 -08:00
Risha Mars 185f48b086
DeploymentsList: Replace "least healthy" with "most active" deployments (#277)
* Replace Least Healthy Deployments section with Most Active Deployments (MAD)

* Fix old arguments to ConduitLink
2018-02-06 11:35:57 -08:00
Risha Mars c2da891be7
Minor UI title renames and other tweaks (#256)
* ServiceMesh: plot public-api instead of destination, retitle destination and telemetry graphs

* ResourceHealthOverview: Hide Inbound/Outbound request rate if there are 0 deployments

* ResourceMetricOverview: retitle DeploymentDetail/PodDetail sections
2018-02-05 11:27:31 -08:00
Risha Mars 9887f10749
Add ability to change the time window for metrics fetching throughout the app (#237)
* Control metricsWindow from root of app

- Add buttons [currently hidden] on metrics pages to control window of metrics requests
- Consolidate metricsWindow usage (stop passing it around)
- Add a ConduitLink component so we can stop passing around pathPrefix
- Add tests for ApiHelpers

* Hide the time window buttons; fix bug in absolute links
* Add a note explaining why metricWindow buttons are disabled
* Convert ConduitLink in to a component that wraps another
2018-02-05 10:56:17 -08:00
Andrew Seigner 9a40d984ff
Replace shelling out with kubernetes proxy (#249)
The conduit dashboard command asychronously shells out and runs "kubectl
proxy".

This change replaces the shelling out with calls to kubernetes proxy
APIs. It also allows us to enable race detection in our go tests, as the
shell out code tests did not pass race detection.

Fixes #173

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-02-02 10:31:59 -08:00
Alex Leong fa2f5a0140
Add dep wrapper script to ensure consistent version of dep is used (#253)
* Add `bin/dep` which fetches a fixed version of `dep` to be used. 
* Upgrade from dep 0.3.1 to 0.4.1
* Fix inconsistent Gopkg.lock by checking in the result of `bin/dep ensure`

Signed-off-by: Alex Leong <alex@buoyant.io>
2018-02-01 16:09:05 -08:00