linkerd2

Commit Graph

Author	SHA1	Message	Date
Andrew Seigner	77fb6d3709	Add namespace as a resource type in public-api (#760 ) * Add namespace as a resource type in public-api The cli and public-api only supported deployments as a resource type. This change adds support for namespace as a resource type in the cli and public-api. This also change includes: - cli statsummary now prints `-`'s when objects are not in the mesh - cli statsummary prints `No resources found.` when applicable - removed `out-` from cli statsummary flags, and analagous proto changes - switched public-api to use native prometheus label types - misc error handling and logging fixes Part of #627 Signed-off-by: Andrew Seigner <siggy@buoyant.io> * Refactor filter and groupby label formulation Signed-off-by: Kevin Lingerfelt <kl@buoyant.io> * Rename stat_summary.go to stat.go in cli Signed-off-by: Kevin Lingerfelt <kl@buoyant.io> * Update rbac privileges for namespace stats Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-04-13 16:53:01 -07:00
Andrew Seigner	21886760c6	Use apps/v1beta2 for Kubernetes 1.8 compatibility (#762 ) Conduit was relying on apps/v1 to Deployment and ReplicaSet APIs. apps/v1 is not available on Kubernetes 1.8. This prevented the public-api from starting. Switch Conduit to use apps/v1beta2. Also increase the Kubernetes API cache sync timeout from 10 to 60 seconds, as it was taking 11 seconds on a test cluster. Fixes #761 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-13 12:08:16 -07:00
Kevin Lingerfelt	fb15fe7c1a	Remove the telemetry service (#757 ) * Remove the telemetry service The telemetry service is no longer needed, now that prometheus scrapes metrics directly from proxies, and the public-api talks directly to prometheus. In this branch I'm removing the service itself as well as all of the telemetry protobuf, and updating the conduit install command to no longer install the service. I'm also removing the old version of the stat command, which required the telemetry service, and renaming the statsummary command to stat. * Fix time window tests * Remove deprecated controller scrape config Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-04-13 11:21:29 -07:00
Kevin Lingerfelt	37434d048a	Update web component to use new stat api (#753 ) * Update web component to use new stat api * Address review feedback * Add external link icon Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-04-12 17:35:03 -07:00
Andrew Seigner	259fdcd134	Add latency stats in new stat summary endpoint (#737 ) The new StatSummary endpoint was only providing request volume and successs rate information. Add support for retrieving latency stats via StatSummary. Also make all prometheus calls in parallel, and implement kubernetes test fixtures. Fixes #681 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-11 11:58:32 -07:00
Kevin Lingerfelt	91c359e612	Switch public API to use cached k8s resources (#724 ) * Switch public API to use cached k8s resources * Move shared informer code to separate goroutine * Fix spelling issue Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-04-10 11:39:31 -07:00
Andrew Seigner	b6bcdcc059	Namespace-aware Grafana dashboards (#716 ) The Grafana dashboards key off of deployment, but had no awareness of namespaces, causing incorrect metrics aggregation and display. This change makes the Grafana dashboards key off of namespaces, and also modifies the Grafana links in the Conduit dashboard to link to namespace+deployment. Fixes #704 Part of #420 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-06 15:37:53 -07:00
Andrew Seigner	50c323c617	Use canonical k8s names, fix prom labels (#702 ) The new statsummary command accepted friendly k8s names, which worked for k8s queries, but Prometheus requires a specific key. Modify the statsummary query to map friendly k8s names to canonical k8s names when constructing the query. Then during the query, map the canonical k8s name to a specific Prometheus label. Fixes #695 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-06 12:34:54 -07:00
Andrew Seigner	836168884e	Link to Grafana from Conduit Dashboard (#678 ) * Link to Grafana from Conduit Dashboard Previously the only way to access the Grafana dashboards was via direct link, provided by the `conduit dashboard` command. Add Grafana links throughout the Conduit Dashboard, next to all Deployment objects. This change also modifies the behavior of the ConduitLink helper, to enable linking to other deployments proxied by the `conduit dashboard` command. Part of #420 Signed-off-by: Andrew Seigner <siggy@buoyant.io> * review feedback Signed-off-by: Andrew Seigner <siggy@buoyant.io> * review feedback, fix console, remove absolute Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-06 10:56:42 -07:00
Risha Mars	2f5b5ea5f2	Start implementing conduit stat summary endpoint (#671 ) Start implementing new conduit stat summary endpoint. Changes the public-api to call prometheus directly instead of the telemetry service. Wired through to `api/stat` on the web server, as well as `conduit statsummary` on the CLI. Works for deployments only. Current implementation just retrieves requests and mesh/total pod count (so latency stats are always 0). Uses API defined in #663 Example queries the stat endpoint will eventually satisfy in #627 This branch includes commits from @klingerf * run ./bin/dep ensure * run ./bin/update-go-deps-shas	2018-04-05 17:05:06 -07:00
Franziska von der Goltz	eff848a8cf	fix pod status and count in control plane dashboard (#659 ) * fix pod status and count display in control plane dashboard section: - the control plane would show terminated and stale deployments in the UI, this is confusing and might indicate errors - this filters out temrinated and failed component deploys from the UI - it is to note that pending deploys will still be counted and represented with a greyed out status dot - Fixes: #606 Signed-off-by: Franziska von der Goltz <franziska@vdgoltz.eu> Signed-off-by: Franziska von der Goltz <franziska@vdgoltz.eu>	2018-04-03 10:39:35 -07:00
Brian Smith	df9ead9c36	Use Go 1.10.1 to build all Go code. (#650 ) Go 1.10.1 is a security release. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-04-02 14:58:30 -10:00
Franziska von der Goltz	67fac9d240	remove toggle sorting functionality from TableComponent (#630 ) remove toggle sorting functionality from TableComponent: - tables displaying metrics allowed to toggle between being sorted and unsorted when clicking the same button. This was confusing behavior for the user. - this PR removes the toggle functionality and introduces a BaseTable Component that extends antd's component without the capability to toggle - Fixes: #566 Signed-off-by: Franziska von der Goltz <franziska@vdgoltz.eu>	2018-03-28 18:01:34 -07:00
Kevin Lingerfelt	59c75a73a9	Add tests/utils/scripts for running integration tests (#608 ) * Add tests/utils/scripts for running integration tests Add a suite of integration tests in the `test/` directory, as well as utilities for testing in the `testutil/` directory. You can use the `bin/test-run` script to run the full suite of tests, and the `bin/test-cleanup` script to cleanup after the tests. The test/README.md file has more information about running tests. @pcalcado, @franziskagoltz, and @rmars also contributed to this change. * Create TEST.md file at the root of the repo * Update based on review feedback * Relax external service IP timeout for GKE * Update TEST.md with more info about different types of test runs * More updates to TEST.md based on review feedback Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-03-27 15:06:55 -07:00
Dennis Adjei-Baah	ad42f2f8ab	Retry k8s watch endpoints on error (#510 ) Shortly after conduit is installed in k8s environment. The control plane component that establishes a watch endpoint with k8s run in to networking issues during proxy initialization. During failure, each watcher fails to retry its connection to k8s watch endpoint which leads to timeouts and eventually, multiple controller pod restarts. This PR adds retry logic to each "watch" enabled package. fixes #478 Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>	2018-03-07 13:40:43 -08:00
Brian Smith	cf3c8cd7bc	Use Go 1.10.0 to build Go components. (#408 ) * Use Go 1.10.0 to build Go components. Take advantage of the new build cache in Go 1.10. Future work on improving build performance will utilize the build cache further. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-02-21 14:31:29 -10:00
Brian Smith	e6aad57766	Remove temporary files generated by dep in go-deps image. (#407 ) Previously Dockerfile-go-deps was converted from a multi-stage Dockefile to a single-stage Dockerfile in anticipation of enabling efficient use of `--cache-from` in CI. However, that resulted in the image ballooning in size because it contained the Git repo for every package downloaded by `dep ensure`. Bring the image back down to the proper size by removing the temporary files created. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-02-21 13:06:24 -10:00
Kevin Lingerfelt	b9b16195b8	Remove uses of upstream/downstream from web UI (#400 ) Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-02-20 17:05:22 -08:00
Risha Mars	b26a551d89	Increase padding of main section (#395 )	2018-02-20 10:11:32 -08:00
Risha Mars	ae0d47d5c9	Add ability to cancel promises via a wrapper (#374 ) * Add ability to cancel promises via a wrapper * Let the ApiHelpers keep track of outstanding requests, provide ApiHelpers.cancel()	2018-02-19 17:28:40 -08:00
Risha Mars	53354cf68f	Small UI tweaks for 0.3 prep (#377 ) * Display more decimal points for truncated numbers, add hover info * Filter completed pods out of web UI * Decrease the polling interval from 10s to 2s * Add more detailed pod categorization based on status * Tweak filtering of pods, tweak explanations in status table	2018-02-19 14:11:03 -08:00
Risha Mars	8bc7c5acde	UI tweaks: sidebar collapse, latency formatting, table row spacing (#361 ) - reduce row spacing on tables to make them more compact - Rename TabbedMetricsTable to MetricsTable since it's not tabbed any more - Format latencies greater than 1000ms as seconds - Make sidebar collapsible - poll the /pods endpoint from the sidebar in order to refresh the list of deployments in the autocomplete - display the conduit namespace in the service mesh details table - Use floats rather than Col for more responsive layout (fixes #224)	2018-02-19 11:21:54 -08:00
Kevin Lingerfelt	300fd3475b	Remove unused web routes and helper (#356 ) Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-02-14 11:52:39 -08:00
Brian Smith	b18fe459d4	Precompile large Go libraries in go-deps Docker image. (#332 ) On my system (i9-7960x running Docker natively in Linux) this regularly saves over 11 seconds of build time when a file under pkg/ changes and over 1.5 seconds of build time when a file under controller/ changes. Since most contributors are running Docker in a VM on less powerful computers, the savings for most contributors should be significantly greater. I imagine the savings for web/ and cli/ and proxy-init/ are similar, but I did not measure them. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-02-13 11:35:10 -10:00
Brian Smith	ec5a02fd64	Upgrade to Go 1.9.4. (#326 ) Go 1.9.4 is a security release. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-02-12 13:47:40 -10:00
Brian Smith	86ea1c06bf	Improve the caching behavior of Dockerfile-go-deps. (#325 ) Previously Dockerfile-go-deps would run `dep ensure` whenever anything in the source tree changed. Also, because it was a multi-stage Dockerfile it did not work well with Docker's `--cache-from` feature. Change Dockerfile-go-deps to only re-run `dep ensure` when Gopkg.{toml,lock} and/or bin/dep change. Simplify it to a single stage so that it works better with Docker's `--cache-from` feature. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-02-12 13:40:20 -10:00
Brian Smith	c78df4ba13	Use bin/dep in Dockerfile-go-deps. (#324 ) bin/dep verifies the digest of the `dep` downloaded `dep` executable, whereas previously Dockerfile-go-deps wasn't. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-02-12 13:32:08 -10:00
Risha Mars	1f6aa27922	UI updates, graph removals (#319 ) UI cleanups. Remove repetitive labels in the UI, remove unused elements, remove graphs until we improve their utility. - remove “Deployment” from the headers of the Deployment Detail Page - remove Routes in sidebar - kill leftmost 100px of sidebear - remove word controller from service mesh page first table - add twitter and GitHub and slack links - kill the graphs, replace with one large header (request rate, success rate, latency top bar) put upstream/downstream diagram before upstream downstream tables * Clean up DeploymentList page (#321) - remove "Most active deployments" graphs from the Deployments List page - remove the scatterplot sections of the page as I don't think we'll be using them for a while	2018-02-12 12:44:33 -08:00
Eliza Weisman	458e9d2ac5	Remove per-path metrics from telemetry pipeline (#317 ) Follow-up from #315. Now that the UIs don't report per-path metrics, we can remove the path label from Prometheus, the path aggregation and filtering options from the telemetry API, and the path field from the proxy report API. I've modified the tests to no longer expect the removed fields, and manually verified that Conduit still works after making these changes. Closes #265 Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-02-09 14:20:28 -08:00
Eliza Weisman	6c2ac6125f	Remove per-path metrics from UIs (#315 ) I've removed per-path metrics from the web dashboard and from the `conduit stat` command. Manually validated that these metrics are no longer displayed. Closes #263 Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-02-09 12:35:49 -08:00
Andrew Seigner	33e3c3ace9	Optimize Prometheus queries (#298 ) Prometheus queries from the Telemetry service were taking seconds or 10s of seconds. Optimize these queries: - Move all summary queries requiring a single point data off of Prometheus' QueryRange() endpoint, onto Query() - Set `defaultVectorRange` to 30s, and also use it regardless of time window Also add tests for grpc_server and telemetry server Signed-off-by: Andrew Seigner <siggy@buoyant.io> Fixes #260	2018-02-09 10:55:07 -08:00
Eliza Weisman	2015d992cc	Remove pod-level metrics from web and CLI (#304 ) This PR updates the web UI to remove the pod detail page, and to remove the links to that page from pod names in metrics tables. It also removes the `pods` option from `conduit stat`, and the `sourcePod` and `targetPod` fields from the controller API proto's `MetricMetadata` message. I've updated the `conduit stat` tests to reflect these changes, and manually verified the web UI changes. Closes #261 Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-02-08 19:07:10 -08:00
Risha Mars	81d4b7b924	Fix bug where table data wasn't being updated (#290 )	2018-02-08 10:33:33 -08:00
Risha Mars	ff15574a0d	MetricsTable: Consolidate latency, success, request metrics into one tab (#276 ) * Consolidate latency, success, request metrics into one tab on the SortableMetricsTable - removes sparklines from the table - makes tables sortable by default - move pod table in DeploymentDetail to its own row * remove request distribution column, reorder columns	2018-02-07 09:50:01 -08:00
Risha Mars	185f48b086	DeploymentsList: Replace "least healthy" with "most active" deployments (#277 ) * Replace Least Healthy Deployments section with Most Active Deployments (MAD) * Fix old arguments to ConduitLink	2018-02-06 11:35:57 -08:00
Risha Mars	c2da891be7	Minor UI title renames and other tweaks (#256 ) * ServiceMesh: plot public-api instead of destination, retitle destination and telemetry graphs * ResourceHealthOverview: Hide Inbound/Outbound request rate if there are 0 deployments * ResourceMetricOverview: retitle DeploymentDetail/PodDetail sections	2018-02-05 11:27:31 -08:00
Risha Mars	9887f10749	Add ability to change the time window for metrics fetching throughout the app (#237 ) * Control metricsWindow from root of app - Add buttons [currently hidden] on metrics pages to control window of metrics requests - Consolidate metricsWindow usage (stop passing it around) - Add a ConduitLink component so we can stop passing around pathPrefix - Add tests for ApiHelpers * Hide the time window buttons; fix bug in absolute links * Add a note explaining why metricWindow buttons are disabled * Convert ConduitLink in to a component that wraps another	2018-02-05 10:56:17 -08:00
Andrew Seigner	9a40d984ff	Replace shelling out with kubernetes proxy (#249 ) The conduit dashboard command asychronously shells out and runs "kubectl proxy". This change replaces the shelling out with calls to kubernetes proxy APIs. It also allows us to enable race detection in our go tests, as the shell out code tests did not pass race detection. Fixes #173 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-02-02 10:31:59 -08:00
Alex Leong	fa2f5a0140	Add dep wrapper script to ensure consistent version of dep is used (#253 ) * Add `bin/dep` which fetches a fixed version of `dep` to be used. * Upgrade from dep 0.3.1 to 0.4.1 * Fix inconsistent Gopkg.lock by checking in the result of `bin/dep ensure` Signed-off-by: Alex Leong <alex@buoyant.io>	2018-02-01 16:09:05 -08:00
Andrew Seigner	277c06cf1e	Simplify and refactor k8s labels and annnotations (#227 ) The conduit.io/* k8s labels and annotations we're redundant in some cases, and not flexible enough in others. This change modifies the labels in the following ways: `conduit.io/plane: control` => `conduit.io/controller-component: web` `conduit.io/controller: conduit` => `conduit.io/controller-ns: conduit` `conduit.io/plane: data` => (remove, redundant with `conduit.io/controller-ns`) It also centralizes all k8s labels and annotations into pkg/k8s/labels.go, and adds tests for the install command. Part of #201 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-02-01 14:12:06 -08:00
Risha Mars	a9d4a3d74e	Add more prometheus instrumentation (latency, response size) (#174 ) We added basic prometheus instrumentation, but this only encapsulated basic go metrics and request counts. This adds latency and response size metrics exporting as well, to the public-api server, theweb server and the telemetry server. Since the util function in grpc.go was basically used to wrap the server creation in a prometheus handler, I added the other prometheus constants in there and renamed the file to prometheus.go. - Add request duration and response size instrumentation to web and public api - Also add latency monitoring to telemetry service requests - Rename util/grpc.go to util/prometheus.go	2018-02-01 09:50:31 -08:00
Risha Mars	f3925a07fb	Various small UI tweaks (#234 ) * Various small UI naming tweaks - align top two tables in the service mesh page - "All Deployments" -> "Deployments" - reorder latency p50, p95, p99 - "Current success" -> "Success rate" * Add margin to incomplete mesh message, reorder latency in TabbedMetricsTable * Right align numbers in service mesh page	2018-01-31 18:09:15 -08:00
Kevin Lingerfelt	7399df83f1	Set conduit version to match conduit docker tags (#208 ) * Set conduit version to match conduit docker tags Signed-off-by: Kevin Lingerfelt <kl@buoyant.io> * Remove --skip-inbound-ports for emojivoto Signed-off-by: Kevin Lingerfelt <kl@buoyant.io> * Rename git_sha => git_sha_head Signed-off-by: Kevin Lingerfelt <kl@buoyant.io> * Switch to using the go linker for setting the version Signed-off-by: Kevin Lingerfelt <kl@buoyant.io> * Log conduit version when go servers start Signed-off-by: Kevin Lingerfelt <kl@buoyant.io> * Cleanup conduit script Signed-off-by: Kevin Lingerfelt <kl@buoyant.io> * Add --short flag to head sha command Signed-off-by: Kevin Lingerfelt <kl@buoyant.io> * Set CONDUIT_VERSION in docker-compose env Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-01-26 11:43:45 -08:00
Risha Mars	8f63c2b7a5	Fix link to deployments from autocomplete not including pathPrefix (#209 ) Signed-off-by: Risha Mars <mars@buoyant.io>	2018-01-25 16:47:39 -08:00
Risha Mars	d0119162e8	Switch to ant sider/content Layout modules (#188 ) * Switch to ant sider/content Layout modules, to help style sidebar This fixes the problem of the sidebar not extending all the way on long pages. * Fix a bug where the autocomplete options weren't being reset when an item was selected	2018-01-24 11:38:54 -08:00
Risha Mars	b9f5ad093f	Rename js components to clarify component relationships (#179 ) * Rename components to clarify component relationships * Rename Deployment to DeploymentDetail to match PodDetail * Rename Deployments to DeploymentsList to clarify which page this is * Rename StatPane to ResourceMetricsOverview to be a less generic name * Rename HealthPane -> ResourceHealthOverview * Rename StatPaneStat -> ResourceOverviewMetric Signed-off-by: Risha Mars <mars@buoyant.io>	2018-01-23 10:05:53 -08:00
Risha Mars	67255bc03a	Remove font colouring on the call to action (#184 ) Signed-off-by: Risha Mars <mars@buoyant.io>	2018-01-19 13:43:32 -08:00
Risha Mars	8a1dc1a2b5	Improve appearance of autocomplete search bar (#180 ) Signed-off-by: Risha Mars <mars@buoyant.io>	2018-01-19 10:40:52 -08:00
Risha Mars	eea711a7f2	Hide scatterplot (#175 ) Signed-off-by: Risha Mars <mars@buoyant.io>	2018-01-18 16:20:12 -08:00
Risha Mars	43e6229363	Consolidate latency colour naming, css tweaks from #147 (#164 ) Signed-off-by: Risha Mars <mars@buoyant.io>	2018-01-17 14:22:48 -08:00

1 2

92 Commits