linkerd2

Commit Graph

Author	SHA1	Message	Date
Kevin Lingerfelt	383babfae2	Prepare for the v0.3.0 release (#406 ) Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-02-21 11:14:11 -08:00
Carl Lerche	287128885e	Proxy: Limit the max number of in-flight requests. (#398 ) Currently, the max number of in-flight requests in the proxy is unbounded. This is due to the `Buffer` middleware being unbounded. This is resolved by adding an instance of `InFlightLimit` around `Buffer`, capping the max number of in-flight requests for a given endpoint. Currently, the limit is hardcoded to 10,000. However, this will eventually become a configuration value. Fixes #287 Signed-off-by: Carl Lerche <me@carllerche.com>	2018-02-20 19:56:21 -08:00
Kevin Lingerfelt	c579a8fe8d	Improve get/stat/tap help text by way of examples (#401 ) Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-02-20 19:27:42 -08:00
Sean McArthur	236f71fbe0	proxy: use original dst if authority doesnt look like local service (#397 ) The proxy will check that the requested authority looks like a local service, and if it doesn't, it will no longer ask the Destination service about the request, instead just using the SO_ORIGINAL_DST, enabling egress naturally. The rules used to determine if it looks like a local service come from this comment: > If default_zone.is_none() and the name is in the form $a.$b.svc, or if !default_zone.is_none() and the name is in the form $a.$b.svc.$default_zone, for some a and some b, then use the Destination service. Otherwise, use the IP given.	2018-02-20 18:09:21 -08:00
Kevin Lingerfelt	b9b16195b8	Remove uses of upstream/downstream from web UI (#400 ) Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-02-20 17:05:22 -08:00
Kevin Lingerfelt	8db7115420	Update go-run to set version equal to root-tag (#393 ) * Update go-run to set version equal to root-tag * Fix inject tests for undefined version change * Pass inject version explitictly as arg Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-02-20 12:25:55 -08:00
Kevin Lingerfelt	f48555d3cc	Remove kubectl dependency, validate k8s server version via api (#396 ) * Remove kubectl dependency, validate k8s server version via api Signed-off-by: Kevin Lingerfelt <kl@buoyant.io> * Remove unused MockKubectl Signed-off-by: Kevin Lingerfelt <kl@buoyant.io> * Remame kubectl.go to version.go Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-02-20 12:14:11 -08:00
Dennis Adjei-Baah	9af3783555	Print error message only when invalid YAML file is used with inject command (#389 ) When the `inject` command is used on a YAML file that is invalid, it prints out an invalid YAML file with the injected proxy. This may give a false indication to the user that the inject was successful even though the inject command prints out an error message further down the terminal window. This PR fixes #303 and contains a test input and output file that indicates what should be shown. This PR also fixes #390. Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>	2018-02-20 11:59:41 -08:00
Kevin Lingerfelt	d1ae4c5bc7	Run conduit dashboard on ephemeral port by default (#394 ) * Run conduit dashboard on ephemeral port by default * Fix wording on dashboard --port flag * log.Debug error instead of discarding it Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-02-20 10:33:47 -08:00
Risha Mars	b26a551d89	Increase padding of main section (#395 )	2018-02-20 10:11:32 -08:00
Risha Mars	ae0d47d5c9	Add ability to cancel promises via a wrapper (#374 ) * Add ability to cancel promises via a wrapper * Let the ApiHelpers keep track of outstanding requests, provide ApiHelpers.cancel()	2018-02-19 17:28:40 -08:00
William Morgan	6807b491d3	update readme: experimental -> alpha, and minor tweaks (#391 ) update README, add mailing list links, etc.	2018-02-19 15:41:04 -08:00
Risha Mars	53354cf68f	Small UI tweaks for 0.3 prep (#377 ) * Display more decimal points for truncated numbers, add hover info * Filter completed pods out of web UI * Decrease the polling interval from 10s to 2s * Add more detailed pod categorization based on status * Tweak filtering of pods, tweak explanations in status table	2018-02-19 14:11:03 -08:00
Brian Smith	1489a84316	Refactor `conduit inject` code to eliminate duplicate logic (#383 ) * Refactor `conduit inject` code to eliminate duplicate logic Previously there was a lot of code repeated once for each type of object that has a pod spec. Refactor the code to reduce the amount of duplication there, to make future changes easier. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-02-19 11:18:44 -10:00
Brian Smith	80aba6c075	CLI: Remove now-unnecessary "enhanced" Kubernetes object types (#382 ) * CLI: Remove now-unnecessary "enhanced" Kubernetes object types The "enhanced" types aren't necessary because now the Kuberentes API implementation has the correct JSON annotations for the InitContainers field. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-02-19 09:37:25 -10:00
Risha Mars	8bc7c5acde	UI tweaks: sidebar collapse, latency formatting, table row spacing (#361 ) - reduce row spacing on tables to make them more compact - Rename TabbedMetricsTable to MetricsTable since it's not tabbed any more - Format latencies greater than 1000ms as seconds - Make sidebar collapsible - poll the /pods endpoint from the sidebar in order to refresh the list of deployments in the autocomplete - display the conduit namespace in the service mesh details table - Use floats rather than Col for more responsive layout (fixes #224)	2018-02-19 11:21:54 -08:00
Dennis Adjei-Baah	01e694ad71	add check and friendly error if conduit dashboard is not installed (#289 ) * add check and friendly error if conduit dashboard is not installed Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>	2018-02-19 09:28:56 -08:00
Brian Smith	d8f9c33183	Skip pods with hostNetwork=true in `conduit inject` (#380 ) The init container injected by conduit inject rewrites the iptables configuration for its network namespace. This causes havoc when the network namespace isn't restricted to the pod, i.e. when hostNetwork=true. Skip pods with hostNetwork=true to avoid this problem. Fixes #366. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-02-18 13:55:42 -10:00
Brian Smith	51873542e5	Refactor `conduit inject` code to make it unit-testable. (#379 ) Refactor `conduit inject` code to make it unit-testable. Refactor the conduit inject code to make it easier to add unit tests. This work was done by @deebo91 in #365. This is the same PR without the conduit install changes, so that it can land ahead of #365. In particular, this will be used for testing the fix for high-priority bug #366. Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io> Signed-off-by: Brian Smith <brian@briansmith.org>	2018-02-18 12:33:52 -10:00
Oliver Gould	c454ac413c	Upgrade to Rust 1.24.0 (#363 ) Upgrade to Rust 1.24.0	2018-02-16 14:37:29 -08:00
Brian Smith	64f270b631	Strip `conduit` CLI executables in `docker build`. (#367 ) File sizes (in bytes) before and after this change: conduit-darwin conduit-linux conduit-windows Before: 27,056,288 27,282,364 27,359,744 After: 20,023,456 18,080,576 18,262,528 ---------------------------------------------------- Diff 7,032,832 9,201,788 9,097,216 Fixes #352. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-02-16 08:20:18 -10:00
Alex Leong	552204366c	Use Prometheus to track added data plane pods. (#338 ) The instance cache that powers the ListPods API is stored in memory in the telemetry service. This means that when there are multiple replicas of the telemetry service, each replica will have a distinct, incomplete view of the added pods based on which pods report to that telemetry replica. This causes the data plane bubbles on the dashboard to not all be filled in, and to flicker with each data refresh. We create a Prometheus counter called reports_total which has pod as a label. Whenever a telemetry service instance receives a report from a pod, it increments reports_total for that pod. This allows us to remove the in-memory instance cache and instead query Prometheus to see if each pod has had a report in the last 30 seconds. Fixes #337 Signed-off-by: Alex Leong <alex@buoyant.io>	2018-02-14 16:09:55 -08:00
Kevin Lingerfelt	300fd3475b	Remove unused web routes and helper (#356 ) Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-02-14 11:52:39 -08:00
Andrew Seigner	1db7d2a2fb	Ensure latency quantile queries match timestamps (#348 ) In PR #298 we moved time window parsing (10s => (time.now - 10s, time.now) down the stack to immediately before the query. This had the unintended effect of creating parallel latency quantile requests with slightly different timestamps. This change parses the time window prior to latency quantile fan out, ensuring all requests have the same timestamp. Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-02-13 16:26:54 -08:00
Brian Smith	aa123b8ad5	Test the proxy in release mode in Docker in CI on the master branch. (#327 ) * Test the proxy in release mode in Docker in CI on the master branch. Previously we were not running the proxy tests in the release configuration. Run the proxy tests in the release configuration through Docker. Docker builds with tests in release mode are too slow to run on every pull request so release mode tests will only be run on the master branch. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-02-13 12:16:50 -10:00
Andrew Seigner	50f4aa57e5	Require timestamp on all telemetry requests (#342 ) PR #298 moved summary (non-timeseries) requests to Prometheus' Query endpoint, with no timestamp provided. This Query endpoint returns a single data point with whatever timestamp was provided in the request. In the absense of a timestamp, it uses current server time. This causes the Public API to return discreet data points with slightly different timestamps, which is unexpected behavior. Modify the Public API -> Telemetry -> Prometheus request path to always require a timestamp for single data point requests. Fixes #340 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-02-13 13:52:21 -08:00
Oliver Gould	4154db2d4f	Improve wording around Getting Started (#288 ) Some of the phrasing around the getting started section of the README was awkward.	2018-02-13 13:38:37 -08:00
Brian Smith	b18fe459d4	Precompile large Go libraries in go-deps Docker image. (#332 ) On my system (i9-7960x running Docker natively in Linux) this regularly saves over 11 seconds of build time when a file under pkg/ changes and over 1.5 seconds of build time when a file under controller/ changes. Since most contributors are running Docker in a VM on less powerful computers, the savings for most contributors should be significantly greater. I imagine the savings for web/ and cli/ and proxy-init/ are similar, but I did not measure them. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-02-13 11:35:10 -10:00
Andrew Seigner	797bba6bc6	Upgrade to Prometheus 2.1.0 (#344 ) Conduit has been on Prometheus 1.8.1. Prometheus 2.x promises better performance. Upgrade Conduit to Prometheus 2.1.0 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-02-13 13:22:53 -08:00
Brian Smith	37008f9626	Improve caching behavior of controller/Dockerfile. (#331 ) Precompiling pkg/ in an earlier layer saves ~10 seconds of wall clock time on an incremental build on my machine (i9-7960x) when I update a file in controller/ such as controller/destination/server.go. This makes a significant difference in the edit-build-test loop. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-02-13 11:21:22 -10:00
Brian Smith	ec5a02fd64	Upgrade to Go 1.9.4. (#326 ) Go 1.9.4 is a security release. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-02-12 13:47:40 -10:00
Brian Smith	86ea1c06bf	Improve the caching behavior of Dockerfile-go-deps. (#325 ) Previously Dockerfile-go-deps would run `dep ensure` whenever anything in the source tree changed. Also, because it was a multi-stage Dockerfile it did not work well with Docker's `--cache-from` feature. Change Dockerfile-go-deps to only re-run `dep ensure` when Gopkg.{toml,lock} and/or bin/dep change. Simplify it to a single stage so that it works better with Docker's `--cache-from` feature. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-02-12 13:40:20 -10:00
Brian Smith	c78df4ba13	Use bin/dep in Dockerfile-go-deps. (#324 ) bin/dep verifies the digest of the `dep` downloaded `dep` executable, whereas previously Dockerfile-go-deps wasn't. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-02-12 13:32:08 -10:00
Risha Mars	1f6aa27922	UI updates, graph removals (#319 ) UI cleanups. Remove repetitive labels in the UI, remove unused elements, remove graphs until we improve their utility. - remove “Deployment” from the headers of the Deployment Detail Page - remove Routes in sidebar - kill leftmost 100px of sidebear - remove word controller from service mesh page first table - add twitter and GitHub and slack links - kill the graphs, replace with one large header (request rate, success rate, latency top bar) put upstream/downstream diagram before upstream downstream tables * Clean up DeploymentList page (#321) - remove "Most active deployments" graphs from the Deployments List page - remove the scatterplot sections of the page as I don't think we'll be using them for a while	2018-02-12 12:44:33 -08:00
Andrew Seigner	261586b862	Fix pointer copying (#330 ) The Public APIs stat endpoint copies a slice of values to a slice of pointers prior to gRPC response. Go's range clause re-uses the same pointer for each iteration of the loop, causing a slice of {1,2,3} becoming {3,3,3}. Fix the range loop to directly reference pointers in the slice of values, ignoring the range variable. Also add tests to catch this case. Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-02-10 11:04:28 -08:00
Eliza Weisman	8bc497a057	Remove unused metrics (#322 ) Removed the `method` label from Prometheus, and removed HTTP methods from reports. Removed `StreamSummary` from reports and replaced it with a `u32` count of streams. Closes #266 Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-02-09 17:14:17 -08:00
Andrew Seigner	bffa5ff3e6	Concurrent Telemetry requests (#323 ) All requests from the public API service to the Telemetry service were done serially. In some cases a single request to the public API's Stat endpoint resulted in 5 serial requests to the Telemetry service. Make all requests from the Public API to Telemetry concurrent. Signed-off-by: Andrew Seigner <siggy@buoyant.io> Part of #299	2018-02-09 17:11:20 -08:00
Eliza Weisman	458e9d2ac5	Remove per-path metrics from telemetry pipeline (#317 ) Follow-up from #315. Now that the UIs don't report per-path metrics, we can remove the path label from Prometheus, the path aggregation and filtering options from the telemetry API, and the path field from the proxy report API. I've modified the tests to no longer expect the removed fields, and manually verified that Conduit still works after making these changes. Closes #265 Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-02-09 14:20:28 -08:00
Eliza Weisman	6c2ac6125f	Remove per-path metrics from UIs (#315 ) I've removed per-path metrics from the web dashboard and from the `conduit stat` command. Manually validated that these metrics are no longer displayed. Closes #263 Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-02-09 12:35:49 -08:00
Andrew Seigner	33e3c3ace9	Optimize Prometheus queries (#298 ) Prometheus queries from the Telemetry service were taking seconds or 10s of seconds. Optimize these queries: - Move all summary queries requiring a single point data off of Prometheus' QueryRange() endpoint, onto Query() - Set `defaultVectorRange` to 30s, and also use it regardless of time window Also add tests for grpc_server and telemetry server Signed-off-by: Andrew Seigner <siggy@buoyant.io> Fixes #260	2018-02-09 10:55:07 -08:00
Eliza Weisman	2015d992cc	Remove pod-level metrics from web and CLI (#304 ) This PR updates the web UI to remove the pod detail page, and to remove the links to that page from pod names in metrics tables. It also removes the `pods` option from `conduit stat`, and the `sourcePod` and `targetPod` fields from the controller API proto's `MetricMetadata` message. I've updated the `conduit stat` tests to reflect these changes, and manually verified the web UI changes. Closes #261 Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-02-08 19:07:10 -08:00
Risha Mars	81d4b7b924	Fix bug where table data wasn't being updated (#290 )	2018-02-08 10:33:33 -08:00
Brian Smith	4fadfa2243	Don't manually install Docker in Travis CI. (#297 ) Travis CI now installs Docker 17.09 or later, which is good enough for us, so avoid installing Docker manually. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-02-08 08:31:00 -10:00
Jeff Haynie	f721a0f800	Fixed mispelling in conduit inject args (#300 )	2018-02-08 12:48:40 -05:00
Eliza Weisman	915f08ac4c	Store proxy latencies in a structure that matches controller histogram (#11 ) The proxy currently stores latency values in an `OrderMap` and reports every observed latency value to the controller's telemetry API since the last report. The telemetry API then sends each individual value to Prometheus. This doesn't scale well when there are a large number of proxies making reports. I've modified the proxy to use a fixed-size histogram that matches the histogram buckets in Prometheus. Each report now includes an array indicating the histogram bounds, and each response scope contains a set of counts corresponding to each index in the bounds array, indicating the number of times a latency in that bucket was observed. The controller then reports the upper bound of each bucket to Prometheus, and can use the proxy's reported set of bucket bounds so that the observed values will be correct even if the bounds in the control plane are changed independently of those set in the proxy. I've also modified `simulate-proxy` to generate the new report structure, and added tests in the proxy's telemetry test suite validating the new behaviour.	2018-02-07 18:02:59 -08:00
Kevin Lingerfelt	fbb4e812f8	Change default version string from "unknown" to "latest" (#284 ) Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-02-07 10:01:12 -08:00
Risha Mars	ff15574a0d	MetricsTable: Consolidate latency, success, request metrics into one tab (#276 ) * Consolidate latency, success, request metrics into one tab on the SortableMetricsTable - removes sparklines from the table - makes tables sortable by default - move pod table in DeploymentDetail to its own row * remove request distribution column, reorder columns	2018-02-07 09:50:01 -08:00
Oliver Gould	a2d537f5c4	Use a load-aware balancer (#251 ) Currently, the conduit proxy uses a simplistic Round-Robin load balancing algorithm. This strategy degrades severely when individual endpoints exhibit abnormally high latency. This change improves this situation somewhat by making the load balancer aware of the number of outstanding requests to each endpoint. When nodes exhibit high latency, they should tend to have more pending requests than faster nodes; and the Power-of-Two-Choices node selector can be used to distribute requests to lesser-loaded instances. From the finagle guide: The algorithm randomly picks two nodes from the set of ready endpoints and selects the least loaded of the two. By repeatedly using this strategy, we can expect a manageable upper bound on the maximum load of any server. The maximum load variance between any two servers is bound by ln(ln(n))` where `n` is the number of servers in the cluster. Signed-off-by: Oliver Gould <ver@buoyant.io>	2018-02-07 09:39:31 -08:00
Kevin Lingerfelt	447ee142c0	Stop running "cargo check" in CI (#285 ) * Stop running "cargo check" in CI Signed-off-by: Kevin Lingerfelt <kl@buoyant.io> * Attempt to clear cargo cache Signed-off-by: Kevin Lingerfelt <kl@buoyant.io> * Remove cache clearing step Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-02-06 15:57:22 -08:00
Oliver Gould	95b91c5976	Set PROXY_SKIP_TESTS for CI Docker build (#283 ) The SKIP_TESTS flag is not used. The PROXY_SKIP_TESTS flag should be set so that unoptimized proxy tests are not built.	2018-02-06 13:37:38 -08:00

1 2 3 4 5 ...

255 Commits All Branches Search

255 Commits

All Branches