linkerd2

Commit Graph

Author	SHA1	Message	Date
Andrew Seigner	50f4aa57e5	Require timestamp on all telemetry requests (#342 ) PR #298 moved summary (non-timeseries) requests to Prometheus' Query endpoint, with no timestamp provided. This Query endpoint returns a single data point with whatever timestamp was provided in the request. In the absense of a timestamp, it uses current server time. This causes the Public API to return discreet data points with slightly different timestamps, which is unexpected behavior. Modify the Public API -> Telemetry -> Prometheus request path to always require a timestamp for single data point requests. Fixes #340 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-02-13 13:52:21 -08:00
Brian Smith	b18fe459d4	Precompile large Go libraries in go-deps Docker image. (#332 ) On my system (i9-7960x running Docker natively in Linux) this regularly saves over 11 seconds of build time when a file under pkg/ changes and over 1.5 seconds of build time when a file under controller/ changes. Since most contributors are running Docker in a VM on less powerful computers, the savings for most contributors should be significantly greater. I imagine the savings for web/ and cli/ and proxy-init/ are similar, but I did not measure them. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-02-13 11:35:10 -10:00
Brian Smith	37008f9626	Improve caching behavior of controller/Dockerfile. (#331 ) Precompiling pkg/ in an earlier layer saves ~10 seconds of wall clock time on an incremental build on my machine (i9-7960x) when I update a file in controller/ such as controller/destination/server.go. This makes a significant difference in the edit-build-test loop. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-02-13 11:21:22 -10:00
Brian Smith	ec5a02fd64	Upgrade to Go 1.9.4. (#326 ) Go 1.9.4 is a security release. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-02-12 13:47:40 -10:00
Brian Smith	86ea1c06bf	Improve the caching behavior of Dockerfile-go-deps. (#325 ) Previously Dockerfile-go-deps would run `dep ensure` whenever anything in the source tree changed. Also, because it was a multi-stage Dockerfile it did not work well with Docker's `--cache-from` feature. Change Dockerfile-go-deps to only re-run `dep ensure` when Gopkg.{toml,lock} and/or bin/dep change. Simplify it to a single stage so that it works better with Docker's `--cache-from` feature. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-02-12 13:40:20 -10:00
Brian Smith	c78df4ba13	Use bin/dep in Dockerfile-go-deps. (#324 ) bin/dep verifies the digest of the `dep` downloaded `dep` executable, whereas previously Dockerfile-go-deps wasn't. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-02-12 13:32:08 -10:00
Andrew Seigner	261586b862	Fix pointer copying (#330 ) The Public APIs stat endpoint copies a slice of values to a slice of pointers prior to gRPC response. Go's range clause re-uses the same pointer for each iteration of the loop, causing a slice of {1,2,3} becoming {3,3,3}. Fix the range loop to directly reference pointers in the slice of values, ignoring the range variable. Also add tests to catch this case. Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-02-10 11:04:28 -08:00
Eliza Weisman	8bc497a057	Remove unused metrics (#322 ) Removed the `method` label from Prometheus, and removed HTTP methods from reports. Removed `StreamSummary` from reports and replaced it with a `u32` count of streams. Closes #266 Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-02-09 17:14:17 -08:00
Andrew Seigner	bffa5ff3e6	Concurrent Telemetry requests (#323 ) All requests from the public API service to the Telemetry service were done serially. In some cases a single request to the public API's Stat endpoint resulted in 5 serial requests to the Telemetry service. Make all requests from the Public API to Telemetry concurrent. Signed-off-by: Andrew Seigner <siggy@buoyant.io> Part of #299	2018-02-09 17:11:20 -08:00
Eliza Weisman	458e9d2ac5	Remove per-path metrics from telemetry pipeline (#317 ) Follow-up from #315. Now that the UIs don't report per-path metrics, we can remove the path label from Prometheus, the path aggregation and filtering options from the telemetry API, and the path field from the proxy report API. I've modified the tests to no longer expect the removed fields, and manually verified that Conduit still works after making these changes. Closes #265 Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-02-09 14:20:28 -08:00
Andrew Seigner	33e3c3ace9	Optimize Prometheus queries (#298 ) Prometheus queries from the Telemetry service were taking seconds or 10s of seconds. Optimize these queries: - Move all summary queries requiring a single point data off of Prometheus' QueryRange() endpoint, onto Query() - Set `defaultVectorRange` to 30s, and also use it regardless of time window Also add tests for grpc_server and telemetry server Signed-off-by: Andrew Seigner <siggy@buoyant.io> Fixes #260	2018-02-09 10:55:07 -08:00
Eliza Weisman	2015d992cc	Remove pod-level metrics from web and CLI (#304 ) This PR updates the web UI to remove the pod detail page, and to remove the links to that page from pod names in metrics tables. It also removes the `pods` option from `conduit stat`, and the `sourcePod` and `targetPod` fields from the controller API proto's `MetricMetadata` message. I've updated the `conduit stat` tests to reflect these changes, and manually verified the web UI changes. Closes #261 Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-02-08 19:07:10 -08:00
Eliza Weisman	915f08ac4c	Store proxy latencies in a structure that matches controller histogram (#11 ) The proxy currently stores latency values in an `OrderMap` and reports every observed latency value to the controller's telemetry API since the last report. The telemetry API then sends each individual value to Prometheus. This doesn't scale well when there are a large number of proxies making reports. I've modified the proxy to use a fixed-size histogram that matches the histogram buckets in Prometheus. Each report now includes an array indicating the histogram bounds, and each response scope contains a set of counts corresponding to each index in the bounds array, indicating the number of times a latency in that bucket was observed. The controller then reports the upper bound of each bucket to Prometheus, and can use the proxy's reported set of bucket bounds so that the observed values will be correct even if the bounds in the control plane are changed independently of those set in the proxy. I've also modified `simulate-proxy` to generate the new report structure, and added tests in the proxy's telemetry test suite validating the new behaviour.	2018-02-07 18:02:59 -08:00
Phil Calçado	9c03764a29	Remove hardcoded port and shared state for http test (#282 ) We now create a new test HTTP server per test case instead of sharing it across them all. This should solve the data races we have experienced on Travis. Signed-off-by: Phil Calcado <phil@buoyant.io>	2018-02-06 13:48:14 -05:00
Andrew Seigner	4156af786d	Enable race detection in ci (#259 ) We previously did not have race detection enabled because our tests would fail. Following #249, this is no longer the case. Enable race detection in ci and build instructions. This change also fixes client_test.go attempting to allocate a 2GB buffer due to bad test input. Fixes #173 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-02-02 15:04:52 -08:00
Andrew Seigner	9a40d984ff	Replace shelling out with kubernetes proxy (#249 ) The conduit dashboard command asychronously shells out and runs "kubectl proxy". This change replaces the shelling out with calls to kubernetes proxy APIs. It also allows us to enable race detection in our go tests, as the shell out code tests did not pass race detection. Fixes #173 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-02-02 10:31:59 -08:00
Alex Leong	fa2f5a0140	Add dep wrapper script to ensure consistent version of dep is used (#253 ) * Add `bin/dep` which fetches a fixed version of `dep` to be used. * Upgrade from dep 0.3.1 to 0.4.1 * Fix inconsistent Gopkg.lock by checking in the result of `bin/dep ensure` Signed-off-by: Alex Leong <alex@buoyant.io>	2018-02-01 16:09:05 -08:00
Andrew Seigner	277c06cf1e	Simplify and refactor k8s labels and annnotations (#227 ) The conduit.io/* k8s labels and annotations we're redundant in some cases, and not flexible enough in others. This change modifies the labels in the following ways: `conduit.io/plane: control` => `conduit.io/controller-component: web` `conduit.io/controller: conduit` => `conduit.io/controller-ns: conduit` `conduit.io/plane: data` => (remove, redundant with `conduit.io/controller-ns`) It also centralizes all k8s labels and annotations into pkg/k8s/labels.go, and adds tests for the install command. Part of #201 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-02-01 14:12:06 -08:00
Kevin Lingerfelt	9ff439ef44	Add -log-level flag for install and inject commands (#239 ) * Add -log-level flag for install and inject commands Signed-off-by: Kevin Lingerfelt <kl@buoyant.io> * Turn off all CLI logging by default, rename inject and install flags Signed-off-by: Kevin Lingerfelt <kl@buoyant.io> * Re-enable color logging Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-02-01 12:38:07 -08:00
Risha Mars	a9d4a3d74e	Add more prometheus instrumentation (latency, response size) (#174 ) We added basic prometheus instrumentation, but this only encapsulated basic go metrics and request counts. This adds latency and response size metrics exporting as well, to the public-api server, theweb server and the telemetry server. Since the util function in grpc.go was basically used to wrap the server creation in a prometheus handler, I added the other prometheus constants in there and renamed the file to prometheus.go. - Add request duration and response size instrumentation to web and public api - Also add latency monitoring to telemetry service requests - Rename util/grpc.go to util/prometheus.go	2018-02-01 09:50:31 -08:00
Kevin Lingerfelt	4a76c6448b	Update cli subcommands to print errors when encountered (#221 ) Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-01-29 11:28:19 -08:00
Kevin Lingerfelt	7399df83f1	Set conduit version to match conduit docker tags (#208 ) * Set conduit version to match conduit docker tags Signed-off-by: Kevin Lingerfelt <kl@buoyant.io> * Remove --skip-inbound-ports for emojivoto Signed-off-by: Kevin Lingerfelt <kl@buoyant.io> * Rename git_sha => git_sha_head Signed-off-by: Kevin Lingerfelt <kl@buoyant.io> * Switch to using the go linker for setting the version Signed-off-by: Kevin Lingerfelt <kl@buoyant.io> * Log conduit version when go servers start Signed-off-by: Kevin Lingerfelt <kl@buoyant.io> * Cleanup conduit script Signed-off-by: Kevin Lingerfelt <kl@buoyant.io> * Add --short flag to head sha command Signed-off-by: Kevin Lingerfelt <kl@buoyant.io> * Set CONDUIT_VERSION in docker-compose env Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-01-26 11:43:45 -08:00
Phil Calçado	9410da471a	Better error handling for Tap (#177 ) Previously, running `$conduit tap` would return a `Unexpected EOF` error when the server wasn't available. This was due to a few problems with the way we were handling errors all the way down the tap server. This change fixes that and cleans some of the protobuf-over-HTTP code. - first step towards #49 - closes #106	2018-01-25 11:49:38 -05:00
Andrew Seigner	d0a0bb22bd	Move EosCtx to common for Tap and Telemetery (#204 ) * Make Eos optional in TapEvent grpc_status not being set in protobuf is the same as being set to zero, which is also status OK Modify TapEvent to include an optional EOS struct Signed-off-by: Andrew Seigner <siggy@buoyant.io> Part of #198 * Add Eos to proto & proxy tap end-of-stream events The proxy now outputs `Eos` instead of `grpc_status` in all end-of-stream tap events. The EOS value is set to `grpc_status_code` when the response ended with a `grpc_status` trailer, `http_reset_code` when the response ended with a reset, and no `Eos` when the response ended gracefully without a `grpc_status` trailer. This PR updates the proxy. The proto and controller changes are in PR #204. Part of #198. Closes #202 Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-01-24 15:48:00 -08:00
Eliza Weisman	9e49054963	Classify non-gRPC status codes for HTTP telemetry (#200 ) Currently, all "success"/"failure" classifications in the telemetry API are made based on the `grpc-status` trailer. If the trailer is not present, then a request is assumed to have failed. As we start proxying non-gRPC traffic, the controller needs to also be aware of HTTP status codes, so that non-gRPC requests are not assumed to always fail. I've modified the telemetry API server to classify requests based on their HTTP status codes when the `grpc-status` trailer is not present. I've also modified the `simulate-proxy` script to generate fake HTTP/2 traffic without the `grpc-status` trailer. Closes #196 Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-01-24 10:57:23 -08:00
Sean McArthur	db913e3d18	controller: echo ip address if destination service receives ip (#186 ) Signed-off-by: Sean McArthur <sean@seanmonstar.com>	2018-01-22 16:20:13 -08:00
Andrew Seigner	beaea5540d	Update version to v0.1.3 in controller	2018-01-19 14:11:58 -08:00
Andrew Seigner	e6f17faf28	Updates for v0.1.2 release (#171 ) Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-01-19 10:56:20 -08:00
Brian Smith	650dcdde1e	Stop ignoring the most significant labels of Destination names (#63 ) Stop ignoring the most significant labels of Destination names Previously the destinations service was ignoring all the labels in a destination name after the first two labels. Thus, for example, "name.ns.another.domain.example.com" would be considered the same as "name.ns.svc.cluster.local". This was very wrong. Match destination names taking into consideration every label in the destination name. Provisions have been made for the case where the controller and the proxies with the zone name to use. However, currently neither the controller nor the proxies are actually configured with the zone, so the implementation was made to work in the current configuration too, as long as fully-qualified names are not used. A negative consequence of this change is that a name like "name.ns.svc.cluster.local" won't resolve in the current configuration, because the controller doesn't know the zone is "cluster.local" Unit tests are included for the new mapping rules. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-01-18 11:20:54 -10:00
Dennis Adjei-Baah	f7af375e73	Remove scheme requirement for api-addr flag in conduit CLI (#126 ) * Allow external controller public api clients that don't rely on a kubeconfig to interact with Conduit CLI Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>	2018-01-17 17:12:44 -08:00
Andrew Seigner	6f26cf21cf	Add TCP data to simulate-proxy script (#165 ) simulate-proxy sends HTTP example data. Modify this test script to also send TCP example data. Part of #132 Signed-off-by: Andrew Seigner <andrew@sig.gy>	2018-01-17 15:38:10 -08:00
Kevin Lingerfelt	e56be9bf0e	Bump k8s watch intialization timeout, cleanup logging (#166 ) Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-01-17 15:31:01 -08:00
Oliver Gould	008f53865b	Make proxy-deps multi-stage to remove the original source files (#161 ) Previously, proxy-deps and go-deps included the source tree for local projects. This can cause build conflicts when files are renamed. By adopting a multi-stage build for the proxy-deps image, we can be sure that we only preserve essential dependencies & manifests in the proxy-deps and go-deps images. Furthermore, `bin/update-go-deps-shas` and `bin/update-proxy-deps-shas` have been added to ease maintenance when files are changed. Fixes #159 Signed-off-by: Oliver Gould <ver@buoyant.io>	2018-01-17 12:26:22 -08:00
Kevin Lingerfelt	fd3cfcb5d9	Move healthcheck proto to separate file, use throughout (#150 ) * Move healthcheck proto to separate file, use throughout Signed-off-by: Kevin Lingerfelt <kl@buoyant.io> * Remove Check message from healthcheck.proto Signed-off-by: Kevin Lingerfelt <kl@buoyant.io> * Standardize healthcheck protobuf import name Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-01-17 11:15:38 -08:00
Phil Calçado	612bd0f7a0	Add --verbose option to CLI (#154 ) * Use stdout as writer for tap command fixes #136 Signed-off-by: Phil Calcado <phil@buoyant.io> * Add --log-level to command line Signed-off-by: Phil Calcado <phil@buoyant.io>	2018-01-17 12:06:43 -05:00
Phil Calçado	e328db7e87	Adds conduit-api check for status command (#140 ) * Abstract Conduit API client from protobuf interface to add new features Signed-off-by: Phil Calcado <phil@buoyant.io> * Consolidate mock api clients Signed-off-by: Phil Calcado <phil@buoyant.io> * Add simple implementation of healthcheck for conduit api Signed-off-by: Phil Calcado <phil@buoyant.io> * Change NextSteps to FriendlyMessageToUser Signed-off-by: Phil Calcado <phil@buoyant.io> * Add grpc check for status on the client Signed-off-by: Phil Calcado <phil@buoyant.io> * Add simple server-side check for Conduit API Signed-off-by: Phil Calcado <phil@buoyant.io> * Fix feedback from PR Signed-off-by: Phil Calcado <phil@buoyant.io>	2018-01-12 15:35:22 -05:00
Eliza Weisman	63d1a5d70d	Add Protocol field to Transports telemetry (#138 ) See #132. This PR adds a protocol field to the ClientTransport and ServerTransport messages, and modifies the proxy to report a value for this field (currently, it's only ever HTTP). Currently, HTTP/1 and HTTP/2 are collapsed into one Protocol variant, see #132 (comment). I expect that we can treat H1 as a subset of H2 as far as metrics goes. Note that after discussing it with @klingerf, I learned that the control plane telemetry API currently does not do anything with the ClientTransport and ServerTransport messages, so beyond regenerating the protobuf-generated code, no controller changes were actually necessary. As we actually add metrics to TCP transports, we'll want to make some additions to the telemetry API to ingest these metrics. If any metrics are shared between HTTP and raw TCP transports (say, bytes sent), we'll want to differentiate between them in Prometheus. All the metrics that the control plane currently ingests from telemetry reports are likely to be HTTP-specific (requests, responses, response latencies), or at least, do not apply to raw TCP. Actually adding metrics to raw TCP transports will probably have to wait until there are raw TCP transports implemented in the proxy... Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-01-11 16:00:38 -08:00
Kevin Lingerfelt	1dc1c00a2a	Upgrade k8s.io/client-go to v6.0.0 (#122 ) * Sort imports Signed-off-by: Kevin Lingerfelt <kl@buoyant.io> * Upgrade k8s.io/client-go to v6.0.0 Signed-off-by: Kevin Lingerfelt <kl@buoyant.io> * Make k8s store initialization blocking with timeout Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-01-11 10:22:37 -08:00
Andrew Seigner	1ceaf3874a	Fix web and public-api log info messages. (#129 ) The existing startup/shutdown log info messages had spacing issues and used fmt. Update the log messages to use logrus for consistency, and fix spacing issues. Signed-off-by: Andrew Seigner <andrew@sig.gy>	2018-01-09 16:14:56 -08:00
Andrew Seigner	caeb83a526	Fix Go and Proxy dependency image SHAs (#117 ) The image tags for gcr.io/runconduit/go-deps and gcr.io/runconduit/proxy-deps were not updating to account for all changes in those images. Modify SHA generation to include all files that affect the base dependency images. Also add instructions to README.md for updating hard-coded SHAs in Dockerfile's. Fixes #115 Signed-off-by: Andrew Seigner <andrew@sig.gy>	2018-01-08 11:19:49 -08:00
Risha Mars	80ecdc13c2	Copy over /pkg to container (#110 ) Signed-off-by: Risha Mars <mars@buoyant.io>	2018-01-05 10:12:29 -08:00
Phil Calçado	709de5a7b0	Moves k8s and conduit client code to /pkg (#103 ) * Rename constructor functions from MakeXyz to NewXyz As it is more commonly used in the codebase Signed-off-by: Phil Calcado <phil@buoyant.io> * Make Conduit client depend on KubernetesAPI Signed-off-by: Phil Calcado <phil@buoyant.io> * Move Conduit client and k8s logic to standard go package dir for internal libs Signed-off-by: Phil Calcado <phil@buoyant.io> * Move dependencies to /pkg Signed-off-by: Phil Calcado <phil@buoyant.io> * Make conduit client more testable Signed-off-by: Phil Calcado <phil@buoyant.io> * Remove unused config object Signed-off-by: Phil Calcado <phil@buoyant.io> * Add more test cases for marhsalling Signed-off-by: Phil Calcado <phil@buoyant.io> * Move client back to controller Signed-off-by: Phil Calcado <phil@buoyant.io> * Sort imports Signed-off-by: Phil Calcado <phil@buoyant.io>	2018-01-04 10:10:10 -08:00
Christopher Schmidt	ce69c2e534	returns rs name in case of there's no deployment runconduit/conduit#80 (#81 ) Signed-off-by: Christopher Schmidt <fakod666@googlemail.com>	2018-01-02 11:24:09 -08:00
Phil Calçado	31e9846f62	Make several CLI commands testable (#86 ) * Add func to rsolve kubectl-like names to canonical names Signed-off-by: Phil Calcado <phil@buoyant.io> * Refactor API instantiation Signed-off-by: Phil Calcado <phil@buoyant.io> * Make version command testable Signed-off-by: Phil Calcado <phil@buoyant.io> * Make get command testable Signed-off-by: Phil Calcado <phil@buoyant.io> * Add tests for api utils Signed-off-by: Phil Calcado <phil@buoyant.io> * Make stat command testable Signed-off-by: Phil Calcado <phil@buoyant.io> * Make tap command testablë Signed-off-by: Phil Calcado <phil@buoyant.io>	2017-12-27 14:10:41 -05:00
Brian Smith	2729fa02bc	Stop using "default" as default service namespace (#61 ) Previously the destinations service would look for services in the "default" namespace if the service name didn't have at least two labels. However, the "default" namespace is almost always the wrong namespace. The only reasonable default namespace is the namespace of the client service, which isn't given to the destinations service. Therefore it shouldn't try to default the namespace. Accordingly, stop defaulting the namespace to "default". Validated by manually testing the emojivoto service before and after the proxy implemented namespace defaulting itself.	2017-12-20 10:44:24 -10:00
Kevin Lingerfelt	a8e75115ab	Prepare the repo for the v0.1.1 release (#75 ) * Prepare the repo for the v0.1.1 release * Add changelog * Changelog updates, wrap at 100 characters	2017-12-20 10:51:53 -08:00
Phil Calçado	0a6a9edaee	Respect $KUBECONFIG env var (#68 ) * Move kubectl logis to k8s package * Made kubectl return url.URL, just like API Make k8s API code respect /Users/pcalcado/.kube/config (closes #17) * Fix style mistakes and typos	2017-12-20 11:50:25 +11:00
Kevin Lingerfelt	2f114e69fa	Add support for path stats in cli and web api (#13 ) * Add support for path stats in cli and web api The cli stat command supports grouping by pod and deployment. With this change, it will also support grouping by path, in order to facilitate a summary stats per individual endpoint. * Right-align numeric columns in stat output	2017-12-08 12:24:39 -08:00
Risha Mars	82f50b5536	Fix simulate-proxy script (#14 ) Problem: Simulate proxy would seemingly hang when used. In simulate-proxy we were using rand.Uint32() to generate Count. This is way too big (in telemetry/server.go we call latencyStat.observe() Count times, so this loop was taking fovever). Solution: Use a count of 1 (as the surrounding loop will generate count requests) Validation: Script now works without hanging.	2017-12-08 11:15:05 -08:00
Kevin Lingerfelt	906d4e8b69	Fix public-api error marshaling and unmarshaling (#16 )	2017-12-08 11:03:55 -08:00

1 2

53 Commits