linkerd2

Commit Graph

Author	SHA1	Message	Date
Kevin Lingerfelt	bd1d1af38b	dst svc: use shared informer instead of pod watcher (#1073 ) * Update desintation service to use shared informer instead of pod watcher * Add const for pod IP index name Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-06-12 18:09:47 -07:00
Kevin Lingerfelt	6e66f6d662	Rename Lister to API and expose informers as well as listers (#1072 ) Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-06-12 10:27:55 -07:00
Risha Mars	7d4c4aa290	CLI: print resources in the same order every time stat all is run (#1088 ) Previously, in conduit stat all we would just print the map of stat results, which resulted in the order in which stats were displayed varying between prints. Fix: Define an array, k8s.StatAllResourceTypes and use the order in this array to print the map; ensuring a consistent print order every time the command is run.	2018-06-08 15:02:17 -07:00
Ivan Sim	11d1d55632	Filter out failed and completed pods from stats summary result (#1010 ) (#1065 ) Both the conduit stat command and web UI are showing failed and completed pods. This change filters out those pods before returning the result to the client. Fixes #1010 Signed-off-by: Ivan Sim <ihcsim@gmail.com>	2018-06-05 13:19:48 -07:00
Kevin Lingerfelt	eebc612d52	Add install flag for sending tls identity info to proxies (#1055 ) Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-06-04 16:55:06 -07:00
Kevin Lingerfelt	ec2433e9bd	Update controller to use 'tls' metric label (#1044 ) * Update controller to use 'tls' metric label * Fix meshed column formatter Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-06-01 16:44:33 -07:00
Eliza Weisman	5a42ce357e	proto: Add TLS identity to WeightedAddr message (#1041 ) Required for #1008. This PR adds the `TlsIdentity` message to the Destination service proto, to describe what strategy the proxy should use for verifying an endpoint's TLS certificates. It also adds a `TlsIdentity` field to the `WeightedAddr` message. Currently, there is one possible variant for `TlsIdentity`, `KubernetesPodName`, which consists of the Kubernetes pod name of the endpoint, the namespace of the endpoint, and the namespace of that pod's Conduit control plane. The proxy should attempt to connect over TLS if the control plane namespace matches its own control plane namespace. The pod name and namespace are used to verify the endpoint's TLS certificate. See https://github.com/runconduit/conduit/issues/386#issuecomment-392948046. This change was initially part of #1008, but I factored it out to make the diff smaller. Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-05-31 11:48:25 -07:00
Risha Mars	ffabdefc6c	Add queries to prometheus to determine number of fully meshed requests (#983 ) - Update the `response_total` prometheus query of the StatSummary endpoint to also break queries out by a `meshed` label. - Add a 'Secured' column to the web UI/CLI stat displays, which indicate the percentage of traffic starting and ending in the mesh This meshed label is used in the CLI/Web UI to display a column of the percentage of traffic that starts/ends in the mesh. (Which is a proxy indicator for whether that traffic is 'secured' when we add TLS by default for intra mesh requests). The `meshed` label is not yet added anywhere, so until it is supplied by the proxy, all traffic will show up as 0% secured in the web/CLI.	2018-05-24 11:05:09 -07:00
Andrew Seigner	8a3b1a638a	Introduce meshed label in simulate-proxy (#992 ) The proxy does not yet support a `meshed` label. In anticipation of a `meshed` label in the proxy, introduce this label in `simulate-proxy`, for testing. Relates to #306 and #386. Signed-off-by: Andrew Seigner <siggy@buoyant.io> secured -> meshed Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-05-23 15:06:11 -07:00
Andrew Seigner	84e6eb5c87	Fix nil pointer dereference in StatSummary (#991 ) The StatSummary endpoint was dereferencing StatSummaryRequest.Selector.Resource, causing a panic when it received an empty request. Fix StatSummary to use the nil-friendly StatSummaryRequest.GetSelector().GetResource() methods, and add a test to validate. Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-05-23 13:21:49 -07:00
Risha Mars	1e6434f6de	Fix bug in the public-api where conduit stat params were ignored (#971 ) * Fix bug where we were dropping parts of the StatSummaryRequest * Add tests for prometheus query strings and for failed cases Problem In #928 I rewrote the stat api to handle 'all' as a resource type. To query for all resource types, we would copy the Resource, LabelSelector and TimeWindow of the original request, and then go through all the resource types and set Resource.Type for each resource we wanted to get. The bug is that while we copy over some fields of the original request, we didn't copy over all of them - namely Resource.Name and the Outbound resource. So the Stat endpoint would ignore any --to or --from flags, and would ignore requests for a specific named resource. Solution Copy over all fields from the request. I've also added tests for this case. In this process I've refactored the stat_summary_test code to make it a bit easier to read/use.	2018-05-18 16:06:06 -07:00
Kevin Lingerfelt	36ec391dbe	Go: update k8s dependencies to 1.10.2 (#962 ) Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-05-17 15:46:58 -07:00
Risha Mars	b8dc83f9d2	Modify the Stat API to handle requests for resource type "all" (#928 ) Allow the Stat endpoint in the public-api to accept requests for resourceType "all". Currently, this queries Pods, Deployments, RCs and Services, but can be modified to query other resources as well. Both the CLI and web endpoints now work if you set resourceType to all. e.g. `conduit stat all`	2018-05-11 14:35:37 -07:00
Kevin Lingerfelt	4e8e1eb84d	CLI: Fix validation for service stats (#935 ) * CLI: Fix validation for service stats * Address review feedback Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-05-11 10:28:49 -07:00
Oliver Gould	a786089fd6	docker: Cache versionless builds before building versioned go binaries (#921 ) The way that git-related version information is linked into go binaries busts Docker's cache such that every commit causes all binaries to rebuilt. In order to ameliorate this, we can build each binary once without version information first so that its artifacts are cached. When Go sources are not changed and only the version information changes, builds are 4.3x faster than before (from 5+ minutes to <90s). On `master` Branch off of master and build (mostly cached): ``` :; time DOCKER_TRACE=1 bin/docker-build ... DOCKER_TRACE=1 bin/docker-build 9.10s user 6.30s system 5% cpu 4:26.47 total ``` Rebuild without changing anything (highly cached): ``` :; time DOCKER_TRACE=1 bin/docker-build ... DOCKER_TRACE=1 bin/docker-build 9.23s user 6.04s system 47% cpu 32.017 total ``` Update only the git sha and rebuild: ``` :; git ci -am 'bump it' --allow-empty [ver/eg 2749eb3] bump it :; time DOCKER_TRACE=1 bin/docker-build ... DOCKER_TRACE=1 bin/docker-build 8.55s user 6.08s system 4% cpu 5:22.25 total ``` On this branch: Rebuild without changing anything (highly cached): ``` :; time DOCKER_TRACE=1 bin/docker-build ... DOCKER_TRACE=1 bin/docker-build 8.94s user 5.97s system 46% cpu 32.257 total ``` Update only the git sha and rebuild: ``` :; git ci -am 'bump it' --allow-empty [ver/go-docker-cache-versionless 77a80b5] bump it :; time DOCKER_TRACE=1 bin/docker-build ... DOCKER_TRACE=1 bin/docker-build-cli-bin 2.02s user 1.34s system 9% cpu 34.144 total ```	2018-05-10 10:22:09 -07:00
Risha Mars	416381cdfd	Fix bug where GetPodsFor(pod) was returning all pods in a namespace (#900 ) * Fix bug where GetPodsFor(pod) was returning all pods in a namespace Problem In lister.GetPodsFor, when the input object was a pod, we would return all the pods in the namespace. I would expect GetPodsFor(pod) to return only one pod - the pod itself. Cause The cause of this is that when the object type was pod we were setting the selector to selector = labels.Everything() which gets all the pods in the namespace. Fix Special case GetPodsFor(pod) to return the pod itself, rather than looking up pods via labels.	2018-05-08 13:52:49 -07:00
Risha Mars	f94856e489	Modify the Stat endpoint to also return the number of failed conduit pods (#895 ) * Modify the Stat endpoint to also return the count of failed pods * Add comments explaining pod count stats * Rename total pod count to running pod count This is to support the service mesh overview page, as I'd like to include an indicator of failed pods there.	2018-05-08 10:35:21 -07:00
Brian Smith	c5d2dab8bd	Remove special support for ExternalName services (#764 ) After this was implemented we found that ExternalName services are represented in DNS as CNAMEs, which means that the proxy's DNS fallback logic can be used instead of doing DNS in the control plane. Besides simplifying the controller, this will also increase fidelity with the proxied pods' DNS configuration (improve transparency). Signed-off-by: Brian Smith <brian@briansmith.org>	2018-04-25 11:53:33 -10:00
Andrew Seigner	dce31b888f	Deprecate Tap, rename TapByResource to Tap (#844 ) The `conduit tap` command is now deprecated. Replace `conduit tap` with `connduit tapByResource`. Rename tapByResource to tap. The underlying protobuf for tap remains, the tap gRPC endpoint now returns Unimplemented. Fixes #804 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-25 12:24:46 -07:00
Andrew Seigner	a0a9a42e23	Implement Public API and Tap on top of Lister (#835 ) public-api and and tap were both using their own implementations of the Kubernetes Informer/Lister APIs. This change factors out all Informer/Lister usage into the Lister module. This also introduces a new `Lister.GetObjects` method. Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-24 18:10:48 -07:00
Andrew Seigner	03d4684d3b	Introduce K8s Lister, integrate simulate-proxy (#829 ) The Kubernetes client-go Informer/Lister APIs are implemented in several parts of the code base. This change introduces a Lister module, providing Informer/Lister capability through a simple interface. Once this merges, we can follow up with moving public-api and tap onto Lister. Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-23 16:44:19 -07:00
Andrew Seigner	baf4ea1a5a	Implement TapByResource in Tap Service (#827 ) The TapByResource endpoint was previously a stub. Implement end-to-end tapByResource functionality, with support for specifying any kubernetes resource(s) as target and destination. Fixes #803, #49 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-23 16:13:26 -07:00
Eliza Weisman	d9112abc93	proxy: remove unused metrics (#826 ) This PR removes the unused `request_duration_ms` and `response_duration_ms` histogram metrics from the proxy. It also removes them from the `simulate-proxy` script's output, and from `docs/proxy-metrics.md` Closes #821	2018-04-23 16:05:20 -07:00
Andrew Seigner	39eccb09e2	cli: standardize kubernetes resource parsing (#830 ) The Tap command leveraged new cli parsing code, enabling Kubernetes resources specified as `(TYPE [NAME] \| TYPE/NAME)`. The Stat command did not use this. Modify the Stat command to use the same cli flag parsing code as Tap. Remove the to/from-resource flags from Stat. Fixes #792 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-23 15:17:42 -07:00
Eliza Weisman	8147a363e9	Make simulate-proxy match proxy output (#822 ) This PR makes two changes to the `simulate-proxy` script: 1. Removed the `protocol={"http", "tcp"}` label from TCP metrics. The proxy no longer adds this label (see https://github.com/runconduit/conduit/pull/785#discussion_r182563499). 2. Fixed failed responses being labeled with `classification="fail"` rather than `classification="failure"` (the label the proxy sets). I noticed that while I was here and decided to fix it as well. Note that the first change required some minor changes to the `proxyMetricCollectors` struct in `simulate-proxy`; since the label cardinality for TCP open stats decreased by one due to removing the `protocol` label, it's no longer necessary for that struct to `haveCounterVec`/`GaugeVec` pointers for these stats. It now owns the actual `Counter`/`Gauge` instead. This means that the metric vecs that are created to be labeled for `inbound` and `outbound` are now stored as variables in the `newSimulatedProxy` function rather than going in a `proxyMetricCollectors` struct first. This shouldn't impact behaviour at all.	2018-04-20 12:11:57 -07:00
Andrew Seigner	79bdc638b3	Service support in stat command (#809 ) The `stat` command did not support `service` as a resource type. This change adds `service` support to the `stat` command. Specifically: - as a destination resource on `--to` commands - as a target resource on `--from` commands Fixes #805 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-19 16:51:20 -07:00
Eliza Weisman	6eec6256f7	Add transport-level metrics to simulate-proxy (#811 ) This PR adds the transport-level metrics described in #742 to the `simulate-proxy` script. This will be useful while adding these metrics to the Grafana dashboard and/or CLI. Closes #793	2018-04-19 15:18:43 -07:00
Andrew Seigner	293e00bc3e	Introduce tapByResource cli command (#802 ) The existing `tap` command is being deprecated. Introduce a `tapByResource` cli command. It supports tapping a Kubernetes resource or collection of resources, optionally filtered by outbound resources. This command will eventually replace `tap`. Part of #778 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-19 14:44:23 -07:00
Kevin Lingerfelt	653dc6bfaa	Add replication controller stats in CLI (#794 ) * Add replication controller stats in CLI * Fix pod status in stat summary tests Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-04-18 18:12:14 -07:00
Oliver Gould	06dd8d90ee	Introduce the TapByResource API (#778 ) This changes the public api to have a new rpc type, `TapByResource`. This api supersedes the Tap api. `TapByResource` is richer, more closely reflecting the proxy's capabilities. The proxy's Tap api is extended to select over destination labels, corresponding with those returned by the Destination api. Now both `Tap` and `TapByResource`'s responses may include destination labels. This change avoids breaking backwards compatibility by: * introducing the new `TapByResource` rpc type, opting not to change Tap * extending the proxy's Match type with a new, optional, `destination_label` field. * `TapEvent` is extended with a new, optional, `destination_meta`.	2018-04-18 15:37:07 -07:00
Andrew Seigner	1e4ac8fda8	Destination service provides pod-template-hash (#784 ) The Destination service does not provide ReplicaSet information to the proxy. The `pod-template-hash` label approximates selecting over all pods in a ReplicaSet or ReplicationController. Modify the Destination service to provide this label to the proxy. Relates to #508 and #741 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-18 14:41:27 -07:00
Kevin Lingerfelt	71a51afb40	Expose pod stats in CLI, web UI, and Grafana (#788 ) * Expose pod stats in CLI, web UI, and Grafana * Fix js api helpers test * Add outbound traffic stats to pod dashboard Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-04-18 11:26:47 -07:00
Andrew Seigner	9e8cce0838	Destination service returns "Running" pod labels (#781 ) When the Destination sees an IP address, it looks up Pods by that IP, and associates Pod label data to it. If the lookup by IP returned more than one Pod, it simply picked the first one. This is not correct, specifically in cases where one pod is in a Running state, and others are not. Modify the Destination service to only return label data for Pods in the Running state. Fixes #773 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-17 14:42:54 -07:00
Andrew Seigner	727521f914	Permit arbitrary time windows in public-api (#774 ) The public-api previously only permitted 4 hard-coded time windows: 10s, 1m, 10m, 1h. This was primarily a relic of the recently removed telemetry system. Modify the public-api to validate the time string, but allow for any window size, which is then passed through to Prometheus. Fixes #686 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-16 17:37:17 -07:00
Kevin Lingerfelt	11a4359e9a	Misc cleanup following the telemetry rewrite (#771 ) Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-04-16 15:51:07 -07:00
Andrew Seigner	77fb6d3709	Add namespace as a resource type in public-api (#760 ) * Add namespace as a resource type in public-api The cli and public-api only supported deployments as a resource type. This change adds support for namespace as a resource type in the cli and public-api. This also change includes: - cli statsummary now prints `-`'s when objects are not in the mesh - cli statsummary prints `No resources found.` when applicable - removed `out-` from cli statsummary flags, and analagous proto changes - switched public-api to use native prometheus label types - misc error handling and logging fixes Part of #627 Signed-off-by: Andrew Seigner <siggy@buoyant.io> * Refactor filter and groupby label formulation Signed-off-by: Kevin Lingerfelt <kl@buoyant.io> * Rename stat_summary.go to stat.go in cli Signed-off-by: Kevin Lingerfelt <kl@buoyant.io> * Update rbac privileges for namespace stats Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-04-13 16:53:01 -07:00
Andrew Seigner	21886760c6	Use apps/v1beta2 for Kubernetes 1.8 compatibility (#762 ) Conduit was relying on apps/v1 to Deployment and ReplicaSet APIs. apps/v1 is not available on Kubernetes 1.8. This prevented the public-api from starting. Switch Conduit to use apps/v1beta2. Also increase the Kubernetes API cache sync timeout from 10 to 60 seconds, as it was taking 11 seconds on a test cluster. Fixes #761 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-13 12:08:16 -07:00
Kevin Lingerfelt	fb15fe7c1a	Remove the telemetry service (#757 ) * Remove the telemetry service The telemetry service is no longer needed, now that prometheus scrapes metrics directly from proxies, and the public-api talks directly to prometheus. In this branch I'm removing the service itself as well as all of the telemetry protobuf, and updating the conduit install command to no longer install the service. I'm also removing the old version of the stat command, which required the telemetry service, and renaming the statsummary command to stat. * Fix time window tests * Remove deprecated controller scrape config Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-04-13 11:21:29 -07:00
Andrew Seigner	e9b209829d	Handle NaN metrics (#750 ) The Prometheus client sometimes returns NaN if a calculation is invalid, such as histogram_quantile when no requests have occurred. Add IsNaN check in the public-api and set output to zero. Fixes #747 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-12 15:21:00 -07:00
Andrew Seigner	624b87f743	Implement ListPods in public-api (#743 ) The ListPods endpoint's logic resides in the telemetry service, which is going away. Move ListPods logic into public-api, use new k8s informer APIs. Fixes #694 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-11 17:53:57 -07:00
Kevin Lingerfelt	47caf1ca07	Add --all-namespaces flag to CLI statsummary command (#745 ) * Add --all-namespaces flag to CLI statsummary command * Fix statsummary output formatting Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-04-11 16:40:25 -07:00
Andrew Seigner	259fdcd134	Add latency stats in new stat summary endpoint (#737 ) The new StatSummary endpoint was only providing request volume and successs rate information. Add support for retrieving latency stats via StatSummary. Also make all prometheus calls in parallel, and implement kubernetes test fixtures. Fixes #681 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-11 11:58:32 -07:00
Kevin Lingerfelt	e1e1b6b599	Controller: add more destination labels, fix service label (#731 ) * Add more destination labels, fix service label * Update owner labels to match proxy metrics docs Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-04-11 10:44:52 -07:00
Kevin Lingerfelt	91c359e612	Switch public API to use cached k8s resources (#724 ) * Switch public API to use cached k8s resources * Move shared informer code to separate goroutine * Fix spelling issue Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-04-10 11:39:31 -07:00
Andrew Seigner	3a341abe9a	Fix success rate calculation in public api (#723 ) The success rate calculation relies on the `classification` label, but was incorrectly specifying `fail` rather than `failure`. Fix public api to specify `failure`. Also re-org public api tests for easier Kubernetes and Prometheus mocking. Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-10 11:04:04 -07:00
Andrew Seigner	716b392231	Move StatSummary logic into grpc server (#717 ) The StatSummary logic was implemented as a method on http_server. Move the StatSummary logic into grpc_server, for consistency with the other endpoints. Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-06 16:46:15 -07:00
Andrew Seigner	50c323c617	Use canonical k8s names, fix prom labels (#702 ) The new statsummary command accepted friendly k8s names, which worked for k8s queries, but Prometheus requires a specific key. Modify the statsummary query to map friendly k8s names to canonical k8s names when constructing the query. Then during the query, map the canonical k8s name to a specific Prometheus label. Fixes #695 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-06 12:34:54 -07:00
Risha Mars	2f5b5ea5f2	Start implementing conduit stat summary endpoint (#671 ) Start implementing new conduit stat summary endpoint. Changes the public-api to call prometheus directly instead of the telemetry service. Wired through to `api/stat` on the web server, as well as `conduit statsummary` on the CLI. Works for deployments only. Current implementation just retrieves requests and mesh/total pod count (so latency stats are always 0). Uses API defined in #663 Example queries the stat endpoint will eventually satisfy in #627 This branch includes commits from @klingerf * run ./bin/dep ensure * run ./bin/update-go-deps-shas	2018-04-05 17:05:06 -07:00
Andrew Seigner	28d5007cdf	Harmonize Prometheus label usage (#690 ) The Destination service used slightly different labels than the telemetry pipeline expected, specifically, prefixed with `k8s_`. Make all Prometheus labels consistent by dropping `k8s_`. Also rename `pod_name` to `pod` for consistency with `deployement`, etc. Also update and reorganize `proxy-metrics.md` to reflect new labelling. Fixes #655 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-05 15:09:06 -07:00
Risha Mars	d1a39ea6bf	Define a new telemetry Stat API (#663 ) * Define a new telemetry Stat API Proposal definition for a new Stat API, for the purposes of satisfying the queries proposed in #627. StatSummary will replace Stat once implemented and the original Stat deleted.	2018-04-03 14:45:58 -07:00
Phil Calçado	19001f8d38	Add pod-based metric_labels to destinations response (#429 ) (#654 ) * Extracted logic from destination server * Make tests follow style used elsewhere in the code * Extract single interface for resolvers * Add tests for k8s and ipv4 resolvers * Fix small usability issues * Update dep * Act on feedback * Add pod-based metric_labels to destinations response * Add documentation on running control plane to BUILD.md Signed-off-by: Phil Calcado <phil@buoyant.io> * Fix mock controller in proxy tests (#656) Signed-off-by: Eliza Weisman <eliza@buoyant.io> * Address review feedback * Rename files in the destination package Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-04-02 18:36:57 -07:00
Brian Smith	df9ead9c36	Use Go 1.10.1 to build all Go code. (#650 ) Go 1.10.1 is a security release. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-04-02 14:58:30 -10:00
Andrew Seigner	97546e0646	Modify simulate-proxy to be more pod-centric (#653 ) simulate-proxy uses a deployment object from kubernetes to simulate each proxy metrics endpoint. Modify simulate-proxy to instead use a pod to simulate each proxy metrics endpoint. This ensures that each metrics endpoint consistently represents a pod in kubernetes, including it's namespace, deployment, and label information. This change also adds support for: - a new `metric-ports` flag, default is `10000-10009`. - `classification`, `pod_name`, and `pod_template_hash` labels Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-03-30 13:28:45 -07:00
Phil Calçado	bbed49c5bd	Refactor destination service and add tests in preparation to add information about labels (#645 ) * Extracted logic from destination server * Make tests follow style used elsewhere in the code * Extract single interface for resolvers * Add tests for k8s and ipv4 resolvers * Fix small usability issues * Update dep * Act on feedback Signed-off-by: Phil Calcado <phil@buoyant.io>	2018-03-30 11:36:48 -07:00
Andrew Seigner	1ed4a93b5e	Higher velocity metrics from simulate-proxy (#635 ) simulate-proxy increments a single set of metrics on each iteration, and also randomizes http status codes, leaving counters unchanged across several collections. Modify simuilate-proxy to increment all metrics on each iteration, provide a 90% success rate, ensure a pod does not call itself, and increase proxy count from 3 to 10. Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-03-28 13:30:02 -07:00
Kevin Lingerfelt	59c75a73a9	Add tests/utils/scripts for running integration tests (#608 ) * Add tests/utils/scripts for running integration tests Add a suite of integration tests in the `test/` directory, as well as utilities for testing in the `testutil/` directory. You can use the `bin/test-run` script to run the full suite of tests, and the `bin/test-cleanup` script to cleanup after the tests. The test/README.md file has more information about running tests. @pcalcado, @franziskagoltz, and @rmars also contributed to this change. * Create TEST.md file at the root of the repo * Update based on review feedback * Relax external service IP timeout for GKE * Update TEST.md with more info about different types of test runs * More updates to TEST.md based on review feedback Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-03-27 15:06:55 -07:00
Andrew Seigner	fe35509406	Clean up Prometheus labels scraped from proxy (#633 ) The Prometheus scrape config collects from Conduit proxies, and maps Kubernetes labels to Prometheus labels, appending "k8s_". This change keeps the resultant Prometheus labels consistent with their source Kubernetes labels. For example: "deployment" and "pod_template_hash". Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-03-27 15:01:08 -07:00
Brian Smith	7dc21f9588	Add the NoEndpoints message to the Destination API (#564 ) Have the controller tell the client whether the service exists, not just what are available. This way we can implement fallback logic to alternate service discovery mechanisms for ambigious names. Signed-off-by: Brian Smith <brian@briansmith.org> Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-03-27 10:45:41 -10:00
Andrew Seigner	12c6531546	Update docker-compose environment to match prod (#609 ) The Prometheus config in the docker-compose environment had fallen behind the prod setup. This change updates the docker-compose environment in the following ways: - Prometheus config more closely matches prod, based on #583 - simulate-proxy labels matches prod, based on #605 - add Grafana container Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-03-23 17:00:39 -07:00
Dennis Adjei-Baah	b90668a0b5	Modify simulate proxy to expose prometheus metrics (#576 ) The simulate-proxy script pushes metrics to the telemetry service. This PR modifies the script to expose metrics to a prometheus endpoint. This functionality creates a server that randomly generates response_total, request_totals, response_duration_ms and response_latency_ms. The server reads pod information from a k8s cluster and picks a random namespace to use for all exposed metrics. Tested out these changes with a locally running prometheus server. I also ran the docker-compose.yml to make sure metrics were being recorded by the prometheus docker container. fixes #498 Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>	2018-03-21 16:40:12 -07:00
Alena Varkockova	b82f89f4d9	Reuse code for metrics serving in controller (#585 ) Signed-off-by: Alena Varkockova varkockova.a@gmail.com	2018-03-19 10:33:25 -07:00
Alex Leong	9eb084c99d	Most controller listeners should only bind on localhost (#494 ) * Most controller listeners should only bind on localhost * Use default listening addresses in controller components * Review feedback * Revert test_helper change * Revert use of absolute domains Signed-off-by: Alex Leong <alex@buoyant.io>	2018-03-12 11:32:20 -07:00
Dennis Adjei-Baah	ad42f2f8ab	Retry k8s watch endpoints on error (#510 ) Shortly after conduit is installed in k8s environment. The control plane component that establishes a watch endpoint with k8s run in to networking issues during proxy initialization. During failure, each watcher fails to retry its connection to k8s watch endpoint which leads to timeouts and eventually, multiple controller pod restarts. This PR adds retry logic to each "watch" enabled package. fixes #478 Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>	2018-03-07 13:40:43 -08:00
Dennis Adjei-Baah	5a4c5aa683	Exclude telemetry generated by the control plane when requesting depl… (#493 ) When the conduit proxy is injected into the controller pod, we observe controller pod proxy stats show up as an "outbound" deployment for an unrelated upstream deployment. This may cause confusion when monitoring deployments in the service mesh. This PR filters out this "misleading" stat in the public api whenever the dashboard requests metric information for a specific deployment. * exclude telemetry generated by the control plane when requesting deployment metrics fixes #370 Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>	2018-03-05 17:58:08 -08:00
Andrew Seigner	698e65da8b	Fix flakey dns_test (#516 ) The dns_test had assumed DNS changes were deterministically ordered, but util.DiffAddresses uses a map and therefore does not guarantee ordering. Fix dns_test to sort TCP Addresses prior to comparison. Fixes #515 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-03-05 16:50:33 -08:00
Kevin Lingerfelt	8e2ef9d658	Handle ExternalName-type svcs in destination service (#490 ) * Handle ExternalName-type svcs in destination service * Move refresh interval to a global var Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-03-02 11:30:53 -08:00
Alex Leong	9b4e847555	Add DNS label validation in destination service (#464 ) Add a validation in the destination service that ensures that DNS destinations consist of valid labels. Signed-off-by: Alex Leong <alex@buoyant.io>	2018-03-01 15:49:49 -08:00
Kevin Lingerfelt	e57e74056e	Run go fix to fix context package imports (#470 ) Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-02-28 13:25:33 -08:00
Alex Leong	84ba1f3017	Ensure tap requests at least 1rps from each pod (#459 ) When attempting to tap N pods when N is greater than the target rps, a rounding error occurs that requests 0 rps from each pod and no tap data is returned. Ensure that tap requests at least 1 rps from each target pod. Tested in Kubernetes on docker-for-desktop with a 15 replica deployment and a maxRps of 10. Signed-off-by: Alex Leong <alex@buoyant.io>	2018-02-27 16:03:47 -08:00
Brian Smith	78ebd5e340	Base control plane Docker images on scratch instead of base. (#368 ) The control plane is proxied through the Conduit proxy. The Conduit proxy is based on the base image, and the control plane containers and the proxy share a networking namespace. This means we don't need the extra base utilities in the controller images since we can use the utilties in the proxy image. This is a step towards building the initial no-networking Conduit CA pod. Since the Conduit CA will not do any networking of its own, we networking debugging utilties are not helpful for it. They are actually an unnecessary risk because they could facilitate the exfiltration of the private key of the CA. (The Conduit CA pod won't have the Conduit Proxy injected into it either.) This also simplifies & slightly speeds up the building of the controller images. This is a stepping stone towards being able to build the controller images without `docker build` to improve build times. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-02-23 13:03:19 -10:00
Brian Smith	cf3c8cd7bc	Use Go 1.10.0 to build Go components. (#408 ) * Use Go 1.10.0 to build Go components. Take advantage of the new build cache in Go 1.10. Future work on improving build performance will utilize the build cache further. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-02-21 14:31:29 -10:00
Brian Smith	e6aad57766	Remove temporary files generated by dep in go-deps image. (#407 ) Previously Dockerfile-go-deps was converted from a multi-stage Dockefile to a single-stage Dockerfile in anticipation of enabling efficient use of `--cache-from` in CI. However, that resulted in the image ballooning in size because it contained the Git repo for every package downloaded by `dep ensure`. Bring the image back down to the proper size by removing the temporary files created. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-02-21 13:06:24 -10:00
Alex Leong	552204366c	Use Prometheus to track added data plane pods. (#338 ) The instance cache that powers the ListPods API is stored in memory in the telemetry service. This means that when there are multiple replicas of the telemetry service, each replica will have a distinct, incomplete view of the added pods based on which pods report to that telemetry replica. This causes the data plane bubbles on the dashboard to not all be filled in, and to flicker with each data refresh. We create a Prometheus counter called reports_total which has pod as a label. Whenever a telemetry service instance receives a report from a pod, it increments reports_total for that pod. This allows us to remove the in-memory instance cache and instead query Prometheus to see if each pod has had a report in the last 30 seconds. Fixes #337 Signed-off-by: Alex Leong <alex@buoyant.io>	2018-02-14 16:09:55 -08:00
Andrew Seigner	1db7d2a2fb	Ensure latency quantile queries match timestamps (#348 ) In PR #298 we moved time window parsing (10s => (time.now - 10s, time.now) down the stack to immediately before the query. This had the unintended effect of creating parallel latency quantile requests with slightly different timestamps. This change parses the time window prior to latency quantile fan out, ensuring all requests have the same timestamp. Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-02-13 16:26:54 -08:00
Andrew Seigner	50f4aa57e5	Require timestamp on all telemetry requests (#342 ) PR #298 moved summary (non-timeseries) requests to Prometheus' Query endpoint, with no timestamp provided. This Query endpoint returns a single data point with whatever timestamp was provided in the request. In the absense of a timestamp, it uses current server time. This causes the Public API to return discreet data points with slightly different timestamps, which is unexpected behavior. Modify the Public API -> Telemetry -> Prometheus request path to always require a timestamp for single data point requests. Fixes #340 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-02-13 13:52:21 -08:00
Brian Smith	b18fe459d4	Precompile large Go libraries in go-deps Docker image. (#332 ) On my system (i9-7960x running Docker natively in Linux) this regularly saves over 11 seconds of build time when a file under pkg/ changes and over 1.5 seconds of build time when a file under controller/ changes. Since most contributors are running Docker in a VM on less powerful computers, the savings for most contributors should be significantly greater. I imagine the savings for web/ and cli/ and proxy-init/ are similar, but I did not measure them. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-02-13 11:35:10 -10:00
Brian Smith	37008f9626	Improve caching behavior of controller/Dockerfile. (#331 ) Precompiling pkg/ in an earlier layer saves ~10 seconds of wall clock time on an incremental build on my machine (i9-7960x) when I update a file in controller/ such as controller/destination/server.go. This makes a significant difference in the edit-build-test loop. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-02-13 11:21:22 -10:00
Brian Smith	ec5a02fd64	Upgrade to Go 1.9.4. (#326 ) Go 1.9.4 is a security release. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-02-12 13:47:40 -10:00
Brian Smith	86ea1c06bf	Improve the caching behavior of Dockerfile-go-deps. (#325 ) Previously Dockerfile-go-deps would run `dep ensure` whenever anything in the source tree changed. Also, because it was a multi-stage Dockerfile it did not work well with Docker's `--cache-from` feature. Change Dockerfile-go-deps to only re-run `dep ensure` when Gopkg.{toml,lock} and/or bin/dep change. Simplify it to a single stage so that it works better with Docker's `--cache-from` feature. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-02-12 13:40:20 -10:00
Brian Smith	c78df4ba13	Use bin/dep in Dockerfile-go-deps. (#324 ) bin/dep verifies the digest of the `dep` downloaded `dep` executable, whereas previously Dockerfile-go-deps wasn't. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-02-12 13:32:08 -10:00
Andrew Seigner	261586b862	Fix pointer copying (#330 ) The Public APIs stat endpoint copies a slice of values to a slice of pointers prior to gRPC response. Go's range clause re-uses the same pointer for each iteration of the loop, causing a slice of {1,2,3} becoming {3,3,3}. Fix the range loop to directly reference pointers in the slice of values, ignoring the range variable. Also add tests to catch this case. Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-02-10 11:04:28 -08:00
Eliza Weisman	8bc497a057	Remove unused metrics (#322 ) Removed the `method` label from Prometheus, and removed HTTP methods from reports. Removed `StreamSummary` from reports and replaced it with a `u32` count of streams. Closes #266 Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-02-09 17:14:17 -08:00
Andrew Seigner	bffa5ff3e6	Concurrent Telemetry requests (#323 ) All requests from the public API service to the Telemetry service were done serially. In some cases a single request to the public API's Stat endpoint resulted in 5 serial requests to the Telemetry service. Make all requests from the Public API to Telemetry concurrent. Signed-off-by: Andrew Seigner <siggy@buoyant.io> Part of #299	2018-02-09 17:11:20 -08:00
Eliza Weisman	458e9d2ac5	Remove per-path metrics from telemetry pipeline (#317 ) Follow-up from #315. Now that the UIs don't report per-path metrics, we can remove the path label from Prometheus, the path aggregation and filtering options from the telemetry API, and the path field from the proxy report API. I've modified the tests to no longer expect the removed fields, and manually verified that Conduit still works after making these changes. Closes #265 Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-02-09 14:20:28 -08:00
Andrew Seigner	33e3c3ace9	Optimize Prometheus queries (#298 ) Prometheus queries from the Telemetry service were taking seconds or 10s of seconds. Optimize these queries: - Move all summary queries requiring a single point data off of Prometheus' QueryRange() endpoint, onto Query() - Set `defaultVectorRange` to 30s, and also use it regardless of time window Also add tests for grpc_server and telemetry server Signed-off-by: Andrew Seigner <siggy@buoyant.io> Fixes #260	2018-02-09 10:55:07 -08:00
Eliza Weisman	2015d992cc	Remove pod-level metrics from web and CLI (#304 ) This PR updates the web UI to remove the pod detail page, and to remove the links to that page from pod names in metrics tables. It also removes the `pods` option from `conduit stat`, and the `sourcePod` and `targetPod` fields from the controller API proto's `MetricMetadata` message. I've updated the `conduit stat` tests to reflect these changes, and manually verified the web UI changes. Closes #261 Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-02-08 19:07:10 -08:00
Eliza Weisman	915f08ac4c	Store proxy latencies in a structure that matches controller histogram (#11 ) The proxy currently stores latency values in an `OrderMap` and reports every observed latency value to the controller's telemetry API since the last report. The telemetry API then sends each individual value to Prometheus. This doesn't scale well when there are a large number of proxies making reports. I've modified the proxy to use a fixed-size histogram that matches the histogram buckets in Prometheus. Each report now includes an array indicating the histogram bounds, and each response scope contains a set of counts corresponding to each index in the bounds array, indicating the number of times a latency in that bucket was observed. The controller then reports the upper bound of each bucket to Prometheus, and can use the proxy's reported set of bucket bounds so that the observed values will be correct even if the bounds in the control plane are changed independently of those set in the proxy. I've also modified `simulate-proxy` to generate the new report structure, and added tests in the proxy's telemetry test suite validating the new behaviour.	2018-02-07 18:02:59 -08:00
Phil Calçado	9c03764a29	Remove hardcoded port and shared state for http test (#282 ) We now create a new test HTTP server per test case instead of sharing it across them all. This should solve the data races we have experienced on Travis. Signed-off-by: Phil Calcado <phil@buoyant.io>	2018-02-06 13:48:14 -05:00
Andrew Seigner	4156af786d	Enable race detection in ci (#259 ) We previously did not have race detection enabled because our tests would fail. Following #249, this is no longer the case. Enable race detection in ci and build instructions. This change also fixes client_test.go attempting to allocate a 2GB buffer due to bad test input. Fixes #173 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-02-02 15:04:52 -08:00
Andrew Seigner	9a40d984ff	Replace shelling out with kubernetes proxy (#249 ) The conduit dashboard command asychronously shells out and runs "kubectl proxy". This change replaces the shelling out with calls to kubernetes proxy APIs. It also allows us to enable race detection in our go tests, as the shell out code tests did not pass race detection. Fixes #173 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-02-02 10:31:59 -08:00
Alex Leong	fa2f5a0140	Add dep wrapper script to ensure consistent version of dep is used (#253 ) * Add `bin/dep` which fetches a fixed version of `dep` to be used. * Upgrade from dep 0.3.1 to 0.4.1 * Fix inconsistent Gopkg.lock by checking in the result of `bin/dep ensure` Signed-off-by: Alex Leong <alex@buoyant.io>	2018-02-01 16:09:05 -08:00
Andrew Seigner	277c06cf1e	Simplify and refactor k8s labels and annnotations (#227 ) The conduit.io/* k8s labels and annotations we're redundant in some cases, and not flexible enough in others. This change modifies the labels in the following ways: `conduit.io/plane: control` => `conduit.io/controller-component: web` `conduit.io/controller: conduit` => `conduit.io/controller-ns: conduit` `conduit.io/plane: data` => (remove, redundant with `conduit.io/controller-ns`) It also centralizes all k8s labels and annotations into pkg/k8s/labels.go, and adds tests for the install command. Part of #201 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-02-01 14:12:06 -08:00
Kevin Lingerfelt	9ff439ef44	Add -log-level flag for install and inject commands (#239 ) * Add -log-level flag for install and inject commands Signed-off-by: Kevin Lingerfelt <kl@buoyant.io> * Turn off all CLI logging by default, rename inject and install flags Signed-off-by: Kevin Lingerfelt <kl@buoyant.io> * Re-enable color logging Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-02-01 12:38:07 -08:00
Risha Mars	a9d4a3d74e	Add more prometheus instrumentation (latency, response size) (#174 ) We added basic prometheus instrumentation, but this only encapsulated basic go metrics and request counts. This adds latency and response size metrics exporting as well, to the public-api server, theweb server and the telemetry server. Since the util function in grpc.go was basically used to wrap the server creation in a prometheus handler, I added the other prometheus constants in there and renamed the file to prometheus.go. - Add request duration and response size instrumentation to web and public api - Also add latency monitoring to telemetry service requests - Rename util/grpc.go to util/prometheus.go	2018-02-01 09:50:31 -08:00
Kevin Lingerfelt	4a76c6448b	Update cli subcommands to print errors when encountered (#221 ) Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-01-29 11:28:19 -08:00
Kevin Lingerfelt	7399df83f1	Set conduit version to match conduit docker tags (#208 ) * Set conduit version to match conduit docker tags Signed-off-by: Kevin Lingerfelt <kl@buoyant.io> * Remove --skip-inbound-ports for emojivoto Signed-off-by: Kevin Lingerfelt <kl@buoyant.io> * Rename git_sha => git_sha_head Signed-off-by: Kevin Lingerfelt <kl@buoyant.io> * Switch to using the go linker for setting the version Signed-off-by: Kevin Lingerfelt <kl@buoyant.io> * Log conduit version when go servers start Signed-off-by: Kevin Lingerfelt <kl@buoyant.io> * Cleanup conduit script Signed-off-by: Kevin Lingerfelt <kl@buoyant.io> * Add --short flag to head sha command Signed-off-by: Kevin Lingerfelt <kl@buoyant.io> * Set CONDUIT_VERSION in docker-compose env Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-01-26 11:43:45 -08:00
Phil Calçado	9410da471a	Better error handling for Tap (#177 ) Previously, running `$conduit tap` would return a `Unexpected EOF` error when the server wasn't available. This was due to a few problems with the way we were handling errors all the way down the tap server. This change fixes that and cleans some of the protobuf-over-HTTP code. - first step towards #49 - closes #106	2018-01-25 11:49:38 -05:00
Andrew Seigner	d0a0bb22bd	Move EosCtx to common for Tap and Telemetery (#204 ) * Make Eos optional in TapEvent grpc_status not being set in protobuf is the same as being set to zero, which is also status OK Modify TapEvent to include an optional EOS struct Signed-off-by: Andrew Seigner <siggy@buoyant.io> Part of #198 * Add Eos to proto & proxy tap end-of-stream events The proxy now outputs `Eos` instead of `grpc_status` in all end-of-stream tap events. The EOS value is set to `grpc_status_code` when the response ended with a `grpc_status` trailer, `http_reset_code` when the response ended with a reset, and no `Eos` when the response ended gracefully without a `grpc_status` trailer. This PR updates the proxy. The proto and controller changes are in PR #204. Part of #198. Closes #202 Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-01-24 15:48:00 -08:00
Eliza Weisman	9e49054963	Classify non-gRPC status codes for HTTP telemetry (#200 ) Currently, all "success"/"failure" classifications in the telemetry API are made based on the `grpc-status` trailer. If the trailer is not present, then a request is assumed to have failed. As we start proxying non-gRPC traffic, the controller needs to also be aware of HTTP status codes, so that non-gRPC requests are not assumed to always fail. I've modified the telemetry API server to classify requests based on their HTTP status codes when the `grpc-status` trailer is not present. I've also modified the `simulate-proxy` script to generate fake HTTP/2 traffic without the `grpc-status` trailer. Closes #196 Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-01-24 10:57:23 -08:00
Sean McArthur	db913e3d18	controller: echo ip address if destination service receives ip (#186 ) Signed-off-by: Sean McArthur <sean@seanmonstar.com>	2018-01-22 16:20:13 -08:00
Andrew Seigner	beaea5540d	Update version to v0.1.3 in controller	2018-01-19 14:11:58 -08:00
Andrew Seigner	e6f17faf28	Updates for v0.1.2 release (#171 ) Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-01-19 10:56:20 -08:00
Brian Smith	650dcdde1e	Stop ignoring the most significant labels of Destination names (#63 ) Stop ignoring the most significant labels of Destination names Previously the destinations service was ignoring all the labels in a destination name after the first two labels. Thus, for example, "name.ns.another.domain.example.com" would be considered the same as "name.ns.svc.cluster.local". This was very wrong. Match destination names taking into consideration every label in the destination name. Provisions have been made for the case where the controller and the proxies with the zone name to use. However, currently neither the controller nor the proxies are actually configured with the zone, so the implementation was made to work in the current configuration too, as long as fully-qualified names are not used. A negative consequence of this change is that a name like "name.ns.svc.cluster.local" won't resolve in the current configuration, because the controller doesn't know the zone is "cluster.local" Unit tests are included for the new mapping rules. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-01-18 11:20:54 -10:00
Dennis Adjei-Baah	f7af375e73	Remove scheme requirement for api-addr flag in conduit CLI (#126 ) * Allow external controller public api clients that don't rely on a kubeconfig to interact with Conduit CLI Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>	2018-01-17 17:12:44 -08:00
Andrew Seigner	6f26cf21cf	Add TCP data to simulate-proxy script (#165 ) simulate-proxy sends HTTP example data. Modify this test script to also send TCP example data. Part of #132 Signed-off-by: Andrew Seigner <andrew@sig.gy>	2018-01-17 15:38:10 -08:00
Kevin Lingerfelt	e56be9bf0e	Bump k8s watch intialization timeout, cleanup logging (#166 ) Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-01-17 15:31:01 -08:00
Oliver Gould	008f53865b	Make proxy-deps multi-stage to remove the original source files (#161 ) Previously, proxy-deps and go-deps included the source tree for local projects. This can cause build conflicts when files are renamed. By adopting a multi-stage build for the proxy-deps image, we can be sure that we only preserve essential dependencies & manifests in the proxy-deps and go-deps images. Furthermore, `bin/update-go-deps-shas` and `bin/update-proxy-deps-shas` have been added to ease maintenance when files are changed. Fixes #159 Signed-off-by: Oliver Gould <ver@buoyant.io>	2018-01-17 12:26:22 -08:00
Kevin Lingerfelt	fd3cfcb5d9	Move healthcheck proto to separate file, use throughout (#150 ) * Move healthcheck proto to separate file, use throughout Signed-off-by: Kevin Lingerfelt <kl@buoyant.io> * Remove Check message from healthcheck.proto Signed-off-by: Kevin Lingerfelt <kl@buoyant.io> * Standardize healthcheck protobuf import name Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-01-17 11:15:38 -08:00
Phil Calçado	612bd0f7a0	Add --verbose option to CLI (#154 ) * Use stdout as writer for tap command fixes #136 Signed-off-by: Phil Calcado <phil@buoyant.io> * Add --log-level to command line Signed-off-by: Phil Calcado <phil@buoyant.io>	2018-01-17 12:06:43 -05:00
Phil Calçado	e328db7e87	Adds conduit-api check for status command (#140 ) * Abstract Conduit API client from protobuf interface to add new features Signed-off-by: Phil Calcado <phil@buoyant.io> * Consolidate mock api clients Signed-off-by: Phil Calcado <phil@buoyant.io> * Add simple implementation of healthcheck for conduit api Signed-off-by: Phil Calcado <phil@buoyant.io> * Change NextSteps to FriendlyMessageToUser Signed-off-by: Phil Calcado <phil@buoyant.io> * Add grpc check for status on the client Signed-off-by: Phil Calcado <phil@buoyant.io> * Add simple server-side check for Conduit API Signed-off-by: Phil Calcado <phil@buoyant.io> * Fix feedback from PR Signed-off-by: Phil Calcado <phil@buoyant.io>	2018-01-12 15:35:22 -05:00
Eliza Weisman	63d1a5d70d	Add Protocol field to Transports telemetry (#138 ) See #132. This PR adds a protocol field to the ClientTransport and ServerTransport messages, and modifies the proxy to report a value for this field (currently, it's only ever HTTP). Currently, HTTP/1 and HTTP/2 are collapsed into one Protocol variant, see #132 (comment). I expect that we can treat H1 as a subset of H2 as far as metrics goes. Note that after discussing it with @klingerf, I learned that the control plane telemetry API currently does not do anything with the ClientTransport and ServerTransport messages, so beyond regenerating the protobuf-generated code, no controller changes were actually necessary. As we actually add metrics to TCP transports, we'll want to make some additions to the telemetry API to ingest these metrics. If any metrics are shared between HTTP and raw TCP transports (say, bytes sent), we'll want to differentiate between them in Prometheus. All the metrics that the control plane currently ingests from telemetry reports are likely to be HTTP-specific (requests, responses, response latencies), or at least, do not apply to raw TCP. Actually adding metrics to raw TCP transports will probably have to wait until there are raw TCP transports implemented in the proxy... Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-01-11 16:00:38 -08:00
Kevin Lingerfelt	1dc1c00a2a	Upgrade k8s.io/client-go to v6.0.0 (#122 ) * Sort imports Signed-off-by: Kevin Lingerfelt <kl@buoyant.io> * Upgrade k8s.io/client-go to v6.0.0 Signed-off-by: Kevin Lingerfelt <kl@buoyant.io> * Make k8s store initialization blocking with timeout Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-01-11 10:22:37 -08:00
Andrew Seigner	1ceaf3874a	Fix web and public-api log info messages. (#129 ) The existing startup/shutdown log info messages had spacing issues and used fmt. Update the log messages to use logrus for consistency, and fix spacing issues. Signed-off-by: Andrew Seigner <andrew@sig.gy>	2018-01-09 16:14:56 -08:00
Andrew Seigner	caeb83a526	Fix Go and Proxy dependency image SHAs (#117 ) The image tags for gcr.io/runconduit/go-deps and gcr.io/runconduit/proxy-deps were not updating to account for all changes in those images. Modify SHA generation to include all files that affect the base dependency images. Also add instructions to README.md for updating hard-coded SHAs in Dockerfile's. Fixes #115 Signed-off-by: Andrew Seigner <andrew@sig.gy>	2018-01-08 11:19:49 -08:00
Risha Mars	80ecdc13c2	Copy over /pkg to container (#110 ) Signed-off-by: Risha Mars <mars@buoyant.io>	2018-01-05 10:12:29 -08:00
Phil Calçado	709de5a7b0	Moves k8s and conduit client code to /pkg (#103 ) * Rename constructor functions from MakeXyz to NewXyz As it is more commonly used in the codebase Signed-off-by: Phil Calcado <phil@buoyant.io> * Make Conduit client depend on KubernetesAPI Signed-off-by: Phil Calcado <phil@buoyant.io> * Move Conduit client and k8s logic to standard go package dir for internal libs Signed-off-by: Phil Calcado <phil@buoyant.io> * Move dependencies to /pkg Signed-off-by: Phil Calcado <phil@buoyant.io> * Make conduit client more testable Signed-off-by: Phil Calcado <phil@buoyant.io> * Remove unused config object Signed-off-by: Phil Calcado <phil@buoyant.io> * Add more test cases for marhsalling Signed-off-by: Phil Calcado <phil@buoyant.io> * Move client back to controller Signed-off-by: Phil Calcado <phil@buoyant.io> * Sort imports Signed-off-by: Phil Calcado <phil@buoyant.io>	2018-01-04 10:10:10 -08:00
Christopher Schmidt	ce69c2e534	returns rs name in case of there's no deployment runconduit/conduit#80 (#81 ) Signed-off-by: Christopher Schmidt <fakod666@googlemail.com>	2018-01-02 11:24:09 -08:00
Phil Calçado	31e9846f62	Make several CLI commands testable (#86 ) * Add func to rsolve kubectl-like names to canonical names Signed-off-by: Phil Calcado <phil@buoyant.io> * Refactor API instantiation Signed-off-by: Phil Calcado <phil@buoyant.io> * Make version command testable Signed-off-by: Phil Calcado <phil@buoyant.io> * Make get command testable Signed-off-by: Phil Calcado <phil@buoyant.io> * Add tests for api utils Signed-off-by: Phil Calcado <phil@buoyant.io> * Make stat command testable Signed-off-by: Phil Calcado <phil@buoyant.io> * Make tap command testablë Signed-off-by: Phil Calcado <phil@buoyant.io>	2017-12-27 14:10:41 -05:00
Brian Smith	2729fa02bc	Stop using "default" as default service namespace (#61 ) Previously the destinations service would look for services in the "default" namespace if the service name didn't have at least two labels. However, the "default" namespace is almost always the wrong namespace. The only reasonable default namespace is the namespace of the client service, which isn't given to the destinations service. Therefore it shouldn't try to default the namespace. Accordingly, stop defaulting the namespace to "default". Validated by manually testing the emojivoto service before and after the proxy implemented namespace defaulting itself.	2017-12-20 10:44:24 -10:00
Kevin Lingerfelt	a8e75115ab	Prepare the repo for the v0.1.1 release (#75 ) * Prepare the repo for the v0.1.1 release * Add changelog * Changelog updates, wrap at 100 characters	2017-12-20 10:51:53 -08:00
Phil Calçado	0a6a9edaee	Respect $KUBECONFIG env var (#68 ) * Move kubectl logis to k8s package * Made kubectl return url.URL, just like API Make k8s API code respect /Users/pcalcado/.kube/config (closes #17) * Fix style mistakes and typos	2017-12-20 11:50:25 +11:00
Kevin Lingerfelt	2f114e69fa	Add support for path stats in cli and web api (#13 ) * Add support for path stats in cli and web api The cli stat command supports grouping by pod and deployment. With this change, it will also support grouping by path, in order to facilitate a summary stats per individual endpoint. * Right-align numeric columns in stat output	2017-12-08 12:24:39 -08:00
Risha Mars	82f50b5536	Fix simulate-proxy script (#14 ) Problem: Simulate proxy would seemingly hang when used. In simulate-proxy we were using rand.Uint32() to generate Count. This is way too big (in telemetry/server.go we call latencyStat.observe() Count times, so this loop was taking fovever). Solution: Use a count of 1 (as the surrounding loop will generate count requests) Validation: Script now works without hanging.	2017-12-08 11:15:05 -08:00
Kevin Lingerfelt	906d4e8b69	Fix public-api error marshaling and unmarshaling (#16 )	2017-12-08 11:03:55 -08:00
Oliver Gould	bff3efea3f	Prepare for v0.1.0 (#1 ) Update versions in code. Use default docker tag of v0.1.0	2017-12-04 19:55:56 -08:00
Oliver Gould	a1fbafaae3	update go-deps image	2017-12-05 01:17:38 +00:00
Oliver Gould	b104bd0676	Introducing Conduit, the ultralight service mesh We’ve built Conduit from the ground up to be the fastest, lightest, simplest, and most secure service mesh in the world. It features an incredibly fast and safe data plane written in Rust, a simple yet powerful control plane written in Go, and a design that’s focused on performance, security, and usability. Most importantly, Conduit incorporates the many lessons we’ve learned from over 18 months of production service mesh experience with Linkerd. This repository contains a few tightly-related components: - `proxy` -- an HTTP/2 proxy written in Rust; - `controller` -- a control plane written in Go with gRPC; - `web` -- a UI written in React, served by Go.	2017-12-05 00:24:55 +00:00

1 2 3 4 5

227 Commits