linkerd2

Commit Graph

Author	SHA1	Message	Date
Alejandro Pedraza	37bc8a69db	Added support for json output in `linkerd stat` (#1749 ) Added support for json output in `linkerd stat` through a new (-o\|--output)=json option. Fixes #1417 Signed-off-by: Alejandro Pedraza <alejandro.pedraza@gmail.com>	2018-10-15 14:10:48 -07:00
Risha Mars	31a396b631	Fix incorrect test wording (#1767 )	2018-10-15 12:07:06 -07:00
Alex Leong	1fe19bf3ce	Add ServiceProfile support to k8s utilities (#1758 ) Updates to the Kubernetes utility code in `/controller/k8s` to support interacting with ServiceProfiles. This makes use of the code generated client added in #1752 Signed-off-by: Alex Leong <alex@buoyant.io>	2018-10-12 09:35:11 -07:00
Alena Varkockova	5a853e8990	Use ListPods always for data plane HC (#1701 ) * Use ListPods always for data plane HC * Missing changes in grpc_server.go * Address review comments * Read proxy version from spec Signed-off-by: Alena Varkockova <varkockova.a@gmail.com>	2018-10-02 11:45:01 -07:00
Kevin Lingerfelt	b5ff29c8aa	Add data plane check to validate proxy version (#1574 ) Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-09-04 15:22:38 -07:00
Kevin Lingerfelt	e97be1f5da	Move all healthcheck-related code to pkg/healthcheck (#1492 ) * Move all healthcheck-related code to pkg/healthcheck * Fix failed check formatting * Better version check wording Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-08-20 16:50:22 -07:00
Kevin Lingerfelt	00a0572098	Better CLI error messages when control plane is unavailable (#1428 ) Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-08-09 15:40:41 -07:00
Kevin Lingerfelt	bd19e8aaff	Update prometheus to only scrape proxies in the same mesh (#1402 ) Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-08-06 12:05:55 -07:00
Alex Leong	3e1f35913b	Read all bytes of message length header (#1394 ) The `reader.Read` method only reads as many bytes as are currently available from reader. When reading the 4 byte message length header, if not all 4 of those bytes are available, `Read` will only read the available bytes and return. This causes alignment issues when the message body is read and there are still unread header bytes in the reader. These bytes will appear at the beginning of the message body and cause a crash when the message is unmarshalled. Use `io.ReadFull` to ensure that we read all 4 of the message length header bytes. Fixes #1287 Signed-off-by: Alex Leong <alex@buoyant.io>	2018-08-02 10:45:49 -07:00
Risha Mars	fef896011f	Add more filters to the web UI tap form (#1371 ) * Update ant to 3.7.2 * Add autocomplete of namespaces/resources to Tap in web ui * Add form fields for authority/path/method/rps/scheme * Add the ability to clear error messages to the error banner * Add error listener to ws object	2018-07-31 15:48:53 -07:00
Kevin Lingerfelt	4b9700933a	Update prometheus labels to match k8s resource names (#1355 ) Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-07-23 15:45:05 -07:00
Kevin Lingerfelt	e5cce1abaf	Rename CLI from conduit to linkerd (#1312 ) * Rename CLI binary * Update integration tests for new binary name * Rename --conduit-namespace flag, change default ns * Rename occurrences of conduit in rest of CLI * Rename inject and install components * Remove conduit occurrences in docker files * Additional miscellaneous cleanup * Move protobuf definitions to linkerd2 package * Rename conduit.io labels to use linkerd.io * Rename conduit-managed segment to linkerd-managed * Fix conduit references in web project Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-07-12 17:14:07 -07:00
Oliver Gould	941cad4a9c	Migrate build infrastructure to linkerd2 (#1298 ) This PR begins to migrate Conduit to Linkerd2: * The proxy has been completely removed from this repo, and is now located at github.com/linkerd/linkerd2-proxy. * A `Dockerfile-proxy` has been added to fetch the most-recently published proxy binary from build.l5d.io. * Proxy-specific protobuf bindings have been moved to github.com/linkerd/linkerd2-proxy-api. * All docker images now use the gcr.io/linkerd-io registry. * `inject` now uses `LINKERD2_PROXY_` environment variables * Go paths have been updated to reflect the new (future) repo location.	2018-07-09 15:38:38 -07:00
Risha Mars	9050b2d312	Fix authority stat queries when a --from flag is used (#1289 ) * Fix bug where we were using dst_authorities as a group by instead of authorities * Add test to make sure we don't dst_authorities Previously, we were only checking to make sure we didn't add dst_authorities in the query labels in promDstQueryLabels but we weren't checking the groupBy labels in promDstGroupByLabelNames - this caused us to try to query for dst_authorities when a --from query was sent. There are no dst_authorities, so there would be no named results.	2018-07-06 17:29:08 -07:00
Kevin Lingerfelt	693acdbf26	Update ListPods endpoint to return all pod owner types (#1275 ) Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-07-05 15:14:16 -07:00
Risha Mars	ba2e13c731	Small tweaks to error modal, add Reason to api error response (#1246 ) - Add Reason to the error data passed from the api - Rewrite error logic in the UI to try to make it clearer - Show 0/0 pods meshed instead of 0/0 pods meshed (N/A) if 0 pods are meshed	2018-07-03 17:14:27 -07:00
Risha Mars	2002a8ba50	Add more tests for the stat summary endpoint --from flags (#1237 ) Also add dst_ labels in the metrics we mock, so we can do --from queries with results.	2018-07-03 14:30:15 -07:00
Risha Mars	8ebc969d2f	Fix bug where we wouldn't run stat table assertions if we expected 0 results (#1235 ) I realized that our stat summary expectation checker would only check the actual proto responses against the expectations if the expectations were non-empty. Problem If we expected empty results and the api returned actual results, we never actually check those results against the expectations. The bug can be reproduced by replacing any nonzero metric we expect in expectedResponse with expectedResponse: genEmptyResponse() The tests on master will still pass. Solution Remove this line and ensure we get the expected number of stat tables.	2018-06-29 14:23:14 -07:00
Risha Mars	5ed7fc563c	Add controller component pod uptimes to the ServiceMesh page (#1205 ) - Return pod uptimes from the GetPods endpoint - Adds filtering by namespace to api.GetPods - Adds a --namespace filter to conduit get pods - Adds pod uptimes to the controller component toolitps on the ServiceMesh page - Moves the ServiceMesh page back to using /api/pods	2018-06-28 15:42:00 -07:00
Risha Mars	5963b2ac24	Better format empty errors (#1202 )	2018-06-28 14:52:04 -07:00
Risha Mars	68586fe697	Add the ability to query stats by authority (#1181 ) Adds the ability to query by a new non-kubernetes resource type, "authorities", in the StatSummary api. This includes an extensive refactor of stat_summary.go to deal with non-kubernetes resource types. - Add documentation to Resource in the public api so we can use it for authority - Handle non-k8s resource requests in the StatSummary endpoint - Rewrite stat summary fetching and parsing to handle non-k8s resources - keys stat summary metric handling by Resource instead of a generated string - Adds authority to the CLI - Adds /authorities to the Web UI - Adds some more stat integration and unit tests	2018-06-28 14:31:44 -07:00
Kevin Lingerfelt	682b0274b5	Add controller admin servers and readiness probes (#1168 ) * Add controller admin servers and readiness probes * Tweak readiness probes to be more sane * Refactor based on review feedback Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-06-20 17:32:44 -07:00
Risha Mars	46c99febf2	Don't panic on stats that aren't included in StatAllResourceTypes (#1154 ) Problem `conduit stat` would cause a panic for any resource that wasn't in the list of StatAllResourceTypes This bug was introduced by https://github.com/runconduit/conduit/pull/1088/files Solution Fix writeStatsToBuffer to not depend on what resources are in StatAllResourceTypes Also adds a unit test and integration test for `conduit stat ns`	2018-06-19 17:00:16 -07:00
Risha Mars	e2c2f19d2c	Propagate errors in conduit containers to the api (#1117 ) - It would be nice to display container errors in the UI. This PR gets the pod's container statuses and returns them in the public api - Also add a terminationMessagePolicy to conduit's inject so that we can capture the proxy's error messages if it terminates	2018-06-14 16:22:31 -07:00
Kevin Lingerfelt	13aaa82c95	Allow k8s API clients to watch a subset of resources (#1118 ) * Allow k8s API clients to watch a subset of resources * Sort resources Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-06-14 11:09:01 -07:00
Kevin Lingerfelt	9f1df963e9	Move controller/util and web/util packages to pkg (#1109 ) Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-06-13 11:25:56 -07:00
Kevin Lingerfelt	6e66f6d662	Rename Lister to API and expose informers as well as listers (#1072 ) Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-06-12 10:27:55 -07:00
Risha Mars	7d4c4aa290	CLI: print resources in the same order every time stat all is run (#1088 ) Previously, in conduit stat all we would just print the map of stat results, which resulted in the order in which stats were displayed varying between prints. Fix: Define an array, k8s.StatAllResourceTypes and use the order in this array to print the map; ensuring a consistent print order every time the command is run.	2018-06-08 15:02:17 -07:00
Ivan Sim	11d1d55632	Filter out failed and completed pods from stats summary result (#1010 ) (#1065 ) Both the conduit stat command and web UI are showing failed and completed pods. This change filters out those pods before returning the result to the client. Fixes #1010 Signed-off-by: Ivan Sim <ihcsim@gmail.com>	2018-06-05 13:19:48 -07:00
Kevin Lingerfelt	ec2433e9bd	Update controller to use 'tls' metric label (#1044 ) * Update controller to use 'tls' metric label * Fix meshed column formatter Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-06-01 16:44:33 -07:00
Risha Mars	ffabdefc6c	Add queries to prometheus to determine number of fully meshed requests (#983 ) - Update the `response_total` prometheus query of the StatSummary endpoint to also break queries out by a `meshed` label. - Add a 'Secured' column to the web UI/CLI stat displays, which indicate the percentage of traffic starting and ending in the mesh This meshed label is used in the CLI/Web UI to display a column of the percentage of traffic that starts/ends in the mesh. (Which is a proxy indicator for whether that traffic is 'secured' when we add TLS by default for intra mesh requests). The `meshed` label is not yet added anywhere, so until it is supplied by the proxy, all traffic will show up as 0% secured in the web/CLI.	2018-05-24 11:05:09 -07:00
Andrew Seigner	84e6eb5c87	Fix nil pointer dereference in StatSummary (#991 ) The StatSummary endpoint was dereferencing StatSummaryRequest.Selector.Resource, causing a panic when it received an empty request. Fix StatSummary to use the nil-friendly StatSummaryRequest.GetSelector().GetResource() methods, and add a test to validate. Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-05-23 13:21:49 -07:00
Risha Mars	1e6434f6de	Fix bug in the public-api where conduit stat params were ignored (#971 ) * Fix bug where we were dropping parts of the StatSummaryRequest * Add tests for prometheus query strings and for failed cases Problem In #928 I rewrote the stat api to handle 'all' as a resource type. To query for all resource types, we would copy the Resource, LabelSelector and TimeWindow of the original request, and then go through all the resource types and set Resource.Type for each resource we wanted to get. The bug is that while we copy over some fields of the original request, we didn't copy over all of them - namely Resource.Name and the Outbound resource. So the Stat endpoint would ignore any --to or --from flags, and would ignore requests for a specific named resource. Solution Copy over all fields from the request. I've also added tests for this case. In this process I've refactored the stat_summary_test code to make it a bit easier to read/use.	2018-05-18 16:06:06 -07:00
Risha Mars	b8dc83f9d2	Modify the Stat API to handle requests for resource type "all" (#928 ) Allow the Stat endpoint in the public-api to accept requests for resourceType "all". Currently, this queries Pods, Deployments, RCs and Services, but can be modified to query other resources as well. Both the CLI and web endpoints now work if you set resourceType to all. e.g. `conduit stat all`	2018-05-11 14:35:37 -07:00
Kevin Lingerfelt	4e8e1eb84d	CLI: Fix validation for service stats (#935 ) * CLI: Fix validation for service stats * Address review feedback Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-05-11 10:28:49 -07:00
Risha Mars	f94856e489	Modify the Stat endpoint to also return the number of failed conduit pods (#895 ) * Modify the Stat endpoint to also return the count of failed pods * Add comments explaining pod count stats * Rename total pod count to running pod count This is to support the service mesh overview page, as I'd like to include an indicator of failed pods there.	2018-05-08 10:35:21 -07:00
Andrew Seigner	dce31b888f	Deprecate Tap, rename TapByResource to Tap (#844 ) The `conduit tap` command is now deprecated. Replace `conduit tap` with `connduit tapByResource`. Rename tapByResource to tap. The underlying protobuf for tap remains, the tap gRPC endpoint now returns Unimplemented. Fixes #804 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-25 12:24:46 -07:00
Andrew Seigner	a0a9a42e23	Implement Public API and Tap on top of Lister (#835 ) public-api and and tap were both using their own implementations of the Kubernetes Informer/Lister APIs. This change factors out all Informer/Lister usage into the Lister module. This also introduces a new `Lister.GetObjects` method. Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-24 18:10:48 -07:00
Andrew Seigner	baf4ea1a5a	Implement TapByResource in Tap Service (#827 ) The TapByResource endpoint was previously a stub. Implement end-to-end tapByResource functionality, with support for specifying any kubernetes resource(s) as target and destination. Fixes #803, #49 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-23 16:13:26 -07:00
Andrew Seigner	79bdc638b3	Service support in stat command (#809 ) The `stat` command did not support `service` as a resource type. This change adds `service` support to the `stat` command. Specifically: - as a destination resource on `--to` commands - as a target resource on `--from` commands Fixes #805 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-19 16:51:20 -07:00
Andrew Seigner	293e00bc3e	Introduce tapByResource cli command (#802 ) The existing `tap` command is being deprecated. Introduce a `tapByResource` cli command. It supports tapping a Kubernetes resource or collection of resources, optionally filtered by outbound resources. This command will eventually replace `tap`. Part of #778 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-19 14:44:23 -07:00
Kevin Lingerfelt	653dc6bfaa	Add replication controller stats in CLI (#794 ) * Add replication controller stats in CLI * Fix pod status in stat summary tests Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-04-18 18:12:14 -07:00
Oliver Gould	06dd8d90ee	Introduce the TapByResource API (#778 ) This changes the public api to have a new rpc type, `TapByResource`. This api supersedes the Tap api. `TapByResource` is richer, more closely reflecting the proxy's capabilities. The proxy's Tap api is extended to select over destination labels, corresponding with those returned by the Destination api. Now both `Tap` and `TapByResource`'s responses may include destination labels. This change avoids breaking backwards compatibility by: * introducing the new `TapByResource` rpc type, opting not to change Tap * extending the proxy's Match type with a new, optional, `destination_label` field. * `TapEvent` is extended with a new, optional, `destination_meta`.	2018-04-18 15:37:07 -07:00
Kevin Lingerfelt	71a51afb40	Expose pod stats in CLI, web UI, and Grafana (#788 ) * Expose pod stats in CLI, web UI, and Grafana * Fix js api helpers test * Add outbound traffic stats to pod dashboard Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-04-18 11:26:47 -07:00
Andrew Seigner	727521f914	Permit arbitrary time windows in public-api (#774 ) The public-api previously only permitted 4 hard-coded time windows: 10s, 1m, 10m, 1h. This was primarily a relic of the recently removed telemetry system. Modify the public-api to validate the time string, but allow for any window size, which is then passed through to Prometheus. Fixes #686 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-16 17:37:17 -07:00
Kevin Lingerfelt	11a4359e9a	Misc cleanup following the telemetry rewrite (#771 ) Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-04-16 15:51:07 -07:00
Andrew Seigner	77fb6d3709	Add namespace as a resource type in public-api (#760 ) * Add namespace as a resource type in public-api The cli and public-api only supported deployments as a resource type. This change adds support for namespace as a resource type in the cli and public-api. This also change includes: - cli statsummary now prints `-`'s when objects are not in the mesh - cli statsummary prints `No resources found.` when applicable - removed `out-` from cli statsummary flags, and analagous proto changes - switched public-api to use native prometheus label types - misc error handling and logging fixes Part of #627 Signed-off-by: Andrew Seigner <siggy@buoyant.io> * Refactor filter and groupby label formulation Signed-off-by: Kevin Lingerfelt <kl@buoyant.io> * Rename stat_summary.go to stat.go in cli Signed-off-by: Kevin Lingerfelt <kl@buoyant.io> * Update rbac privileges for namespace stats Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-04-13 16:53:01 -07:00
Andrew Seigner	21886760c6	Use apps/v1beta2 for Kubernetes 1.8 compatibility (#762 ) Conduit was relying on apps/v1 to Deployment and ReplicaSet APIs. apps/v1 is not available on Kubernetes 1.8. This prevented the public-api from starting. Switch Conduit to use apps/v1beta2. Also increase the Kubernetes API cache sync timeout from 10 to 60 seconds, as it was taking 11 seconds on a test cluster. Fixes #761 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-13 12:08:16 -07:00
Kevin Lingerfelt	fb15fe7c1a	Remove the telemetry service (#757 ) * Remove the telemetry service The telemetry service is no longer needed, now that prometheus scrapes metrics directly from proxies, and the public-api talks directly to prometheus. In this branch I'm removing the service itself as well as all of the telemetry protobuf, and updating the conduit install command to no longer install the service. I'm also removing the old version of the stat command, which required the telemetry service, and renaming the statsummary command to stat. * Fix time window tests * Remove deprecated controller scrape config Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-04-13 11:21:29 -07:00
Andrew Seigner	e9b209829d	Handle NaN metrics (#750 ) The Prometheus client sometimes returns NaN if a calculation is invalid, such as histogram_quantile when no requests have occurred. Add IsNaN check in the public-api and set output to zero. Fixes #747 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-12 15:21:00 -07:00

1 2

86 Commits