* Use Tap data on Resource Detail page to display unmeshed resources
that send traffic to the specified resource.
* Don't update neighbors on every websocket recv; this causes too much rendering.
Instead, store in internal variable and update with the api results.
This branch uses the src data from tap to discern which unmeshed resources are
sending traffic to the specified resource. We then show this resource in the
octopus graph.
Note that tap is sampled data, so it's possible for an unmeshed resource to not
show up. Also, because we won't know about the resource until it appears in the
Tap results, results could pop into the chart at any time.
_.throttle setState for receiving websocket tap events to prevent continuous rerendering
Problem
We receive a lot of websocket events from the tap server. Previously, we
were processing each event as we received it, then calling setState after
processing to update the tables. Each call to setState triggered a re-render of
the whole table. We were rerendering multiplie times a second, causing the whole
page to become unresponsive.
Solution
Throttle setState for receiving websocket tap events to prevent
continuous rerendering. Store the tap events in an index outside of state, and
only update the state once every specified interval (currently 500ms).
We can now view entire namespaces with Top and the page won't crash!
To verify: Go to /top and try topping a namespace
When scrollbars are set to always be visible in a browser, we see them appear in the sidebar component of the dashboard.
This PR adds CSS that hides the scrollbar for WebKit browsers, i.e., Chrome and Safari and uses an overflow: hidden technique inspired by this solution to hide the scrollbar in Firefox.
fixes#1611
Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
Adds a new page that shows all namespaces in an accordion. This will replace
ServiceMesh as the default landing page.
The page will request stats for all namespaces, and then pick the first meshed
namespace that's not the linkerd namespace to auto-expand in the accordion.
This branch also updates the definition of "added to the mesh" in the frontend
to be runningPodCount > 0 && meshedPodCount > 0 (previously, it was
runningPodCount === meshedPodCount, which would count resources with no pods as
"added").
I've also moved the link to /namespaces out of the top-level sidebar and into
the Resources sub-menu.
* Use url query params for tap/top form filters
* Add comment explaining react-url-query onChange handlers
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
Previously, WebSocket error messages would appear with the first
couple characters cut off. I've fixed this by using ws.WriteControl
instead of ws.WriteMessage to write errors, as gorilla does in
their example app.
- Use writeControl to write error messages to the client
- Stop the spinner if there is an error present
For a better UI experience, the sidebar should be able to scroll independently from the detail view. This PR allows both the sidebar and the detail view to scroll independently.
fixes#1547
Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
- Use an ant Select instead of Autocomplete for resource list, so that the user
can see all available tappable resources
- Fix bug where the authority autocomplete wasn't showing any options
- Adds "namespace/" as an option in the resource selection dropdown
This required some weird handling because we allow requests of the form
linkerd tap namespace/linkerd (taps namespace linkerd)
but not
linkerd tap namespace --namespace linkerd (does not work as intended,
taps every namespace)
Remove twitter, github and slack links from the sidebar.
The "Update Linkerd" menu item will still show up if there's an update.
The "Update now" button will also still show.
Increase the MaxRps on the tap server to 100 RPS.
The max RPS for tap/top was increased in for the CLI #1531, but we were
still manually setting this to 1 RPS in the Web UI and Web server.
Remove the pervasive setting of MaxRps to 1 in the web frontend and server
In #1540 I moved the version check code and was using our prefixed version of
fetch. This is unnecessary because the version check URL doesn't depend on the
dashboard URL prefix.
TLDR Don't use prefixed fetch for version check
A bunch of small items.
This branch:
- filters out un-meshed resources from the Tap and Top autocompletes
- removes an un-rendered title attribute from the sidebar menu items
- formats latency in Tap with a comma
- prevents the grafana link from showing if there are 0 pods in a deployment
When the mesh completion message calls to action it prints a CLI command to copy&paste. It's visually hard to separate message from the command snippet which is what this commit fixes.
Flipped background and font color to create a better visual distinction
Successfully ran web app test suite
Signed-off-by: Sebastian Tiedtke <sebastiantiedtke@gmail.com>
Previously, we included a version check in the server polling loop, which meant
we were hitting the version check endpoint once very 10 seconds from the
sidebar. This PR moves that check out of the loop so that we only hit it once,
upon pageload.
This PR also includes
- some whitespace fixes
- a fix for a console error we were triggering with our tests
Linkerd CLI's "look and feel" is similar to Kubernetes kubectl CLI. Linkerd's dashboard can be extended to match Kubernetes dashboard UI.
This PR serves as a starting point for this work. The new sidebar shows all resources from all namespaces on initial page load. Resources can be filtered to show only items in a given namespace. The sidebar displays authority, deployment, service and, pod resources. We may need to think about whether it is necessary to show all resources types. Some resources, i.e. authorities, contain a large cardinality of resource details and may not be very useful to a user.
fixes#1449
Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
Includes a substantial refactor of Top.jsx to move the websocket
and top-request-aggregation code into a self-contained module
so that this code can be shared by /top and by each resource
detail page.
(This refactor also helps separate concerns in that
page; since that page also makes 10 second requests to the stat
api to populate the autocompletes in the form).
The TopModule uses the startTap prop to figure out whether it
should start a websocket connection and make a tap request
when mounted. (This is because the resource detail pages
start tapping immediately upon load, whereas /top can only
start once you've entered a query.
I've removed the spinner and the awaitingWebSocketConnection
state field because that now belongs in the top module. I think a
similar refactor of tap would be good before we re-add it.
Do a little more work to get the octopus graph closer to the mocks.
This version gives you a slightly better navigational sense of where
you are in the app, and gives you a clearer
view of the neighbouring stats
Add a basic top graph depicting the current resource's stats
and it's upstreams and downstreams.
Also add upstreams and downstreams tables for this resource
This will be styled more later, but just getting the basic components
and data onto the page.
Add a pod table to the Resource Detail page showing metrics
for pods belonging to a resource.
In the future, I think we'll modify the stat summary endpoint to
take multiple resources as arguments, and have the resource detail page
first query for the pods associated with the resource and then
query for stats for those pods.
See #1467 for discussion.
This PR also modifies the queries to not use the withREST component, in anticipation of the above changes.
* Upgrade to dep 0.5.0, go 1.10.3
* Remove existing dep binary if it's the wrong version
* Add version in filename of dep binary to prevent version conflicts
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
This PR started out as a PR to link to our Resource Detail dashboard in
addition to grafana in the resource list pages, but I decided to refactor
the way we deal with our svgs since I was here.
This branch:
- modifies the GrafanaLink component to consist of the grafana icon
that links to grafana adds links to the ResourceDetail page in all our metrics tables
- adds a jsx component we can use to wrap svgs so that we don't get
annoying 404s on images that we have to handle
- remove the relative paths hack for images
- removes unused svg files in /img
Remove old unused graphs from the web code (scatter plot and line graph)
and their associated css
Files removed:
web/app/css/line-graph.css
web/app/css/list.css
web/app/css/scatterplot.css
web/app/css/version.css
web/app/js/components/LineGraph.jsx
web/app/js/components/ScatterPlot.jsx
Currently conduit stat outputs a column that shows the number of meshed pods in the resource being
queried. The web UI does not have this information about meshed pod state.
This commit adds a meshed column for better UI parity with the stat command.
Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
Add a Top page to the linkerd web UI. This is the web equivalent of #1435.
I've used the same fields as in the current implementation.
This branch also includes some slight refactors to the Tap code to enable code reuse.
The request processing logic is pretty similar to that in Tap.jsx, except that we can
immediately discard the result once we receive the response end and aggregate
that result into the top results. So the index of tap results will tend to be smaller
(unless they're long running requests like streaming). But we also add a similar
index of aggregated Top results, and discard oldest results if top has been
running for a long time.
* Add a Top page to the web UI
* Refactor Tap event parsing into common util code
* Small refactors to the TapQueryForm and the CliCmd display to accomodate Top
* Collate tap events based on the ID (src, dst, stream)
* Also refactor keying of req/rsp/end into requestInit/responseInit/responseEnd for clarity
* Use pod labels when present in top
* Fix bug where src/dst were switched in the Tap display table
This an initial implementation of the `linkerd top` command. This command launches an ncurses style tabular view of current requests (using data from tap). Most of the command line arguments are the same as tap and allow selecting the resource to inspect and filtering which requests to view.
Fixes#1283
Signed-off-by: Alex Leong <alex@buoyant.io>
Tap.jsx is really large and contains a lot of logic that pertains only to the Tap Query Form.
This PR tries to separate the concerns of the form and the query display from the main
Tap querying and rendering logic.
This will also allow us to easily reuse this form/CLI formatting for the Top page.
Changes in this PR:
* moves all the code for the form into its own component (TapQueryForm)
* moves the code that displays the current query into its own component (TapQueryCliCmd)
* formats the current tap query as the equivalent command line format that you
can paste into a terminal
Now that we have source metadata in tap events, we can display
the pod name in the UI instead of the IP. I've also added a popover
that shows deploy and pod info if we have it.
Also adds another table in the expanded row view to show all the
metadata we have. This table probably won't stick around forever,
but I'm just displaying all the data we have right now.
We have a new format for displaying errors in ErrorBanner.
When a websocket error occurred, we'd pass in text where ErrorBanner
expects and object. This PR puts the websocket errors in an object
Also clean up the display of the error by removing redundant text.
Problem:
We depend on the websocketRequestSent bool (renamed to
tapRequestInProgress in this branch) to determine whether the
start/stop button says start or stop. However, we don't change
this value in setState until we open the websocket connection
(which could take some time). This led to a delay in when you
press the Start button and when it changes colour.
Solution:
Set the state before waiting for the websocket to open, so the
button colour changes immediately and the form feels more responsive
* Changing the statusText to be an object with more fields, then displaying them in the ErrorBanner
Signed-off-by: Adam Christian <adam@buoyant.io>
Refactoring karma tests and propTypes and defaultProps per the code review from @rmars
Signed-off-by: Adam Christian <adam@buoyant.io>
Changing the default message to pass the ServiceMeshTest ErrorBanner assertion
Revert "Changing the default message to pass the ServiceMeshTest ErrorBanner assertion"
This reverts commit 2415b7099b03ad7a8deda9f67218bb531111b3ec.
Fixing the failing karma unit tests because the statusMessage wasn't being properly passed into the component rendering stub context
Signed-off-by: Adam Christian <adam@buoyant.io>
merging master in
Signed-off-by: Adam Christian <adam@buoyant.io>
* Export api error type independently from ApiHelpers
Signed-off-by: Adam Christian <adam@buoyant.io>
Problem:
Currently the web UI's resource autocomplete also lists authorities.
However you can't tap authorities in this way, you have to use --authority
in addition to whatever resource you're trying to tap.
The web UI is confusing as it presents authorities in that list.
Those authorities should instead be moved to the Authority box in the advanced filter form.
Solution:
* Don't present authorities as options in the Resource dropdowns
* Add authority autocomplete to authority form input
Follow up to @kl in #1391 there is an error when we try to tap an authority
Add client side filtering to the tap table, so that we can narrow down
queries while still tapping a whole resource.
There are two general kinds of filters here:
- filters where the number of possible values is bounded/small and
we know them (e.g. inbound/outbound, grpc status). here, I've tried to
hardcode the list of possible options with explanations (see the GRPC status filters)
- filters where the number of possible values can be very large (e.g. paths)
here, I've generated the list of options as we process the incoming data.
I also periodically delete the oldest filter option so the list of filters
doesn't grow unbounded
Filters added:
- GRPC status code filters
- http status filters
- path filters
- scheme filters
- tls, destination and source filters
* Make use of the Web UI to render tap events in a table
- Return JSON tap events instead of the command line output
- Experiment with a different way of rendering the EventList
- changed the default width back to 100% of the screen because this
table does not look great squished
* Update ant to 3.7.2
* Add autocomplete of namespaces/resources to Tap in web ui
* Add form fields for authority/path/method/rps/scheme
* Add the ability to clear error messages to the error banner
* Add error listener to ws object
Speed up incremental rebuilds by avoiding relinking the controller
and/or web executables when changes are made to unrelated files.
Before this change, any time the git tag changed, the executables
would have to be rebuilt (relinked at least), even if no Go files
changed.
Validated by running the integration tests.
Signed-off-by: Brian Smith <brian@briansmith.org>
* Allow docker-build-proxy to override the proxy version
* Update based on review feedback
* fetch-proxy should return full path to executable
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
Adds a tap endpoint in the web api that communicates with the dashboard
via websockets.
I've moved a bunch of code from the cli tap.go into utils so that the code
can be shared between web and CLI. I think we should consider making the
display more suited to web, but in the short term, reusing the CLI's
rendering of tap events works.
Adds a Tap page in the Web UI that you can use to make tap requests.
The form currently only allows you to enter a resource and namespace,
other filters coming in a follow-up branch.
- Remove a conduit image from our img folder
- Add a linkerd favicon, should no longer get the favicon not found console error
- Configure webpack to not hash image names
* This commit adds an application topology graph within the namespace tab. As a developer / operator one would like to see an overview of the services running to identify dependencies. Adding this graph gives Linkerd2 users a good overview of service dependencies.
* networkgraphtest added
Fixes: #924
Signed-off-by: Franziska von der Goltz <franziska@vdgoltz.eu>
* Stop using `installsuffix` when building Go code.
See https://plus.google.com/117192131596509381660/posts/eNnNePihYnK.
`-installsuffix cgo` isn't necessary as of Go 1.10 (where build caching
changed substantially) and it probably wasn't necessary earlier.
Signed-off-by: Brian Smith <brian@briansmith.org>
* update grafana dashboards to remove conduit reference and replace with linkerd instances
* update test install fixtures to reflect changes
Fixes: #1315
Signed-off-by: Franziska von der Goltz <franziska@vdgoltz.eu>
This PR adjusts the colour of a popup in the sidebar, as well as removes
references to conduit in the frontend test fixtures.
All that's left in the Web UI code now is a few references to the conduit sites / githubs,
as well as the CLI name.
* Remove a touch of conduit blue from the sidebar popup
* Remove minor references to conduit throughout the web code
* Fully colour the sidebar in new bg colour
This PR begins to migrate Conduit to Linkerd2:
* The proxy has been completely removed from this repo, and is now located at
github.com/linkerd/linkerd2-proxy.
* A `Dockerfile-proxy` has been added to fetch the most-recently published proxy
binary from build.l5d.io.
* Proxy-specific protobuf bindings have been moved to
github.com/linkerd/linkerd2-proxy-api.
* All docker images now use the gcr.io/linkerd-io registry.
* `inject` now uses `LINKERD2_PROXY_` environment variables
* Go paths have been updated to reflect the new (future) repo location.
This PR starts removing all references to the word "Conduit" in the web UI.
In the interest of not making huge changes all at once, I'll gradually start moving away
from the usage of "conduit" in the Web UI. For example, there are a lot of components that
have conduit in their names but they don't need to.
This branch is mostly component / variable names. There should be no visible changes except
the spinner is no longer a Conduit spinner.
See #1262 for visible branding changes.
- Rename ConduitLink to PrefixedLink
- Remove ConduitSpinner in favour of antd.Spin
- Remove css classnames that are conduit- centered
- Parameterize the current Product Name so that it's easier to change in the future
Tracking ticket: linkerd/linkerd#2018
- Add Reason to the error data passed from the api
- Rewrite error logic in the UI to try to make it clearer
- Show 0/0 pods meshed instead of 0/0 pods meshed (N/A) if 0 pods are meshed
Create a ephemeral, in-memory TLS certificate authority and integrate it into the certificate distributor.
Remove the re-creation of deleted ConfigMaps; this will be added back later in #1248.
Signed-off-by: Brian Smith brian@briansmith.org
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
- Return pod uptimes from the GetPods endpoint
- Adds filtering by namespace to api.GetPods
- Adds a --namespace filter to conduit get pods
- Adds pod uptimes to the controller component toolitps on the ServiceMesh page
- Moves the ServiceMesh page back to using /api/pods
Adds the ability to query by a new non-kubernetes resource type, "authorities",
in the StatSummary api.
This includes an extensive refactor of stat_summary.go to deal with non-kubernetes
resource types.
- Add documentation to Resource in the public api so we can use it for authority
- Handle non-k8s resource requests in the StatSummary endpoint
- Rewrite stat summary fetching and parsing to handle non-k8s resources
- keys stat summary metric handling by Resource instead of a generated string
- Adds authority to the CLI
- Adds /authorities to the Web UI
- Adds some more stat integration and unit tests
Add Sidebar links to Pods, Deployments, and Replication Controllers
In #1016 we removed the sidebar links to individual resource pages in favour of a namespace
page that lists all resources. These resource pages require no additional code so they're still
in our UI (accessible under /pods, /deployments etc), just not easily findable. I find them
useful to check when in development mode, or when debugging something, so I'd like to
re-add links.
If we don't want them in permanently, we can gate them behind `NODE_ENV=development`
* Add CA certificate bundle distributor to conduit install
* Update ca-distributor to use shared informers
* Only install CA distributor when --enable-tls flag is set
* Only copy CA bundle into namespaces where inject pods have the same controller
* Update API config to only watch pods and configmaps
* Address review feedback
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
* Add controller admin servers and readiness probes
* Tweak readiness probes to be more sane
* Refactor based on review feedback
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
Don't allow the CLI or Web UI to request named resources if --all-namespaces is used.
This follows kubectl, which also does not allow requesting named resources
over all namespaces.
This PR also updates the Web API's behaviour to be in line with the CLI's.
Both will now default to the default namespace if no namespace is specified.
- If error messages are very long, truncate them and display a toggle to show the full message
- Tweak the headings - remove Pod, Container and Image - instead show them as titles
- Also move over from using Ant's Modal.method to the plain Modal component, which is a
little simpler to hook into our other renders.
* Display proxy container errors in the Web UI
Add an error modal to display pod errors
Add icon to data tables to indicate errors are present
Display errors on the Service Mesh Overview Page and all the resource pages
Add an emitWarning to the webpack config so that webpack will compile despite lint
errors when running in development mode. This is necessary to enable development
on the frontend using webpack-dev-server's automatic reloading.
Also sets a NODE_ENV in travis.yml so that the build will fail if linting fails.
* Update destination service ot use shared informer instead of custom endpoints informer
* Add additional tests for dst svc endpoints watcher
* Remove service ports when all listeners unsubscribed
* Update go deps
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
Common blacklists have `/api/stat` in them. This causes the dashboard to not load.
`/api/tps-reports` is not in any blacklists, suggests what this route does and is slightly tongue in cheek. Fixes#970
* Display font-awesome icons no matter what URL is originally loaded
The URLs in the dashboard need to be relative. Unfortunately, this means that if
you load something that isn't the base route ... font-awesome icons look broken.
There's no real way to solve this from within webpack (or the web server without
some work). Instead, just load font-awesome from a CDN as there's no real
benefit we get from including it in the bundle. Fixes#1019.
* Moving font-awesome to styles
* Web: remove ns column from tables on individual ns page
* Add prop types and tests for MetricsTable component
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
* Add propType validation
When refactoring components, it is hard to know what is required and isn't.
Adds propTypes to the existing components and enables eslint errors for anything
moving forward. This should keep us documenting the API for components.
* Remove extra newline
On the individual namespace pages, the filter should not be shown, as all results that appear on the that page will be for on namespace.
Added a boolean property, showNamespaceFilter, to MetricsTable that allows you to define if the filter should be shown.
Tested that the filter is not shown on namespace pages.
Fixes#972
Signed-off-by: Kim Christensen <kimworking@gmail.com>
As part of the HOC + Context merges, ResourceList missed out on the api injection and errors out on the Namespaces tab.
Wrap the returned HOC in `withContext` to make sure it is there, no matter where it is in the tree. (Fixes#1034)
* Add a HOC for the REST API tooling
We're copying and duplicating logic all over the place with components that need to talk to the API.
Moves most of the REST API tooling into a HOC that can be used by other components. Now, a component can use `withREST`, pass in the promises that it would like resolved and receive the responses as props.
* Show PageHeader whether there's an error or not
* Hiding page header during loading
* Test updates to work with namespace restructuring
In an effort to highlight the namespace overview pages, remove the Deployments,
Replication Controllers and Pods items from the sidebar and replace them with direct
links to individual Namespace pages. If the user has more than 8 namespaces, only
list the first 8 (the rest can be accessed by the namespace list page).
The Deployments/RCs/Pods endpoints are still available if you go directly to
/deployments, /pods, etc. but they're not highlighted to the user.
Previously, we would filter out stats coming from Conduit itself and from the kube-*
namespaces on some views in the Web UI. Remove this filtering, so that we display
all the resource information we get back from the Stat API. (Fixes#997)
On the Resource pages, the call to action would show up when there were no
metrics present, but that's actually not actionable by the user. Instead, I'm
going to show a blank table with a "no s detected" message.
* Remove special-case filtering out of kube-* namespaces, and conduit namespaces
* Remove the call to action for no metrics
* Linkify the namespace column for the resource pages
* Add an app-wide context for global props.
We've been passing the `api` object down from the top of the react tree. With
16.x, there's now the ability to have context that can inject anywhere in the
tree. This creates a top level context provider that contains most of the global
variables we've been using (api, appData, ...). It subsequently cleans up some
of the routes and nested components.
- Bumps `react-dom` to 16.3.2 (to match `react`).
- Adds `enzyme-context-patch` for now. This is fixed in enzyme master, but there
has not been a release yet. Needs to be removed when that is fixed.
* Use a default inside appData for controllerNamespace
* Update syntax of if to use curly brackets
- Update the `response_total` prometheus query of the StatSummary endpoint to also
break queries out by a `meshed` label.
- Add a 'Secured' column to the web UI/CLI stat displays, which indicate the percentage of traffic
starting and ending in the mesh
This meshed label is used in the CLI/Web UI to display a column of the percentage of traffic that
starts/ends in the mesh. (Which is a proxy indicator for whether that traffic is 'secured' when we
add TLS by default for intra mesh requests).
The `meshed` label is not yet added anywhere, so until it is supplied by the proxy, all traffic will
show up as 0% secured in the web/CLI.
- Switched from `es2015` to `env` for the default preset. This is the recommended preset and allows us to track the latest and greatest moving forward.
- Added `react-app` as a preset. We get class properties (and thus => for context) as well as the current recommended settings for react apps.
- Created a `web` script that provides functions for common tasks. `react-app` requires that BABEL_ENV/NODE_ENV is set and this guarantees it.
- Updated the web dockerfile to set NODE_ENV correctly and use `bin/web`.
- Moved the babel related modules over to devDependencies.
Debugging issues in the dashboard is a little frustrating without source maps and the full source map takes awhile to build.
Just enables one of the cheaper source maps by default. It is good enough (tm) for what is there now.
This PR modifies the Namespace page in the web UI to replace the 3 existing api calls
with a single call.
* Consolidate calls to /metrics to use the new resource type all
* Simplify urlsForResource, add comment with assumptions
Problem
If you navigate directly to (or do a hard refresh on) a path with more than one segment,
e.g. http://localhost:8084/namespaces/conduit, the dashboard js is not served.
Pages with two paths have to be accessed by loading the dashboard on a different
path and then clicking through.
When accessing the dashboard via conduit dashboard we append a path prefix so that
we can connect using the k8s proxy. This means that moving the dashboard to serve
images off relative paths won't work, because we need to serve images whether the
dashboard is loaded from http://localhost:8084/namespaces/conduit or
from http://localhost:8084/namespaces.
Solution
Check whether we're serving the dashboard with the proxy url, and if we are, adjust
the url at which we serve the index bundle from.
I've also added a very manual override if the conduit logo can't be found at the usual url.
This enables the removal of the inline-block display for links and
fixes menu items not showing up when sidebar is expanded on firefox
Problem
Previously we were linking the icon and expand text of the menu bar separately.
This caused the clickable areas of the menus to be inconsistent, which we were
fixing via css. This wasn't consistently displayed across browsers.
Fix
Linkify the whole Menu Item rather than linking the icon and text separately.
This enables the removal of the inline-block display for links and
fixes menu items not showing up when sidebar is expanded on firefox.
Additionally it makes the clicking of menu links way more consistent.
The frontend assets was not optimized, resulting in suboptimal page load times.
Enabled webpack production mode in the Dockerfile, this still allows good development
and debugging experience when running the web interface locally during development.
Also added minification of the CSS handled by css-loader.
The web interface still works as expected.
The size of the JS file has been reduced from 3.6 MB to 1.2 MB.
And the CSS minification has resulted in sidebar.css from 5.71 kB to 4.33 kb,
and styles.css from 4.18 kB to 3.1 kB.
Fixes#378
Signed-off-by: Kim Christensen <kimworking@gmail.com>
* Fix issue where we were waiting for the next polling interval when switching tabs
Fix issue where we were waiting for the next polling interval when switching tabs.
When we switch tabs, we update the Props of the ResourceList component, but we weren't
resetting how we poll the server. This meant we'd wait until the end of the current polling interval
(2s) to get the data for the tab we just switched to.
I've added stopServerPolling and startServerPolling methods so that we can cancel the resource
requests of the page we're leaving and immediately start polling for new data if the resource type
changes.
The way that git-related version information is linked into go binaries
busts Docker's cache such that every commit causes all binaries to
rebuilt.
In order to ameliorate this, we can build each binary once without
version information first so that its artifacts are cached. When Go
sources are not changed and only the version information changes, builds
are 4.3x faster than before (from 5+ minutes to <90s).
On `master`
Branch off of master and build (mostly cached):
```
:; time DOCKER_TRACE=1 bin/docker-build
...
DOCKER_TRACE=1 bin/docker-build 9.10s user 6.30s system 5% cpu 4:26.47 total
```
Rebuild without changing anything (highly cached):
```
:; time DOCKER_TRACE=1 bin/docker-build
...
DOCKER_TRACE=1 bin/docker-build 9.23s user 6.04s system 47% cpu 32.017 total
```
Update only the git sha and rebuild:
```
:; git ci -am 'bump it' --allow-empty
[ver/eg 2749eb3] bump it
:; time DOCKER_TRACE=1 bin/docker-build
...
DOCKER_TRACE=1 bin/docker-build 8.55s user 6.08s system 4% cpu 5:22.25 total
```
On this branch:
Rebuild without changing anything (highly cached):
```
:; time DOCKER_TRACE=1 bin/docker-build
...
DOCKER_TRACE=1 bin/docker-build 8.94s user 5.97s system 46% cpu 32.257 total
```
Update only the git sha and rebuild:
```
:; git ci -am 'bump it' --allow-empty
[ver/go-docker-cache-versionless 77a80b5] bump it
:; time DOCKER_TRACE=1 bin/docker-build
...
DOCKER_TRACE=1 bin/docker-build-cli-bin 2.02s user 1.34s system 9% cpu 34.144 total
```
* Turn the status bars red if there exist failed pods in the namespace
* Also use failed pods in conduit component table
Now that the API returns the number of failed pods, use this info to indicate failed pods in
the ServiceMesh page.
The bars will turn red if there are any failed pods present in the namespace.
They'll be green if they have non-zero pods meshed, and grey otherwise.
Add namespaces as a top level resource in the Web UI
This PR does the following:
- Replace the deployments table in the service mesh page with namespaces
- Add a Namespaces index page that lists all namespaces and their stats
- Add an individual namespace page showing all resources for that namespace
- Make the incomplete mesh message more generic to any resource type
- Revamp rest of service mesh page to move off ListPods
Make the sidebar icon based and collapsed by default
I had to move the call to version check into the sidebar component, indicator
when the sidebar was minimized if there was a conduit update.
Currently I just have letters representing the icons for Deployments, RCs and Pods,
but we can change this in the future.
* Modify the Stat endpoint to also return the count of failed pods
* Add comments explaining pod count stats
* Rename total pod count to running pod count
This is to support the service mesh overview page, as I'd like to include an indicator of
failed pods there.
Enables filtering by one or more namespaces. Table updates are prevented
when the filter menu is open, as table updates will rerender the menu,
unselecting anything the user has selected but not confirmed.
* Add a namespace column to the metrics tables, support long resource names
* Add a test for GrafanaLink
* Change the PodList.jsx component to not use the ListPods api
We removed individual Deployment pages a while ago, but left the autocomplete search bar in. Clicking on searches goes to a 404 because we don't have /deployment any more.
This will be revisited in the future with direct links to grafana dashboards to all the
resources we support.
* Add a Replication Controllers page in the Web UI
@siggy pointed out that we don't need to use the PodsList api any more, since the new stats endpoint (#671) includes meshedPodCount and totalPodCount, which is all we need to determine whether the deployment/rc has been added to the mesh (which is what we were using ListPods to determine).
This PR modifies deployments to not use the pods api any more, and adds a Replication Controllers page. This page is quite similar to the Deployments page in logic, so I've made a PodOwnersList component to share the code.
I haven't added Replication Controllers to the Service Mesh page yet, because that page does require a list of component pods. Also, we don't need the calls to Prometheus for the Service Mesh page, so I don't want to use the existing stat apis for it. I figure that is a large enough change for a separate PR.
After this was implemented we found that ExternalName services are
represented in DNS as CNAMEs, which means that the proxy's DNS
fallback logic can be used instead of doing DNS in the control
plane. Besides simplifying the controller, this will also increase
fidelity with the proxied pods' DNS configuration (improve
transparency).
Signed-off-by: Brian Smith <brian@briansmith.org>
The `conduit tap` command is now deprecated.
Replace `conduit tap` with `connduit tapByResource`. Rename tapByResource
to tap. The underlying protobuf for tap remains, the tap gRPC endpoint now
returns Unimplemented.
Fixes#804
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
public-api and and tap were both using their own implementations of
the Kubernetes Informer/Lister APIs.
This change factors out all Informer/Lister usage into the Lister
module. This also introduces a new `Lister.GetObjects` method.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The TapByResource endpoint was previously a stub.
Implement end-to-end tapByResource functionality, with support for
specifying any kubernetes resource(s) as target and destination.
Fixes#803, #49
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The Destination service does not provide ReplicaSet information to the
proxy.
The `pod-template-hash` label approximates selecting over all pods in a
ReplicaSet or ReplicationController. Modify the Destination service to
provide this label to the proxy.
Relates to #508 and #741
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
* Expose pod stats in CLI, web UI, and Grafana
* Fix js api helpers test
* Add outbound traffic stats to pod dashboard
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
The public-api previously only permitted 4 hard-coded time windows:
10s, 1m, 10m, 1h. This was primarily a relic of the recently removed
telemetry system.
Modify the public-api to validate the time string, but allow for any
window size, which is then passed through to Prometheus.
Fixes#686
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
* Add namespace as a resource type in public-api
The cli and public-api only supported deployments as a resource type.
This change adds support for namespace as a resource type in the cli and
public-api. This also change includes:
- cli statsummary now prints `-`'s when objects are not in the mesh
- cli statsummary prints `No resources found.` when applicable
- removed `out-` from cli statsummary flags, and analagous proto changes
- switched public-api to use native prometheus label types
- misc error handling and logging fixes
Part of #627
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
* Refactor filter and groupby label formulation
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
* Rename stat_summary.go to stat.go in cli
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
* Update rbac privileges for namespace stats
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
Conduit was relying on apps/v1 to Deployment and ReplicaSet APIs.
apps/v1 is not available on Kubernetes 1.8. This prevented the
public-api from starting.
Switch Conduit to use apps/v1beta2. Also increase the Kubernetes API
cache sync timeout from 10 to 60 seconds, as it was taking 11 seconds on
a test cluster.
Fixes#761
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
* Remove the telemetry service
The telemetry service is no longer needed, now that prometheus scrapes
metrics directly from proxies, and the public-api talks directly to
prometheus. In this branch I'm removing the service itself as well as
all of the telemetry protobuf, and updating the conduit install command
to no longer install the service. I'm also removing the old version of
the stat command, which required the telemetry service, and renaming the
statsummary command to stat.
* Fix time window tests
* Remove deprecated controller scrape config
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
The new StatSummary endpoint was only providing request volume and
successs rate information.
Add support for retrieving latency stats via StatSummary. Also make
all prometheus calls in parallel, and implement kubernetes test
fixtures.
Fixes#681
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
* Switch public API to use cached k8s resources
* Move shared informer code to separate goroutine
* Fix spelling issue
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
The Grafana dashboards key off of deployment, but had no awareness of
namespaces, causing incorrect metrics aggregation and display.
This change makes the Grafana dashboards key off of namespaces, and also
modifies the Grafana links in the Conduit dashboard to link to
namespace+deployment.
Fixes#704
Part of #420
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The new statsummary command accepted friendly k8s names, which worked
for k8s queries, but Prometheus requires a specific key.
Modify the statsummary query to map friendly k8s names to canonical k8s
names when constructing the query. Then during the query, map the
canonical k8s name to a specific Prometheus label.
Fixes#695
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
* Link to Grafana from Conduit Dashboard
Previously the only way to access the Grafana dashboards was via direct
link, provided by the `conduit dashboard` command.
Add Grafana links throughout the Conduit Dashboard, next to all
Deployment objects. This change also modifies the behavior of the
ConduitLink helper, to enable linking to other deployments proxied by
the `conduit dashboard` command.
Part of #420
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
* review feedback
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
* review feedback, fix console, remove absolute
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Start implementing new conduit stat summary endpoint.
Changes the public-api to call prometheus directly instead of the
telemetry service. Wired through to `api/stat` on the web server,
as well as `conduit statsummary` on the CLI. Works for deployments only.
Current implementation just retrieves requests and mesh/total pod count
(so latency stats are always 0).
Uses API defined in #663
Example queries the stat endpoint will eventually satisfy in #627
This branch includes commits from @klingerf
* run ./bin/dep ensure
* run ./bin/update-go-deps-shas
* fix pod status and count display in control plane dashboard section:
- the control plane would show terminated and stale deployments in the UI, this is confusing and might indicate errors
- this filters out temrinated and failed component deploys from the UI
- it is to note that pending deploys will still be counted and represented with a greyed out status dot
- Fixes: #606
Signed-off-by: Franziska von der Goltz <franziska@vdgoltz.eu>
Signed-off-by: Franziska von der Goltz <franziska@vdgoltz.eu>
remove toggle sorting functionality from TableComponent:
- tables displaying metrics allowed to toggle between being sorted and unsorted when clicking the same button. This was confusing behavior for the user.
- this PR removes the toggle functionality and introduces a BaseTable Component that extends antd's component without the capability to toggle
- Fixes: #566
Signed-off-by: Franziska von der Goltz <franziska@vdgoltz.eu>
* Add tests/utils/scripts for running integration tests
Add a suite of integration tests in the `test/` directory, as well as
utilities for testing in the `testutil/` directory.
You can use the `bin/test-run` script to run the full suite of tests,
and the `bin/test-cleanup` script to cleanup after the tests.
The test/README.md file has more information about running tests.
@pcalcado, @franziskagoltz, and @rmars also contributed to this change.
* Create TEST.md file at the root of the repo
* Update based on review feedback
* Relax external service IP timeout for GKE
* Update TEST.md with more info about different types of test runs
* More updates to TEST.md based on review feedback
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
Shortly after conduit is installed in k8s environment. The control plane component that establishes a watch endpoint with k8s run in to networking issues during proxy initialization. During failure, each watcher fails to retry its connection to k8s watch endpoint which leads to timeouts and eventually, multiple controller pod restarts.
This PR adds retry logic to each "watch" enabled package.
fixes#478
Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
* Use Go 1.10.0 to build Go components.
Take advantage of the new build cache in Go 1.10. Future work on improving
build performance will utilize the build cache further.
Signed-off-by: Brian Smith <brian@briansmith.org>
Previously Dockerfile-go-deps was converted from a multi-stage Dockefile
to a single-stage Dockerfile in anticipation of enabling efficient use
of `--cache-from` in CI. However, that resulted in the image ballooning
in size because it contained the Git repo for every package downloaded
by `dep ensure`.
Bring the image back down to the proper size by removing the temporary
files created.
Signed-off-by: Brian Smith <brian@briansmith.org>