Commit Graph

79 Commits

Author SHA1 Message Date
Matt Moore 1a3a36a996 Rework how we GlobalResync. (#597)
* Rework how we GlobalResync.

This consolidates the logic for how we do GlobalResyncs in a few ways:
1. Remove SendGlobalUpdates and replace it with FilteredGlobalResync.
2. Have GlobalResync go through the FilteredGlobalResync with a tautological predicate.
3. Change the way we enqueue things during a global resync to avoid flooding the system
   by using EnqueueAfter and wait.Jitter to stagger how things are queued.

The background for this is that when I started to benchmark reconciliation of a large
number of resources we'd see random large stalls like [this](https://mako.dev/run?run_key=4882560844824576&~dl=1&~cl=1&~rl=1&~rvl=1&~il=1&~sksl=1&~pal=1)
fairly consistently.  Looking at the logs, these stalls coincide with incredibly deep
work queues due to global resyncs triggers by configmap notifications. When I disabled
global resyncs the spikes [went away](https://mako.dev/run?run_key=4975897060835328&~dl=1&~cl=1&~rl=1&~rvl=1&~il=1&~sksl=1&~pal=1).

To mitigate the huge pile-up due to the global resync we use `wait.Jitter` to stagger how
things are enqueued with a `maxFactor` of the length of the store.  This also seems to
[keep things flowing](https://mako.dev/run?run_key=5701802099998720&~dl=1&~cl=1&~rl=1&~rvl=1&~il=1&~sksl=1&~pal=1),
although we will possibly need to tune things further.

* Update comment to mention the delay.
2019-08-24 21:23:13 -07:00
Victor Agababov 74c5d67ea0 Fix the issues pointed out by staticcheck (#541)
* Fix the issues pointed out by staticcheck

* review fix
2019-07-23 13:13:36 -07:00
Matt Moore 222dd25986 Migrate pkg to use the knative.dev/pkg import path (#489)
* Manual changes.

* scripted changes.
2019-06-26 13:02:06 -07:00
Matt Moore c6f03fa600 Push the event recorder stuff into knative/pkg (#443) 2019-06-06 07:35:40 -07:00
Matt Moore 985bff446d Simplify the default controller.Impl constructor. (#427)
A while back we added a "StatsReporter" argument to `controller.NewImpl`,
but in serving every callsite of this is passing:
```
controller.NewImpl(r, logger, "my-string", MustNewStatsReporter("my-string", logger))
```

Where `MustNewStatsReporter` is just a form of knative/pkg's `controller.NewStatsReporter`
that logs fatally when an error is returned.  It is notable that Serving's logic has been
duplicated to both Build and Eventing.

There are a handful of changes here:
1. Move MustNewStatsReporter into knative/pkg
2. Expose the current interface as NewImplWithStats
3. Drop the StatsReporter from NewImpl and default to `MustNewStatsReporter()`

This is a breaking change for downstream repositories, but should make their callsites universally simpler.
2019-05-28 13:16:31 -07:00
Matt Moore d1ccfd6652 This upstreams a useful helper from serving. (#428)
In serving we have `reconciler.Handler` that wraps a handler function
(e.g. `Enqueue`) in the `cache.ResourceEventHandlerFuncs`.  This pattern
was becoming pervasive, so this simpler handler dramatically reduced our
boilerplate.
2019-05-27 23:21:29 -07:00
Matt Moore aae68ba9e0 Add methods for attaching resync duration to context. (#426)
Also adds a method for tracker leases, which should be a multiple of the resync period.
2019-05-26 23:44:29 -07:00
mattmoor-sockpuppet 34792a92ce golang format tools (#419)
Produced via:
  `gofmt -s -w $(find -path './vendor' -prune -o -type f -name '*.go' -print))`
  `goimports -w $(find -name '*.go' | grep -v vendor)`
2019-05-18 10:35:26 -07:00
leo james 7b196cff46 add trace id (#405)
* add trace id

* using with list

* add comment
2019-05-16 07:14:22 -07:00
Ville Aikas 5e4512dcb2 Add Filter function by name/namespace (#414) 2019-05-14 13:53:32 -07:00
Victor Agababov c3f131538a add addafter method that permits enqueueing the objects themselves (#408) 2019-05-08 16:14:37 -07:00
Victor Agababov f23f58d373 Expose the EnqueueKeyAfter function on the controller (#402)
This is needed for the positive handoff, so that we can enqueue
the KPA after we received the positive answer from the activator
but still want to have buffer time to make sure the network
changes propagate everywhere.
2019-05-02 17:16:34 -07:00
Matt Moore 7735348150 Add some helpers to reduce boilerplate in controller binaries. (#335)
* Add some helpers to reduce boilerplate in controller binaries.

* Incorporate feedback from Victor.

* Fix a couple test data races.
2019-03-23 10:57:48 -07:00
Victor Agababov c100323403 Fix the sugared logger usage (#300)
* Fix the sugared logger usage

Without `w` the logged messages are quite useless and annoying.

* some more magic
2019-02-28 18:17:38 -08:00
Matt Moore 18a6804326 Wait for work to complete before returning from Run. (#259)
Partially addresses https://github.com/knative/serving/issues/3074
2019-02-03 09:40:39 -08:00
Brad Hoekstra b8c4664801 Don't rate-limit normal events (#242)
* Don't rate-limit normal events

* Fix test - first enqueue is not rate limited
2019-01-25 14:11:34 -08:00
Aldo Culquicondor c2b6cc54fb Clean up controller code (#187)
- Removed unnecessary machinery
- Fixed misleading comments
2018-12-04 17:00:26 -08:00
Mustafa Demirhan 53b1235c2a Use different reporting periods based on the metrics backend (#182)
* Use different reporting periods based on the metrics backend. Original implementation was setting this value to 1 min to accommodate the minimum allowed value for stackdriver, but that is a sub optimal behavior for Prometheus. The change also allows operator to override the default values.

* Fix the units of reconciler latency from nano seconds to milliseconds.

* Address PR comments.

* Add better logging for reconciliation.

* Remove unnecessary lock in the UT.

* Address PR comments.
2018-12-04 15:16:25 -08:00
Matt Moore 102237ce9b Simplify the core controller logic. (#184)
I was hoping to switch from a simple string key to a struct to include additional metadata into our workqueue item (in particular the time at which it was queued), but the great unit tests here reminded me that this would not properly deduplicate keys (they would have different queue times!), so I'm settling for simplifying the loop in a few ways:
1. Keys can only be strings, so remove some error checking and a test.
1. We were wrapping a block in a `func` because it contained `defer`, but the outer function had nothing in it.

I'd still like to try and find a way to track queue times for keys, so that we can surface two interesting metrics:
1. How long are things spending queued?
1. How long between events being observed and us responding? (above + reconcile latency)
2018-12-03 17:02:25 -08:00
lichuqiang c1d9d9552a quieting controller log (#169) 2018-11-26 08:47:23 -08:00
mattmoor-sockpuppet d99eb0732f Run gofmt (#171)
Produced via: `gofmt -s -w $(find -name '*.go' | grep -v vendor)`
2018-11-26 07:18:22 -08:00
lichuqiang a5c260df2a Update EnqueueLabelOf to handle resources in same namespace (#166)
* update EnqueueLabelOf to handle resources in same namespace

* spit the EnqueueLabelOf func for namespaced/cluster-scoped resources
2018-11-19 19:04:20 -08:00
Matt Moore 35322088a8 Move the `getObject` method into `kmeta` (#155)
This promotes the private deletion-handling helper wrapping `meta.Accessor` into `kmeta`.  The wordy name is based on the wordier `cache.DeletionHandlingMetaNamespaceKeyFunc`, since it serves a similar purpose.
2018-11-01 18:48:34 -07:00
lichuqiang fcb0656514 requeue keys on error in controller (#149)
* requeue keys on error in controller

* remove maxRetries limit
2018-10-31 09:59:34 -07:00
lichuqiang d247efe41d update EnqueueControllerOf to properly handle Delete (#146)
* update EnqueueControllerOf to properly handle Delete

* update existing test

* add enqueue label tests

* address comments
2018-10-30 07:13:34 -07:00
Mustafa Demirhan 6c3ae54dda Add reconciler metrics (#127)
* Add reconciler metrics for queue depth, reconcile count & reconcile latency.

* Update deps.

* Change queue depth measurement type to int64 from float64

* Address PR comments

* Add extra unit tests.

* Minor fixes.

* Address PR comments by adding extra documentation

* Add tests for testing/fake_stats_reporter

* Addressed PR comments.

* Address PR comments.
2018-10-22 14:34:30 -07:00
Ivan Mikushin 62bcfe912f Prepare for global resync on ConfigMap changes (#132)
* Prepare for global resync on ConfigMap changes

- Add GlobalResync(cache.SharedInformer) method to *controller.Impl
- Move UntypedStore from serving/pkg/config to pkg/configmap
- Add onAfterStore callbacks to UntypedStore
- Add TypeFilter to enable composable construction of such callbacks

* Address review

- Unit test GlobalResync()
- Reimpl GlobalResync() using ListKeys()

* Update comments

* Add test for onAfterStore

* Update godoc comment for Logger
2018-10-18 09:47:28 -07:00
Matt Moore 3ca427071d Add a logkey for the reconcile key. (#49)
This adds a logkey.Key for attaching the key being reconciled to a logger we embed into the context passed to Reconcile.  The point of this is to enable us to eliminate [this](928d580756/pkg/reconciler/v1alpha1/service/service.go (L101-L104)) boilerplate from each of the Reconcilers.

These aren't identical, this has the form:
  knative.dev/key: foo/bar

What it's replacing has the form:
  knative.dev/namespace: foo
  serving.knative.dev/service: bar

However, the type information present in the latter is redundant with the controller type that [should already be embedded](928d580756/pkg/reconciler/reconciler.go (L80)) in the logger.
2018-08-23 09:48:59 -07:00
Matt Moore bf6f9e373f Move our base controller infrastructure into knative/pkg. (#33)
The `controller.go` is from: https://github.com/knative/serving/pull/1770, however, this adds 100% coverage of `controller.go`, which we have been missing in `knative/serving`.

This also adds a `context.Context` to the `Reconcile` signature, to enable better sharing of logger setup across controllers.

Fixes: https://github.com/knative/pkg/issues/8
2018-08-01 14:32:37 -07:00