The API review for #6279 came up with a more meaningful name that
better explains the intent. A setter for the old name was left in
ManagedChannelBuilder to ease migration to the new name by current
users.
Adds an Executor to NameResolver.Args, which is optionally set on ManagedChannelBuilder. This allows NameResolver implementations to avoid creating their own thread pools if the application already manages its own pools.
Addresses #3703.
This allows plumbing information to the subchannel via Attributes, like
an authority override. RR still does not support multiple EAGs that only
differ by attributes.
ServerImpl uses that ticker to create incoming Deadlines. This feature is specifically restricted to in-process, as it can also customize ScheduledExecutorService, and them together can fake out the clock which is useful in tests. On the other hand, a fake Ticker won't work with Netty's ScheduledExecutorService.
Also improved mismatch detection, documentation and tests in Deadline.
Fixes#5593 and supersedes #5601
Now that https://github.com/census-instrumentation/opencensus-java/pull/1854 has been merged & released as 0.21.0. We can start using the method & status tags.
Background:
Opencensus introduced new tags for status and method (https://github.com/census-instrumentation/opencensus-java/pull/1115). The old views that used those tags were deprecated and new views were created that used the new tags. However grpc-java wasn't updated to use the new tags due to concern of breaking existing metrics. This resulted in the old views being deprecated while the new views were broken (goomics #50).
https://github.com/census-instrumentation/opencensus-java/pull/1854 added a compatibility layer to opencensus that would remap new tags to old tags for old views. This should unblock grpc to switching to the new tags while allowing old views to still be populated. That commit was released as part of opencensus 0.21, which grpc currently uses
* Bump PerfMark to 0.17.0
The main changes how linking is done. Linking is now always done
through the `PerfMark` entry class. This is for two reasons:
1. It make instrumenting the linking calls *much* easier.
2. It follows the API pattern of "verbNoun()". Previous callsites
would have `Link link = PerfMark.link(); link.link()`. This
stuttering is not quick to follow.
Generated using:
```
find -name \*.java -exec sed -i 's#link = PerfMark.link();#link = PerfMark.linkOut();#g' {} \;
find -name \*.java -exec sed -i 's#link.link();#PerfMark.linkIn(link);#g' {} \;
find -name \*.java -exec sed -i 's#command.getLink().link();#PerfMark.linkIn(command.getLink());#g' {} \;
find -name \*.java -exec sed -i 's#cmd.getLink().link();#PerfMark.linkIn(cmd.getLink());#g' {} \;
find -name \*.java -exec sed -i 's#msg.getLink().link();#PerfMark.linkIn(msg.getLink());#g' {} \;
```
Since the deprecated link methods are also `@DoNotCall`, the same
sed calls will need to be used on import.
Works for #4740
- Subclasses of `AbstractClientStream` include remote address in insight if available.
- `DelayedStream` adds buffered time, and the insight of real stream if it's set.
- `RetriableStream` insights outputs of Substreams.
Example error message:
```
deadline exceeded after 8112071ns. [buffered_nanos=24763, remote_addr=foo.test.google.fr/127.0.0.1:44749]
```
or
```
deadline exceeded after 8112071ns. [buffered_nanos=22344324763, waiting_for_connection]
```
This is related to #4776 but taking a more usage-specific approach.
In #5892 getAttributes() is called without any regard of timing.
Currently DelayedStream.getAttributes() wil throw if called before
passThrough was set. Just to be safe, we are removing that
restriction and making it clear on the javadoc.
On the other hand, we intend to keep the timing restriction on
ClientCall.getAttributes().
This change removes the WriteQueue linking and splits it out into each
of the commands, so that the trace is more precise, and the tag
information is correct.
It is still unclear what the initial Tag should be for ClientCallImpl,
since it should not access the TransportState to get the HTTP/2 stream id.
No more methods on the `LoadBalancer` will be called after
`LoadBalancer#shutdown()` is called. This includes
`LoadBalancer#handleSubchannelState()` too. `SubchannelStateListener`
inherited this restriction. However, this special case makes
`onSubchannelState(SHUTDOWN)` an unreliable way of being notified
about `Subchannel` SHUTDOWN, and may confuse/complicate a
wrapping `LoadBalancer` that expects the full notification (e.g., #5875).
The javadoc isn't clear whether this restriction applies. I think
it's more useful to make it no apply.
We were logging when withDeadline() was used, not when the Context was used. As
discovered while looking at https://stackoverflow.com/q/56593692/4690866 .
In e19e8f7d updateTimeoutHeaders was removed and logIfContextNarrowedTimeout
was called directly. However, the two methods had reverse ordering of
callDeadine/outerCallDeadline and the caller did not get their arguments
swapped.
panic mode was temporarily disabled by #4152 and re-enabled by #4245,
but the tests were not. This has caused a few test code that was
broken but not executed at all.
This add perfmark annotations in some key places, notably on transport/application boundaries, and thread hop locations. Perfmark records to a thread-local buffer the events that happen in each thread. Perfmark is disabled by default, and will compile to a noop unless Perfmark.setEnabled is invoked. This should make it free when disable, and pretty fast when it is enabled.
It is important that started tasks are ended, so several places in our code are moved to either try-finally blocks, or moved into a private method. I realize this is ugly, but I think it is manageable. In the future, we can look at making an agent or compiler plugin that simplifies the recording.
Linking between threads is done with a Link object, which is created on the "outbound" task, and used on the "inbound" task. This is slightly more verbose, and does has a small amount of runtime overhead, even when disabled. (for null checks, slightly higher memory usage, etc.) I think this is okay to, because it makes other optimizations much easier.
Previously, AutoConfiguredLoadBalancer was also handling it, but it
doesn't trigger retries. By returning true for
canHandleEmptyAddressListFromNameResolution(),
AutoConfiguredLoadBalancer effectively by-passed the empty-result
handling logic in ManagedChannelImpl, thus resolution retries were
never triggered.
This change requires AutoConfiguredLoadBalancer to stop being a
LoadBalancer, for its tryHandleResolvedAddresses(). It doesn't
cause any trouble because AutoConfiguredLoadBalancer has become
less and less like a LoadBalancer during the service config changes.
As mentioned in 5188[1], the default used with the enableKeepAlive API
conflicted with the default server enforcement. Instead of fixing it,
remove it. These APIs were deprecated in v1.3.0 in April 2017.
1. https://github.com/grpc/grpc-java/issues/5188#issuecomment-482269303
The `method` and `status` shouldn't be propagated in the first place,
but in previous OpenCensus implementation all tags are propagating by
default. Now with the TagMetadata it may make sense to change them to
local tags.
This will be a breaking change to users who depend on the behavior that
these tags propagate through process boundaries.
The old createSubchannel() doesn't require being called from
sync-context, thus its implementation schedules start() on
sync-context because start() has such requirement. However, if a
LoadBalancer call createSubchannel() in sync-context, start() is
queued and only called after the control exits the sync-context. If
createSubchannel() is immediately followed by other Subchannel methods
that requires start(), such as requestConnection(), they will fail.
This will be a very common issue as most LoadBalancers call
createSubchannel() in the sync-context.
This fix splits out the real work of start() into internalStart() which
is called by the old createSubchannel().
Previously PickResult's Subchannel must be the actual implementation
returned from the Channel's Helper, and Channel would cast it to the
implementation class in order to use it. This will be broken if
Subchannel is wrapped in the case of hierarchical LoadBalancers.
getInternalSubchannel() is the guaranteed path for the Channel to get
the InternalSubchannel implementation. It is friendly for wrapping.
Background: #5676
The pick_first policies in core and grpclb previously would call
Subchannel.requestConnection() from data-path. They now will schedule
that call in the sync-context to avoid the warning. They will only
call it for the first pick of each picker, to prevent storming the
sync-context.
This is a revised version of #5503 (62b03fd), which was rolled back in f8d0868. The newer version passes SubchannelStateListener to Subchannel.start() instead of SubchannelCreationArgs, which allows us to remove the Subchannel argument from the listener, which works as a solution for #5676.
LoadBalancers that call the old createSubchannel() will get start() implicitly called with a listener that passes updates to the deprecated LoadBalancer.handleSubchannelState(). Those who call the new createSubchannel() will have to call start() explicitly.
GRPCLB code is still using the old API, because it's a pain to migrate the SubchannelPool to the new API. Since CachedSubchannelHelper is on the way, it's easier to switch to it when it's ready. Keeping
GRPCLB with the old API would also confirm the backward compatibility.
Contrary to #5736, we will still keep the sync-context requirement of
requestConnection(), because it prevents API fragmentation.
PickFirstLoadBalancer is the only known violator. We will fix it on
master, but we don't want to make that change on 1.21.x because the
release is soon. We simply remove the warning in this release so that
users won't be annoyed.
This supersedes #5736
We will require Subchannel.requestConnection() to be called from
sync-context (#5722), but SubchannelPicker.requestConnection() is
currently calling it with the assumption of thread-safety. Actually
SubchannelPicker.requestConnection() is called already from
sync-context by ChannelImpl, it makes more sense to move this method
to LoadBalancer where all other methods are sync-context'ed, rather than
making SubchannelPicker.requestConnection() sync-context'ed and fragmenting
the SubchannelPicker API because pickSubchannel() is thread-safe.
C++ also has the requestConnection() equivalent on their LoadBalancer
interface.
We check for idle mode the first time we try newStream(), but failed to when
newStream races with reprocess(). This would normally be a very rare race,
except when you consider that AbstractChannelBuilder will call
managedChannel.enterIdle() when the network changes.
Fixes#5729
I see more cases of wrapping Helper and Subchannel during the work of
XdsLoadBalancer, we will require that all methods that involve mutable
state to be called from the Synchronization Context. We will start
logging warnings first, and make them throw in a future release.
Helper.createSubchannel() is already doing so. This change adds
warnings to the other eligible methods.
https://github.com/grpc/grpc-java/issues/5015
This follows the sort of changes we've done in the past, where the '2'
implies "version 2." We can end up reclaiming the original name if we
wish in the future.
The main reason for this change is to avoid changing to Observer since
the rest of io.grpc consistently uses Listener.
Also updated CensusModule to use the new helper methods ContextUtils.withValue() instead of directly manipulating the context keys. See census-instrumentation/opencensus-java#1864.
Having two Helpers (the other being LoadBalancer.Helper) is proven to
be confusing, and the one in NameResolver is fundamentally different
from the one from LoadBalancer, as the latter mutates the states while
the former doesn't. Renaming it to Args make it less awkward to
expose it to LoadBalancer, which is needed for creating NameResolvers
for per-locality routing in the XdsLoadBalancer.
As we are now endorsing the wrapping of ClientStreamTracers by
providing ForwardingClientStreamTracer, there is a need for altering
StreamInfo, especially CallOptions before it's passed onto the
delegate. A Builder class and a toBuilder() provides a robust way
to copy the rest of the fields.
This is a breaking change for anybody who creates StreamInfo, which is
unlikely in non-test code, because StreamInfo was added as late as
1.20.0.
NameResolverRegistry takes on all the logic previously in
NameResolverProvider. But it also allows manual registration of
NameResolvers, which is useful when the providers have complex
construction or need objects injected into them.
This also avoids a circular dependency during class loading since
previously loading any Provider searched for all Providers via
ClassLoader since ClassLoader handling was static within the parent
class.
Fixes#5562
io.grpc has fewer dependencies than io.grpc.internal. Moving it to a
separate artifact lets users use the API without bringing in the deps.
If the library has an optional dependency on grpc, that can be quite
convenient.
We now version-pin both grpc-api and grpc-core, since both contain
internal APIs.
I had to change a few tests in grpc-api to avoid FakeClock. Moving
FakeClock to grpc-api was difficult because it uses
io.grpc.internal.TimeProvider, which can't be moved since it is a
production class. Having grpc-api's tests depend on grpc-core's test
classes would be weird and cause a circular dependincy. Having
grpc-api's tests depend on grpc-core is likely possible, but weird and
fairly unnecessary at this point. So instead I rewrote the tests to
avoid FakeClock.
Fixes#1447
Resolves#5497
## Motivation
In hierarchical `LoadBalancer`s (e.g., `XdsLoadBalancer`) or wrapped `LoadBalancer`s (e.g., `HealthCheckingLoadBalancerFactory`, the top-level `LoadBalancer` receives `Subchannel` state updates from the Channel impl, and they almost always pass it down to its children `LoadBalancer`s.
Sometimes the children `LoadBalancer`s are not directly created by the parent, thus requires whatever API in the middle to also pass Subchannel state updates, complicating that API. For example, the proposed [`RequestDirector`](https://github.com/grpc/grpc-java/issues/5496#issuecomment-476008051) includes `handleSubchannelState()` solely to plumb state updates to where they are used. We also see this pattern in `HealthCheckingLoadBalancerFactory`, `GrpclbState` and `SubchannelPool`.
Another minor issue is, the parent `LoadBalancer` would need to intercept the `Helper` passed to its children to map Subchannels to the children `LoadBalancer`s, so that it pass updates about relevant Subchannels to the children. Otherwise, a child `LoadBalancer` may be surprised by seeing Subchannel not created by it, and it's not efficient to broadcast Subchannel updates to all children.
## API Proposal
We will pass a `SubchannelStateListener` when creating a `Subchannel` to accept state updates, those updates could be directly passed to where the `Subchannel` is created, skipping the explicit chaining in the middle.
Also define a first-class object `CreateSubchannelArgs` to pass arguments for the reasons below:
1. It may avoid breakages when we add new arguments to `createSubchannel()`. For example, a `LoadBalancer` may wrap `Helper` and intercept `createSubchannel()` in a hierarchical case. It may not be interested in all arguments. Passing a single `CreateSubchannelArgs` will not break the parent `LoadBalancer` if we add new fields later.
2. This also reduces the eventual size of Helper interface, as the convenience `createSubchannel()` that accepts one EAG instead of a List is no longer necessary, since that convenience is moved into `CreateSubchannelArgs`.
```java
interface SubchannelStateListener {
void onSubchannelState(Subchannel subchannel, ConnectivityStateInfo newState);
}
abstract class LoadBalancer.Helper {
abstract Subchannel createSubchannel(CreateSubchannelArgs args);
}
final class CreateSubchannelArgs {
final List<EquivalentAddressGroup> getAddresses();
final Attributes getAttributes();
final SubchannelStateListener getStateListener();
final class Builder () {
...
}
}
```
The new `createSubchannel()` must be called from synchronization context, as a step towards #5015.
## How the new API helps
Most hierarchical `LoadBalancer`s would just let the listener from the child `LoadBalancer`s directly pass through to gRPC core, which is less boilerplate than before.
Without any effort by the parent, each child will only see updates for the Subchannels it has created, which is clearer and more efficient.
If a parent `LoadBalancer` does want to look at or alter the Subchannel state updates for its delegate (like in `HealthCheckingLoadBalancerFactory`), it can still do so in the wrapping `LoadBalancer.Helper` passed to the delegate by intercepting the `SubchannelStateListener`.
## Migration implications
Existing `LoadBalancer` implementations will continue to work, while they will see deprecation warnings when compiled:
- The old `LoadBalancer.Helper#createSubchannel` variants are now deprecated, but will work until they are deleted. They create a `SubchannelStateListener` that delegates to `LoadBalancer#handleSubchannelState`.
- `LoadBalancer#handleSubchannelState` is now deprecated, and will throw if called and the implementation doesn't override it. It will be deleted in a future release.
The migration for most `LoadBalancer` implementation would be moving the logic from `LoadBalancer#handleSubchannelState` into a `SubchannelStateListener`.
This class is used in other places than just NameResolver.Helper. It
should not be an inner class of Helper.
Strictly speaking this is an API-breaking change. However, this is
part of the service config error handling API that hasn't been done
yet. Nobody has a legitimate reason to use it.