Normally the first exception/event experienced is the cause and is
followed by a stampede of ClosedChannelExceptions. In this case,
SslHandler is manufacturing a ClosedChannelException of its own and
propagating it _before_ the trigger event. This might be considered a
bug, but changing SslHandler's behavior would be very risky and almost
certainly break someone's code.
Fixes#7376
It's hoped that this resolves the "too_many_pings" issue some users are
seeing that is worked around by GRPC_EXPERIMENTAL_AUTOFLOWCONTROL=false.
This change also avoids resetting the ping count for empty data frames
(which shouldn't really happen with gRPC).
The previous code failed to reset the ping count on HEADERS and
WINDOW_UPDATE. The code _appeared_ to have callbacks for WINDOW_UPDATE,
but was layered above the Http2Connection so was never called. Thus,
this version is much more aggressive then the previous version while
also addressing the correctness issue.
It deprecates ExpectedException and Assert.assertThat(T, org.hamcrest.Matcher).
Without Java 8 we don't want to migrate away from ExpectedException at
this time. We tend to prefer Truth over Hamcrest, so I swapped the one
instance of Assert.assertThat() to use Truth. With this change we get a
warning-less build with JUnit 4.13. We don't yet upgrade because we
still need to support JUnit 4.12 for some use-cases, but will be able to
upgrade to 4.13 soon when they upgrade.
This is a very simple change to test for IBMJSSE2 security provider in addition to the others. IBM JRE does not support the Sun provider, but instead has IBMJSSE2 which supports the same API calls.
I tested this on Z/OS machine as now it works when before it couldn't find a security provider
96ad6338 accidentally caused the javadoc and sources jars to no longer
be published for grpc-netty-shaded. It would appear to be due to the
jars being empty. This commit causes them to be published again.
A user has reported a GOAWAY with too_many_pings when using BDP. We
aren't certain why it is happening, but want to provide a way to disable
BDP while we continue investigating. b/162162973
verifyZeroInteractions has the same behavior as verifyNoMoreInteractions. It
was deprecated in Mockito 3.0.1 and replaced with verifyNoInteractions, which
does not change behavior depending on previous verify() calls. All instances
were replaced with verifyNoInteractions, except those in
ApplicationThreadDeframerTest which were replaced with verifyNoMoreInteractions
since there is a verify() call in `@Before`.
Since Travis in on Java 8u252, we won't actually be testing Jetty ALPN at this
point. We're also not testing the Java 9 ALPN API on Java 8, since our current
version of Netty doesn't support it (but an upgrade is available that does).
This provides a substantial ~3x performance increase to Netty async
streaming with small messages. It also increases OkHttp performance for
the same benchmark 40% and decreases unary latency by 3µs for Netty and
10µs for OkHttp.
We avoid calling listener after closure because the Executor used for
RPC callbacks may no longer be available. This issue was already
present in the ApplicationThreadDeframer, but full-stream compression is
not really deployed so was unnoticed.
DirectExecutor saw a 5-6µs latency increase via MigratingDeframer.
DirectExecutor usages should see no benefit from MigratingDeframer, so
disable it in that case.
This PR changes the `NettyServerTransport#getLogLevel` method to log
`SocketException`s to `LogLevel.FINE`, rather than exclusively pure
IOExceptions. This may fix an unintentional regression introduced in
c166ec2c, although the message in my java version (14.0.1) wouldn't have
passed the old logic for quieting either. This also fixes the issue
raised in #6423 that was locked for inactivity.
This fixes
```
[2020/05/14 20:21:52 INFO] [io.grpc.netty.NettyServerTransport.connections] Transport failed
java.net.SocketException: Connection reset
at java.base/sun.nio.ch.SocketChannelImpl.throwConnectionReset(SocketChannelImpl.java:345)
at java.base/sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:376)
at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:288)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1125)
at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:347)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:148)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:677)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:612)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:529)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:491)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:905)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:832)
```
being logged to level INFO, which occurs whenever a socket is improperly
closed by the client, such as with the grpc-health-probe (They've got an
[open issue](https://github.com/grpc-ecosystem/grpc-health-probe/issues/34)
for this)
- Use gradle configuration `api` for dependencies that are part of grpc public api signatures.
- Replace deprecated gradle configurations `compile`, `testCompile`, `runtime` and `testRuntime`.
- With minimal change in dependencies: If we need dep X and Y to compile our code, and if X transitively depends on Y, then our build would still pass even if we only include X as `compile`/`implementation` dependency for our project. Ideally we should include both X and Y explicitly as `implementation` dependency for our project, but in this PR we don't add the missing Y if it is previously missing.
The race between new streams and transport shutdown is #2562, but it is still
far from being generally solved. This reduces the race window of new streams
from (transport selection → stream created on network thread) to (transport
selection → stream enqueued on network thread). Since only a single thread now
needs to do work in the stream creation race window, the window should be
dramatically smaller.
This only reduces GOAWAY races when the server performs a graceful shutdown
(using two GOAWAYs), as that is the only non-racy way on-the-wire to shutdown a
connection in HTTP/2.
maven_install is strongly superior to previous forms of grabbing dependencies
from Maven as it computes the appropriate versions to use from the full
transitive dependencies of all libraries used by an application. It also has
much less boilerplate and includes dependencies with generated targets.
In the future we will drop the jvm_maven_import_external usages and require
maven_install, at which point we can swap to using the `@maven' repository and
no longer depend on compat_repositories.
Fixes#5359
This PR is to add one more Executor parameter when creating the SslContext.
In Netty, we already have this implementation for passing Executor when creating SslContext: netty/netty#8847. This extra Executor is used to take some time-consuming tasks when doing SSL handshake. However, in current gRPC implementation, we are not using this API.
In this PR, the relevant changes are:
1. get the executorPool from ChannelBuilder or ServerBuilder
2. pass the executorPool all the way down to ClientTlsHandler
3. fill executorPool in when creating SslHandler
The allocator has a circular reference that prevents it from GC'ed,
thus causes memory leak if gRPC Channels are created and shutdown
(even cleanly) on a regular basis.
See https://github.com/netty/netty/issues/6891#issuecomment-457809308
and internal b/146074696.
This would cut the amount of per-thread direct buffer allocations by half, especially with light traffic. This will also cut the amount of file descriptors that's created per thread by half.
Internal benchmark results (median of 5 runs) doesn't show any significant change:
```
Before (STDEV) After (STDEV)
grpc-java-java-multi-qps-integrity_only
Actual QPS 711,004 (6,246) 704,372 (6,873)
QPS per Client CPU 23,921 (252) 24,188 (252)
grpc-java-java-multi-throughput-integrity_only
Actual QPS 35,326 (48) 35,294 (29)
QPS per Client CPU 3,362 (17) 3,440 (13)
grpc-java-java-single-latency-integrity_only
Median latency (us) 127 (2.77) 129 (3.13)
grpc-java-java-single-throughput-integrity_only
Actual QPS 581 (11.60) 590 (7.08)
QPS per Client CPU 490 (10.98) 498 (5.63)
```
This would reduce the amount of direct buffer allocations, especially with light traffic. This should mitigate internal issue b/143075435
The change is currently optional and is only effective if system property "io.grpc.netty.useCustomAllocator" is set to "true" ignoring the case.
Internal benchmark results (median of 5 runs) doesn't show any significant change:
```
Before (STDEV) After (STDEV)
grpc-java-java-multi-qps-integrity_only
Actual QPS 717,848 (7,445) 715,061 (2,122)
QPS per Client CPU 23,768 (799) 23,842 (295)
grpc-java-java-multi-throughput-integrity_only
Actual QPS 35,631 (204) 35,298 (25)
QPS per Client CPU 3,362 (56) 3,316 (18)
grpc-java-java-single-latency-integrity_only
Median latency (us) 130 (1.82) 125 (5.36)
grpc-java-java-single-throughput-integrity_only
Actual QPS 593 (5.14) 587 (3.76)
QPS per Client CPU 502 (4.51) 494 (6.92)
```
It appears the problem is that server.close() was missing sync(), so the
event loop was still processing the closure when the next test started.
This change is more aggressive than it needs to be, but should make it
less bug-prone.
Fixes#5574
This implicit loading is more conservative than the loading for
tcnative, as Conscrypt will only be implicitly loaded if there are no
other options. This means the Java 9+ JSSE is preferred over Conscrypt
without explicit user configuration.
While we would generally prefer Conscrypt over JSSE, we want to allow
the user to configure their security providers. There wasn't a good way
to do that with netty-tcnative and the performance of JSSE at the time
was abysmal.
While generally being a good way to allow adopting Conscrypt, this also
allows easily using App Engine Java 8's provided Conscrypt which can
substantially reduce binary size.
See googleapis/google-cloud-java#6425
This change adds two booleans to the ChannelBuilders to
allow transports to use get and put. These are currently defaulted to
on, but unset on the method descriptors. This change is 1/2 that will
allow the safe / idempotent bits to be set on generated proto code.
Part 2/2 will actually enable it.
The use case for this is for interceptors that implement caching logic.
They need to be able to access the safe/idempotent bits on the MD in
order to decide to how to handle the request, even if gRPC doesn't use
GET / PUT HTTP methods.
Examples and android projects were left unchanged. They can be changed
later.
No plugin versions were changed, to make this as non-functional of a
change as possible. Upgrading Gradle to 5.6 was necessary for
pluginManagement in settings.gradle.
This can be used to prevent duplicate classes in the classpath, one via
Maven and one via Bazel-native.
See census-instrumentation/opencensus-java#1963 and #5359
Http2ControlFrameLimitEncoder is from Netty. It is copied here as a
temporary measure until we upgrade to the version of Netty that includes
the class.
See CVE-2019-9515
* Bump PerfMark to 0.17.0
The main changes how linking is done. Linking is now always done
through the `PerfMark` entry class. This is for two reasons:
1. It make instrumenting the linking calls *much* easier.
2. It follows the API pattern of "verbNoun()". Previous callsites
would have `Link link = PerfMark.link(); link.link()`. This
stuttering is not quick to follow.
Generated using:
```
find -name \*.java -exec sed -i 's#link = PerfMark.link();#link = PerfMark.linkOut();#g' {} \;
find -name \*.java -exec sed -i 's#link.link();#PerfMark.linkIn(link);#g' {} \;
find -name \*.java -exec sed -i 's#command.getLink().link();#PerfMark.linkIn(command.getLink());#g' {} \;
find -name \*.java -exec sed -i 's#cmd.getLink().link();#PerfMark.linkIn(cmd.getLink());#g' {} \;
find -name \*.java -exec sed -i 's#msg.getLink().link();#PerfMark.linkIn(msg.getLink());#g' {} \;
```
Since the deprecated link methods are also `@DoNotCall`, the same
sed calls will need to be used on import.
In case a negotiating handler misses a read, and it reaches the WBAEH, it should cause a failure. Also, if closing the channel fails while handling another error, log the second failure.
We only care about when closing is done, not whether it is successful or not.
If there's a failure, we're already going to log a warning. Use await to avoid
throwing unexpectedly.
Maven does not include transitive runtime dependencies in the
compile-time classpath (testing shows Gradle 4 does; docs say
Gradle 5 doesn't). So if a user references the shaded
NettyServerBuilder without also depending on grpc-core directly,
compilation will fail because AbstractServerImplBuilder couldn't
be found.
This isn't technically a problem, since we're not wanting to encourage
users to reference the shaded classes directly. But some users will
certainly reference the classes anyway and the error is pretty confusing
while also being trivially worked around. In other words: it justs
wastes people's time and benefits nobody.
Fixes#5881
This change is needed after trying to use the new style protocol negotiators internally. The problem is that some handlers fire the event in handlerAdded, which is too early. The followup PNE is fired after handlerAdded, which breaks the composibility of the negotiators.
To fix this, this change modifies the negotiation flow. Specifically:
* Negotiators should NEVER fire a negotiation from handlerAdded, instead they should wait until userEventTriggered
* Negotiators now do state checking on the PNE. If it is set twice, it fails. If it has not been received when doing the next stage of negotiation, it fails.
* WBAEH now fires the initial, default event. This is the only handler that can fire it from handlerAdded
The tests updated are ones not using WBAEH (which they probably should). This change ensures attributes aren't lost when doing negotiation.
This change removes the WriteQueue linking and splits it out into each
of the commands, so that the trace is more precise, and the tag
information is correct.
It is still unclear what the initial Tag should be for ClientCallImpl,
since it should not access the TransportState to get the HTTP/2 stream id.
Transport level exceptions (e.g. "Connection reset by peer") are not
useful and clutter the logs. `NettyServerTransport` contains logic to
log such exceptions at level `FINE`.
When running with epoll, transport level exceptions are prefixed with
additional contextual information (e.g. "syscall:read(..) failed:") that
causes the exceptions to be logged at level `INFO`.
Update the filtering logic to match on error messages _containing_ the
blacklisted messages, rather than using string equality.
Closes#5872.
Signed-off-by: Nick Travers <n.e.travers@gmail.com>
This add perfmark annotations in some key places, notably on transport/application boundaries, and thread hop locations. Perfmark records to a thread-local buffer the events that happen in each thread. Perfmark is disabled by default, and will compile to a noop unless Perfmark.setEnabled is invoked. This should make it free when disable, and pretty fast when it is enabled.
It is important that started tasks are ended, so several places in our code are moved to either try-finally blocks, or moved into a private method. I realize this is ugly, but I think it is manageable. In the future, we can look at making an agent or compiler plugin that simplifies the recording.
Linking between threads is done with a Link object, which is created on the "outbound" task, and used on the "inbound" task. This is slightly more verbose, and does has a small amount of runtime overhead, even when disabled. (for null checks, slightly higher memory usage, etc.) I think this is okay to, because it makes other optimizations much easier.