Rebased PR #8343 into the first commit of this PR, then (the 2nd commit) reverted the part for metric recording of retry attempts. The PR as a whole is mechanical refactoring. No behavior change (except that some of the old code path when tracer is created is moved into the new method `streamCreated()`).
The API change is documented in go/grpc-stats-api-change-for-retry-java
Nginx and C core don't do graceful GOAWAY and retries have matured such
that transparent retries may soon be on by default. Refining the
workaround thus can reduces error rate for users.
Fixes#8310
This has been wrong since the introduction of the code in df357cb8.
Noticed as part of https://github.com/grpc/grpc-go/pull/4491 . The error
text is generally ASCII, so this probably doesn't matter much.
We used to have two ClientStreamListener.closed() methods. One is simply calling the other with default arg. This doubles debugging (e.g. #7921) and sometimes unit testing work. Deleting the 2-arg method to cleanup.
This PR is purely refactoring.
This adds support for creating a Netty Channel with SocketAddress and ChannelCredentials.
This aligns with NettyServerBuilder.forAddress(SocketAddress address, ServerCredentials creds).
Enables a codepath for zero-copy protobuf deserialization. Two new InputStream extension interfaces are added:
- HasByteBuffer: allows access to the underlying buffers containing inbound bytes directly without copying
- Detachable: allows customer marshaller to keep the buffers around until the application code is done with using the protobuf messages
Applications can implement a custom marshaller that takes over the ownership of ByteBuffers and wrap them into ByteStrings with protobuf's UnsafeByteOperations support. Then a RopeByteString, which is a in-place composite of ByteStrings can be created. This enables using the zero-copy codepath (requires immutable ByteBuffer indication) of CodedInputStream for deserialization.
Fix a bug in StreamBufferingEncoder: when client receives GOWAY while there are pending streams due to MAX_CONCURRENT_STREAMS, we see the following error:
io.netty.handler.codec.http2.Http2Exception$StreamException: Maximum active streams violated for this endpoint.
Tested with the interop client on Zulu 8 and Zulu 11 with
-XX:+UseOpenJSSE (after disabling tcnative). I was unable to add a new
case to TlsTest because adding OpenJSSE as a dependency in a Gradle
build fails: https://github.com/openjsse/openjsse/issues/19Fixes#7907
Starting in Netty 4.1.60, Netty will validate Content-Length headers
using getAll() and setLong(). While getAll() was documented as only used
in tests, it doesn't appear it was currently used in any tests.
While Http2NettyTest.contentLengthPermitted() was added to confirm that
Content-Length works, it won't actually exercise any interesting
behavior until we upgrade to Netty 4.1.60. However, I did test with
Netty 4.1.60 and it reproduced the failure in
https://github.com/grpc/grpc-java/issues/7953 and passed with this
change.
Since Netty is now observing/modifying the headers, it would seem
appropriate to implement a substantial portion of the Http2Headers API.
However, the surface is much larger than we'd want to implement for a
'quick fix' that could be backported. In addition, it seems much of the
API is just convenience methods, so it is probably appropriate to split
out a AbstractHeaders class from DefaultHeaders in Netty that doesn't
make any assumptions about the header storage mechanism.
If a handshake is ongoing during shutdown, this would substantially
reduce the time it takes to shut down. Previously, you would need to use
channel.shutdownNow() to have fast shutdown behavior, which is an
unnecessary use of the variant.
When the current approach was written WriteBufferingAndExceptionHandler
didn't exist and so it was hard to predict how the pipeline would react
to events (particularly because of HTTP/2 handler's re-definition of
close()). Now that WBAEH exists, this is more straight-forward.
See this PR in netty: https://github.com/netty/netty/pull/9798 . It's
possible that one peer has closed the stream, yet another frame from
peers arrives after it. This is largely harmless, as explained in the PR
from netty repository. If we don't do this, the log will be polluted with
these harmless logs.
Example that would no longer be logged:
```
Jan 25, 2021 6:23:51 PM io.grpc.netty.NettyServerHandler onStreamError
WARNING: Stream Error
io.netty.handler.codec.http2.Http2Exception$StreamException: Received DATA frame for an unknown stream 27
at io.netty.handler.codec.http2.Http2Exception.streamError(Http2Exception.java:147)
at io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.shouldIgnoreHeadersOrDataFrame(DefaultHttp2ConnectionDecoder.java:596)
at io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.onDataRead(DefaultHttp2ConnectionDecoder.java:239)
...
```
- Add APIs to `ClientTransportFactory`:
```java
public interface ClientTransportFactory {
/**
* Swaps to a new ChannelCredentials with all other settings unchanged. Returns null if the
* ChannelCredentials is not supported by the current ClientTransportFactory settings.
*/
SwapChannelCredentialsResult swapChannelCredentials(ChannelCredentials channelCreds);
final class SwapChannelCredentialsResult {
final ClientTransportFactory transportFactory;
@Nullable final CallCredentials callCredentials;
}
}
```
- Add `ChannelCredentials` to constructor args of `ManagedChannelImplBuilder`:
```java
public ManagedChannelImplBuilder(
String target, @Nullable ChannelCredentials channelCreds, @Nullable CallCredentials callCreds, ...)
```
Enable this feature by setting the system property
-Dio.netty.ssl.masterKeyHandler=true
or
System.setProperty(SslMasterKeyHandler.SYSTEM_PROP_KEY, "true");
The keys will be written to the log named "io.netty.wireshark" in
the warnning level. To export the keys to a file, you can configure
log factory like: (with log4j.xml for example)
<appender name="key-file" class="org.apache.log4j.RollingFileAppender">
<param name="file" value="d:/keyfile.txt"/>
<layout class="org.apache.log4j.PatternLayout">
<param name="ConversionPattern" value="%m%n"/>
</layout>
</appender>
<category name="io.netty.wireshark">
<priority value="DEBUG" />
<appender-ref ref="key-file" />
</category>
Wireshark can analyze the messages gRPC over TLS with this
key log file.
close#7199
Change InternalServer to handle multiple addresses and implemented in NettyServer.
It makes ServerImpl to have a single transport server, and this single transport server (NettyServer) will bind to all listening addresses during bootstrap. (#7674)
The tiny cache size was removed from the bytebuf allocator and so was
deprecated. TLSv1.3 was enabled by the upgrade, which fails mTLS
connections at different times. Conscrypt is incompatible with the
default TrustManager when TLSv1.3 is enabled so we explicitly disable
TLSv1.3 when Conscrypt is used for the moment.
API change (See go/grpc-rls-callcreds-to-server):
- Add `ChannelCredentials.withoutBearerTokens()`
- Add `createResolvingOobChannelBuilder(String, ChannelCredentials)`, `getChannelCredentials()` and `getUnsafeChannelCredentials()` for `LoadBalancer.Helper`
This PR does not include the implementation of `createResolvingOobChannelBuilder(String, ChannelCredentials)`.
This is to be used for xDS to inject configuration for the
XdsServerCredentials. We'd like a cleaner approach, but they mostly seem
to be more heavy-weight. We will probably address this at the same time
we handle the Executor being passed for TLS. In the mean time this is
easy, doesn't hurt much, and can easily be changed in the future.
* fix channel builders ABI backward compatibility broken in v1.33.0
* fix server builders ABI backward compatibility broken in v1.33.0
* makes ForwardingServerBuilder package-private
With this, it will be clear if the RPC failed because the server didn't
use a double-GOAWAY or if it failed because of MAX_CONCURRENT_STREAMS or
if it was due to a local race. It also fixes the status code to be
UNAVAILABLE except for the RPCs included in the GOAWAY error (modulo the
Netty bug).
Fixes#5855
The stream creation was failing because the stream id was disallowed:
Caused by: io.grpc.StatusRuntimeException: INTERNAL: http2 exception
at io.grpc.Status.asRuntimeException(Status.java:533)
at io.grpc.stub.ClientCalls$BlockingResponseStream.hasNext(ClientCalls.java:629)
... 16 more
Caused by: io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2Exception$StreamException: Cannot create stream 222691 greater than Last-Stream-ID 222689 from GOAWAY.
The problem was introduced in 9ead606. Fixes#7357
Normally the first exception/event experienced is the cause and is
followed by a stampede of ClosedChannelExceptions. In this case,
SslHandler is manufacturing a ClosedChannelException of its own and
propagating it _before_ the trigger event. This might be considered a
bug, but changing SslHandler's behavior would be very risky and almost
certainly break someone's code.
Fixes#7376
It's hoped that this resolves the "too_many_pings" issue some users are
seeing that is worked around by GRPC_EXPERIMENTAL_AUTOFLOWCONTROL=false.
This change also avoids resetting the ping count for empty data frames
(which shouldn't really happen with gRPC).
The previous code failed to reset the ping count on HEADERS and
WINDOW_UPDATE. The code _appeared_ to have callbacks for WINDOW_UPDATE,
but was layered above the Http2Connection so was never called. Thus,
this version is much more aggressive then the previous version while
also addressing the correctness issue.
It deprecates ExpectedException and Assert.assertThat(T, org.hamcrest.Matcher).
Without Java 8 we don't want to migrate away from ExpectedException at
this time. We tend to prefer Truth over Hamcrest, so I swapped the one
instance of Assert.assertThat() to use Truth. With this change we get a
warning-less build with JUnit 4.13. We don't yet upgrade because we
still need to support JUnit 4.12 for some use-cases, but will be able to
upgrade to 4.13 soon when they upgrade.
This is a very simple change to test for IBMJSSE2 security provider in addition to the others. IBM JRE does not support the Sun provider, but instead has IBMJSSE2 which supports the same API calls.
I tested this on Z/OS machine as now it works when before it couldn't find a security provider
A user has reported a GOAWAY with too_many_pings when using BDP. We
aren't certain why it is happening, but want to provide a way to disable
BDP while we continue investigating. b/162162973
verifyZeroInteractions has the same behavior as verifyNoMoreInteractions. It
was deprecated in Mockito 3.0.1 and replaced with verifyNoInteractions, which
does not change behavior depending on previous verify() calls. All instances
were replaced with verifyNoInteractions, except those in
ApplicationThreadDeframerTest which were replaced with verifyNoMoreInteractions
since there is a verify() call in `@Before`.
Since Travis in on Java 8u252, we won't actually be testing Jetty ALPN at this
point. We're also not testing the Java 9 ALPN API on Java 8, since our current
version of Netty doesn't support it (but an upgrade is available that does).
This provides a substantial ~3x performance increase to Netty async
streaming with small messages. It also increases OkHttp performance for
the same benchmark 40% and decreases unary latency by 3µs for Netty and
10µs for OkHttp.
We avoid calling listener after closure because the Executor used for
RPC callbacks may no longer be available. This issue was already
present in the ApplicationThreadDeframer, but full-stream compression is
not really deployed so was unnoticed.
DirectExecutor saw a 5-6µs latency increase via MigratingDeframer.
DirectExecutor usages should see no benefit from MigratingDeframer, so
disable it in that case.
This PR changes the `NettyServerTransport#getLogLevel` method to log
`SocketException`s to `LogLevel.FINE`, rather than exclusively pure
IOExceptions. This may fix an unintentional regression introduced in
c166ec2c, although the message in my java version (14.0.1) wouldn't have
passed the old logic for quieting either. This also fixes the issue
raised in #6423 that was locked for inactivity.
This fixes
```
[2020/05/14 20:21:52 INFO] [io.grpc.netty.NettyServerTransport.connections] Transport failed
java.net.SocketException: Connection reset
at java.base/sun.nio.ch.SocketChannelImpl.throwConnectionReset(SocketChannelImpl.java:345)
at java.base/sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:376)
at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:288)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1125)
at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:347)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:148)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:677)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:612)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:529)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:491)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:905)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:832)
```
being logged to level INFO, which occurs whenever a socket is improperly
closed by the client, such as with the grpc-health-probe (They've got an
[open issue](https://github.com/grpc-ecosystem/grpc-health-probe/issues/34)
for this)
The race between new streams and transport shutdown is #2562, but it is still
far from being generally solved. This reduces the race window of new streams
from (transport selection → stream created on network thread) to (transport
selection → stream enqueued on network thread). Since only a single thread now
needs to do work in the stream creation race window, the window should be
dramatically smaller.
This only reduces GOAWAY races when the server performs a graceful shutdown
(using two GOAWAYs), as that is the only non-racy way on-the-wire to shutdown a
connection in HTTP/2.
This PR is to add one more Executor parameter when creating the SslContext.
In Netty, we already have this implementation for passing Executor when creating SslContext: netty/netty#8847. This extra Executor is used to take some time-consuming tasks when doing SSL handshake. However, in current gRPC implementation, we are not using this API.
In this PR, the relevant changes are:
1. get the executorPool from ChannelBuilder or ServerBuilder
2. pass the executorPool all the way down to ClientTlsHandler
3. fill executorPool in when creating SslHandler
The allocator has a circular reference that prevents it from GC'ed,
thus causes memory leak if gRPC Channels are created and shutdown
(even cleanly) on a regular basis.
See https://github.com/netty/netty/issues/6891#issuecomment-457809308
and internal b/146074696.
This would cut the amount of per-thread direct buffer allocations by half, especially with light traffic. This will also cut the amount of file descriptors that's created per thread by half.
Internal benchmark results (median of 5 runs) doesn't show any significant change:
```
Before (STDEV) After (STDEV)
grpc-java-java-multi-qps-integrity_only
Actual QPS 711,004 (6,246) 704,372 (6,873)
QPS per Client CPU 23,921 (252) 24,188 (252)
grpc-java-java-multi-throughput-integrity_only
Actual QPS 35,326 (48) 35,294 (29)
QPS per Client CPU 3,362 (17) 3,440 (13)
grpc-java-java-single-latency-integrity_only
Median latency (us) 127 (2.77) 129 (3.13)
grpc-java-java-single-throughput-integrity_only
Actual QPS 581 (11.60) 590 (7.08)
QPS per Client CPU 490 (10.98) 498 (5.63)
```
This would reduce the amount of direct buffer allocations, especially with light traffic. This should mitigate internal issue b/143075435
The change is currently optional and is only effective if system property "io.grpc.netty.useCustomAllocator" is set to "true" ignoring the case.
Internal benchmark results (median of 5 runs) doesn't show any significant change:
```
Before (STDEV) After (STDEV)
grpc-java-java-multi-qps-integrity_only
Actual QPS 717,848 (7,445) 715,061 (2,122)
QPS per Client CPU 23,768 (799) 23,842 (295)
grpc-java-java-multi-throughput-integrity_only
Actual QPS 35,631 (204) 35,298 (25)
QPS per Client CPU 3,362 (56) 3,316 (18)
grpc-java-java-single-latency-integrity_only
Median latency (us) 130 (1.82) 125 (5.36)
grpc-java-java-single-throughput-integrity_only
Actual QPS 593 (5.14) 587 (3.76)
QPS per Client CPU 502 (4.51) 494 (6.92)
```
It appears the problem is that server.close() was missing sync(), so the
event loop was still processing the closure when the next test started.
This change is more aggressive than it needs to be, but should make it
less bug-prone.
Fixes#5574
This implicit loading is more conservative than the loading for
tcnative, as Conscrypt will only be implicitly loaded if there are no
other options. This means the Java 9+ JSSE is preferred over Conscrypt
without explicit user configuration.
While we would generally prefer Conscrypt over JSSE, we want to allow
the user to configure their security providers. There wasn't a good way
to do that with netty-tcnative and the performance of JSSE at the time
was abysmal.
While generally being a good way to allow adopting Conscrypt, this also
allows easily using App Engine Java 8's provided Conscrypt which can
substantially reduce binary size.
See googleapis/google-cloud-java#6425
This change adds two booleans to the ChannelBuilders to
allow transports to use get and put. These are currently defaulted to
on, but unset on the method descriptors. This change is 1/2 that will
allow the safe / idempotent bits to be set on generated proto code.
Part 2/2 will actually enable it.
The use case for this is for interceptors that implement caching logic.
They need to be able to access the safe/idempotent bits on the MD in
order to decide to how to handle the request, even if gRPC doesn't use
GET / PUT HTTP methods.
Http2ControlFrameLimitEncoder is from Netty. It is copied here as a
temporary measure until we upgrade to the version of Netty that includes
the class.
See CVE-2019-9515