For transparent retry.
Also allow non-WFR RPCs to retry indefinitely on errors that resulted in no I/O; the spec used to forbid it at one point during development, but it no longer does.
Performance benchmarks can be found below. Obviously, a 8 KiB
request/response is tailored to showcase this improvement as this is
where codec buffer reuse shines, but I've run other benchmarks too (like
1-byte requests and responses) and there's no discernable impact on
performance.
We do not allow reuse of buffers when stat handlers or binlogs are
turned on. This is because those two may need access to the data and
payload even after the data has been written to the wire. In such cases,
we never return the data back to the pool.
A buffer reuse threshold of 1 KiB was determined after several
experiments. There's diminished returns when buffer reuse is enabled for
smaller messages (actually, a negative impact).
unary-networkMode_none-bufConn_false-keepalive_false-benchTime_40s-trace_false-latency_0s-kbps_0-MTU_0-maxConcurrentCalls_6-reqSize_8192B-respSize_8192B-compressor_off-channelz_false-preloader_false
Title Before After Percentage
TotalOps 839638 906223 7.93%
SendOps 0 0 NaN%
RecvOps 0 0 NaN%
Bytes/op 103788.29 80592.47 -22.35%
Allocs/op 183.33 189.30 3.27%
ReqT/op 1375662899.20 1484755763.20 7.93%
RespT/op 1375662899.20 1484755763.20 7.93%
50th-Lat 238.746µs 225.019µs -5.75%
90th-Lat 514.253µs 456.439µs -11.24%
99th-Lat 711.083µs 702.466µs -1.21%
Avg-Lat 285.45µs 264.456µs -7.35%
`transport/Stream.RecvCompress` returns what the header contains, if present,
or empty string if a context error occurs. However, it "prefers" the header
data even if there is a context error, to prevent a related race. What happens
here is:
1. RPC starts.
2. Client cancels RPC.
3. `RecvCompress` tells `ClientStream.Recv` that compression used is "" because
of the context error. `as.decomp` is left nil, because there is no
compressor to look up in the registry.
4. Server's header and first message hit client.
5. Client sees the header and message and allows grpc's stream to see them.
(We only provide context errors if we need to block.)
6. Client performs a successful `Read` on the stream, receiving the gzipped
payload, then checks `as.decomp`.
7. We have no decompressor but the payload has a bit set indicating the message
is compressed, so this is an error. However, when forming the error string,
`RecvCompress` now returns "gzip" because it doesn't need to block to get
this from the now-received header. This leads to the confusing message
about how "gzip" is not installed even though it is.
This change makes `waitOnHeader` close the stream when context cancellation happens.
Then `RecvCompress` uses whatever value is present in the stream at that time, which
can no longer change because the stream is closed. Also, this will be in sync with
the messages on the stream - if there are any messages present, the headers must
have been processed first, and `RecvCompress` will contain the proper value.
Before these fixes, it was possible to see errors on new RPCs after a
connection began draining, and before establishing a new connection. There is
an inherent race between choosing a SubConn and attempting to creating a stream
on it. We should be able to avoid application-visible RPC errors due to this
with transparent retry. However, several bugs were preventing this from
working correctly:
1. Non-wait-for-ready RPCs were skipping transparent retry, though the retry
design calls for retrying them.
2. The transport closed itself (and would consequently error new RPCs) before
notifying the SubConn that it was draining.
3. The SubConn wasn't synchronously updating itself once it was notified about
the closing or draining state.
4. The SubConn would go into the TRANSIENT_FAILURE state instantaneously,
causing RPCs to fail instead of queue.