Performance benchmarks can be found below. Obviously, a 8 KiB
request/response is tailored to showcase this improvement as this is
where codec buffer reuse shines, but I've run other benchmarks too (like
1-byte requests and responses) and there's no discernable impact on
performance.
We do not allow reuse of buffers when stat handlers or binlogs are
turned on. This is because those two may need access to the data and
payload even after the data has been written to the wire. In such cases,
we never return the data back to the pool.
A buffer reuse threshold of 1 KiB was determined after several
experiments. There's diminished returns when buffer reuse is enabled for
smaller messages (actually, a negative impact).
unary-networkMode_none-bufConn_false-keepalive_false-benchTime_40s-trace_false-latency_0s-kbps_0-MTU_0-maxConcurrentCalls_6-reqSize_8192B-respSize_8192B-compressor_off-channelz_false-preloader_false
Title Before After Percentage
TotalOps 839638 906223 7.93%
SendOps 0 0 NaN%
RecvOps 0 0 NaN%
Bytes/op 103788.29 80592.47 -22.35%
Allocs/op 183.33 189.30 3.27%
ReqT/op 1375662899.20 1484755763.20 7.93%
RespT/op 1375662899.20 1484755763.20 7.93%
50th-Lat 238.746µs 225.019µs -5.75%
90th-Lat 514.253µs 456.439µs -11.24%
99th-Lat 711.083µs 702.466µs -1.21%
Avg-Lat 285.45µs 264.456µs -7.35%
* protoCodec: return early if proto.Marshaler
If the object to marshal implements proto.Marshaler, delegate to that
immediately instead of pre-allocating a buffer. (*proto.Buffer).Marshal
has the same logic, so the []byte buffer we pre-allocate in codec.go
would never be used.
This is mainly for users of gogoproto. If you turn on the "marshaler"
and "sizer" gogoproto options, the generated Marshal() method already
pre-allocates a buffer of the appropriate size for marshaling the
entire object.
* protoCodec: return early if proto.Unmarshaler
If the object to unmarshal implements proto.Unmarshaler, delegate to
that immediately. This perhaps saves a bit of work preparing the
cached the proto.Buffer object which would not end up being used for
the proto.Unmarshaler case.
Note that I moved the obj.Reset() call above the delegation to
obj.Unmarshal(). This maintains the grpc behavior of
proto.Unmarshalers always being Reset() before being delegated to,
which is consistent to how proto.Unmarshal() behaves (proto.Buffer
does not call Reset() in Unmarshal).
* use a global sharded pool of proto.Buffer caches in protoCodec
* fix goimports
* make global buffer pool index counter atomic
* hack to remove alloc in encode_len_struct
* remove extra slice alloc in proto codec marshal
* replce magic number for proto size field length with constant
* replace custom cache with sync.Pool
* remove 1 line functions in codec.go and add protoCodec microbenchmarks
* add concurrent usage test for protoCodec
* fix golint.gofmt,goimport checks
* fix issues in codec.go and codec_test.go
* use go parallel benchmark helpers
* replace proto.Codec with a guess of size needed
* update Fatalf -> Errorf in tests
* wrap proto.Buffer along with cached last size into larger struct for pool use
* make wrapped proto buffer only a literal
* fix style and imports
* move b.Run into inner function
* reverse micro benchmark op order to unmarshal-marshal and fix benchmark setup-in-test bug
* add test for large message
* remove use of defer in codec.marshal
* revert recent changes to codec bencmarks
* move sub-benchmarks into >= go-1.7 only file
* add commentfor marshaler and tweak benchmark subtests for easier usage
* move build tag for go1.7 on benchmarks to inside file
* move build tag to top of file
* comment Codec, embed proto.Buffer into cached struct and add an int32 cap