internal: fix GO_AWAY deadlock
A deadlock can occur when a GO_AWAY is followed by a connection closure. This
happens because onClose needlessly closes the current ac.transport: if a
GO_AWAY already occured, and the transport was already reset, then the later
closure (of the original address) sets ac.transport - which is now healthy -
to nil.
The manifestation of this problem is that picker_wrapper spins forever trying
to use a READY connection whose ac.transport is nil.
internal: fix client send preface problems
This CL fixes three problems:
- In clientconn_state_transitions_test.go, sometimes tests would flake because there's not enough buffer to send client side settings, causing the connection to unpredictably enter TRANSIENT FAILURE. Each time we set up a server to send SETTINGS, we should also set up the server to read. This allows the client to successfully send its SETTINGS, unflaking the test.
- In clientconn.go, we incorrectly transitioned into TRANSIENT FAILURE when creating an http2client returned an error. This should be handled in the outer resetTransport main reset loop. The reason this became a problem is that the outer resetTransport has very specific conditions around when to transition into TRANSIENT FAILURE that the egregious transition did not have. So, it could transition into TRANSIENT FAILURE after failing to dial, even if it was trying to connect to a non-final address in the list of addresses.
- In clientconn.go, we incorrectly stay in CONNECTING after `createTransport` when a server sends its connection preface but the client is not able to send its connection preface. This CL causes the addrconn to correctly enter TRANSIENT FAILURE when `createTransport` fails, even if a server preface was received. It does so by making ac.successfulHandshake to consider both server preface received as well as client preface sent.
Closing `ClientConn` sets `balancerWrapper` to nil.
If service config switches balancer, the new balancer will be notified of the existing addresses.
When these two happens together, there's a chance that a method will be called on the nil `balancerWrapper`. This change adds a check to make sure that never happens.
fixes#2367
internal: fix onClose state transitions
When onClose occurs during WaitForHandshake, it should immediately
exit createTransport instead of leaving CONNECTING and entering READY.
Furthermore, when any onClose happens, the state should change to
TRANSIENT FAILURE.
Fixes#2340Fixes#2341
Also fixes an unreported bug in which entering READY causes a
Dial call to end prematurely, instead of blocking until a READY
transport is found.
Google default creds is a combo of ALTS, TLS and OAuth2. The right set of creds will be picked to use based on environment.
This PR contains:
- A new `creds.Bundle` type
- changes to use it in ClientConn and transport
- dial option to set the bundle for a ClientConn
- balancer options and NewSubConnOption to set it for SubConn
- Google default creds implementation by @cesarghali
- grpclb changes to use different creds mode for different servers
- interop client changes for google default creds testing
This fixes a race in ac.tearDown and ac.resetTransport. If ac.resetTransport
is in backoff when ac.tearDown occurs, there's a race between the state
changing to Shutdown and ac.resetTransport calling ac.createTransport.
This fixes it by returning when ac.resetTransport encounters an error
during ac.nextAddr (specifically ac.ctx.Error()). It also fixes it by
making sure that ac.tearDown changes state to Shutdown before canceling
the context.
Both fixes were implemented because they both seem to be valuable
standalone additions: the former makes ac.resetTransport more
understandable and less dependent on behavior happening elsewhere,
and the latter makes ac.tearDown more correct.
Finally, TestDialParseTargetUnknownScheme had its buffer removed; the
buffer was likely added a while ago to assuage this issue. It should
not be necessary anymore.
internal: remove transportMonitor, replace with callbacks
This refactors the internal http2 transport to use callbacks instead
of continuously monitoring the transport in a separate goroutine. This
has several advantages:
- Less goroutines.
- Less complexity: synchronous callbacks are much easier to reason to
reason about than asynchronous monitoring goroutines.
- Callbacks: these provide definitive locations for monitoring the
creation and closure of a transport, paving the way for GracefulStop.
This CL also consolidates all the logic about backoff and iterating
through the list of addresses into a single method.
This is a breaking change, but the transport package was never intended for use outside of grpc. Any current users that we are aware of are incorrect or have a preferred alternative.
This PR splits out grpclb from grpc. I have made the PR in several commits so you can see more clearly the steps that happened.
There are a few possibly contentious points that I would like to make clear up front:
* grpclb will no longer autoload as a load balancer. I think this is okay, as service config is not widely (at all?) used, and I believe this is the only way to access it.
* `internal` is used more, as a way of having code shared between packages without exposing types
* ConnectivityStateEvaluator, as used by grpclb, is no longer thread safe. I believe there is an outer mutex that guards access, but I want to point out this subtle change up here.
All but one tests pass with this, due to another cyclic dependency. I can fix this, but it is a little more widely scoped (such as exposing grpc.server and grpc.errorDesc in the internal package). This PR is a nearly-passing sample of that last step to get this working.
PTAL @menghanl @dfawley
This fixes:
clientconn.go:948:3: should write m = cc.sc.Methods[method[:i+1]] instead of m, _ = cc.sc.Methods[method[:i+1]] (S1005)
encoding/proto/proto_test.go:43:5: should use !bytes.Equal(p.GetBody(), expectedBody) instead (S1004)
resolver/dns/dns_resolver.go:260:2: should merge variable declaration with assignment on next line (S1021)
resolver/dns/dns_resolver.go:344:2: should use 'return <expr>' instead of 'if <expr> { return <bool> }; return <bool>' (S1008)
This change introduces some behavior changes that should not impact users that
are following the proper stream protocol. Specifically, one of the following
conditions must be satisfied:
1. The user calls Close on the ClientConn.
2. The user cancels the context provided to NewClientStream, or its deadline
expires. (Note that it if the context is no longer needed before the deadline
expires, it is still recommended to call cancel to prevent bloat.) It is always
recommended to cancel contexts when they are no longer needed, and to
never use the background context directly, so all users should always be
doing this.
3. The user calls RecvMsg (or Recv in generated code) until a non-nil error is
returned.
4. The user receives any error from Header or SendMsg (or Send in generated
code) besides io.EOF. If none of the above happen, this will leak a goroutine
and a context, and grpc will not call the optionally-configured stats handler
with a stats.End message.
Before this change, if a user created a stream and the server ended the stream,
the stats handler would be invoked with a stats.End containing the final status
of the stream. Subsequent calls to RecvMsg would then trigger the stats handler
with InPayloads, which may be unexpected by stats handlers.
WithBalancerName dial option specifies the name of the balancer to be used by the ClientConn. Service config updates can NOT override the balancer option.
This allows ClientConn to get more up-to-date addresses from resolver.
ClientConn compares new addresses with the cached ones. So if resolver returns the same set of addresses, ClientConn will not notify balancer about it.
Also moved the initialization of resolver and balancer to avoid race. Balancer will only be started when ClientConn gets resolved addresses from balancer.
- The transport is now responsible for closing its own connection when an error
occurs or when the context given to it in NewClientTransport() is canceled.
- Remove client/server shutdown channels -- add cancel function to allow
self-cancellation.
- Plumb the clientConn's context into the client transport to allow for the
transport to be canceled even after it has been removed from the ac (due to
graceful close) when the ClientConn is closed.
A leak happens when DialContext times out before a balancer returns any
addresses or before a successful connection is established.
The loop in ClientConn.lbWatcher breaks and doneChan never gets closed.
- All logs use 1 severity level instead of printf
- All transport logs only go to verbose level 2+
- The default logger only log errors and verbosity level 1
- Add environment variable GRPC_GO_LOG_SEVERITY_LEVEL and GRPC_GO_LOG_VERBOSITY_LEVEL to set severity or verbosity levels for the default logger
When timeout is not hit `time.After` will leak unnecessary timer, so
it's better to stop timer explicitly.
Signed-off-by: Alexander Morozov <lk4d4math@gmail.com>
With this change, the default dialer checks environment variables to see if proxy is needed. If so, it dials to the proxy and does an HTTP CONNECT handshake.
This modifies the WithBlock behavior somewhat to block until there is at least
one valid connection. Previously, each connection would be made serially until
all had completed successfully, with any errors returned to the caller. Errors
are now only returned due to connecting to a backend if a balancer is not used,
or if there is an error starting the balancer itself.
Fixes#976
The :authority pseudo-header for a gRPC Client defaults to the host
portion of the dialed target and can only be overwritten by providing a
TransportCredentials. However, there are cases where setting this header
independent of any tranport security is valid. In my particular case,
in order to leverage Envoy for request routing, the cluster/service name
must be provided in the :authority header. This may also be useful in a
testing context.
This patch adds a DialOption to overwrite the authority header,
even if TransportCredentials are provided (I'd imagine you'd only ever
need to specify one or the other).
* Add the initial service config support
* start scWatcher later
* remove timeoutCh
* address the comments
* deal with dial timeout
* defer cancel for the newly created context for correct lifetime management
* fix the defer order
* added other 2 missing cancels