Add document describing current problem with context propagation (#910)

* Add document describing current problem with context propagation * Update docs/contributing/inter-thread-context-propagation.md Co-authored-by: Trask Stalnaker <trask.stalnaker@gmail.com> * Update docs/contributing/inter-thread-context-propagation.md Co-authored-by: Trask Stalnaker <trask.stalnaker@gmail.com> * More clarifications * More clarifications * Better example Co-authored-by: Trask Stalnaker <trask.stalnaker@gmail.com>
2020-08-12 20:33:07 +03:00 · 2020-08-12 20:33:07 +03:00 · e26801482c
parent 51e06100ca
commit e26801482c
2 changed files with 107 additions and 2 deletions
--- a/docs/contributing/inter-thread-context-propagation.md
+++ b/docs/contributing/inter-thread-context-propagation.md
@ -0,0 +1,106 @@
 # The story of context propagation across threads
 ## The need
 Take a look at the following two pseudo-code snippets (see below for explanations).
 ```
 Executor pool = Executors.newFixedThreadPool(10)
 public void doGet(HttpServletRequest request, HttpServletResponse response) {
    Future f1 = pool.submit(() -> {
        return userRepository.queryShippingAddress(requet)
    })
    Future f2 = pool.submit(() -> {
        return warehouse.currentState(requet)
    })
    writeResponse(response, f1.get(), f2.get())
 }
 ```
 ```
 Executor pool = Executors.newFixedThreadPool(10)
 public void doGet(HttpServletRequest request, HttpServletResponse response) {
    final AsyncContext acontext = request.startAsync();
    acontext.start(() -> {
            String address = userRepository.queryShippingAddress(requet)
            HttpServletResponse response = acontext.getResponse();
            writeResponse(response, address)
            acontext.complete();
   }
 }
 ```
 In both cases request processing requires some potentially long operation and application developer
 wants to do them off the main thread. In the first case this hand-off between request accepting thread
 and request processing thread happens manually, by submitting work into some thread pool.
 In the second case it is the framework that handles separate thread pool and passing work to it.
 In cases like this proper tracing solution should still combine into a single trace all the work
 required for request processing, regardless in what thread that work happened. With proper
 parent-child relationship between span: span representing shipping address query should be the child
 of the span which denotes accepting HTTP request.
 ## The solution
 Java auto instrumentation uses an obvious solution to the requirement above: we attach current execution
 context (represented in the code by `io.grpc.Context`) with each `Runnable`, `Callable` and `ForkJoinTask`.
 "Current" means the context active on the thread which calls `Executor.execute` (and its analogues
 such as `submit`, `invokeAll` etc) at the moment of that call. Whenever some other thread starts
 actual execution of that `Runnable` (or `Callable` or `ForkJoinTask`), that context get restored
 on that thread for the duration of the execution. This can be illustrated by the following pseudo-code:
 ```
    var job = () -> {
        try(Scope scope = withScopedContext(this.context)) {
            return userRepository.queryShippingAddress(requet)
        }}
    job.context = Context.current()
    Future f1 = pool.submit()
 ```
 ## The drawback
 Here is a simplified example of what async servlet processing may look like
 ```
 protected void service(HttpServletRequest req, HttpServletResponse resp) {
    //This method is instrumented and we start new scope here
    AsyncContext context = req.startAsync()
    // When the runnable below is being submitted by servlet engine to an executor service
    // it will capture the current context (together with the current span) with it
    context.start {
        // When Runnable starts, we reactive the captured context
        // So this method is executed with the same context as the original "service" method
        resp.writer.print("Hello world!")
        context.complete()
    }
 }
 ```
 If we now take a look inside `context.complete` method from above it may be implemented like this:
 ```
 //Here we still have the same context from above active
 //It gets attached to this new runnable
 pool.submit(new AcceptRequestRunnable() {
 // The same context from above is propagated here as well
 // Thus new reqeust processing will start while having a context active with some span inside
 // That span will be used as parent spans for new spans created for a new request
    ...
 })
 ```
 This means that mechanism described in the previous section will propagate the execution context
 of one request processing to a thread accepting some next, unrelated, request.
 This will result in spans representing the accepting and processing of the second request will join
 the same trace as those of the first span. This mistakenly correlates unrelated requests and may lead
 to huge traces being active for hours and hours.
 In addition this makes some of our tests extremely flaky.
 ## The currently accepted trade-offs
 We acknowledge the problem with too active context propagation. We still think that out of the box
 support for asynchronous multi-threaded traces is very important. We have diagnostics in place to
 help us with detecting when we too eagerly propagate the execution context too far. We hope to
 gradually find framework-specific countermeasures to such problem and solve them one by one.
 In the meantime, processing new incoming request in the given JVM and creating new `SERVER` span
 always starts with a clean context.
--- a/instrumentation/servlet/servlet-3.0/src/test/groovy/TestServlet3.groovy
+++ b/instrumentation/servlet/servlet-3.0/src/test/groovy/TestServlet3.groovy
@ -16,11 +16,10 @@
 import groovy.servlet.AbstractHttpServlet
 import io.opentelemetry.auto.test.base.HttpServerTest
-
+import java.util.concurrent.Phaser
 import javax.servlet.annotation.WebServlet
 import javax.servlet.http.HttpServletRequest
 import javax.servlet.http.HttpServletResponse
 import java.util.concurrent.Phaser
 import static io.opentelemetry.auto.test.base.HttpServerTest.ServerEndpoint.ERROR
 import static io.opentelemetry.auto.test.base.HttpServerTest.ServerEndpoint.EXCEPTION