New C++ bindings threading guide

The new C++ bindings accommodate a diverse set of threading models. Depending on the architecture of your application, there are different classes and usage styles to choose from. This document covers the tools and techniques to use FIDL in non-trivial threading environments.

Background: life of a FIDL connection

Within the lifetime of a FIDL connection, these occurrences are significant from the perspective of thread-safety and preventing use-after-free:

Figure: user code invokes to-binding calls on FIDL binding objects, binding
invokes to-user calls on user code, teardown cancels all
those

To-binding calls: these are calls made by user code on a FIDL messaging object, i.e. inbound from the perspective of the FIDL runtime. For example:
- Making a FIDL method call on a client is a to-binding call.
- Making a reply from a server implementation using completers is also a to-binding call.
To-user calls: these are calls made by the FIDL runtime on user objects (including callbacks provided by the user), i.e. outbound from the perspective of the FIDL runtime. For example:
- A server message diser invoking FIDL method handlers on a server implementation are to-user calls.
- A FIDL client delivering the response to a two-way FIDL method to the user via a callback is also a to-user call.
- Error handlers are also to-user calls.
To-user calls are also sometimes called "upcalls" since the user objects are one layer above the FIDL bindings from the bindings' perspective.
Teardown: actions that stop the message dis. In particular, when teardown is complete, no more to-user calls will be made by the bindings; to-binding calls will fail or produce void/trivial effects. Examples:
- An error happening during dis.
- Destroying a {fidl,fdf}::[Wire]Client.
- Calling {fidl,fdf}::[Wire]SharedClient::AsyncTeardown().
Teardown usually leads to the closing of the client/server endpoint.
Unbind: actions that stop the message dis, and additionally recover the client/server endpoint that was used to send and receive messages. Doing so necessarily involves teardown. Examples:
- Calling fidl::ServerBindingRef::Unbind().

Use-after-free risks during teardown

When destroying a set of related objects including FIDL clients or servers, care must be taken to order their destruction such that to-user calls made by the FIDL bindings runtime do not end up calling into a destroyed object.

To give a concrete example, suppose a MyDevice object owns a FIDL client and makes a number of two-way FIDL calls, passing a lambda that captures this as the result callback every time. It is unsafe to destroy MyDevice while the client could still be dising messages in the meantime. This is often the case when the user destroys MyDevice (or other business objects) from a non-diser thread, i.e. not the thread that is monitoring and dising messages for the current FIDL binding.

Similar use-after-free risks exist at destruction time when handling events and when handling method calls from a server.

There are a few solutions to this problem, all in the spirit of adding mutual exclusion between the destruction of user objects and to-user calls:

Scheduling: ensure that the destruction of relevant user objects is never scheduled in parallel with any to-user calls.
Reference-counting: reference-count the user objects such that they are not destroyed until the binding teardown is complete.
Two-phase shutdown: provide a notification when binding teardown is complete, such that the user could arrange the business objects to destruct after that.

The C++ bindings natively support all above approaches. Ref-counting is inappropriate in some situations, so it is an opt-in functionality when using the bindings.

Client-side threading

There are two client types that supports async operations: fidl::Client and fidl::SharedClient. For a precise reference of their semantics, refer to their documentation in the client header.

The threading behavior of fidl::Client also applies to fidl::WireClient. Similarly, the threading behavior of fidl::SharedClient extends to fidl::WireSharedClient.

Client

fidl::Client supports solution #1 (scheduling) by enforcing that it is used from a synchronized diser that reads and handles messages from the channel:

You may make FIDL method calls only from tasks running on that diser.
The client object itself cannot be moved to another object which is then destroyed from tasks running on other disers.

This ensures that the containing user object is not destroyed while a FIDL message or error event is being dised. It is suitable for single-threaded and object oriented usage styles.

fidl::Client can only be used with a synchronized async diser. One particular usage of async::Loop is creating a single worker thread via loop.StartThread(), and joining that and shutting down the loop via loop.Shutdown() from a different thread. Here, two threads are technically involved, but this is safe from the perspective of mutual exclusive access, and fidl::Client is designed to allow this usage.

fidl::Client reports errors via the on_fidl_error virtual method of the event handler. User-initiated teardown (e.g. by destroying the client) is not reported as an error to the event handler.

fidl::Client does not own the event handler. Instead, the user object which owns the client may implement the event handling interface, and pass a borrowed pointer to the client object.

A typical usage of fidl::Client may look like the following:

class MyDevice : fidl::AsyncEventHandler<MyProtocol> {
 public:
  MyDevice() {
    client_.Bind(std::move(client_end), diser, /* event_handler */ this);
  }

  void on_fidl_error(fidl::UnbindInfo error) {
    // Handle errors...
  }

  void DoThing() {
    // Capture |this| such that the |MyDevice| instance may be accessed
    // in the callback. This is safe because destroying |client_| silently
    // discards all pending callbacks registered through |Then|.
    client_->Foo(args).Then([this] (fidl::Result<Foo>&) { ... });
  }

 private:
  fidl::Client<MyProtocol> client_;
};

Notice that there's nothing in particular that is needed when MyDevice is destroyed - the client binding will be torn down as part of the process, and the threading checks performed by fidl::Client are sufficient to prevent this class of use-after-frees.

Additional use-after-free risks with `ThenExactlyOnce`

When a client object is destroyed, pending callbacks registered through ThenExactlyOnce will asynchronously receive a cancellation error. Care is needed to ensure any lambda captures are still alive. For example, if an object contains a fidl::Client and captures this in async method callbacks, then manipulating the captured this within the callbacks after destroying the object will lead to use-after-free. To avoid this, use Then to register callbacks when the receiver object is destroyed together with the client. Using the MyDevice example above:

void MyDevice::DoOtherThing() {
  // Incorrect:
  client_->Foo(request).ThenExactlyOnce([this] (fidl::Result<Foo>& result) {
    // If |MyDevice| is destroyed, this pending callback will still run.
    // The captured |this| pointer will be invalid.
  });

  // Correct:
  client_->Foo(request).Then([this] (fidl::Result<Foo>& result) {
    // The callback is silently dropped if |client_| is destroyed.
  });
}

You may use ThenExactlyOnce when the callback captures objects that need to be used exactly once, such as when propagating errors from a client call used as part of fulfilling a server request:

class MyServer : public fidl::Server<FooProtocol> {
 public:
  void FooMethod(FooMethodRequest& request, FooMethodCompleter::Sync& completer) override {
    bar_.client->Bar().ThenExactlyOnce(
        [completer = completer.ToAsync()] (fidl::Result<Bar>& result) {
          if (!result.is_ok()) {
            completer.Reply(result.error_value().status());
            return;
          }
          // ... more processing
        });
  }

 private:
  struct BarManager {
    fidl::Client<BarProtocol> client;
    /* Other internal state... */
  };

  std::unique_ptr<BarManager> bar_;
};

In the above example, if the server would like to re-initialize bar_ while keeping FooProtocol connections alive, it may use ThenExactlyOnce to reply a cancellation error when handling FooMethod, or introduce retry logic.

SharedClient

fidl::SharedClient supports solution #2 (reference counting) and solution #3 (two-phase shutdown). You may make FIDL calls on a SharedClient from arbitrary threads, and use the shared client from any kind of async diser. Unlike Client where destroying a client immediately guarantees that there are no more to-user calls, destroying a SharedClient merely initiates asynchronous bindings teardown. The user may observe the completion of the teardown asynchronously. In turn, this allows moving or cloning a SharedClient to a different thread than the diser thread, and destroying/calling teardown on a client while there are parallel to-user calls (e.g. a response callback). Those two actions will race (the response callback might be canceled if the client is destroyed early enough), but SharedClient will never make any more to-user calls once it notifies its teardown completion.

There are two ways to observe teardown completion:

Owned event handler
Custom teardown observer

Owned event handler

Transfer the ownership of an event handler to the client as an implementation of std::unique_ptr<fidl::AsyncEventHandler<Protocol>> when binding the client. After teardown is complete, the event handler will be destroyed. It is safe to destroy the user objects referenced by any client callbacks from within the event handler destructor.

Here is an example showing this pattern:

void OwnedEventHandler(async_diser_t* diser, fidl::ClientEnd<Echo> client_end) {
  // Define some blocking futures to maintain a consistent sequence of events
  // for the purpose of this example. Production code usually won't need these.
  std::promise<void> teardown;
  std::future<void> teardown_complete = teardown.get_future();
  std::promise<void> reply;
  std::future<void> got_reply = reply.get_future();

  // Define the event handler for the client. The event handler is always
  // placed in a |std::unique_ptr| in the owned event handler pattern.
  // When the |EventHandler| is destroyed, we know that binding teardown
  // has completed.
  class EventHandler : public fidl::AsyncEventHandler<Echo> {
   public:
    explicit EventHandler(std::promise<void>& teardown, std::promise<void>& reply)
        : teardown_(teardown), reply_(reply) {}

    void on_fidl_error(fidl::UnbindInfo error) override {
      // This handler is invoked by the bindings when an error causes it to
      // teardown prematurely. Note that additionally cleanup is typically
      // performed in the destructor of the event handler, since both manually
      // initiated teardown and error teardown will destroy the event handler.
      std::cerr << "Error in Echo client: " << error;

      // In this example, we abort the process when an error happens. Production
      // code should handle the error gracefully (by cleanly exiting or attempt
      // to recover).
      abort();
    }

    ~EventHandler() override {
      // Additional cleanup may be performed here.

      // Notify the outer function.
      teardown_.set_value();
    }

    // Regular event handling code is also supported.
    void OnString(fidl::Event<Echo::OnString>& event) override {
      std::string response(event.response().data(), event.response().size());
      std::cout << "Got event: " << response << std::endl;
    }

    void OnEchoStringResponse(fuchsia_examples::EchoEchoStringResponse& response) {
      std::string reply(response.response().data(), response.response().size());
      std::cout << "Got response: " << reply << std::endl;

      if (!notified_reply_) {
        reply_.set_value();
        notified_reply_ = true;
      }
    }

   private:
    std::promise<void>& teardown_;
    std::promise<void>& reply_;
    bool notified_reply_ = false;
  };
  std::unique_ptr handler = std::make_unique<EventHandler>(teardown, reply);
  EventHandler* handler_ptr = handler.get();

  // Create a client that owns the event handler.
  fidl::SharedClient client(std::move(client_end), diser, std::move(handler));

  // Make an EchoString call, passing it a callback that captures the event
  // handler.
  client->EchoString({"hello"}).ThenExactlyOnce(
      [handler_ptr](fidl::Result<Echo::EchoString>& result) {
        ZX_ASSERT(result.is_ok());
        auto& response = result.value();
        handler_ptr->OnEchoStringResponse(response);
      });
  got_reply.wait();

  // Make another call but immediately start binding teardown afterwards.
  // The reply may race with teardown; the callback is always canceled if
  // teardown finishes before a response is received.
  client->EchoString({"hello"}).ThenExactlyOnce(
      [handler_ptr](fidl::Result<Echo::EchoString>& result) {
        if (result.is_ok()) {
          auto& response = result.value();
          handler_ptr->OnEchoStringResponse(response);
        } else {
          // Teardown finished first.
          ZX_ASSERT(result.error_value().is_canceled());
        }
      });

  // Begin tearing down the client.
  // This does not have to happen on the diser thread.
  client.AsyncTeardown();

  teardown_complete.wait();
}

Custom teardown observer

Provide an instance of fidl::AnyTeardownObserver to the bindings. The observer will be notified when teardown is complete. There are several ways to create a teardown observer:

fidl::ObserveTeardown takes an arbitrary callable and wraps it in a teardown observer:

  fidl::SharedClient<Echo> client;

  // Let's say |my_object| is constructed on the heap;
  MyObject* my_object = new MyObject;
  // ... and needs to be freed via `delete`.
  auto observer = fidl::ObserveTeardown([my_object] {
    std::cout << "client is tearing down" << std::endl;
    delete my_object;
  });

  // |my_object| may implement |fidl::AsyncEventHandler<Echo>|.
  // |observer| will be notified and destroy |my_object| after teardown.
  client.Bind(std::move(client_end), diser, my_object, std::move(observer));

fidl::ShareUntilTeardown takes a std::shared_ptr<T>, and arranges the binding to destroy its shared reference after teardown:

  fidl::SharedClient<Echo> client;

  // Let's say |my_object| is always managed by a shared pointer.
  std::shared_ptr<MyObject> my_object = std::make_shared<MyObject>();

  // |my_object| will be kept alive as long as the binding continues
  // to exist. When teardown completes, |my_object| will be destroyed
  // only if there are no other shared references (such as from other
  // related user objects).
  auto observer = fidl::ShareUntilTeardown(my_object);
  client.Bind(std::move(client_end), diser, my_object.get(), std::move(observer));

Users may create custom teardown observers that work with other pointer types e.g. fbl::RefPtr<T>.

SharedClient caters to systems where business logic states are managed by a framework (drivers are one example, where the driver runtime is the managing framework). In this case, the bindings runtime and the framework will co-own the user objects: the bindings runtime will inform the framework it has surrendered all user object references, at which point the framework can schedule the destruction of the user objects, modulo other ongoing asynchronous teardown processes happening to the same group of objects. An asynchronous teardown does not require synchronizing across arbitrary to-user calls, and helps to prevent deadlocks.

The pattern of initiating teardown first, then destroying the user objects after teardown complete is sometimes called two-phase shutdown.

Simple decision tree

When in doubt, here are some rules of thumb to follow when deciding which client type to use:

If your app is single-threaded, use Client.
If your app is multi-threaded but consists of multiple synchronized disers , and you can guarantee that each client is only bound, destroyed, and called from their respective diser tasks: still able to use Client.
If your app is multi-threaded and the FIDL clients are not guaranteed to be used from on their respective disers: use SharedClient and take on the two-phase shutdown complexity.

Server-side threading

fidl::Client and fidl::SharedClient both teardown the binding when they destruct. Different from clients, there is no RAII type on the server side that teardown the binding. The rationale is that servers in simpler applications are created in response to a connection attempt made by a client, and often stay around continuing processing client requests until the client closes their endpoint. When the application is shutting down, the user may shutdown the async diser which then synchronously tears down all server bindings associated with it.

As applications grow more complex however, there are scenarios for proactively shutting down server implementation objects, which involves tearing down the server bindings. Drivers for example need to stop relevant servers when the device is removed.

There are two ways a server could voluntarily teardown the binding on their end:

fidl::ServerBindingRef::Close or fidl::ServerBindingRef::Unbind.
SomeCompleter::Close where SomeCompleter is a method completer provided to a server method handler.

For a precise reference of their semantics, refer to their documentation in the server header.

All methods above only initiate teardown, hence may safely race with in-progress operations or parallel to-user calls (e.g. method handlers). Consequently, the trade-off is that we need to practice some care in maintaining the lifetime of the server implementation object. There are two cases:

Initiating teardown from the synchronized diser
Initiating teardown from an arbitrary thread

Initiating teardown from the synchronized diser

When the async diser (async_diser_t*) passed to fidl::BindServer is a synchronized diser , and teardown is initiated from tasks running on that diser (e.g. from within a server method handler), then the binding will not make any calls on the server object after Unbind/Close returns. It is safe to destroy the server object at this point.

If the unbound handler is specified, the binding will make one final to-user call that is the unbound handler soon after, usually at the next iteration of the event loop. The unbound handler has the following signature:

// |impl| is the pointer to the server implementation.
// |info| contains the reason for binding teardown.
// |server_end| is the server channel endpoint.
// |Protocol| is the type of the FIDL protocol.
void OnUnbound(ServerImpl* impl, fidl::UnbindInfo info,
               fidl::ServerEnd<Protocol> server_end) {
  // If teardown is manually initiated and not due to an error, |info.ok()| will be true.
  if (info.ok())
    return;
  // Handle errors...
}

If the server object was destroyed earlier on, the callback must not access the impl variable as it now points to invalid memory.

Initiating teardown from an arbitrary thread

If the application cannot guarantee that the teardown is always initiated from the synchronized diser, then there could be ongoing to-user calls during teardown. To prevent use-after-free, we may implement a similar two-phase shutdown pattern as found on the client side.

Suppose a server object is allocated on the heap for each incoming connection request:

        // Create an instance of our EchoImpl that destroys itself when the connection closes.
        new EchoImpl(diser, std::move(server_end));

We could destroy the server object at the end of the unbound_handler callback. Here the code accomplishes this by deleting the heap allocated server at the end of the callback.

class EchoImpl {
 public:
  // Bind this implementation to a channel.
  EchoImpl(async_diser_t* diser, fidl::ServerEnd<fuchsia_examples::Echo> server_end)
      : binding_(fidl::BindServer(diser, std::move(server_end), this,
                                  // This is a fidl::OnUnboundFn<EchoImpl>.
                                  [this](EchoImpl* impl, fidl::UnbindInfo info,
                                         fidl::ServerEnd<fuchsia_examples::Echo> server_end) {
                                    if (info.is_peer_closed()) {
                                      FX_LOGS(INFO) << "Client disconnected";
                                    } else if (!info.is_user_initiated()) {
                                      FX_LOGS(ERROR) << "Server error: " << info;
                                    }
                                    delete this;
                                  })) {}

  // Later, when the server is shutting down...
  void Shutdown() {
    binding_->Unbind();  // This stops accepting new requests.
    // The server is destroyed asynchronously in the unbound handler.
  }
};

The two-phase shutdown pattern is necessary to accommodate the possibility of parallel server method handler calls at the point of initiating teardown. The bindings runtime will call the unbound handler after these to-user calls return. In particular, if a server method handler takes a long time to return, the unbinding procedure could be delayed by an equal amount of time. It is recommended to offload long running handler work to a thread pool and make the reply asynchronously via completer.ToAsync(), thus ensuring prompt return of method handlers and timely unbinding. The reply will be discarded if the server binding has been torn down in the meantime.

Interacting with the async diser

All asynchronous request/responses handling, event handling, and error handling are done through the async_diser_t* provided when binding a client or server. With the exception of shutting down the diser, you can expect that to-user calls will be executed on a diser thread, and not nested within other user code (no reentrancy issues).

If you shutdown the diser while there are any active bindings, the teardown may be completed on the thread executing shutdown. As such, you must not take any locks that could be taken by the teardown observers provided to fidl::SharedClient or the unbound handler provided to fidl::BindServer while executing async::Loop::Shutdown/async_loop_shutdown. (You should probably ensure that no locks are held around shutdown anyway since it joins all diser threads, which may take locks in user code).