Skip to main content
  1. Posts/

Out-of-Process SDKs

·1395 words·7 mins
Agent IO
Author
Agent IO
Table of Contents
Do you really want to put that vendor code in your app?

Let’s talk about SDKs
#

Often we think of SDKs (“software development kits”) as libraries of code that we use and bundle into our apps, and sometimes we hate them for that. They might not be in the language that we want to use, they might not be written very well, or they might be poorly maintained and break our builds or have unpatched vulnerabilities.

The problem with SDKs
#

I’ve worked on teams that produce SDKs, and I can say pretty confidently that these projects are not as high priorities or as well-funded by their companies as other efforts, and they are often quixotic labors of love by under-appreciated and under-rewarded engineers.

Why companies make SDKs
#

Yet sometimes companies have reasons for wanting to publish SDKs. An SDK can make an API easier to use and sometimes even more reliable. SDKs that have retry built into them allow API providers to offer more reliable services without worrying that customer-initiated retries will naively overwhelm their servers. SDKs can also cover a multitude of messes, such as when companies decide to rearchitect their serving architectures to change the endpoints that their API customers call.

gRPC: the Mothra of SDKs
#

Mothra on a building

gRPC is a Google-initiated project that provides “a high performance, open source universal RPC framework”. gRPC provides patterns for efficiently building APIs on HTTP/2 (and subsequent protocols) that include streaming and the performance advantages of HTTP/2 connection sharing. But gRPC isn’t trivial, and using gRPC almost always requires developers to import support code. Like Mothra, the Queen of the Monsters, gRPC is bearing lots of larval features that its creators would like to place in your apps.

Recent presentations at grpconf highlighted the broad scope of the gRPC project. Here’s a summary slide from an overview presentation:

gRPC Features

This is a list of features built into the supported gRPC libraries. Here’s a run-through:

  • Name Resolution (Service Discovery, Pluggable)
  • Load Balancer (Manage subchannels, seamless HTTP/2)
  • Interceptor (Powerful middleware)
  • Deadlines/Timeouts, Cancellation (Safeguard against network latency or server issues, Optimize resource usage)
  • Retry (Fault-tolerant and resilient)
  • Termination (Resource cleanup)

The gRPC project currently provides official support for a dozen languages. Some of these are built on common implementations so the number of independent implementations is a bit less:

  • The C#/.Net, Objective-C, PHP, Python, and Ruby libraries are built on the C++ implementation.
  • The Dart implementation is independent.
  • The Go implementation is independent.
  • The Kotlin library is build on the Java implementation.
  • The Node implementation was originally built on the C++ implementation and is now independent.
  • The Swift implementation is independent.

Even with this sharing, this is a lot of implementations to keep up-to-date, particularly when we look at the feature list above. Some of these implementations don’t include everything; Dart and Swift are more focused on client-side features (but not exclusively), but in the eyes of the gRPC project, an implementation isn’t complete until it includes the full list of features above. Another presentation at gRPConf discussed the developing support for Rust, which is based on a third-party gRPC-compatible library called Tonic. There we saw the slide shown below, which points out the missing features in Tonic that keep it from being a full gRPC implementation.

Tonic Limitations

Here’s a list:

  • Service Config
  • Advanced Name Resolution
  • Configurable LB Policies
  • Connection Management
  • xDS/Envoy Support
  • Health Checking
  • Observability Integration (OpenTelemetry)

Whew. Now with all of this required to use gRPC, what do we need to do to call gRPC APIs from our apps? gRPC APIs are generally described with Protocol Buffer files that are compiled to generate calling and serving support code. In some languages, like Go, code generation is poorly integrated with the build process, so this generated code winds up checked into GitHub or made available via package managers. But that has hazards – app developers can have build problems if different libraries are linked to different versions of the gRPC libraries or include conflicting generated files for common proto dependencies. And the supported gRPC libraries come with some or all of the features listed above and are regularly updated as these features are added, improved, and debugged. So even a simple API client needs to be updated regularly.

Do we want all of this in our API clients?
#

All this feature proliferation in networking libraries forces developers to regularly rebuild and republish their apps. It adds complexity to application code, it increases application security profiles and vulnerability exposure, and it creates governance challenges for organizations. If your organization has a few dozen applications that use gRPC APIs,

  • you need to be sure that all of your services are built in languages that have a full-featured gRPC library available
  • you need to be sure that all of your services use the correct gRPC library
  • you need to be sure that all of your services can be easily rebuilt and redeployed when your gRPC library needs updating
  • and you need to be sure that all of the networking features that you use are correctly configured

For example, if your applications are supposed to check API keys, how do you know that they do? The gRPC approach might make sense if you can easily rebuild and redeploy all of your services whenever you need to update your RPC library. This is true for Google, which keeps code in a monorepo and builds and deploys with a single integrated system. But for nearly everyone else (and even for Google), this problem is solved with a separation of concerns that moves important common functions out of applications and into a service layer that uses proxies to manage communication. In one popular form, that’s called the “service mesh”. But that’s not necessarily the only way.

IO, the Out-of-Process SDK
#

We can think of IO as an SDK that runs out of process, in other words alongside an application, rather than inside of it. IO handles many common tasks of SDKs:

  • authentication, adding credentials to API requests
  • retry, allowing clients to robustly handle transient server-side glitches
  • routing, allowing clients to be written without hard-coded addresses of API service endpoints

In short, IO does the advanced networking things that gRPC tries to do in-process. But because IO is out of process, it allows these capabilities to be deployed and upgraded without ever touching the applications. Also, because communication is managed out of process, platform teams can have confidence that network operations are correct and secure without ever looking inside the applications.

What IO doesn’t do
#

Notably, IO does not do message serialization and deserialization. But this is arguably the easiest thing that SDKs do! Non-coders often think serialization is the most important reason to offer SDKs, but capable developers find it easy and can be annoyed when SDKs do this badly… which is often! SDK maintainers are often asked to support multiple languages including ones where they aren’t experts. Developers can easily hand-write or generate their serialization code, and for Protocol Buffers it’s not difficult to use a tool like protoc or buf to generate serialization code for the APIs that an application needs.

Just Say No to Vendor SDKs
#

If we are using a proxy, what do we want from our RPC support code?

  • Simplicity. This keeps our dependencies and security exposure small. Our support code should only contain code that we need and use.
  • Security. If we’re running with a local proxy, we would like to bind the application and proxy together so that no other processes can intercept or interfere with their communication. One nice way to achieve that is to use Linux abstract sockets.

These capabilities are not application- or API-specific. They should either be in a language’s standard library or in widely-used shared frameworks.

Also, our networking libraries don’t need to be so thick. That’s led us to create Sidecar, a Go library that supports gRPC clients and servers that communicate through sidecar proxies. Because these applications use sidecars, they don’t need any of the advanced features that complicate gRPC implementations and add performance and security risks.

That’s why we have IO
#

By moving the picky and tricky networking stuff into IO and leaving application-specific and language-idiomatic tasks to application developers, IO provides a better distribution of work that gives developers more control of things that matter to them and less need to worry about things that don’t (or won’t, because IO takes care of them).

Comments
#