API descriptions/schemas have been more and more of an obvious choice for companies wanting to build serious APIs. For readers that are used to my GraphQL posts, this may not be much of a surprise to you, since GraphQL inherently comes with a typed schema. For other HTTP APIs, an API description or schema has not always been adopted, and multiple choices have been available, probably explaining the fact not all APIs have descriptions along them. This might also explain the way some people debate GraphQL vs REST over the typed schema, even though nowadays it’s possible for all HTTP APIs to have a schema.
In the HTTP API world, JSON Schema, as well as OpenAPI have been moving as the clear industry leaders in terms of schematized payloads and API descriptions. But along with those technologies often comes an approach for building APIs: “Design First”.
On the surface, the design first approach basically means we must think about the design of our APIs before we start implementing them, which is a very sound approach. Consider the alternative of implementing the API before designing it, exposing us to common API design pitfalls such as designing too closely to how thing are implemented internally. Without getting into the whole “big design up front” issue, generally, thinking about design before the implementation is even more important when designing a web API. This is because a consistent interface and having a stable API over time is primordial, where changing the implementation is usually much easier over time.
API Description first
I’ve been reading a ton on some suggested API development workflows and on OpenAPI. In those spheres, the design first approach often seems to imply an OpenAPI first approach, meaning writing the OpenAPI description for your new feature before writing any code. OpenAPI can serve as a common language for speaking API design, and great tools can be used to generate code, mock servers, automated tests, documentation, and SDKs. This is actually wonderful! From one artifact we can generate a lot of what we use to have to implement ourselves from no common language.
The dangerous and sad part is that a lot of these great things fall apart if we fail to maintain parity between our OpenAPI description and our underlying implementation. Having worked on two quite large APIs with hundreds of engineers building and evolving APIs currently, this fact scares me a lot. Keeping parity between two different sources, or requiring something to be added manually are things that simply tend to fall apart at scale.
In the typical design first workflow, this parity is usually kept through careful integration and contract testing. This is very important if the design first approach is picked as this is the only thing that could catch divergence between our docs, SDKs, and machine readable schemas with the actual implementation. This still worries me a bit though, because these tests need to be very carefully done. This can and should be automated. i.e a developer shouldn't need to add a test themselves when adding new use cases to an API. Thankfully a lot of tools exist to help us with that today.
Is code first always bad?
It certainly sounds like code first is a terrible approach when reading what’s out there these days. I’d like to add a bit of nuance to it however. Implementation first, is likely almost always a bad idea. But code can be used to define interfaces also, not only the underlying implementation. I truly believe one of the best decisions GitHub and Shopify made initially with GraphQL was to have the schema be an artifact of the code, and not use the popular approach of “schema-first”. It gives our team the confidence that their changes to the interface looks right, but also avoids them from having the maintain two things at once. What if code can be used to define the interface too? After all programming languages are quite useful (one could say maybe more than YAML ;) )
If somehow our engineers could write their API interface through a carefully created DSL/API, they could then stop worrying about keeping descriptions in sync, and we could generate those for them using the code as the source of truth.
Porque no los dos
There are other benefits to a design-first approach however. Thinking purely of design at first and isolating changes to interface changes lets us generally have a better workflow. Communication can be restricted to design, automated checks can be done against the initial descriptions and reviews are less confusing than when reviewing an implementation and interface.
As much as I love a “code as the source of truth approach”, I’m still a big proponent of having at least some design up front. This is why I wonder if a good “hybrid” solution would be more of a design first, code as the source truth approach.
Design First, Code as the Source of Truth
So what does this actually look like as a workflow? To be honest I’m still thinking about this one! Here’s a potential workflow I’m imagining:
- Initial design of an interface
- Team communicates on a pull request / issue of that design, reviews may happen, automated checks or lints are ran.
- Once satisfied with design, code scaffold/generation can happen, helping teams with implementing the API.
- From this point on, the initial design description is historical only (or always kept in sync with the code), and the code is our final source of truth.
- Engineers iterate on the pull request containing the code (interface and implementation)
- On deploy, SDKs, documentation, and other artifacts are generated from the final OpenAPI description.
I want to focus on number one here, this is the one I’m still wondering about. Is OpenAPI the best language from engineers to communicate a design? Writing and reading YAML (or JSON) is not the most pleasant experience usually. This is where a visual tool, like the amazing Stoplight Studio comes in. Initial design can be done through a visual tool to avoid the need for hand writing and reading YAML, and mock servers / description visualization tools can be used for stakeholders to review. Under the hood, it is probably based on OpenAPI, which can then be used to generate initial code for the implementation. Maybe we can even ensure the initial design is not too far from what is found in the code, and ask for a re-design if it is. The cool thing is that once the initial feature is shipped, follow up bug fixes or small additions can be done in the code, without fear of the final API description being out of sync, since the code is the source of truth after the initial design.
There are many ways of opting for a “design first” approach, and to me it is more of a philosophy than a full method or workflow. If your team agrees on a design in a Google docs or through an OpenAPI description, you’re still doing design first! As much as it’s important to think about design, I think there is still enormous value in the description being as close as possible to the code, especially at scale.