Why we created Taxi, and why we felt the necessity for One other Schema Language

I’m typically requested why we felt the necessity for an additional schema language – why not simply use OpenAPI | SQL | Protobuf | GraphQL
?
Inevitably, somebody hyperlinks that xkcd cartoon. Fortunately, that cartoon is definitely actually humorous,
so I don’t thoughts re-reading time and again and over.

It appears the one customary the web can agree on is to reply to discussions about schema languages with that cartoon.
Nonetheless, it’s a good query, and one which deserves a superb reply. So, to summarize:
- Sure, Taxi is one other schema language. It’s completely different.
- Sure, we felt there want for yet one more. It has completely different targets than the others, and addresses shortcomings we felt existed.
- Sure, there’s now yet one more technique to do issues… kinda (We don’t count on you emigrate from OpenAPI to Taxi.)
What’s Taxi?
Taxi is a language for documenting information fashions and information sources (APIs, Occasion streams, Databases).
It’s designed for describing how
information relates throughout an ecosystem of datasources (reminiscent of an enterprise), reasonably than a single API.
We do that so we will automate orchestration and interoperability between information sources, with out writing glue code.
The place most schema languages describe a single information supply (API, Occasion Stream, Db, Serverless perform), Taxi is designed for describing how
information relates throughout an ecosystem of datasources (reminiscent of an enterprise).
Taxi can partially describe:
- Databases – however you’ll be able to’t substitute DDL with Taxi.
- HTTP APIs – however you’ll most likely maintain utilizing OpenAPI / gRPC – and that’s cool.
- Message queues, like Kafka and Rabbit – however you’ll be able to’t substitute Protobuf with Taxi.
For instance, right here’s a Taxi spec describing a database, a Kafka subject, and a REST API:
service FilmsDatabase {
desk movies : Movie[]
}
mannequin Movie {
filmId : FilmId inherits String
}
service FilmEvents {
stream newReleases: Stream<NewReviewSubmittedEvent>
}
mannequin NewReviewSubmittedEvent {
filmId : FilmId
reviewId : ReviewId inherits String
}
service ReviewsApi {
@HttpOperation(methodology = "GET", url = "https://critiques/{id}")
operation getReviews(id: FilmId): FilmReview[]
}
mannequin FilmReview {
id: ReviewId
filmId: FilmId
rating: ReviewScore inherits Int
}
There’s sufficient metadata there for us to grasp how every thing hangs collectively – right here’s a diagram of that spec:
Taxi embeds inside present specs
In observe, Taxi truly breaks down into two separate actions:
- Defining a set of semantic scalar varieties, that describe a single discipline, which stay in a Taxi venture. These are designed for sharing.
- Embedding references to these varieties inside present API specs.
Our instance above breaks down into just a few easy varieties:
kind FilmId inherits String
kind ReviewId inherits String
kind ReviewScore inherits Int
And another specs:
openapi: 3.0.1
information:
title: ReviewsApi
model: 1.0.0
paths:
https://critiques/{id}:
get:
parameters:
- title: id
in: path
required: true
schema:
kind: string
x-taxi-type:
title: FilmId
responses:
"200":
content material:
utility/json:
schema:
kind: array
gadgets:
$ref: '#/elements/schemas/FilmReview'
elements:
schemas:
FilmReview:
kind: object
properties:
id:
kind: string
x-taxi-type:
title: ReviewId
filmId:
kind: string
x-taxi-type:
title: FilmId
rating:
kind: integer
format: int32
x-taxi-type:
title: ReviewScore
import "taxi/dataType.proto";
message NewReviewSubmittedEvent {
optionally available string filmId = 1 [(taxi.dataType)="FilmId"];
optionally available string reviewId = 2 [(taxi.dataType)="ReviewId"];
}
import io.vyne.jdbc.Desk
import FilmId
import FilmTitle
@Desk(desk = "movie" , schema = "public" , connection = "movies")
mannequin Movie {
@Id filmId : FilmId
title : FilmTitle
}
Why do that?
At Orbital, we’re on a mission to eradicate integration code.
Despite the fact that we have been heavy customers of OpenAPI, we discovered that when consuming APIs there’s a tonne of busywork in
writing glue code stitching issues collectively.
Usually, we have been composing a number of APIs collectively to attain a single job. Every API required
extra glue code, and meant we have been tightly coupling to that spec – that means when the spec launched breaking adjustments, our
glue code needed to be repaired.
That appeared foolish. Despite the fact that we had all these specs that describe what APIs do, it nonetheless
fell to engineers to work out which API to name, what information to move for inputs, and write the glue code.
Despite the fact that we had all these specs that describe what APIs do, it nonetheless
fell to engineers to work out which API to name, to trace down inputs, and write the glue code.
We needed to offer a technique to let APIs describe themselves richly sufficient that software program might work out how one can orchestrate
them collectively mechanically. We additionally had another targets:
- We shouldn’t be counting on discipline names – they’re a bad proxy for semantics, and groups needs to be free to selected names that make sense to their area.
- We shouldn’t pressure groups constructing APIs to exchange their present specs with one thing new – we wish to complement what’s already in place.
- We needed to be technology-agnostic. Fashionable enterprises are heterogeneous, so we wanted to work in every single place.
- Groups constructing APIs want to have the ability to change their APIs simply, with out cascading change onto shoppers.
Producer vs Client – As soon as vs Many occasions
Taxi shifts the duty of describing how issues sew collectively from Customers to Producers.
Historically, it falls to shoppers to work out how to do that.
And, it’s an costly query to reply… it entails monitoring down API specs, studying docs,
and constructing a psychological mannequin of how issues dangle collectively.

How integration works in the present day. Over, and over, and over.
That sucks, as a result of shoppers are a-plenty, that means that the “how does this relate to that” is answered time and again.
As a substitute, with Taxi, we shift the duty of documenting how issues relate to producers, by embedding metadata
of their APIs.
This implies work is finished as soon as – by the groups who perceive the APIs one of the best – those constructing it.
What if groups put the incorrect metadata?
Yeah, that’s an issue. If groups map fields collectively incorrectly, then the incorrect APIs are stitched collectively.
However, that downside exists in the present day – each time a brand new shopper stitches collectively some APIs, they’re performing the Subject Mapping Foxtrot, and there’s an opportunity they’ll get it incorrect.
So, by leaving it to shoppers to map fields collectively, we’re truly dealing with this threat time and again.
In observe, the groups who’re constructing the API have a a lot deeper understanding of how their information pertains to the broader ecosystem, as they know their API finest. And, while producers
can’t map their APIs to all shoppers, they’ll connect a bit of well-defined metadata that assigns a semantic contract we all agree on.
That being mentioned, I feel tooling may also help extra right here than we do at present. Watch this area.
What’s TaxiQL?
TaxiQL is a part of the Taxi Spec – it’s a question language that lets shoppers ask for information they need utilizing the
similar semantic varieties that producers have embedded of their API specs.
discover { Movie( Title == 'Gladiator' ) } as {
title: FilmTitle
forged : {
actorName : FirstName + ' ' + LastName
twitter : TwitterHandle
}[]
ranking: ReviewScore
critiques: ReviewText[]
}
Taxi works simply as nicely with request-response interfaces, in addition to streaming information sources:
stream { FilmReviewSubmittedEvent } as {
title: FilmTitle
forged : {
actorName : FirstName + ' ' + LastName
twitter : TwitterHandle
}[]
ranking: ReviewScore
critiques: ReviewText[]
}
By utilizing varieties within the shopper contract, and mixing with the metadata in producers,
there’s sufficient info to deduce how every thing hangs collectively, and to automate the mixing.
TaxiQL vs GraphQL
Each Taxi and GraphQL share comparable targets – offering a single entrypoint for
composing a number of APIs collectively. GraphQL has been a robust inspiration in how we’ve designed
Taxi.
Nonetheless, there are key variations of their method.
No international schema
GraphQL defines a single international schema that each one shoppers adhere to. Customers can
cherry-pick the fields they need, however the construction is mounted.
That international schema can create friction to vary – if that you must refactor the schema, you both want
to make backwards suitable adjustments, or shoppers have to replace.
Which is to say – evolving a shared structural contract is difficult, and somebody must take the hit – both the schema proprietor (by means of sustaining a backwards suitable change),
or the shoppers (by fixing what breaks when the schema adjustments).
This isn’t only a GraphQL situation – it’s true of any structural contract that’s broadly shared – they’re arduous to vary.
Client pushed contracts
GraphQL has consumer-driven-contracts-lite. ie., shoppers can cherry-pick the fields they need from
the worldwide schema. Nonetheless, that’s it – construction, encoding, and so on is mounted.
If you wish to devour information in a unique form, you’ll have to map the information on the buyer aspect.
These further mapping layers are troublesome – every mapping layer you construct is a good coupling between contracts – ie.,
a factor that has to vary when the upstream contract adjustments.
As a substitute, with Taxi, shoppers outline the information contract they wish to devour within the request they ship.
There isn’t any single central schema – producer APIs are composed collectively on-the-fly to fulfill the buyer contract.
The kinds and question language present sufficient flexibility that customers can specific the precise
contract they require, and the middleware can perceive how one can assemble a response.
No resolvers
GraphQL makes use of resolvers to sew collectively APIs – which is precisely the kind of integration code
we’re on a mission to eradicate.
As APIs change, resolvers have to be maintained and up to date.
Usually, with Taxi you don’t want resolvers – the API specs are wealthy sufficient to automate the resolvers.
Know-how-agnostic
GraphQL requires GraphQL in every single place, or shims to adapt to GraphQL.
Taxi (and taxiQL) is making an attempt arduous to maintain the footprint low – Taxi will fortunately compose collectively a Kafka subject
publishing Protobuf, with a gRPC service, some REST APIs, and a database.
What about implementations?
Taxi – the spec, compiler, and tooling ecosystem are all open supply. Head over to Github and provides it some stars.
The TaxiQL question server is at present a part of Orbital – which we’re working to open supply this summer time.
Within the meantime, you’ll be able to take it for a spin following our getting started guide.
Abstract
So, right here’s the important thing takeaways:
Taxi exists to explain how information and information sources relate to at least one one other, and to automate interoperability.
OpenAPI, SQL, Protobuf, and so on., all do a fantastic job of describing a single information supply. By including further
metadata into these APIs, we now have sufficient info to automate integration between providers, with out
writing (or sustaining) glue code.
TaxiQL is a method to make use of Taxi varieties to ask for information, and for shoppers to stay decoupled from producer schemas,
in order issues change, there’s no glue code to keep up.
XKCD is the one true customary, which is the way it was at all times meant to be.