
Robust APIs Are Weird - zdw
https://www.aviskase.com/articles/2020/06/18/robust-apis-are-weird/
======
userbinator
IMHO much of the issues around validation are largely self-inflicted by the
use of needlessly complex/verbose/flexible formats. Having worked
(fortunately, not much) with SOAP, I can safely say it's one of the worst
offenders. Text-based formats add so much more flexibility that they
dramatically increase the space of both valid and invalid inputs over a simple
binary protocol. The article shows how this can happen with numbers, but my
favourite example is a boolean - if you use a textual format, you may think
"true" and "false" are valid, but what about "True", "TRUE", "FALSE", " FALSE"
(with whitespace), "tRuE", "falso", "yes", "no", etc.? Maybe you will think it
makes sense to allow only the first two, but in a nearly-optimal binary format
like ASN.1 PER, there's no need to even think about such things: a boolean is
a _single bit_ , representing exactly the two states which need to be
communicated. There's no ambiguity[1], and (even better) no room for
invalidity!

[1]I guess if you really stretch things, you can say 1 is false and 0 is true,
and I have actually seen that before, but I can't think of any other
ambiguities a single bit could have.

~~~
tasogare
Two functions applied on input (Trim and ToLower in C#) solve all the examples
you listed and any similar cases. Yes and no are obviously very different from
true and false.

~~~
adrianmsmith
> Yes and no are obviously very different from true and false.

The text-based language YAML allows “yes” and “no” for Boolean in addition to
“true” and “false”. So if multiple bright people think about this topic
they’ll evidently come up with different approaches. Not so with a single bit.

------
jbverschoor
>> Be conservative in what you send, be liberal in what you accept.

Oh no... be very strict in what you accept. Once out there, you can't make it
more strict. APIs should be as strict and defined as possible. Unless the API
is not an API, but user input, then, yes, be liberal in what you accept

~~~
saurik
Yeah: exactly. Postel has even gone on the record to note how he is being
misinterpreted a lot and what he meant was more like "to maximize ecosystem
rebustness, make sure to accept as much of the specification as possible
including all the optional parts for the things you accept, but prefer to only
require and use the common subset of required and generally well implemented
functionality for the things you send".

~~~
CodesInChaos
From what I remember his example was a reserved field which you must set to 0
when writing, and ignore while reading to enable extensibility.

In a modern context this would correspond to ignoring unknown keys in json, or
ignoring unknown fields in protobuf.

Sadly many interpret this as accepting an ill defined superset of the
specification, which then turns into a complex specification as
implementations rely on those extensions being handled in a specific way.

------
furstenheim
Oh please, no auto casting. It's given way more problems that those that have
solved (looking at you Excel).

If you want to use json, validate, and accept both 10 and "10" you can use
json schema and do:

``` { "type": ["string", "number"]} ```

Depending on the library and language you could extend the rules to accept a
string that can be parsed and throw if not.

------
nine_k
To tell the truth, there is json-schema[1], and a number of implementations of
it which work pretty well.

You can use it to validate your JSON inputs, much like you would with XSD
(I've used both).

There is OpenAPI, née Swagger [2], which allows you to describe your API in
human- and machine-readable form and generate code for it [3].

But validation is but one step. A robust API needs rate-limiting, injection
protection (does your language has tainted strings?), easy enough authz/authn
story, etc, etc, all different and non-trivial things.

[1]: [http://json-schema.org/](http://json-schema.org/) [2]:
[https://en.wikipedia.org/wiki/OpenAPI_Specification](https://en.wikipedia.org/wiki/OpenAPI_Specification)
[3]: [https://openapi-generator.tech/](https://openapi-generator.tech/)

~~~
Sherl
The openAPI uses Python flask but to be frank, it makes sense to use the
schema to generate client rather then server. FastAPI and Pydantic can
generate openAPI schema from datamodel which does the type parsing for you, if
parsing is the use case. Writing server code with a library that can generate
schema is a pro for me rather then keeping up with updates from schema to
code.

------
ChrisMarshallNY
I’ve been working with XML, and various flavors, thereof, for decades.

XML schema is a big reason that I would use XML. Otherwise, it’s a fairly
painful and prolix standard.

XML is also a good way to deliver longer streams of data (for me), because so
many parsers will allow “realtime” parsing of “broken” XML data, while JSON
tends to require the entire document to be delivered (and correct) before
parsing (I’m sure that will be changing, if it hasn’t already).

I’ve written ONVIF software, which is SOAP/WSDL-based.

I’ve written XSLT, which could be used as a form of medieval torture.

I’ve written servers, with REST, and REST-like APIs. I generally prefer REST-
like, where the server response is JSON and/or XML (usually either one), but
the GET/POST/PATCH stimulus is transaction/URI argument-based. That’s the way
most folks seem to like doing it, as well.

Pure REST requires sending XML or JSON; putting a huge onus on the server, and
making it difficult to be flexible (I like to support both XML and JSON as
“first-class citizens”).

 _> Because the simplest way to validate a JSON document is first to consume
it with some common library that guesses types almost like we do: “does it
have quotes around? string!” And only then, with already cast value, to
compare its type with whatever is defined in a schema._

This is true. JSON schema isn’t particularly useful to me. I feel like it
works against the reason for using JSON. I like JSON, precisely because it is
so lightweight.

XML Schema is incredibly mature and robust (and an _enormous_ pain in the butt
to write). Validation is built into almost every parser; while many folks are
unaware that JSON Schema even exists.

What I tend to do, is serve both XML and JSON from a server, with XML
accompanied by Schema (sometimes, dynamically generated), but the JSON derived
from the XML, so it benefits from the XML validation.

I’m wondering if standards like OpenGL will replace both XML and JSON. I’ve
never used it, because of the requirement to use a third-party library, but it
is a compelling tool.

~~~
skocznymroczny
OpenGL?

~~~
ChrisMarshallNY
_> OpenGL?_

Sorry. My bad. PBC (Posting Before Coffee).

GraphQL

------
somurzakov
there are a lot more things that make APIs "robust". described is only a tip
of the iceberg - data/type checks due to de/serialization. Your API may be
perfect in terms of type correctness and structure, and contain SQL injection.

There should be also load balancing/HA, rate limiting, authn/authz, injection,
all sorts of protections against OWASP top10 type attacks.

------
tango12
On a similar note, Rob Zhu had a great section in his talk about how GraphQL's
typesafety (to make the API more robust) tries to address a similar problem
but ends up being very nuanced in practice.

Linking to the right time:
[https://youtu.be/djKPtyXhaNE?t=1128](https://youtu.be/djKPtyXhaNE?t=1128)

TL;DR:

\- GraphQL is a descriptive type system not prescriptive. This means that
"String" maps to JSON String maps to Javascript utf8 string, but what type
that maps to for C++ is not quite clear. Because there are many different ways
to represent a string. So which one do we use? As opposed to protobuf client
codegen which is prescriptive and unambiguous in what you get.

\- Encoding custom scalars can cause some confusion because it's not clear
what the spec of the custom scalar is. Although, there's been some recent work
to make that a little better.[1]

\- Impedance mismatch with nullability and union types because support is not
uniform across languages

[1] [https://github.com/graphql/graphql-
spec/issues/635](https://github.com/graphql/graphql-spec/issues/635)

------
opqpo
I wish more people try gRPC outside of microservices. It is the perfect
solution for building APIs.

~~~
nine_k
GRPC is rather nice. It has good and well-established practices about
backwards compatibility (never ever remove a field), nullability, mapping to
traditional HTTP-based endpoints, inline documentation.

The downside is that you need a special tool to accesss it, a plain browser or
curl won't work.

~~~
opqpo
You can use it inside a browser. Look at [https://github.com/grpc/grpc-
web](https://github.com/grpc/grpc-web) but I agree that something like playing
with curl would be impossible. I once even stumbled upon a golang project that
can generates a RESTful API from your gRPC API without any modifications on
your server side.

~~~
divbzero
> _a golang project that generates a RESTful API from your gRPC API_

You might be thinking grpc-gateway. [1]

[1]: [https://github.com/grpc-ecosystem/grpc-gateway](https://github.com/grpc-
ecosystem/grpc-gateway)

------
aviskase
Wow, that's unexpected attention.

First of all, I agree with all the comments mentioning that being robust
implies much more than validation. This article was a take on one example I
found curious.

And second, as you may notice, that article was for testers. That's why a
certain oversimplification.

Personally, I am on the side of "if the contract is defined, follow it." All
the fiddling with supplied data seems fragile and prone to hidden behaviour.
But there is a valid argument for cases when your data suppliers have a
history of ever changing and/or buggy output. If you pay them money, you can
request them to fix it. If _they_ pay you money, perhaps you would consider
being more flexible.

------
hoppla
I remember a security test I did. One parameter had to be float value, and
validated it by casting it as float. NaN and Infinity are valid float values
in many languages. The application did not like that

