Hacker News new | past | comments | ask | show | jobs | submit login
Using JSON Schema to Document, Test, and Debug APIs (heroku.com)
139 points by joeyespo 6 days ago | hide | past | web | favorite | 48 comments

This is precisely the purpose of the OpenAPI Specification[0], an extended subset of the JSON Schema specification. Plus there a whole family of supporting tools, such as Swagger UI[1], and lots more[2].

[0] https://swagger.io/docs/specification/about/

[1] https://swagger.io/tools/swagger-ui/

[2] https://openapi.tools/

Some searching turns up some articles on incompatibilities between OpenAPI and JSON Schema [0] [1], and it's not immediately clear if they fit together well at this point.

We've got some JSON Schema APIs in our services already and are using libs like the Python `json-schema` package to do validation with those. I also briefly experimented with Quicktype [2] to generate some TS types as a proof-of-concept.

We're getting ready to do a pretty big rework of a lot of our services, and I'm interested in any info folks can provide on pros and cons of using OpenAPI vs JSON Schema for API definitions, and tooling around request validation and TS interface / client generation.

[0] https://philsturgeon.uk/api/2018/04/13/openapi-and-json-sche...

[1] https://github.com/OAI/OpenAPI-Specification/issues/1532

[2] https://quicktype.io/typescript/

The choice to fork[1] the schema standard instead of simply using it as is or with purely additive extensions) is so incredibly damaging. Yes, I get that it makes it a lot simpler to generate code for languages that have bad and static type systems. Meanwhile, the standard is still not powerful enough to describe everything JSON:API thinks is best practice due to not having any way to describe nested properties in queries.

[1] They don't want to call it a fork, but when they invent totally new extensions to support things that already is solved by parts of the standard that they don't want to support, then it is a fork.

Hi- I'm one of the main editors of the JSON Schema specification, and we and the OpenAPI Technical Steering Committee are actively working together to re-converge the specifications.

OpenAPI 3 was developed while JSON Schema progress was stalled due to the prior editors leaving and a new group of us (eventually) picking it up.

OpenAPI 3.1 will most likely include a keyword to allow using standard JSON Schema as an alternative to their customized syntax, and hopefully we can achieve full integration on OpenAPI 4. There are also some other ideas being explored for improved compatibility in 3.x.

Standards work is hard, but the relationship between the OpenAPI TSC and the JSON Schema editors is quite healthy and we are making good progress.

One challenge around OpenAPI right now is that most of it is still stuck in version v2, with relatively immature support for v3. And v3 is quite different.

Not too long ago, I tried to build a project around OpenAPI, trying to generate model structs in Go from OpenAPI definitions. The support just wasn't there. There were code generation tools for v2 and emerging library support for v3, but nothing that covered both.

I went back and used plain JSON Schema for my models, and used GraphQL instead of OpenAPI for the API, and that turned out great.

Please give it a try with OpenAPI Generator (https://openapi-generator.tech), which supports both OpenAPI spec v2 and v3 for code generation.

V3 support is good these days for most languages. It's a delight to generate TypeScript types and then have the compiler tell you all the places you need to change code because your schema changed.

We went with this exact approach in my previous job, using OpenAPI to do everything the blog post mentions, including code generation, diffs for API consumers (published to Slack, for instance), and more. I believe this has a very good shot at success, but it's not complete yet. For instance we had to do a lot of in house work to make code generation work.

Hi- I'm one of the main editors of the JSON Schema specification, and one thing we are doing with the next draft is making it easier to build extensions for things like code generation. We expect to work with the OpenAPI folks on this as it is very relevant to their interests :-)

Also, we and the OpenAPI Technical Steering Committee are actively working together to re-converge the specifications. OpenAPI 3 was developed while JSON Schema progress was stalled due to the prior editors leaving and a new group of us (eventually) picking it up.

OpenAPI 3.1 will most likely include a keyword to allow using standard JSON Schema as an alternative to their customized syntax, and hopefully we can achieve full integration on OpenAPI 4. There are also some other ideas being explored for improved compatibility in 3.x.

Standards work is hard, but the relationship between the OpenAPI TSC and the JSON Schema editors is quite healthy and we are making good progress.

I agree, I've also invested a huge amount of time and effort into my openapi-generator setup [1], and there's tons of room for improvement. We've probably even done some of the same work for code generation (custom Java subclasses, etc.)

[1] https://github.com/OpenAPITools/openapi-generator

Not sure which code generation tool you used but if you've time, please try OpenAPI Generator (https://openapi-generator.tech) and let us know if you've any feedback.

As someone who works with this tooling nearly daily I can tell you that OpenAPI is unfortunately quite messy.

It seems like Swagger v2 was reasonably well-liked so the idea was to make OpenAPI the best possible way to describe any kind of REST API in existence. Since the spec got more or less merged with JSON Schema it has become extremely unwieldy and full of edge cases.

At my company we're trying to develop tooling around OpenAPI but a lot of things (especially related to the 'oneOf'/'anyOf'/'allOf' features) are extremely ambiguous.

Not to mention that at least the Java versions of the parsing libraries are full of inconsistencies as well. For instance, there's a OpenAPI parser which also has a 'compatibility' mode so it can read Swagger V2 specs as well. Unfortunately the data you get from reading a Swagger V2 spec via that compatibility layer is different from when you'd first convert the V2 spec to OpenAPI v3 through an external tool.

The project is certainly ambitious and I completely agree that this is a hard problem to solve, but honestly I would say that if you want universal adoption of a toolkit for writing API's then the API specification language itself should not be so difficult or ambiguous in its implementation.

I've really enjoyed working with OpenAPI / Swagger while building FormAPI [1]. I use rswag [2] to define and run a set of integration tests that test everything about the API, and then all the requests/responses/schemas are saved into a OpenAPI specification. I can then use that to generate API clients for any language [3]. Then I have test suites for each API client, which spins up the dev server and ensures that all the API calls have the correct behavior (to catch any bugs in openapi-generator, or my custom code that wraps some of the generated functions.)

It took a lot of work to get there, and there's still lots of room for improvement. My ultimate goal is to automatically generate the API client test suite based on the requests/responses. I also want to add some custom logic and workflow rules to the OpenAPI specification, instead of needing to write custom wrapper code in 10+ languages.

I've had to write the same basic code so many times: Make a POST request, then make a GET request once per second until the "status" changes from "pending" to "done", and then finally return the completed result. (I know it would probably be better to set up a websocket connection, but this works fine and it was easier to implement.) It would be so awesome if I could define this workflow inside my API specification, and then the auto-generated client code could include this polling logic without any effort on my part. (Apart from needing to write and support the higher-level generator code for each language, which would actually be a lot more work.)

The other really annoying thing is needing to figure out package managers and release steps for every single programming language. I've only figured this out for 6 languages so far (C#, Java, JS, PHP, Python, Ruby), but I want to support far more languages, and it's just exhausting to go through this process each time. It would be so nice if there was an open source project that wrapped all of the different package managers and provided a framework on top of openapi-generator. And if there was a CLI tool (or web UI) that walked you through the process of signing up for accounts and setting up API keys, and then keeping it all in one place. I would honestly be tempted to rewrite openapi-generator from scratch in a better language / template engine, because the Java code is so hard to read and extend. I'm mainly a Ruby developer, but I don't think Ruby would be a great choice for CLI / generator tools, because it gets pretty slow. So maybe Python or Go.

I feel like this would a really interesting project to work on, and potentially a startup idea. Would anyone be interested in using it? I should probably try to validate this idea. I've set up a Google Form where you can just click a checkbox to register your interest anonymously, or you can also submit your email if you want to get updates and try a beta version: https://forms.gle/7qWzWpC9QrTjUgnU7

[1] https://formapi.io

[2] https://github.com/domaindrivendev/rswag

[3] https://github.com/OpenAPITools/openapi-generator

These days all my needs for "structured data" start with JSON Schema. I refuse to work with schemaless apps and databases anymore.

The workflow which I use for one app is:

* Define a JSON Schema for all the app's models

* Generate static Go types from these definitions

* Generate corresponding GraphQL types and inputs in the Go app for serving the API

* Generate TypeScript types in JavaScript front ends against the same schema

* Same model structs in Go are used to shuffle data in and out of data stores

Since GraphQL is more limited than JSON Schema (for example, very limited union ("oneof") support), there are some features I'm not able to fully make use of, but the common denominator covers most use cases.

I love having end-to-end static typing, validation and consistency.

I tried to do exactly this (but using C# instead of Go on the back end). The tools basically fell apart when trying to do things that were well within JSON Schema's spec.

Can you link to the libraries you used??

Just like I have been always doing with Web Services.

I was building an app using JSON Schema last year (until Feb this year) and gave up. I had read JSON Schema specification like 10 times and there are some undefined behaviors. I had read a lot of JSON Schema issues on Github, there is basically one man show and the others come to review mostly.

After the point they added all logics to the spec, everything becomes a mess.

Example 1:

{ "oneOf": [ {"type": "number"}, {"type": "number"} ] }

1 or 2 or 3 is an invalid input based on above schema. It doesn't make sense to me at first glance. (hint: it's XOR)


Example 2:

{ "oneOf": [ {"minimum": 0, "maximum": 10}, {"minimum": 5, "maximum": 20} ] }

These ranges will pass: [0,4] and [11,20] But you would be surprise it doesn't reject string, bool, null. Basically it doesn't reject anything except [5,10] range.


Example 3:

{ "type": ["object", "array", "null"], "not": {} }

Easy but confusing. This rejects every thing because comma means AND, and `"not": {}` means false.


Example 4:

{ "allOf": [ { "type": "object", "properties": { "a": {"type": "string"}, "b": {"type": "integer"} } }], "additionalProperties": false }

A well-known problem. This rejects (all - {})

This is because the "allOf" is object AND outside it is an empty object. "properties" at level 0 is empty if not specified. So this schema only accept {}; an object AND is empty.

They add a new keyword called "unevaluatedProperties" to solve this and I won't explain it to you!


If you just use basic stuff like draft-04, you will be fine.

But I will NEVER touch this spec again!


EDIT: My app was sort of static analysis on schema, and this spec doesn't suppose to help doing thing like that.

It seems like there's two slight misunderstandings:

> "oneOf": [ {"type": "number"}, {"type": "number"} ]

> { "oneOf": [ {"minimum": 0, "maximum": 10}, {"minimum": 5, "maximum": 20} ] }

"oneOf" requires exactly one match (hence the name), typically you want to use "anyOf" or "allOf". And there's no benefit in putting two identical values inside them.

> { "type": ["object", "array", "null"], "not": {} }

> { "allOf": [ { "type": "object", "properties": { "a": {"type": "string"}, "b": {"type": "integer"} } }], "additionalProperties": false }

JSON Schema is merely a list of assertions. Some test the type of the value (that's the "type" keyword), others test the value ranges within a single type. This way, you can allow values to be one of multiple types, e.g.: {type:["string","object"], minLength:1} means "Value must be a string or object; and if it's a string, it must have at least one character."

Some of the assertions are spread across multiple keywords, "additionalProperties" depends on "properties", for example. So, {additionalProperties:false} means: if value is an object, then only an empty object is permitted.

Some implementations can tell you if you're trying to do nonsensical things (like test the maximum length of a value that's only allowed to be a boolean), but that's up to the implementation to test for.

Yes, it's my mistake using JSON Schema for something it does not suppose to. Analyzing schema from user input is very difficult if "type" keyword is not required (I actually had read the GH's issue about it). IMHO, JSON is already verbose, no need to make it concise, requiring "type" make it a lot simpler. I'd also prefer not to allow multiple type on "type" keyword.

The fact that the spec is too flexible that allow users to write all nonsensical things, and so sensible things that might have hole on assertions.

I understand all your explanation and thanks for creating a new account to do this, appreciated!

I also read this paper: https://martinugarte.com/media/pdfs/p263.pdf (Foundations of JSON Schema)

Also this: https://www.genivia.com/sjot.html#SJOT_versus_JSON_schema (JSON Schema problems)


And now the circle is complete, with the difference that XML Schema allows for proper comments.

I never saw XML schema as a bad idea, just overdesigned and verbose, same as all things XML.

I happen to think otherwise regarding XML, and rather enjoy using all high level tooling for XML processing.

As opposed to underspecified, and ill thought out?

To me it seems the XML era had better engineers making the standards.

relax ng?

Well, JSON Scheme has "title", "description", and "examples" keywords that are meant to be used for additional info, so not having comments is not really that big deal as with plain JSON. For XML Schema comments are essential as it's super verbose and hard to read. Of course, if you use it a lot you'll with time train your brain to scan through the forrest of XML tags without effort, just like in 90s we were able to easily make sense of multi-level nested tables - but to untrained eyes it's still a mess of tags with million options and it's the main reason why XML lost the popularity - too much noise and verbosity.

JSON Schema has "$comment" as of draft-06 or -07 (I forget, which is funny b/c I was the one who added it).

Im a fan of running these tests live in production. A type check is cheap so it doesnt affect performence that much. You should not force correctness on others, meaning their code will fail because of your strictness. But you can force correctness on yourself so that your code never causes someone else's to fail. So make the schema validation or typecheck on outgoing responses, not only in testing, but also in production.

Checking responses in production is also a good idea because it reveals all the edges that may not be exercised in an artificial test suite.

There is still some performance cost for checking a response against a schema though, so a nice compromise is just to check a small % of outgoing responses — the vast majority of requests stay as fast as possible, but given a reasonable traffic load, it's still enough to eventually reveal any places the schema doesn't match reality. (This is an approach we use at Stripe for checking our OpenAPI specification.)

We are also validating responses against JSON schemas in our Ruby production apps. I can confirm it's a very helpful practice and I'm a big fan of it.

Currently we are checking 100% of the responses. I even wrote this gem which uses native code to perform the schema validation to minimize the overhead in our endpoints: https://github.com/foxtacles/rj_schema Validating our largest and most complex responses is taking <10ms, on average no more than 2-5ms which is quite affordable.

Side note @brandur, I always look forward to your blog posts, time for a new one!

Hah, thanks! :) I'll got a few queued up that I hope to get finished up pretty soon.

Open API Spec (formerly known as Swagger) is purpose-built on top of (or slightly forked from) JSON Schema and has much wider support among API-related tools and libraries.

I tried using JSON Schema for some internal (non-HTTP) APIs and found bugs/inconsistencies in tooling that made me abandon it.

I'm really torn whether to adopt JSON schema validation for a Saas that I run.

pro: It validates API JSON responses based on an open spec.

cons: It's absolutely terrible to work with. XML all over again. Documentation is terrible. Finding examples is, well you get it...

Note that "Understanding JSON Schema" has been updated for draft-07 and is now under the aegis of the main JSON Schema project: https://json-schema.org/understanding-json-schema/

Why not just use something with a natively supported schema at this point like protobuf/grpc or graphql?

The article describes some very good engineering practices. Nevertheless I’d suggest the json Schema itself is not the root of the documentation but a higher level integrated system like API Blueprints MSON. You still can produce Schema from it but it also can tie into a much better API design lifecycle.

Yes, JSON Schema describes documents/resources, not entire APIs on its own.

Even Hyper-Schema (to the extent that it's implemented at all yet) is a resource-by-resource system, not an API-scope system.

This is a very powerful approach because there are json schema libraries available in all major languages. TypeScript is a better source for truth because it can represent things json schema can't. It is possible to generate a json schema from a TypeScript definition (see https://github.com/YousefED/typescript-json-schema), which means TypeScript can be the single source of truth for API definitions, be used directly in the frontend and, in json schema form, used to test directly against API responses.

Coming from protobuf land, this tool seems gross. It's XML schema all over again.

Why are we trying to shoehorn a schema design language into a format that isn't good for human authoring? It's annoying to write JSON, yet the language masquerades as being human-readable.

Frontend engineers should check out protobuf. It's an amazing data definition language that generates bindings in every language under the sun and has a compact binary serialization that is much more efficient than JSON both to encode/decode and transmit over the wire.

JSON should die the same death XML did. It's so bad.

Being able to see JSON responses in browser dev tools is extremely helpful for development and debugging. Is there something similar for protobuf?

I concur, JSON is easy enough to read/write for humans, easier to parse than XML, too, it kind of hits the sweet spot in between protobuf and XML.

Protobuf is a lower level interface. You can't rely on it for validation and use it for complex model / schema.

For example oneof customer (individual / organisation) with different distinct fields. In protobuf, all these fields need to be defined.

This seems really verbose? Very XML-y.

What's wrong with simple https://github.com/omniti-labs/jsend

``` { status : "success", data : { "post" : { "id" : 1, "title" : "A blog post", "body" : "Some useful content" } } } ```

How is that simple? What does "status" mean? I'm personally familiar with the word "status"—but whatever that property is, I'd have to re-implement it in my user agent, which seems like a waste because my user agent already understands HTTP status codes. How is it simple to have to look at and interpret two different "status" fields? (And that's just the first property!)

What part of the article is verbose? What's wrong with verbose?

For JavaScript, I highly recommend the ajv and tv4 npm packages.

Not directly related but I like writing a json schema for my json configs. In a pinch I can get an interface for editing and validating it very cheaply with https://github.com/json-editor/json-editor

It's not perfect but a great time saver for me and users.

Well written and informative. Thanks for posting/writing this!

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact