Hacker News new | past | comments | ask | show | jobs | submit login
Schemastore.org – schemas for all commonly known JSON file formats (schemastore.org)
219 points by simon04 5 days ago | hide | past | favorite | 66 comments

Jetbrains uses this in their IDEs https://www.jetbrains.com/help/idea/json.html#ws_json_schema...

It's how it autocompletes JSON / YAML files automatically

The Manifold JSON compiler plugin[1] pairs nicely with IntelliJ's JSON support. It's the only type-safe, truly schema-first framework I've encountered (that's also fully integrated into an IDE).

[1] https://github.com/manifold-systems/manifold/tree/master/man...

I love structured repositories of knowledge and documentation like this because it opens a lot of doors for automating some broader software engineering talent.

This could be a really cool opportunity to programatically find common subsets of fields/operations that are sent together in many APIs and see if there's a way to build libraries and tooling across a bunch of languages to handle operations on those fields.


I haven't heard of anyone reverse engineering common schemas / parameters from existing APIs (I doubt the commonalities are really there) but there are projects like this one [0] which express schema.org types as JSON Schema objects suitable for use in OpenAPI.

[0] https://github.com/tandfgroup/schema-oas

That's awesome! One more step towards the semantic web.

Anybody else get bummed out that many times that we create a lightweight thing to compensate for something overcooked (like XML) the new thing eventually ends up overcooked in much the same way?

In all fairness, JSON itself continues to be as simple a format as it ever was. Having schemas is an important adjuct because it frees devs up from constantly translating specs into ad-hoc validation routines that all too often miss something or make mistakes. Validation up front saves a tremendous amount of pain dealing with a key error deep in a call stack when the moment a config file was read its omissions or mistakes could have been identified.

Moreover, and on a related note, in many ways, JSON is great as an exchange format, but stifling as a config language. There is a lot if room between JSON and XML to work with.

XML is a document mark-up language, which was repurposed as a data exchange format and configuration format. A mistake in both cases.

JSON is a data exchange format which is sometimes used as a configuration format, also a mistake.

Will we go full circle and start using a configuration language for data exchange or mark-up? TOML APIs?

> In all fairness, JSON itself continues to be as simple a format as it ever was. Having schemas is an important adjuct because it frees devs up from constantly translating specs into ad-hoc validation routines that all too often miss something or make mistakes. Validation up front saves a tremendous amount of pain dealing with a key error deep in a call stack when the moment a config file was read its omissions or mistakes could have been identified.

I agree. JSON is overall better than XML for most things. It's simple and universal. This simplicity translates to the new incarnations of some good ideas that people were trying to implement on top of XML, like schemas. They are not mandatory, but useful in certain domains. Simpler format means that these things can also become simpler. So eventually we may do them right. It's a process of evolution.

> Moreover, and on a related note, in many ways, JSON is great as an exchange format, but stifling as a config language. There is a lot if room between JSON and XML to work with.

Also agree. There is also a tiny bit of room below JSON and above unstructured text that can fit a universal syntax on top of which other formats can be built, including ones for configuration. A modular one. Where you can have different formats that share the same underlying syntactical structure and as much idioms within that as reasonable.

And here comes the pitch: this is what I'm trying to introduce. A simple universal syntax. It is still somewhat undercooked (less the syntax which is minimal and stable but more its description and formats on top of it), but might be interesting for certain people. The thing is called TAO and it has a website here: https://www.tree-annotation.org/

I hope somebody will find this inspiring at least.

If so, here is a thread on HN for discussion: https://news.ycombinator.com/item?id=23966667

How does your raw operator work with data that contains a closing bracket?

`raw` is more of a type annotation than operator. You can also impose an implicit interpretation of (some of) your data as raw, without a type annotation.

A (not necessarily the clearest; this is something I have to work on) description is here: https://www.tree-annotation.org/raw.html

To condense: If it's unbalanced you have to escape it with the grave accent `. Otherwise, if it's balanced, which is mostly the case, no problem.

Unbalanced brackets and consecutive grave accents (or a grave accent followed by a bracket) are the only things you need to escape in raw data.

For more discussion let's go to the other thread, not to hijack this one.

One of the best representations of the 'simpleness' is the way you can draw it in a graphical loop diagram as they do on the json.org site.

When you can make your 'format' in to a not-too-big diagram like that it makes it far easier for anyone interested to 'get it' and encompass the whole thing in their mind to work with.

SQLite is the place that (to my knowledge) was the first to do these "railroad diagrams". Here are all of theirs (though they're also available inside the documentation): https://sqlite.org/syntaxdiagrams.html

As far as I know, they were originally introduced by Niklaus Wirth to describe Pascal's syntax. See the last pages in this manual from 1973: https://www.research-collection.ethz.ch/bitstream/handle/20....

Definitely not the first, I see that the first release of SQLite was 2000, and I found these railroad diagrams in old Pascal programming books of my father.

For example Turbo Pascal included a Language Guide book that documented the entire Object Pascal language with railroad diagrams:


Oracle had those always (even back when only printed manual were available)

It's because it was easy to use, and so people adopted it long before realizing just how many features they needed that it doesn't provide (for example, lack of types). "Simplicity" is a double-edged sword because you can't actually eliminate natural complexity; you can only move it around into (hopefully) more convenient / less chaotic forms.

I've been developing https://concise-encoding.org over the past two years with this in mind.

A huge improvement over JSON, but you still implement Tony Hoare's billion dollar mistake! There also appear to be no schemas / static types or sum types (the dual to records)? A schema would allow to you to overload the string literal, rather than require all those prefixes.

A data-exchange format is essentially developing a mini-language. Even if we prohibit abstraction, we still want rich static types and the ability to define new types (algebraic types, nominal types). Taking inspiration from programming language theory, will hopefully help avoid a result like Google Protobufs, which is awkward, ad-hoc and non-compositional.

Null is in there because it exists in the real world (unfortunately). There's no putting the genie back in the bottle (although there's an admonition against using it in the spec). Pretending it doesn't exist would cause more trouble in the real world than letting it be.

The string-type prefixing is there so that you don't actually need a schema for 80% of use cases (schemaless data is usually enough). A schema can be added later if desired, and I've been thinking of what that might look like, but it's at a different level than the encoding format and can be developed independently.

The primary purpose of this format is to serve the 80% use case (general purpose, ad-hoc, hierarchical, schemaless data with fundamental type support - or something that easily fits within this paradigm). For the remaining more complicated 20%, there are custom types and at some point in the future, schemas.

> Pretending it doesn't exist would cause more trouble in the real world than letting it be

I don't agree at all with this point. RDBMS types for example are non-nullable by default. Protobufs, XML and many other exchange formats also have optional types but not null added to every type. If you want better interoperability with programming languages, I think it would be better to go after e.g. IEE754 support.

The default nullability of RDBMS types is implementation-specific. Protobufs requires a lot of ugly workarounds for languages such as Java to handle the language's nullable types. You can't eliminate the complexity; only move it, and protobufs moved it in a way that punishes languages with nullable types. XML at least works via omission, but that doesn't handle the case of explicitly signalling "no data", which is a valid signal in many languages.

Concise Encoding does support IEEE754.

> You can't eliminate the complexity; only move it,

Yes and by using null-in-every type for your messaging, now you've pushed the problem out of your app and published it to the world.


These schema are just applied uses of JSON and not modifications.

But XML was perverted due to its inherent extensibility, The JSON syntax hasn't changed: scalars, arrays, associative arrays. That's all! So it has remained "pure" so to speak.

Fortunately the only issue I've run into with JSON is lack of proper NAN and NULL support, but that is largely a limitation of the language's implementation (can't support NULL if you ain't got a NULL!).

JSON has null. https://www.json.org/json-en.html

The main thing I miss is dates.

Yes, it has null, but when you parse it with, say, Boost::property_tree, null is 0, which is not true null because there is no true null in C++. Language limitation.

Cool link, btw. For all the years I've been using JSON I never visited the main page!

So that's not a problem in JSON. Why can't the library output nullptr?

nullptr isn't "null", it's a null POINTER. In languages with a NULL type, and variable can be a NULL. E.g., in python you can say "x = null; x = 1;" in C++ you cannot say "int i = nullptr".

comments would be nice

It’s almost as if making something ‘simple’ and ‘lightweight’ doesn’t let you escape the complexity of the real world in which you have to use it.

The point is not to "escape" complexity, but to manage it.

Of course, these type of "I hate the word simple" comments are always besides the point. They could be the whines of frustrated, incompetent programmers who continually produce overly complex solutions to relatively simple problems and call it a day. We all know those programmers exist.

In a way it is a side-effect of attempting to bolt the same abstractions the 'old and complicated' system had on top of the 'new and lightweight' which essentially takes the problems of old and puts them back on top of the simple stuff.

Sometimes it doesn't happen like that, i.e. the whole clang-ir-llvm thing, but it often does (mostly during serialisation and API exports for third party consumption).

Take WSDL and XSD as an example, the idea was that you put your details in a file that can build on other abstractions and the result is a universal and portable format that allows any system to work with any other system. That failed of course because you end up exporting implementation details inside of those 'portable' abstracted specifications making it no longer portable because you now have to implement the exact same details on the 'consumer' side as well.

Comparable things happen with Protobuf and GRPC due to the same reasons. It's not as bad (yet) due to limitations on data formats you are allowed inside the protocol, and it makes it a bit more an RPC with message passing rather than a complete API encapsulation. If the maintainers can prevent it from eating in to serialisation of arbitrary objects it might even not end up all that bad after all. But that was the theory with XML as well and we all know how that worked out.

To me it doesn't seem like we are going back and forth. Rather that we are moving forward in a zig-zag pattern. One win is cultural.

In XML-world, you felt this shame when you weren't use schema. You didn't have to, but you really ought to kind of deal. Now it's instead "hey there are schemas if you want that tool". To me that's progress.

In what way is this overcooked?

They made the same poor choice as the designers of XML schema, which is to use the target language itself to describe the schemas. That is, a JSON Schema is written in JSON. As a result, they're extremely verbose and tedious to work with.

If you contrast JSON Schema with the type declaration subset of TypeScript, you'll find that the latter is much more concise and readable.

Having said that, there's an interesting discussion to be had about the distinction (if any) between schemas as understood in the database/XML/JSON world and types used in programming languages. JSON Schema does more than most programming language type systems do (e.g. requiring string lengths/numbers be within a certain range) and this is useful for validation. It still doesn't excuse the syntax though.

The advantage is that you don't need to have another parser at the ready to parse your schema definition.

You can also more easily generate schemas, or generate things from schemas, if they're in JSON.

Sure, you can still do these things when schemas are a non-JSON format, but it increases the barrier of entry to writing tooling on top of schemas.

It's technically possible to devise an alternative syntax and a bidirectional converter between it and the json jsonschema syntax. potentially the current json syntax could become some sort of "intermediate representation" not meant for direct human consumption but still useful to simplify implementation of tools that generate or interpret schemas (which wouldn't need a full fledged parser)

A quick web search didn't yield any existing project attempting to do so, but perhaps I'm not typing the right keywords. Does anybody know if this exists?

If not, would you like it? What would be the main characteristics such a syntax should have? (Mine pet peeve are comments, i.e. lack thereof in json)

I am building one, but it just started. Turns out we can even implement module system on json scchema. It's going to be convention/opinion based output; a subset of how a thing is described in jsonschema e.g. tagged union.

Jsonschema is a mixed bag of structure and validation, that sounds simplified, but its expressive power comes from the mix. For example, I have to limit the use of "Applicator" keyword (oneOf, allOf, anyOf) only under my controlled keywords under $defs .. urggg.

I hope in 2-3 months I could get a beta out.

I think there's an argument to be made that validation should be done with a turing complete language to be most effective. However, JSON schema, while weaker and more verbose does have the advantage of programming language independence which whatever typescript has absolutely does not.

Either way JSON schema is fairly lightweight and didn't try to accidentally create an awful new programming language (cough XSLT cough). I don't see it as indicative of any of the kind of fuck ups XML went through.

It was a great choice for XML... I've used XSLT to transform schemas into code many times.... (related: XSLT is more readable than JSON equivalent 'jq')

XML schemas were not overcooked.

In a similar vein, apis.guru has a f*ton of OpenAPI definitions (550) https://apis.guru/openapi-directory/ (I am not affiliated, just a fan)

The count is sadly out of date. We now have over 3000 OpenAPI definitions! (Maintainer)

I just swapped from doing JSON schema to using TypeScript Interfaces for schema I'm working on. https://github.com/ellisgl/keyboard-schema

Aaaaaand now we’re back to XML, but with less readable syntax and implementation consistency. Sounds like a win for everyone.

We're nowhere close to xml.

Namespaces, externally defined entities, CDATA, oh my!

The same thing is happening with the major dynamically-typed languages adding optional static typing.

Things we do/invent for "developer comfort" -- binary serialization FTW.

Wow that’s impressive. It’s awesome to be able to see how complex some of these are like cloudformation.

Given that AWS publishes a pseudo-schema[1], in JSON, I don't for the life of me understand why they don't use an actual JSON-Schema spec, saving the world a ton of trouble instead of building a separate project[2] to try and reverse engineer their spec into a standard schema :-(

1 = https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGui...

2 = https://github.com/aws-cloudformation/aws-cloudformation-tem...

Are there any tools which can parse a JSON document and its schema definition to construct a queryable graph? The ability to see and traverse relationships in particularly complex JSON documents (like an OpenAPI definition) could come in handy.

Now just someone has to build a converter (like pandoc) for these formats and yaml and dhall and xml and ini and gron/greppabblejson/mozilla format and we finally can use any format we want

and two other questions:

* is there a comparable site for xml?

* is there a python module for each format? :D

For those that enjoy a bit of historical perspective, it is interesting to note that similar schema repositories existed also for XML (for both DTD and XML Schema). They were, however, all behind paywalls or registration forms.

It is impressive to see how much and how quickly the world has switched to "open-first" when it comes to schemas, documentation and software in general.


* BizTalk (by Microsoft) https://web.archive.org/web/20000301195519/http://www.biztal...

* https://web.archive.org/web/20000511095737/http://www.openap...

Bonus: A review of XML schema repositories: https://www.xmltwig.org/article/bw/bw_04-schema_repositories...

Has someone made a version of this but as a type definition for Typescript? I'd find that interesting and wouldn't mind contributing to a Github repo.

Should be technically possible, though I don't know how strictly they map to/from each other.

JSON schema to TypeScript - https://github.com/bcherny/json-schema-to-typescript

TypeScript to JSON schema - https://github.com/YousefED/typescript-json-schema

With version 1.0 of TOML it is JSON compatible AFAIK, so it should be possible to check it against json schemas as well.

Interesting. Is there documentation on how to programmatically validate Jason against a schema?

There's some nice python packages to natively use it. https://pypi.org/project/jsonschema/

We use it at Uclusion to make sure our front end is passing the right arguments, by having it be the first thing our lambdas do with the request body:

  from jsonschema import validate as validate_schema
  def validate_syntax(self, event):
        Subclasses are expected to override this to handle any syntactical validation different from here
        :param event: the request event object
        schema = self.get_schema() # implemented by individual validator objects
        validate_schema(event, schema)

The JSON Schema specification: https://json-schema.org

Are there any vim or emacs plugins that utilize these?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact