Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: TypeSchema – A JSON specification to describe data models (typeschema.org)
117 points by k42b3 40 days ago | hide | past | favorite | 49 comments



Three big downturns for me:

1) They do not publish rationale of why the world needs yet another protocol / language / framework on the homepage. It is hidden in https://typeschema.org/history

2) In the history page, they confuse strongly typed and statically typed languages. I have a prejudice about people doing this.

3) The biggest challenge about data models is not auto-generated code (that many people would avoid in principle anyway), but compressed, optimized wire serialization. So you START with selecting this for your application (eg. AVRO, CapnProto, MessagePack etc) and then use the schema definition language coming with the serialization tool you've chosen.


> ... auto-generated code (that many people would avoid in principle anyway)

Auto generated code is 100% enough, sometimes.


I still have not found any way to use autogenerated code for Java/Spring that can handle updates to an external OpenAPI spec.

Any pointers?

(Serious question).


Point #1 was my biggest turn off. Numbers 2 and 3 are good points too.


> 1) Yet another protocol etc.

Agreed.

> 3) The biggest challenge about data models is not auto-generated code

I would say auto-generated code is most definitely the harder problem to solve, and I’d also go out on a limb and say it is THE problem to solve.

Whether it’s JSON, XML, JavaScript, SQL, or what have you, integrating both data and behavior between languages is paramount. But nothing has changed in the last 40+ years solving this problem, we still generate code the same clumsy way… Chinese wall between systems, separate build steps, and all the problems that go with it.

Something like project manifold[1] for the jvm world is in my view the way forward. Shrug.

1. https://github.com/manifold-systems/manifold


also the output in markdown and php doesn't seem good


I mean, Java and Go are strongly typed languages if you consider Object a = new Integer(); a = new Float(); to be strong.

They are also strict of cause


Have you heard of wit? I suspect we'll see use outside of WebAssembly. https://component-model.bytecodealliance.org/design/wit.html

It has non-nullable types, via option, which makes non-nullable the default, since you have to explicitly wrap it in option. https://component-model.bytecodealliance.org/design/wit.html...

A way to represent types commonly found in major languages would be nice, but it would be better to start with something like wit and build on top of it, or at least have a lot of overlap with it.


That was a great read and it gave me several ideas. Thank you.


Why reinvent https://json-schema.org ?? Pros/cons?


From my understanding, JSON schema describes the schema of JSON objects with JSON. This one describes a variety of types of schemas with JSON.

So it could be typescript, Go, GraphQL, etc. It seems to output to JSON schema as well. I guess its main purpose is to share the schema between different languages. Which I imagine works with JSON schema too, but this takes it a step further and handle all the mapping you'd need to do otherwise.


json schema has nuanced and expressive constraints to validate information exchanged in json serialization.

typeschema in contrast seems to focus on describing just the structure of data with the goal to generate stubs in a wide variety of programming languages.


so why not sub-setting JSON Schema? Like done with XML Infoset for example compared to XSD. And extensions are also possible to achieve POCO details as needed.


I find it interesting that the Go serialization just duplicates the props rather than using composition: https://typeschema.org/example/go

Seems a bit naively implemented.

Ideally, the duplicated props in Student would just be a single line of `Human`.


Comparison between TypeSchema and LinkML for those interested as I was. https://www.perplexity.ai/search/please-compare-and-contrast...


What's the benefit over existing variants like Swagger/OpenAPI/JsonSchema ?


It feels like a convert solution, as it can transform TypeSchema into JsonSchema.


Yeah, I'm not really following the line of reasoning presented on the "/history" page: https://typeschema.org/history

It seems to me like a mischaracterization of JSON Schema to say you can't define a concrete type without actual data.

I am a very stupid individual so I could be misunderstanding the argument.


I can't really follow those arguments either. For example the empty object example {}. Why is this bad? Types without properties are a real thing. Also an empty schema is a real thing.

The thought I do get: JSON Schema primarily describes one main document (object/thing). And additionally defines named types (#/definitions/Student). But it's totally fine to just use the definitions for code generation.

The reference semantics of JSON Schema is quite powerful, a little bit like XML with XSD and all the different imports and addons.


Maybe it's just me, but I've never been able to get a complex type schema to work properly with JSON schema.

The moment you have types referencing other types in a way that can become recursive in ANY way, the whole thing seems to explode.


Heh feels like Json schema to me too... Same, but different.


Feels much weaker/naive than JSON Schema, as TypeSchema barely has any constraints.

The TypeSchema spec is hard to comprehend as it doesn't delve into any details and looks like just a bunch of random examples with comments than a proper definitive document (e.g. they don't ever seem to define what "date-time" string format is). I don't see a way to say, e.g., that a string must be an UUIDv7, or that an integer must be non-negative, or support for heterogeneous collections, etc etc.

Maybe it has some uses for code generation across multiple languages for very simple JSON structures, but that feels like a very niche use case. And even then, if you have to hook up per-language validation logic anyway (and probably language-specific patterns too, to express concepts idiomatically), what's the point of a code generator?


"What is the difference to JSON Schema? JSON Schema is a constraint system which is designed to validate JSON data. Such a constraint system is not great for code generation, with TypeSchema our focus is to model data to be able to generate high quality code."

They have more details on the History page.


Those are certainly words, but since the words they use to describe what differentiates them form JSON Schema is just asserting that their thing is for exactly what has always motivated schema languages including, but not limited to, JSON Schema, and since JSON Schema supports that purpose far better, I am left confused

At best, I can guess that maybe they are trying to get at the fact that JSON schema supports some structures that can be awkward or unidiomatic for data models in some languages (a lot of what you can do via allOf or oneOf fits this) and they want to narrow down to something where what can be defined in the schema language is also simple idiomatic structures nearly everywhere, but a restricted profile of JSON Schema would get you there much faster than starting from the ground up.


> narrow down to something where what can be defined in the schema language is also simple idiomatic structures nearly everywhere

It feels more like a lowest common denominator to me, which is frequently (in presence of anything non-trivial) the opposite of idiomatic.

For example, JSON does not have monetary/decimal type, best option available is a string. It would be very opposite of idiomatic to have a C# or Python code use a string in the record/dataclass, instead of a decimal, if the actual JSON document field has the "monetary value" semantic.

And TypeSchema seem to ignore aspects like nullability and presence requirements, making assumptions that everything can be null (which can be wrong and even harmful, as if Java haven't taught us anything).

Maybe I'm thinking wrong about it and the idea is to have separate wire and public API formats, though, where the wire format is minimal JSON (TypeSchema can work, I guess, although I still have concerns about nulls - and distinguishing between nulls and absence of the field) and then that intermediate almost-over-the-wire-but-deserialized-from-JSON-blob object representation is adapted into a language-specific idiomatic structure. I always felt that such approach is way too verbose and adds a lot of boilerplate without a good reason, but I could be wrong about it.


Yeah, “idiomatic” may have been a poor word choice, I really meant closer to “simply representable”. oneOf, for instance, lets you very easily define flexible, concise structures in JSON Schema that OO languages without union types may not express naturally if at all, and which may not be natural to work with even if they cna be expressed in many languages.


This makes sense, but I think it's even a better reason to not use a code generator (which forces certain patterns on your code), but rather think about the best language-native way to express a certain concept you want to express.


Priority to high quality generation of good code from nice schemas that allow it (accepting that the schemas will be not very expressive and often too loose) vs. priority to faithfully representing and validating JSON documents that conform to general, detailed schemas (accepting that code generation won't be particularly flexible).


Restricting JSON Schema would've been my approach to this "problem" too.


Yeah, I've edited my comment above and added the last paragraph with a note about it. Must be a really weird use case when you need to write a bunch of code in different languages (probably writing libraries for some API or JSON-based data interchange format?), and is also not concerned about validation and language - because if you need validation, you're writing code by hand either way, so code generation becomes a curse rather than a blessing.

I would've understood if it would do the inverse - read source code in any of the supported languages, and check if the structures it define it conforms to the schema. That would make sense for testing that those structs aren't too diverging between codebases (have the same-shaped fields). Even then I'm not sure I see the point because various languages tend to use different language-specific things (like an UUID type or enums/symbols/atoms, etc.) to make developer feel at home rather than in a barren JSONland.


It looks far more constrained, especially when it comes to the validation logic, which makes sense validation-wise but honestly quickly becomes a "fate shovels shit in my face" kind of situation when it comes to code generation. As much as I love this sort of constraints I also find the union-type discrimination style "meh".


Kotlin classes are (seemingly) all generated as open classes, rather than data classes. Surprising choice - is this an intentional design decision? Wondering if I am missing something


The output in various languages in rather questionable. Not wrong per-se as it's totally valid code, but just.. not idiomatic and not how a developer fluent in that language would implement it.


Hi man - Don't take my tone the wrong way but it's the only way i can express this. I will never, ever - EVER use your craft project without a complete series of unit-tests. Especially one like yours. I stop reading immediately and just go on about my life.

Good effort though.

Edit: Oh I thought it was yours. Well I'll leave this up anyway.


I once read a paper about Apache/Meta Thrift [1,2]. Similarly, it allows the definition of data types/interfaces and code generation for many programming languages. It was specifically designed for RPCs and microservices.

[1]: https://thrift.apache.org/

[2]: https://github.com/facebook/fbthrift


The rust generator seems not to place generic parameters on the type itself?

use serde::{Serialize, Deserialize}; #[derive(Serialize, Deserialize)] pub struct Map { #[serde(rename = "totalResults")] total_results: Option<u64>,

    #[serde(rename = "entries")]
    entries: Option<Vec<T>>,

}


Why is everything nullable?


Kinda crazy question, but why not support SQL table/column DDL (nested JSON or arrays within those for bonus points)?


This is great. Some positivity, since many comments are on the negative side.

It's exactly what I need to connect .py with .ts


A map represents a map/dictionary with variable key/value entries of the same type.

Why?


If I had the spare time I would love to contribute dart support, but alas...


Looking at the Kotlin or TypeScript examples, it would be preferable to use one of them as source and parse it to output other formats. An LLM would probably be good at doing this too. Unless it can do more than generate boilerplate code I can't see needing this.


You don't need an LLM for that task.


TypeScript but no JavaScript is a tiny bit disappointing. I still like to be able to work on front-end code without needing to run separate build tooling.


It can actually generate the JavaScript type definitions for all possible inputs at once! Here, I'll copy and paste the result for you:

Hope that helps.


This tool can convert to JSON Schema, so it can be used with validator libraries. Either way, validation and static duck typing based on schema are two separate concerns, and the latter is impossible without something like a Typescript compiler (or checker if using jsdoc-style Typescript).


From the history page:

> JSON Schema is a constraint system which is designed to validate JSON data. Such a constraint system is not great for code generation, with TypeSchema our focus is to model data to be able to generate high quality code.

Well, types themselves are another type of constraint; specifying something like the type (number) of a property (Age) is a constraint on the possible values for that property.

> For code generators it is difficult to work with JSON Schema since it is designed to validate JSON data

There's lots of features in JSON Schema, but if you're writing a code generator, you don't actually to have support all of them. Some languages like C# don't have native lang support for AllOf, but do support OneOf or AnyOf.

> JSON Schema has many keywords which contain logic like dependencies, not, if/then/else which are basically not needed for code generators and really complicates building them.

So isn't the whole point of code generators for OpenAPI/JSON Schema is that they generate code to map foreign data sources in a target programming language so that the programmer doesn't have to write all this mapping code by hand? The vast majority of all programming languages support if/then/else and even modelling dependencies like what JSON Schema supports. So why is it a bad thing if a schema language like JSON Schema supports these constraints? Wouldn't having support for this in the schema language and a code generator mean less handwritten code and more time saved?

If a schema constrained Person.Age to be an integer always greater than 0, I would actually really love for that validation code to be generated for me. Chances are, if such constraints exist in the schema, then the vendor/original author probably has good reason to include them; having it code generated automatically means I don't have to worry about writing it myself.

I mean if you want to generate the code for only defining data structures & their properties, you can still do that, and just stop there. A whole new schema language in JSON seems wholly redundant. It seems like maybe the authors should've started with writing their own better code generator for JSON Schema rather than their whole new schema language and code generator.

Finally, reading the spec https://typeschema.org/specification, I can't see support for optional values or null/nullable types. I'm hoping it's just an oversight (seeing as how the spec itself is incredibly brief), but if it isn't, then bluntly, then this isn't ready for release.



?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: