Show HN: Schema.org with GitHub and scripts: A tool to make data more useful

Gehinnn · on Oct 16, 2019

I liked your survey ;) It would be great if you could disclose (in a few days or so) how many visitors took part in it compared to how many people your site visited!

I find your concept a little bit messy though (while your website design is really clean, I love it!). I think it's because you described so many ideas without a lot of technical explanations. Your scenarios are helpful, but they are also quite superficial - I don't see exactly how your solution might be of help there. I also had difficulties extrapolating how your solution might work technically from you example document.

I think you need a very robust and clear implementation for it to be really useful!

Btw, I spent some significant time on a typed data description language meant to be an alternative to JSON, also with extensive schema support. I had a vision very similar to yours, but I failed to implement it. It just turned out to be too hard to do it so that its usage is always frictionless.

Just a couple of days ago, however, I started to port some of my ideas to a very slim JSON-Schema alternative. As a key difference to your idea, I don't want to include turing complete validators into the schema but rather use globally unique type names so that these validators can be implemented somewhere else in arbitrary programming languages. It also allows for automatic UI generation.

If you want to stay in touch and exchange ideas, you can reach out to me on twitter (hediet_dev)!

IsaiahShiner · on Oct 16, 2019

Hey! Well, I can tell you that as of right now, it's hovering around 0.8% conversion, which isn't terrible. We'll see if I can improve that before activity on the page dies down.

You're right, what I have now is relatively messy. I was balancing brevity and specificity. If I was properly technical in my descriptions, it may have read like documentation. And then the other issue is that I have both business people and programmers on the page, and each gets kinda confused because I'm not sure who to cater too. Same problem with the case studies. I had them more specific before, but I cut out what I thought was unnecesary and now they appear rather simple.

I think the solution is to split the whole thing up into several dedicated pages, which I resisted because it would take longer, and this is all probably less important than building product right?

Regardless, I do now intend to add an FAQ, and write some sort of a technical paper/RFC, (which I should probably do anyway before I run off and start writing serious code). I'm not totally sure what to do about the case study, maybe just replace the entire thing with a video?

I would love to hear more about the project you tried! Naturally, I haven't been able to Google examples of people who tried this idea before and failed. It does sound different in that, I really do intend dits to be fully, totally and completely generic. Nothing other than the idea of labels with scripts which describe payloads is sacred. Any language, any style, anything. This means that it can't possibly miss a certain use case (you can just build it) but it does mean that the whole thing is rather complicated, and as others have said, maybe too complicated, like a pipe dream. I am certainly willing to accept that, but no way to find out the result of that without building it.

I really question whether a more restricted system like JSON-Schema could ever become dominant because it's much more prescriptive, especially when you have a limited number of possible data types. In my mind, data can truly look like anything, so coming up with a rigourous standard doesn't seem realistic: there can always be a more broad standard, ala https://xkcd.com/927/ as others have mentioned. I intend to beat 927 by taking the high ground. Its like Lisp: you can't make a more generic Lisp, because anything that satifisfies the components of Lisp is just a dialect. You can't have incompatible data or an incompatible competitor to DitaBase, its totally generic. I also don't see why you couldn't do UI generation with DitaBase using the controllers (I concept I have not explained yet, they allow CRUD for a specific schema).

I am not very knowledgable on Twitter, but now seems like a great time to start! I appreciate the invitiation!

o_x · on Oct 16, 2019

I don't think I understand the concept behind this, it feels a bit like an attempt at using github-as-a-database or turning it into one with schemas, scripts and validators?

thrownaway954 · on Oct 16, 2019

While I get the idea and I get the benefit (I'm a programmer)... I see this as a pipe dream. There is not a manager on this planet that is going to be able to wrap their head around this.

You need to take one of those case studies and create a video that demonstrates everything in a clear and easy to "sell to a manager" way.

IsaiahShiner · on Oct 16, 2019

Hey guys, looking for feedback on the essential concept of the idea. I'd like to hear from real programmers whether they understand what DitaBase is and whether they think they would use, either now or in the future.

It's been difficult to explain DitaBase, and I still don't know if that's because its actually a terrible idea, and much too complicated. If it is a good idea, then I think sometimes good ideas look complicated like this until people figure out how best to explain them. You wouldn't have expected the internet, or say, object orientation, to be instantly and clearly obvious what it is when they were first being talked about, and that is how I see DitaBase.

Its basically that DitaBase is 4 essential concepts: schemas, validation scripts, conversion scripts, and public version control. If you take any single one out, then the really useful part, the dit file type, doesn't work. But it's also true that these concepts have never been applied in this way (that I have found), and so there isn't any default, easy language to call upon to explain DitaBase to people readily and easily.

The answer to the question "What is DitaBase?" feels like it has to be "1, 2, 3, 4" where I list out the essential components. Perhaps, its a "common language for data", or a "general data accuracy system", but those concepts sound totally meaningless by themselves.

I really would love to hear some feedback on how you understand the idea and whether my thinking on this is sound, or if I am totally off base, being caught up too close to the idea.

Vinnl · on Oct 16, 2019

I'm not entirely clear yet on what problem this solves, but I'm just going to share my first impressions in the hope that it might be helpful.

So the first term that comes to my mind is RDF. How does it relate to Dit? Schema.org uses it, does Dit?

The problem I think it might be supposed to solve: checking whether data is in a particular shape, and converting from one such shape to another (including from older versions of a shape to newer versions).

Then I think that's primarily a problem of data from one source that should be interoperable with data from another source? As in: if I control the database and the software interacting with it, then I can simply ("simply") run a migration and update the software. If I control the API and the front-end talking to it, I can update both at the same time, or at least make the front-end able to deal with both the old version and new version without the overhead of adding Dit to my stack. So the use case would be if I don't control both the producer and consumer side of data? And in fact, there might be multiple parties on either side?

If that's the case, then that sounds like the problem the Linked Data community also wants to solve - as far as I know, not extremely successful (but not entirely unsuccessful either). Is there anything in Dit that improves upon a problem that Linked Data is unable to solve?

IsaiahShiner · on Oct 17, 2019

You are correct, DitaBase is in some ways extremely similar to Linked Data/Semantic Web/RDF etc. In fact, I would say that DitaBase is, in some ways, just another attempt at making the Semantic Web deliver on its original promise. DitaBase obviously couldn't exist without HTTP.

"Does dit use RDF?"

The answer to this question is critical to understand about how DitaBase is different: if you want it to, then yes! The validators and converters are fully Turing complete, and not limited to Javascript either. If you can make a language interpreter in C/C++ (think Javascript V8), then you can use it with DitaBase. So whatever format you want, just rebuild it in (or port it to) dit schemas. Therefore, it isn't possible for DitaBase to not be compliant with something (at least, as far as I can think) because if someone thinks its important, they'll rebuild it in dit schemas.

The reason DitaBase will succeed where Linked Data failed is by making DitaBase something that people in their day jobs want to use too, because the tools are just so good. Right now, a company with a data problem is extremely incentivized to seek a solution that solves only their specific problem. Obviously, because it's cheaper. But end-to-end solutions usually are not compatible with things like Linked Data, which needs solutions to be more general.

If a company sees that the best solution to that problem is becoming compliant with DitaBase, then obviously they're going to do it. But surprise surprise, now they're compliant with Linked Data as well.

thawkins · on Oct 16, 2019

I love the idea, but i'm a little wary of the implementation, we are heavy json-schema users, so i worry that this schema data is not going to track the json-schema standard. we got burned with swagger, where in V2 they where pretty much fully J-S compliant, but then for some reason dumped compliance in v3, so we had to kiss them goodbye. I don't need my schema's tied up in yet another propriety format, even if it is close to J-S. I cant see any compliance statements on the site that tell me what level of compliance exists.

thawkins · on Oct 16, 2019

Bad form replying to my own response, but i have inspected it some more and have some comments.

1. The site says schema.org is included, but schema.org is mainly just documentation, I woukd like something more like json-schema that I can actualy use the schema data directly.

2. I dont get why the validators have function names, surely the validator is anonymous, and its name should be a function of the property path, the argumant is always "value", and it always returns a boolean. So why do you need to declare a name at all.

3. If you cant support json-schema then you need an export function that allows you to export valid draft v8 json-schema.

4. Some way of anotating properties for graphql endpoint generation would be cool, ie generating .graphql files, or mongodb bson.

IsaiahShiner · on Oct 17, 2019

Let me see if I can boil down your comments to specifc questions that I can answer.

"Is DitaBase compliant with JSON-Schema?"

The answer to this question is critical to understand about how DitaBase is different: if you want it to, then yes! The validators and converters are fully Turing complete, and not limited to Javascript either. If you can make a language interpreter in C/C++ (think Javascript V8), then you can use it with DitaBase. So whatever format you want, just rebuild it in (or port it to) dit schemas. Therefore, it isn't possible for DitaBase to not be compliant with something (at least, as far as I can think) because if someone thinks its important, they'll rebuild it in dit schemas.

"Schema.org is just documentation. Are there programatic, queryable schemas?"

Yes, but only optionally. In theory, all that is required to be a dit is a validator. You could inherit from "thing" and just directly have some text and a validator. But if you want to use Controllors (CRUD for a specific schema), Containers (abstract the shape of the data from the content) and other things like that, then yes, you need programatically defined schemas, just like in J-S. If someone has them, then you can use them. If not, in theory, if their schema is public, then you could just make a pull request with the relavent additions. DitaBase is very open compared to something like Swagger.

"Uh, anonymous functions? Duh?"

Oh, yes that makes much more sense that the ridiculous names I was giving those in the example. That actually just never occured to me. Of course these things will have to get slowly fleshed out over time, and I suppose there's no reason you couldn't have both.

"GraphQL, BSON, custom tools, cool things"

This is the great thing about DitaBase. Since you know the exact structure of any dit, and everything about dits is generic, you can just build all of this on top, without any overhead. DitaBase will almost certainly build some things like this in house, especially something like GraphQL, and some data analysis and statistical plotting utilites. Or perhaps, similar to how VScode absorbs open source projects into itself when they get really popular, DitaBase could probably do the same.

jgotti92 · on Oct 16, 2019

Can you expand a bit more how did you get burned with it? Did you had some issues because Swagger changed the usage of some keywords from Json Schema?

thawkins · on Oct 16, 2019

we where using swagger to manage both API's and schemas, and also extracting schema files to validate the results of api calls, validate the structure of records in mongodb. When swagger dropped json-schema compliance we lost that ability.

jgotti92 · on Oct 17, 2019

Did you try using JSON Hyper Schema for defining APIs maybe?

rubyn00bie · on Oct 16, 2019

I think I like the idea well enough you're trying to express; I just am, unsure, of like how you're conretely proposing to do it. All the code examples I see are just JSON schemas; which, I totally get, but they don't at all show me what its like to implement anything with it...

Specifically...

* How do I pull this into a project? * How do I access new dits? * How do I write a dit and publish it? * Where is the source code? Is it going to be open source and free?

IsaiahShiner · on Oct 16, 2019

The first thing I'll say is that if it would be helpful, I can have an example of a real dit (showing real syntax, the way labels change on the fly, etc.) on https://www.ditabase.io/example in a few hours. I only didn't do that to save some time. Thank you for expressing interest!

So these are all questions I can answer:

1. How do I pull this into a project?

Assuming you have a dit that you can use (full of scripts and such that you need), all you have to do to use it in a project is use existing infrastructure to tell the dit to validate itself, convert to something, query the data inside, etc. The plan is that this would all work with a linux app. I suppose a repl code interpreter would also be convenient?

  dit validate workspace/assets/example.dit
  dit convert workspace/assets/example.dit "sy7QvPp9" //UUID of the schema you want to convert to.
  dit query workspace/assets/example.dit "LengthEnglish" //Lots of options, etc.

2. How do I access new dits?

All public dits would be available for free on the website. Anyone can view, reproduce, fork, pull request, etc. The starting dits would almost certainly be based on schema.org's object model. If you want something specifc, you could start at say, the DitaBase fork of https://schema.org/Organization and then look at the most popular child schemas until you find what you're looking for. The whole concept is to function the way open source software already does. If someone wants to announce they've made something new, or ask for help finding something, they use exising channels.

Naturally, some people will want to keep their dits closed. In this case, they fork public dits and keep their own closed, using whatever project managment stuff they prefer. They never have to inform anyone how they are using dits, the same way you can start a new git project without ever informing GitHub or BitBucket or something.

3. How do I write a dit and publish it?

The same way you would start a new GitHub project, but on DitaBase instead. Create an account, start a new dit, and beginning adding whatever you need for your project. Every dit gets its own project page, and dits probably will not really belong exclusively to the original owner of the project, because for really popular open dits, they will get used by lots of people.

Of course, dits are not really like regular code projects, since having a public dit that isn't on the DitaBase system isn't all that useful. You can also imagine that you might want to use tools available on the website specific to dits, like pasting a dit and asking the site to check it for you, or attempt to convert it. It might show you a graphical map of the many different components of your dit and how those relate to where you want to convert to, and if a converter is missing or bugged, you can jump right to that page and start adding one, right on the website (many converters will be extremely simple).

4. Where is the source code? Is it going to be open source and free?

At the moment, there is no product. I'm only about a month into this project and I want to get feedback before I start going nuts making a product without hearing from anyone.

I intend for the overwhelming majority of vital DitaBase features to be open source. The Linux app for example, obviously open source. Any dits someone makes, if they want them to be, totally free and open source. DitaBase can make its money in other ways, such as providing all of this stuff as a service, and capitalizing on the new market advantages of DitaBase. In the long run, I envision a search engine that queries public dit repos, rather than just webpages, and can return data directly instead of just saying "Well, here are some webpages which probably contain what you want." So, yes, I want to challenge Google search.

grenoire · on Oct 16, 2019

example.html shows an embedded JPEG as binary. How exactly does this work in terms of parsing? This question also goes for embedded JS and whatnot. How can you tell where a block ends when the contents of a block do not conform to the block's specification?

IsaiahShiner · on Oct 17, 2019

For block length, if you can't use context or control character/symbols, then the label would be required to specific the length of the payload in bytes, or similar. When you reach the end of the given length, then the next token should be the closing payload tag/brace.

The important thing here is that everything gets described. The structures within a field, such as label, meta, validators, etc. always follow the specification set by the previous label. That label then sets the specifications for the next payload. And since the validators and such are Turing complete, the specifications can be anything, rebuilt within dit schemas.

IsaiahShiner · on Oct 16, 2019

An example dit is now live at https://www.ditabase.io/example.html