Hacker News new | past | comments | ask | show | jobs | submit login

One format for all use cases? Databases (passwd,group...), single-word files, key-value(-list?) files, rc files for a thousand programs?

Great idea! We should use XML for that...

XML, YAML, JSON and s-expressions are all just flavours of representing trees.

So yeah, any of that would be a much better idea than unstructured text, and yes, you can serialize all those use cases into trees. I'd steer away from XML for sake of efficiency and human-readability though.

Not everything is a tree, and neither XML nor JSON nor sexp are particularly efficient or "beautiful". And there is no canonic representation. You could strip all whitespace or indent all childs, but... And YAML for example has no nice way to put lists of single words on one line.

I'm yet to see a practical data set that could not be encoded as a tree. Maybe if you have a cyclical data structure and you want to save that directly, but then it's a simple meta-level extension. For example, Lisp reader does that when reading S-expressions. If you want to create a list like this:

  1 ---> 2 ---> 3-|
you write: #1=(1 2 3 #1#), where #n=OBJECT means "this is the object N", and #n# means "here is the very same object N too".

Yes, you can encode everything "as a tree". You can also encode everything "as binary", "as a big integer", whatever. That doesn't mean it's a good idea.

Unlike "as binary" or "as a big integer", a tree is structured. "As an arbitrarily-formatted string" would be much closer to those two comparison points.

If you think strings (or "binary") are "unstructured", think again. (Start with: what does that even mean?)

I don't think they're unstructured; rather, I know they're arbitrarily structured, usually requiring a great deal of ad-hockery to deal with them. A standard structure means a lot less work for data consumers and producers alike.

This comment is structured in the sense that it's two paragraphs of more-or-less-correct English. That doesn't make it useful to tools that don't understand English. As far as a tool like 'rm' is concerned, it might as well be unstructured.

I meant that more in terms of IPC (e.g. in pipes), but most of those use cases happen to be adequately handleable by YAML specifically, so yeah, why not?

Ideally a simplified subset of YAML; the full specification suffers from feature creep and has actual bugs.

StrictYAML ( https://github.com/crdoconnor/strictyaml ) looks promising to that effect.

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact