Hacker News new | past | comments | ask | show | jobs | submit login

If your dataset is mostly a list of strings, sure. If it's anything more structured, why exactly?

I'd argue that using "plaintext" for structured data (a.k.a, inventing your own data representation) will set up both you and the users of your dataset for unnecessary pain dealing with unescaping and parsing.






> If it's anything more structured, why exactly?

It is way easier to input / type and change / update. And compared to lets say JSON, YAML or friends at least 5x times more compact (less typing is better). See the world cup all-in-one page schedule in .txt [1] and compare to versions in JSON, XML and friends that are page long dumps, for example.

[1]: https://github.com/openfootball/world-cup/blob/master/2018--...


One advantage I can think of is it doesn't need to be parsed into a node based tree structure like JSON. It's a lot easier to stream parts of it at a time.

if the dataset is "more structured" you can try to simplify this structure for great gains. As a byproduct, you get to use text files for the data.

Could you give an example?

See above the world cup match schedule [1], for another other examples with geo tree (e.g. country/province/city/district/etc.), see the Belgian Football clubs, for example [2] or for yet another example the Football leagues [3] with tiers (1,2,3, etc.) and cups and supercups, playoffs, etc. The .txt version are pretty compact with "tight" / strict error checking and JSON, YAML and friends I'd say it would be 2x, 3x or even more effort / typing. [1]: https://github.com/openfootball/world-cup/blob/master/2018--... [2]: https://github.com/openfootball/clubs/blob/master/europe/bel... [3]: https://github.com/openfootball/leagues/blob/master/europe/l...

I see what you mean. I agree, for a human editor with domain knowledge, those files are easier to read and maintain than JSON. However, it's definitely nontrivial to parse as a machine-readable format. If other projects are supposed to consume the .txt files directly (i.e. not going through the command-line utility), you should at least provide an EBNF grammar.

Example: I assume, the scorer lists are actually lists-of-lists, where equivalent JSON could look like this:

  [
    {"player":"Gazinsky", "goals":[{"minute":12}]},
    {"player":"Cheryshev",
"goals":[{"minute":43}, {"minute":90, "overtime":1}]}, ... ]

... which is absolutely more verbose.

However, if someone just went by the data, they could get parsing wrong: It looks like the outer list (of players) is delimited by spaces - however, there are also spaces inside the player names. A better approach could be to split the list by ' signs as each player has at least one time - however, players can have more than one time and could probably also have apostrophes inside their names (e.g. Irish players). So I guess, the best delimiter would be a letter after an apostrophe after a number. Except, we might also have parantheses, etc etc.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: