Hacker News new | past | comments | ask | show | jobs | submit login

if the dataset is "more structured" you can try to simplify this structure for great gains. As a byproduct, you get to use text files for the data.

Could you give an example?

See above the world cup match schedule [1], for another other examples with geo tree (e.g. country/province/city/district/etc.), see the Belgian Football clubs, for example [2] or for yet another example the Football leagues [3] with tiers (1,2,3, etc.) and cups and supercups, playoffs, etc. The .txt version are pretty compact with "tight" / strict error checking and JSON, YAML and friends I'd say it would be 2x, 3x or even more effort / typing. [1]: https://github.com/openfootball/world-cup/blob/master/2018--... [2]: https://github.com/openfootball/clubs/blob/master/europe/bel... [3]: https://github.com/openfootball/leagues/blob/master/europe/l...

I see what you mean. I agree, for a human editor with domain knowledge, those files are easier to read and maintain than JSON. However, it's definitely nontrivial to parse as a machine-readable format. If other projects are supposed to consume the .txt files directly (i.e. not going through the command-line utility), you should at least provide an EBNF grammar.

Example: I assume, the scorer lists are actually lists-of-lists, where equivalent JSON could look like this:

    {"player":"Gazinsky", "goals":[{"minute":12}]},
"goals":[{"minute":43}, {"minute":90, "overtime":1}]}, ... ]

... which is absolutely more verbose.

However, if someone just went by the data, they could get parsing wrong: It looks like the outer list (of players) is delimited by spaces - however, there are also spaces inside the player names. A better approach could be to split the list by ' signs as each player has at least one time - however, players can have more than one time and could probably also have apostrophes inside their names (e.g. Irish players). So I guess, the best delimiter would be a letter after an apostrophe after a number. Except, we might also have parantheses, etc etc.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact