Hacker News new | comments | show | ask | jobs | submit login

People will keep laughing at me when I tell them to just produce a binary, pre-parsed format for their data.

Parsing will always be slow no matter what you do, and you can hardly parallelize it. Of course it's fast on a non-mobile CPU, but it will always be slow on a smartphone, hence the need for such a format and it's the reason smartphones use apps instead of HTML+JS.

It's funny because computers are very fast, but batteries are not so good, so are we realizing that well designed software formats have an effect on battery life?

I've seen several articles about how webpages are bloated, but I don't see any real proposition on how to fix it.




The fastest file format is always the one you can just dump into memory and fix a few pointers. But these tend to be also the hardest formats to change/evolve...

Densely coded formats, such as msgpack, are neat for archival / long term storage, but their coding means that the unpackers are quite complex and not very fast (perhaps drawing some lessons from parallel-length x86 decoding could help implementations?); they tend to get slower as the data structure becomes more complex. Since the output format of these are cumbersome, applications will copy everything yet another time. In case we're not using C/C++/Rust, these will also mean a lot of allocator activity. Plus, data validation is separate from format validation, which means at least another pass over the structure.

So purely from a performance perspective these formats are not that good (despite all their claims). Designing purpose-built formats is obviously superior, but also takes quite a bit of time, and means more parsers to test (and fuzz). Stuff like Capnproto might be an adequate middle ground, I'm not sure, never worked with it / I don't know much about how they realized it (I assume custom accessor code that mostly does nothing on x86/ARM).


> The fastest file format is always the one you can just dump into memory and fix a few pointers. But these tend to be also the hardest formats to change/evolve...

Flatbuffers is kind of like this. With untrusted data you just have to run the validation routine (that the code generator spits out) to check that all the internal offsets are closed within your buffer.

https://google.github.io/flatbuffers/


> I've seen several articles about how webpages are bloated, but I don't see any real proposition on how to fix it.

Amp - Google = Fast webpages anyone can host.

Webpages are slow because of either fetch time (which can be addressed by not sending more data than needed) or processing time (which can be addressed by minimizing your time spent in Javascript).

Pure HTML/CSS pages are always going to be fast. Just look at HN for an example. It's when you are asking users to download huge images/movies, or when Javascript is used to do all of the DOM creation and fetching of assets, that things start to slow down.



I was about to suggest Protobuf, but it looks like it has evolved into this, and/or FlatBuffers.

https://developers.google.com/protocol-buffers




Applications are open for YC Winter 2018

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: