Hacker News new | comments | show | ask | jobs | submit login

Those 2 ideas are orthogonal and I can't possibly see any connection there.

There's a difference between in-process/in-memory storage/usage of a data-structure and long-term storage on disk or serialization for the purposes of inter-process communications, especially in case you don't control both ends of a conversation or in case your components are a heterogenous mixture of platforms.

Lets say you want to build an index, like say, a B-Tree or something. Are you going to store it on disk as a binary? It surely saves time on rebuilding it and B-Trees can get big, their whole advantage against normal BSTs being that they are more optimal in case their size exceeds the available RAM, so they are meant for being stored. However, the B-Tree itself is not the actual data. It's just an index. You don't care much about losing it, since you can always rebuild it out of the data that it indexes. And most importantly, you aren't going to use Protobuf to store it ;-)

What about the actual data? If it's a database, like a RDBMS, well it's a black box and it's the norm to store things as binary blobs, again for efficiency, but if you haven't followed the trail of people that have had problems migrating their binary storage between different versions of the same RDBMS, or recovering from backups, let me tell you, it ain't nice. Which is why you can't speak of having backups unless you're doing regular SQL dumps. And most importantly, you aren't going to store anything using Protobuf ;-)

What about unstructured data, like terabytes of log lines? Now this is where it gets interesting, as unstructured data is the real source of it all in systems that collect and aggregate data from various streams. You end up storing it, because it's simply data that you may want to parse at some point. Are you going to store that as binary blobs like with Protobuf? You could do that, but it would be your biggest mistake ever, as that data will outlive your software or any of the current business logic, plus the format will evolve a lot ;-)

As for API communications and protocols, I'm unconvinced that something like Protobuf brings performance benefits over plain JSON and I mentioned in another comment that I do have experience and have done comparisons with various bidding exchanges that were sending tens of thousands of requests per second our way. Maybe at a Google-like scale it brings benefits, but for the rest of us it's a nuisance.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: