Hacker News new | comments | show | ask | jobs | submit login

There's something to be told about learning stuff that matters. Data-structures, algorithms, statistics, machine learning, math in general doesn't change and will be as relevant 30 years from now as it was 30 years ago. POSIX doesn't change either.

Learning for learning's sake is actually pointless and can do more harm than good if done tastelessly. The best kind of learning happens either when something is cool and you want more just because you get a boner just thinking about it or when facing a real-world problem that you'd like to solve using a different approach.

For an example of pointless learning - learning a language that's similar in concept and popularity to another one that you already know is pointless (e.g. Ruby vs Python), unless you learn it because you've got specific real world needs. For learning new languages, I actually apply the following rule - if learning it doesn't hurt, then I'm waisting time.

Another example of pointless learning - anything proprietary, as proprietary things don't survive as well as open-source stuff or stuff based on standards. Actually, taste is required to pick winners - SOAP will be irrelevant in 30 years from now and anybody with taste could have seen it coming since its inception (speaking of which, Google's Protobuf won't survive either, because simple text-based protocols always win).

>Learning for learning's sake is actually pointless

I can't disagree more.

Learning latin, ancient greek, french,italian, portuguese and spanish, C C++ JAVA Javascript, ruby python coffescript and groovy, lisp clojure...

Studying the patterns help being better user of each language. Consider it a "classic" coding education. Is totally worth the time. Especially because after a while, you see the ideas, the broader picture, not the words or the code.

Being ignorant is not so much a shame, as being unwilling to learn. Benjamin Franklin

"The mind is not a vessel to be filled, but a fire to be ignited." - Plutarch

See, I can play the quotes game :-P Besides, you missed my point, taking that out of context. Read it again.

> Google's Protobuf won't survive either, because simple text-based protocols always win

I expect Protobuf, Thrift (fbthrift) and Avro to be around for quite some time. Once you know what they're good for, everyone likes binary formats that can be easily shared across multiple languages via code gen. You can turn every binary format into text-based for debugging purposes using standard commands. Heck, even Apple has binary plists. It's natural that a format eventually gains (or is based on) binary alongside text -- for those times when binary is simply faster. Now, if it's just transmission speed, you can convert text to binary using gzip.

Frankly, the key to success for protobuf or thrift as independent projects is how much adoption you see outside of the companies.

Oh and I'd also point to the "death of rest" in http://techblog.netflix.com/2012/07/embracing-differences-in... -- just as websites provide front-ends to a myriad of backend services, so too can API servers provide a consistent, simple front-end for devices. Not sure what to call it, since as a pattern, it doesn't require a specific protocol, and it could be considered somewhat HATEOAS, except state is easily maintained client-side these days....

I worked with Protobuf and while doing so I had 3 problems - the generated classes were freaking monsters, crashing my IDE, those classes must be used with the precise library version of the generator that was used and in spite of tools available for easy debugging, I couldn't find out one that didn't suck, plus the output is not the only problem, input is a problem too. In practice it's actually a bad idea to parse the whole freaking blob and/or carry around monster classes that specify how, when in essence you end up carrying only about some paths.

And while I thought that the availability of that spec was an advantage over an undocumented JSON-based protocol, in truth an undocumented spec is just as bad as an undocumented JSON-based protocol and at the very least with a JSON-based protocol you have an easier time doing reverse engineering, since playing around with the request format doesn't involve writing code to do it.

Besides, there's nothing about binary protocols that makes them better for code-generation tools. That people don't do this for JSON is only because with JSON you don't need to and it's often preferable to not do it.

There is something to be said against JSON - for objects in arrays, the keys are redundant for example, which adds up when parsing a long document. But there's nothing inherent in a text-based protocol that disallows you to first define the shape of the data before describing the data in more concise terms, while keeping it fairly readable at the same time. And I'm unconvinced that something like Protobuf brings benefits unless you're working at a Google-like scale and I did work on an ads platform that integrated with various bidding exchanges and that was operating on an insane scale and the bidding exchanges I loved were the ones with the protocol based on JSON, while at the same time I had countless problems with those based on Protobuf.

> Heck, even Apple has binary plists.

And personally I hate it, because instead of opening that file with a plain text editor, or as a text file handle, I now have to use special-purpose tools or libraries. Binary plists sit somewhere between Unix-like text config files and the Windows registry and the closer you get to the Windows registry, the more I hate it.

>>everyone likes binary formats that can be easily shared across multiple languages via code gen

Everyone? P buffers and code gen create a static protocol that's more hard to modify OR extend than text; more moving parts: the spec file, the generated code, the code generator, and the lib for your language. Code gen adds complexity to development, build, and deployment. Binary protocols are more difficult to test than typing some json into Postman for example.

Http is the canonical example of the successful text protocol.

How can you hold both of these ideas in your head at the same time?

1. "There's something to be told about learning stuff that matters. Data-structures,"

2. "Protobuf won't survive either, because simple text-based protocols always win"

If the simple inefficient way always wins, we don't need data structures.

Those 2 ideas are orthogonal and I can't possibly see any connection there.

There's a difference between in-process/in-memory storage/usage of a data-structure and long-term storage on disk or serialization for the purposes of inter-process communications, especially in case you don't control both ends of a conversation or in case your components are a heterogenous mixture of platforms.

Lets say you want to build an index, like say, a B-Tree or something. Are you going to store it on disk as a binary? It surely saves time on rebuilding it and B-Trees can get big, their whole advantage against normal BSTs being that they are more optimal in case their size exceeds the available RAM, so they are meant for being stored. However, the B-Tree itself is not the actual data. It's just an index. You don't care much about losing it, since you can always rebuild it out of the data that it indexes. And most importantly, you aren't going to use Protobuf to store it ;-)

What about the actual data? If it's a database, like a RDBMS, well it's a black box and it's the norm to store things as binary blobs, again for efficiency, but if you haven't followed the trail of people that have had problems migrating their binary storage between different versions of the same RDBMS, or recovering from backups, let me tell you, it ain't nice. Which is why you can't speak of having backups unless you're doing regular SQL dumps. And most importantly, you aren't going to store anything using Protobuf ;-)

What about unstructured data, like terabytes of log lines? Now this is where it gets interesting, as unstructured data is the real source of it all in systems that collect and aggregate data from various streams. You end up storing it, because it's simply data that you may want to parse at some point. Are you going to store that as binary blobs like with Protobuf? You could do that, but it would be your biggest mistake ever, as that data will outlive your software or any of the current business logic, plus the format will evolve a lot ;-)

As for API communications and protocols, I'm unconvinced that something like Protobuf brings performance benefits over plain JSON and I mentioned in another comment that I do have experience and have done comparisons with various bidding exchanges that were sending tens of thousands of requests per second our way. Maybe at a Google-like scale it brings benefits, but for the rest of us it's a nuisance.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact