Hacker News new | past | comments | ask | show | jobs | submit login
Gobs of data (2011) (go.dev)
89 points by ash on Dec 4, 2023 | hide | past | favorite | 58 comments



Interesting. I wonder to what extent it's found use at Google over this past decade.

There are advantage to being language-specific, but a lot of disadvantages, as well (speaking as someone who recently had to write some Elixir code to unmarshal a Ruby object...). It seems hard to introduce this since you're forcing all communicating services to be Go-based, which is kind of contrary to the independence that microservices usually affords you.

Some of the benefits are simply design goals (e.g., top level arrays) which could also be done in a language-independent protocol. And even performance questions probably could. Like, Cap'n Proto I think is designed so that users of the protocol don't have to serialize/deserialize the data, right? They just pass it around and work with it directly.

I can see Rob Pike being frustrated with Protocol Buffers at Google, and I don't begrudge anyone for taking a big shot like this, but I wonder if he's found any success with it.


Yeah, after years of dealing with language-specific serialization formats---and inadvertently learning internals of them (including Go gob, Python pickle and PHP serialize), I'm over. And gob is not even a schematic serialization format (i.e. not only you don't need to define a schema beforehand, you can't). There is some interesting idea, but that's all. Use a well-known schemaless serialization format with some extensibility [1] if you really need.

[1] Maybe there was no suitable one when Go was first created. Nowadays I believe CBOR is the best format for this job.


I'm in the same boat. Not to mention security concerns that often crop up in (interpreted) language specific deserialization (I'm looking at you, pickle, thinly veiled `eval()`). I agree that CBOR should generally be the serialization tool of choice for self-describing data (ie. in places where you might otherwise choose JSON).

And if your language of choice doesn't have a CBOR lib, CBOR is fairly easy to implement and writing a encoder/decoder is very fun! I recently completed my implementation for the Gerbil Scheme language last week [0].

[0]: https://github.com/chiefnoah/gerbil-cbor


Yep, I very much agree with this. It's probably inevitable that languages with reflection grow some kind of language-specific serialization format because a) they can and b) there's often some use case where it looks handy to have a format adapted to the quirks of the language. Plus, when some of these bespoke formats were created, the world hadn't converged as much as it has now around a few common text and binary formats.

But now the interoperable serialization formats have a lot more energy spent on tools and such, and the formats are better-specified, to the point that you probably want to use them even where interoperability with other stuff doesn't force it.


As an engineer at Google, my opinion on Protocol Buffers changed massively. Pre-Google I found them awkward, the language bindings in Python sucked, and I didn't really see the point. I knew a schema for services was a good idea, but protobufs didn't seem like the best option.

The thing is that at Google protobufs are used everywhere. Like, absolutely everywhere. Think of all the places they could be, and it's way more. All the tooling understands them, code search and go-to-reference works on them everywhere, they are truly transformational on how many different services with many different implementations interact.

Are they perfect? Far from it. But a Go-only implementation misses almost all the value of protobufs. If I was inventing them for Python I could do better (pickle? maybe not) but the whole point is that they aren't language/ecosystem/use-case specific. If this was Rob Pike's frustration then I can't help but feel he missed the point, or this post is a little disingenuous as to the benefits.

I've not seen Gobs used in Google, but I'd imagine an engineer would need to make a very strong case against using protos between services, regardless of if both services are in Go.


It's called "Larry & Sergey Protobuf Moving Co." for a reason. See the t-shirt design in the video screenshot here: https://isocpp.org/blog/2020/07/cppcon-2019-there-are-no-zer...

Disclaimer: an employee at the said moving company.


In my opinion this is pretty on-brand for Pike.

Back when he did his best work, it was possible for one person to "just write the new thing", without making it fit with anything else. There was nothing else to fit it with. You could invent everything from scratch, and not only was it not a waste of time, if you were good enough it had a chance of being the best fit for purpose.

You could take shortcuts. You could have every part of your system be "odd", because nothing was "even".

That's not true anymore. And the way I see it Pike has not moved on.

Science in general had this switch at one point, too. There was a point where one could know all of science. But it's long gone.


This is an unnecessary ad hominem attack. He wrote this over a decade ago, in the midst of doing his best work creating Go itself.

gob was his opinionated way of doing a Go-specific encoding, while also supporting any number of other encodings in the language. Go has incredibly good support for almost every popular encoding there is.

gob has also been used successfully by a number of projects. In many cases, it's a perfectly good way to encode a piece of data that is completely local to a Go program.


> This is an unnecessary ad hominem attack

I didn't mean it as such.

> in the midst of doing his best work creating Go itself.

Not to go too far off topic, but Go is another example. It famously ignores decades of language theory, and they wrote their own assembler, linker, etc.

Now, much of that has been undone and rewritten, as Go became more adopted, requiring playing well with the rest of the ecosystem.

(but much of it we're unfortunately stuck with, because it's part of the language)

30 years ago there was no ecosystem to play well with, and compared to now we were just banging rocks together. Back then you could be a CS polymath as one person. Well, I couldn't, but Pike could.

It was the old days of John Carmack starting every game engine with an empty directory.

I'm saying that today nobody can. Even John Carmack could not on his own write a AAA game. (I know ID had other coders, but my point stands)


Well, to go off-topic with you, the idea that Go "ignores decades of language theory" is just an opinion that you hold. A factual statement would be to say that the Go designers omitted a great many features other languages have included.

The idea that they did so out of ignorance is ridiculous, given the background of the Go design team. They made considered decisions about what to include in Go.

A fixed-gear bicycle is not "ignoring" decades of bicycle design theory.

Creating a programming language is an engineering task not a theoretical task. Which means there are major trade-offs to be made. And they chose their trade-offs. The wild success of Go should at least make you consider whether or not they're better at making these trade-offs than you, and most language designers, are.

Maybe they were wrong about some choices, which is why Go is still evolving, but they were self-evidently mostly correct.


Most of the Reddit/HN/Twitter crowd that are parroting the "Go ignores decades of language theory" line are just parroting something that originated in an incredibly toxic part of the Scala community. The vast majority of them don't actually understand the tradeoffs and implementation complexity that are associated with the specific subset of language features they desire. The debate about a particular feature is often not even concluded in languages that include said feature. For example the Swift team still has serious disagreements about generics and their respective implementation.

I've seen Pike have conversations about language design with SPJ, Hejlsberg, Lattner, Stroustrup, Odersky and other highly respective PL designers and they would never make such a shallow and trite comment.


Well, that's your opinion.

You seem to have gotten a bit emotional about this, so I don't think this'll go anywhere.


The comment you're replying to is calm, reasoned and courteous.

Claiming the poster is being emotional is... poor form, to put it mildly


Oh. I found it very defensive and aggressive.

Putting scare quotes around what I said, calling what I said "ridiculous", and seems to have taken critique of the Go language personally by saying that the Go language designers better than most language designers.

So this is why I found the reply neither calm nor courteous. As for reasoned, I don't see any reasoning it it, just conclusions.

I don't even see any sign of a refutation of my point, which isn't so much about if Go is good or not, but about the needless reinventing and ignoring other work, and as a result running into problems that would have been predictable, had they been taken into account.

To me it summarizes as "they're smart, you're dumb".

If this is the type of replies this person would write, then that's not productive.


> and they wrote their own assembler, linker, etc.

True, but over 30 years ago. In fact, said compiler was the very first program written for Plan 9. Those decades you speak of came afterwards.


Interesting hypothesis. I can't comment on Pike's history here, but at Google there's certainly a noticeable difference between old special cases that were built pre-2015 ish, and the modern world where everything is very cohesive. I get the impression that there was a big push to achieve that, led from around that time by various products. It's hit some areas more than others, but there do seem to be almost no new products building in the way you've described now.


Well, it is 2011 post. Now Pike is retired, I don't think he cares or matters anyway.


I think he matters, for computer history. I never aimed to have him care what I say, though.


As awkward as Protobufs might be - is there any similar format with so many client bindings? I tried comparing the message formats I could find (e.g. Cap'n'Proto, Protobuf, msgpack and others), and still Protobuf seems to have the most supported languages. I'd be happy for any other suggestions!


> I can't help but feel he missed the point

Did he miss the point, or was the point to light a fire under the protobuf team to improve their product? Which they eventually did (e.g. required fields were removed after this post was made).


FYI: required fields were re-introduced in protobuf. why? because despite what the gobs team and others thinks, that is a necessary feature to work with data structures. If "everything is optional", what is the value of any structure, over sending an untyped associative array of stuff?


"Optional" fields, as in fields you can tell if they were explicitly set or not, were re-introduced. Depending on your definition either required fields were not reintroduced, or required fields with a mandatory default value were until recently the only kind of field in proto3.


It's possible that was an aim, and I'm not familiar with the timelines here, however this does again focus more on the mechanics of protos which I still don't think matter as much as them being fully integrated into a company like they are at Google. The mechanics of them are still not great. Maybe this was a blocker to that happening though?


Nice recap.

By the way, just curious what is your opinion on MessagePack?


Just finished removing this encoding in our production services.

It panics on malformed input which is a no go for us since high availability is really important for us, and it showed quite a lot in the performance and memory profiles (roughly 5 times the time and memory as doing the same with JSON).

The code was converting some data to gob, and storing it in the database for later.

We now just do the same but in json, it’s human readable and Postgres validates that the data is valid JSON.

And unmarshaling it does not panic.


That's interesting because I've had basically the opposite experience. I used encoding/json with BadgerDB and saw that json.Unmarshal in a hot loop was using about 68% of total CPU time in a profile taken from production. By switching to gob it significantly decreased to around 28% (for gob's decode function). I've read that decoding interfaces in gob is slow[1], maybe that accounts for my difference as I don't have any in this particular struct. Also this was a very read-heavy service, so that could be a major difference as well.

[1]: https://groups.google.com/g/golang-nuts/c/12qhqiG1J70


Have you tried the superset of json from AWS?

https://amazon-ion.github.io/ion-docs/


I've been considering adopting the gob package. I haven't used it before, so I only know what's in the docs -- and all of your claims are surprising to me. Could you share more information?

How is it possible that they were getting malformed input? This was happening in go-to-go communication, or was there some kind of cross-language interop?

Any idea why the performance was so much slower than JSON in your case? The technique described in the OP would seem to make that impossible.

Do you think it's possible the database column type or collation was somehow affecting the gob?


The column type was bytea (basically blob) so it should be stored as is by the database. The profiling showed the hotspots in the gob package directly.

The docs explicitly mention that invalid input will make it panic and that can be confirmed by reading the code or fuzzing the input.

From my understanding, there is no compile time schema so everything is done with runtime reflection and that is bound to not be super fast. Granted, JSON is the same on paper, I would guess that the JSON package had more eyes on it and optimizations.

In our case, everything was using JSON except this one component due to some historical oddity so it was also a win in terms of simplifying.


If the issue is panic why not create a wrapper func with recover and present the same interface that you want?


>[Required fields are] also a maintenance problem. Over time, one may want to modify the data definition to remove a required field, but that may cause existing clients of the data to crash.

Okay, but would you rather have it crash or allow for a program to run on the wrong data? Especially if you do that and then say that everything has zero as a default value.

The question remains whether the serialization format should be taking care of that, or a round of parsing later on with a schema on the side; but if you do the former without the latter you're setting yourself up for deployment nightmares


Indeed, I'd rather have the program crash rather than repeat this nightmare: https://specbranch.com/posts/knight-capital/


If you want to deal with the crash and justify why the system went down because you were more correct than the other guys, then sure.

Protocols often represent an interface between organizations. Especially when that is the case, you want to be as charitable as possible when accepting input, because getting any issues resolved may very well require getting the two organizations to agree.

Also, as things change over time, an overly strict interpretation when receiving packets will require unnecessary rework in the future, and possibly down time or lost business.

When dealing with protocols, it's generally best to be strict when emitting packets and as tolerant as possible when accepting them.


> If you want to deal with the crash and justify why the system went down because you were more correct than the other guys, then sure.

I think it's more like, 'if you want to deal with the crash in test instead of having to justify why it crashed in prod.'

> Especially when that is the case, you want to be as charitable as possible when accepting input, because getting any issues resolved may very well require getting the two organizations to agree.

I don't think we've been fans of "be rigorous in what you emit and permissive in what you accept" since IE6 showed us the error of our ways. "Be rigorous in your implementation of a permissive spec" is as far as we should go.


That's the motto for browsers and I agree with it in context, but if it's something you control (like services of a distributed application) then not really. You can just make sure the versions match during deployment and save yourself some debugging headaches

Not if it's something sensitive either, where maybe crashing is preferable to running the wrong way


Largely agree, with the addendum that it's a really good idea to collect metrics as to how much tolerance your code has been required to show. Whether you need to present those metrics to the sender and ask them to tweak their emissions or simply keen an eye on them is situation dependent, but having them at all is definitely in the "future you will thank current you later" ... and I will absolutely confess that current me has cursed past me for not doing so on more than one occasion, and I can only hope I remember more often in the future ;)


Someone made a benchmark of serialization libraries in go [1], and I was surprised to see gobs is one of the slowest ones, specially for decoding. I suspect part of the reason is that the API doesn't not allow reusing decoders [2]. From my explorations it seems like both JSON [3], message-pack [4] and CBOR [5] are better alternatives.

By the way, in Go there are a like a million JSON encoders because a lot of things in the std library are not really coded for maximum performance but more for easy of usage, it seems. Perhaps this is the right balance for certain things (ex: the http library, see [6]).

There are also a bunch of libraries that allow you to modify a JSON file "in place", without having to fully deserialize into structs (ex: GJSON/SJSON [7] [8]). This sounds very convenient and more efficient that fully de/serializing if we just need to change the data a little.

--

1: https://github.com/alecthomas/go_serialization_benchmarks

2: https://github.com/golang/go/issues/29766#issuecomment-45492...

--

3: https://github.com/goccy/go-json

4: https://github.com/vmihailenco/msgpack

5: https://github.com/fxamacker/cbor

--

6: https://github.com/valyala/fasthttp#faq

--

7: https://github.com/tidwall/gjson

8: https://github.com/tidwall/sjson


Gob is a great serialization format! It's super easy to use, and supports go native types (kind of like Python's pickle).

For a recent project, I needed a simple key-value store. I was evaluating using a full RDBMS, but I ended up just putting gob files in a directory.


FWIW, this isn't used much by the community. Being a standard library package it still get some use of course, but for comparison, encoding/gob shows about 22.5K imports [1] to encoding/json's nearly 800K, and whereas you can see in the JSON search an ecosystem of JSON libraries, gob is basically just gob.

Calling it "dead" just invites a tedious thread about what the definition of "dead" is, so I won't, I'll just sort of imply it in this sentence without actually coming out and saying it in a clear manner. I would generally both A: recommend against this, not necessarily as a dire warning, just, you know, a recommendation and B: for anyone who is perturbed by the idea of this existing, just be aware that it's not like this package has embedded itself into the Go ecosystem or anything.

[1]: https://pkg.go.dev/search?q=gob

[2]: https://pkg.go.dev/search?q=json


Not gobs of comments but discussed at the time:

Gobs of data - https://news.ycombinator.com/item?id=2365430 - March 2011 (2 comments)


> If all you want to send is an array of integers, why should you have to put it into a struct first?

If you're sure that's all you'll ever have to do, then sure. But unless you're 100% certain that the protocol will never evolve further, having a more complex structure allows it to change in a gradual way.


It was clear, from the post, that they were saying, "If all I need is a simple array, why should I be required to wrap it in a struct?" The whole point (from the post) being that protobuf required structs but gob allowed simpler types _in addition_ to structs.


all I need _right now_ is a simple array

Nobody knows the future, and preparing for the future is a huge part of software engineering. Sending top-level arrays instead of sending them inside a struct is never the right way.


> Sending top-level arrays instead of sending them inside a struct is never the right way.

While I understand the sentiment, I 100% disagree on the 'never' qualifier.


Depending on if the format you use is self-describing or not, it’s possible “sending a plain array” and “sending a struct with 1 field that is an array” could have the same format on the wire.

If it is self describing, the overhead could be very very minimal.

So, why would you want to send a plain array without wrapping it in a struct?


dmi knows that. dmi was saying that even if the encoding scheme allows encoding simpler types, it's often not smart to use that functionality, because you won't be able to evolve the format in the future. If you encode a message instead of a simple type, you'll be able to evolve it later as you add more features to your program.

Note that even protobufs, which doesn't allow encoding simple types at the top level, still has this debate when deciding whether to encode an array of simple types (inside a struct) or an array of structs (inside a struct). And Google's guidance is to use an array of structs if more data might be needed in the future:

>However, if additional data is likely to be needed in the future, repeated fields should use a message instead of a scalar proactively, to avoid parallel repeated fields.

https://google.aip.dev/144

>// Good: A separate message that can grow to include more fields

https://protobuf.dev/programming-guides/api/#order-independe...


this is quite old, so I'm curious about what triggered it being posted again, has something happened / changed?


I've posted it because I'm always on the lookout for simple solutions for complex problems, and especially for how these solutions are designed. The post describes the design process well.

Also Rob Pike is a great technical writer. Another example of his style is "Effective Go":

https://go.dev/doc/effective_go


yup, and if people are looking for usage I just found a gist that shows how gob handling can be useful (writing to cache that allows the reading back to be castable into the correct structs) https://gist.github.com/pioz/ca5b7a11200f54afbd76dee7acbcc06...


Just because you already knew it all doesn't mean everyone else did. I hadn't seen it before.

Sometimes even when something was posted a few years ago some people just haven't seen it yet.


It's an entirely reasonable question to ask "is there any specific as to why this is being posted today?". If the answer is no, that's fine, but there may be extra context that is interesting and not obvious.


Ten thousand people, to be exact https://xkcd.com/1053/


I used gob for my first client/server Go program, which was a "make one of something you know about to throw away" new language experiment. It worked, but I quickly turned away from it, because it would never be cross platform.

I saw gob more as an experiment that the Go team used to check the reflect package's usability. (Which sucks anyway, by the way.)

I'm surprised it's still in the stdlib. I would have guessed it would have been removed for Go 1.0, because it was already clear then that it was not suitable for anything more experiments.


It may not be a good tool for communicating between services implemented in different languages. But i’d happily use it to save stuff to disk where database is overkill.


I did not know it, but I think so few changes from the proto-buffer that it is a waste of time.


Note that this was written in 2011, while the first mention of "proto3" in protobuf repository was in 2014. So this blogpost probably influenced the development of proto3, which fixed many issues of proto2 (which is referred to as just "protocol buffers" in the blogpost).


Eh, I like Go and respect Rob Pike, but I seriously doubt gob had any impact on the proto3 design


I forgot Go was still around. Thanks for reminding me.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: