
When Not to Serialize - ingve
https://hacksoflife.blogspot.com/2018/08/when-not-to-serialize.html
======
panic
This quote from
[http://erights.org/e/StateSerialization.html](http://erights.org/e/StateSerialization.html)
has always stuck in my mind:

 _Do you, Programmer, take this Object to be part of the persistent state of
your application, to have and to hold, through maintenance and iterations, for
past and future versions, as long as the application shall live?_

 _\- Erm, can I get back to you on that?_

~~~
0xcde4c3db
See also: configuration file formats. Outside of systems where configuring the
installation is basically part of the sales process, you will probably have at
least a handful of customers raising hell if version 10.0 doesn't seamlessly
run with a config file from version 1.0. And maybe even vice-versa.

~~~
repsilat
An upside to SAAS, I guess -- if the user data lives on your servers in a
reasonably structured format you can try to migrate it, and your migrations
don't have to work across too many versions if that's difficult (unlike file
format compatibility, which is a long term commitment.)

Of course, that only works to a certain extent. Removed features can't be
"migrated" cleanly, and often config files (or worse -- code written by users
in most DSLs) aren't well-structured enough to make migration straightforward.

------
Matthias247
I agree with the overall message of the article, but I don't think "serialize"
is the right term here. Serialize means going from in-memory data
representation to a flat bytearray, which is eventually persisted. There are
numerous way to perform serialization, from just memcopying the datastructures
up to defining a good and extensible persistent format and converting into
that.

The compatibility and extensibility issues are mostly coming up from the first
approach. And can often be avoided by utilizing a more flexible persistent
format, which can be anything from a total domain-specific format up to json,
xml, protobuf, etc.

------
rb808
I think Serialization and associated changes in APIs is my #1 headache for
software development. The problem is with all the new auto magic frameworks
now it seems to be getting worse not better. Anyone got any solutions?

~~~
leggomylibro
Do protocol buffers count as a 'new auto magic framework'?

[https://developers.google.com/protocol-
buffers/](https://developers.google.com/protocol-buffers/)

~~~
ninkendo
The problem is, protobuf isn't just a serialization protocol, it's also a
bunch of generated model code you have to start using in your application. You
don't just serialize your domain model to protobuf, you tell protoc to build
you some classes that _become_ your model.

Which means if you aren't careful, you can't easily move to anything else for
serialization, ever again. Your code now uses protobuf-specific objects
everywhere, because that's what protobuf encourages. I'm currently in a
codebase where countless method signatures (which _should_ be serialization-
agnostic) take or return `Message`-derived objects because, that's what we get
when we read in a request or emit a response, and using those types everywhere
was just so tempting.

And now, we have new requirements that introduce some dynamism to our data
model, in a way protobuf doesn't provide, so we're trying to move away from
protobuf, and it's turning out to require a rewrite of practically everything
because these protobuf classes _are_ our data model, so everything depends on
them.

What I've come to prefer is for serialization to be implemented a the
boundaries of your service, with your models at least somewhat isolated from
any given serialization technique. Protobuf is a foot-gun here because it
blends these roles in a way that's hard to get away from.

~~~
deathanatos
> _What I 've come to prefer is for serialization to be implemented a the
> boundaries of your service, with your models at least somewhat isolated from
> any given serialization technique._

I think this is the right way to do it. Just like how UTF-8 to a string type
is kept at the borders. Inevitably, someone comes along with a requirement
that implies the first iteration of the data modeling was not only wrong, but
backwards-incompatibly wrong.

It's hard to convince coworkers that it isn't code duplication though.

> _Protobuf is a foot-gun here because it blends these roles in a way that 's
> hard to get away from._

I'm not sure; in many ways it is just trying to give you a way to supply it
the data to serialize with those models. I'd be nice to not have the "foot
gun", but I'm not sure what such a serialization framework would look like.

~~~
ninkendo
IMO the serializers should be their own standalone classes/modules which live
separately from your application’s core types. You can invoke them when you
need to do the serialization and keep parallel versions of them for legacy
clients, etc.

ActiveModel::Serializers work like this in Rails, although I haven’t tried any
similar approaches in statically-typed languages where protobuf is so commonly
used.

~~~
aldarn
For Python there's Marshmallow ([https://github.com/marshmallow-
code/marshmallow](https://github.com/marshmallow-code/marshmallow)) and Django
REST Framework if you're using Django ([http://www.django-rest-
framework.org/api-guide/serializers/](http://www.django-rest-
framework.org/api-guide/serializers/)). Both of these work as you described.

------
osigurdson
Coming up with a format which is independent of your in memory structure can
be limiting in some situations. A successful strategy is to isolate key
persistable types in a manner that allows you to carry the old types into
later versions of your application at minimal cost. This allows you to
deserialize the data in its exact original form. From this point a series of
transforms are used to map the data to the current version of the application.
The nice thing about this strategy is it is entirely additive - transforms are
added as required and chained together and old transforms are never mutated.
Having said this, if you can get away with defining your data structure up
front, by all means do it as there are many advantages to doing so. If you
cannot (unknown requirements, large team, etc) then a more rigorous transform
chain approach can be a reasonable option.

------
faragon
Example for C (for structures not using enums nor bitfields, because are
compiler-dependant, and avoiding architecture-dependant types like e.g.
size_t/ssize_t) :

if (islittleendian() && sizeof(mystruct) == REFSIZE_mystruct)

    
    
       memcpy(buffer, mystruct, sizeof(mystruct));
    

else

    
    
       conversion_mystruct(buffer, mystruct);
    

(so you can avoid slow serialization in most platforms)

