
Python Serialization Performance - kespindler
http://www.kurtsp.com/python-serialization-performance.html
======
viraptor
> Disable Schematics Validation ... There's a cool Python trick to create an
> object while skipping the __init__ function.

Yes, but now every single time you update the library which provided the base
class, you need to re-verify that __init__ doesn't do anything new. May be
worth the tradeoff, but it really should be noted.

------
Igglyboo
Seems like a good use case for protocol buffers [1].

[1][https://developers.google.com/protocol-
buffers/?hl=en](https://developers.google.com/protocol-buffers/?hl=en)

~~~
kyzyl
It might be that they consider it important to be able to interrogate the in-
memory or on-disk representations without the help of a decoding step.
Protobufs are great for getting objects into a nice compact format to throw on
the wire, but god help you if you end up with multiple actors filling up a
queue with inconsistently encoded objects.

~~~
mahmoudimus
doesn't Apache Avro solve this problem?

------
melted
TL;DR: Things get faster if you disable validation and parse dates using more
specialized code. Duh.

It's humorous when someone who presumably cares about performance tells you
they use Python. Python is a wonderful language, but performance is not what
it is designed for. Basically _anything_ that does not require an interpreter
to run will be 10-30x faster on the same hardware, and most will also consume
less RAM and be able to use more than one core on the system efficiently. It
used to be that Python's lack of performance didn't matter because disks and
networks were so slow things were IO bound. In more and more cases that's just
not true anymore. You could be easily reading at 1GB+/sec and pushing
10-20Gbps to NICs, depending on the hardware.

~~~
viraptor
It depends what "using python" means though. Cython is pretty good at
optimizing basic code. Numpy will process your matrices and vectors using
specialised libraries faster than most manual C approaches. Shedskin will give
you a nice code framework which you can optimise in parts that matter. (insert
other specialised examples)

CPython is slow as an interpreter, true. "Programming in Python" may or may
not be many times slower than compiling the comparable code in other language.
Depends what you're doing and how you're doing it.

Also, I care about performance in any language to some extent. If I can write
a backup bash script that takes 2h, or write one that takes 20min, I do care
about performance and will choose the second one. Why shouldn't I?

~~~
kyzyl
Is Shedskin still actively maintained? I haven't looked in a long time but
last I saw it hadn't gotten any updates in a couple years (maybe I have
corrupted memory though ;)

Anyhow, the relatively new Nuitka project seems to be aiming to tackle the
python-to-c++ compiler problem, and seems to have a lot of promise. Really
good compatibility, apparently decent speedups, and cross platform support.
Works into Python3 too. I have a lot of hope!

~~~
viraptor
I don't think it is. But that doesn't mean it doesn't still work :) But yeah,
Nuitka is probably a more interesting target for new code.

