Hacker News new | comments | show | ask | jobs | submit login
Marshmallow: Simplified object serialization for Python (github.com)
91 points by sloria on Dec 24, 2015 | hide | past | web | favorite | 15 comments

Used it on a huge project where I wanted the interfaces clean between the Python project at runtime and to serialize the Python objects to the database. Using pickling gave me sleepless night until I moved to Marshmallow. Whilst Django REST Framework comes with a nice serializer and parser modules, marshmallow's ability to do just the part was a godsend.

This alone gave us the flexibility to expose the Python modules and objects as a simple JSON API and the DB load / save came for free.

Highly recommended.

can you talk about the usecase a little more. I'm trying to figure out why you would use marshmallow instead of pickling.. especially in context of building REST api.

I use Flask and I am not sure where pickling comes in. I have built a desktop application in pyqt though and the multiprocessing modules need pickle-able data.

If a Python class overrides __getattr__() or has a complex deeper hierarchy of inheritance, pickle sometimes fails in spectacular ways. Sometimes the fixes were extremely painful and sort of nullified the purpose of having a __getattr__() in the first place. More importantly, the serialized data that pickle generates (1) is in python bytecode and is not easy to humanly reason with during debugging (2) a simple object will get inflated because of a deeper inheritance tree.

However by using marshmallow for serializing, I found the resultant JSON output a much more manageable output format to 'reason with'. In my specific case where the lifetime of a python object could be extended over multiple sessions and pass through the runtime -> save -> db -> load -> runtime barrier, JSON was a hugely meaningful choice. The project used a graph of connected Python objects whose relations & states needed to be retained over time and memory barriers.

With a general HTTP REST API, I find that bundling all the related records of an entity in a single API call saves up the roundabout time for the client. In this case, instead of hand building a dict and then generating a json.dumps() I find using marshmallow a better choice - the declaration of the serializer itself, reflects the structure & the entity relations clearly.

Just wanted to say, this project has some of the best documentation around. Really happy to have this library around :)

Marshmallow is really useful. +1. If you like Marshmallow, you might also like Pilo (https://github.com/eventbrite/pilo), which solves similar problems. Marshmallow excels at ORM object serialization. Pilo is really good at parsing JSON into Python objects, and has several features to support this, such as polymorphic downcasting and programmable parsing via hooks. Marshmallow is also very good at parsing/validating Python dictionaries, but in my experience, Marshmallow's API is more focused on serializing objects into dictionaries so that you can call json.dumps on the output dictionary.

I'm also a fan of Schematics (https://schematics.readthedocs.org/en/latest/) which is more or less identical in intent

It looks good but using it in a naive manner to interface with a database, as shown in the example, without binding variables, leaving you totally open to injections is quite bad. What would be good is a lightweight binding to sqlalchemy, leaving the dangerous part to the expertsm. That said, there is a real place for this library and quite a few new ones have popped up in the last year.

It's definitely some bad example code, but it's not really related to the use of the library

Maybe object versioning can get added as this gains more and more usage... That will really make this a library that IMHO is production worthy; large applications can't be easily upgraded in one-go and having a library that helps with translating across versions would be super.

https://github.com/marshmallow-code/marshmallow/issues/171 (if anyone is interested).

Awesome project. Using it on several customer projects (non public code) and probably soon on Abilian Platform.

I really hate this property magic, it was bad in the Django ORM and it'll be bad in Marshmallow too. It looks nice in the examples, but then when you have to use it in anger and a certain problem requires metaclasses to fix...

Please elaborate. I share your sentiment, but I also see the value in this kind of api.

has anyone used ANY object serialization library (other than pickle) with multiprocessing ?

I just cant get anything else to work. We have a sophisticated desktop pyqt application that could really do with a better serialization with multiprocessing.

I have heard of Pathos[1] to replace multiprocessing - but never gave it a try.

[1] https://github.com/uqfoundation/pathos

And then there is clojure where you just keep things as data all the time.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact