> Then the obvious question is: Why? Why use pickle? The most likely answer is “because <X> can’t represent what I need to transmit”, but for that to be at all useful to your proposal, you need to show examples that won’t work in well-known safe serializers.
Pickle still is good for custom objects (JSON loses methods and also order), Graphs & circular refs (JSON breaks), Functions & lambdas (Essential for ML & distributed systems) and is provided out of box.
We're contemplating protocols that don't evaluate or run code; that rules out serializing functions or lambdas (i.e., code).
Custom objects in Python don't have "order" unless they're using `__slots__` - in which case the application already knows what they are from its own class definition. Similarly, methods don't need to be serialized.
A general graph is isomorphic to a sequence of nodes plus a sequence of vertex definitions. You only need your own lightweight protocol on top.
Because globals(), locals(), Classes and classInstances are backed by dicts, and dicts are insertion ordered in CPython since 3.6 (and in the Python spec since 3.7), object attributes are effectively ordered in Python.
Object instances with __slots__ do not have a dict of attributes.
__slots__ attributes of Python classes are ordered, too.
Are graphs isomorphic if their nodes and edges are in a different sequence?
assert dict(a=1, b=2) == dict(b=2, a=1)
from collections import OrderedDict as odict
assert dict(a=1, b=2) != dict(b=2, a=1)
To crytographically sign RDF in any format (XML, JSON, JSON-LD, RDFa), a canonicalization algorithm is applied to normalize the input data prior to hashing and cryptographically signing. Like Merkle hashes of tree branches, a cryptographic signature of a normalized graph is a substitute for more complete tests of isomorphism.
Also, pickle stores the class name to unpickle data into as a (variously-dotted) str. If the version of the object class is not in the class name, pickle will unpickle data from appA.Pickleable into appB.Pickleable (or PickleableV1 into PickleableV2 objects, as long as PickleableV2=PickleableV1 is specified in the deserializer).
So do methods need to be pickled? No for security. Yes because otherwise the appB unpickled data is not isomorphic with the pickled appA.Pickleable class instances.
One Solution: add a version attribute on each object, store it with every object, and discard it before testing equality by other attributes.
Another solution: include the source object version in the class name that gets stored with every pickled object instance, and try hard to make sure the dest object is the same.
> There should be a data-only pickle serialization protocol (that won't serialize or deserialize code).
> How much work would it be to create a pickle protocol that does not exec or eval code?
"Title: Pickle protocol version 6: skipcode pickles" https://discuss.python.org/t/create-a-new-pickle-protocol-ve...
reply