Hacker News new | past | comments | ask | show | jobs | submit login

How do you mean?

So for prepare, imagine you have a class whose entire job is to be serialized to some external format. I'll use json, but it could be csv, yaml, whatever. Heck, maybe you're doing code generation and its a group of functions that will be placed into another file.

Say I have an instance of this class:

    class JsonObject(Serializable):
        key = "value"
        second_key = "second_value"
This gets serialized to

    {
        JsonObject: 
        {
            key: "value",
            second_key: "second_value"
        }
    }
Except that sometimes what you get is

    {
        JsonObject: 
        {
            second_key: "second_value",
            key: "value"
        }
    }
Minor, but important, difference. Python stores object attributes in a dict. Python's dicts are unordered, and so when serializing, the order that things are printed in is undefined behavior. That means that now instead of just using a normal diffing module, you need to write some json-differ that parses and compares the json, and you lose the ability to do side-by-side comparisons. So you want deterministic, ordered, output generation.

Now, to be clear, you could model your api like this:

    output = JsonObject()
    output.append(key="value")
    output.append(second_key="second_value")
And that works well for this simple example, but as soon as you start nesting things, it gets confusing, so just assume that for reasons you want this DSL for code generation.

You have 3 options:

1. Create some determinism: your serialize function looks something like this (pseudopython):

    def serialize(self):
        for k, v in self.attrs:
            write(jsonify(k, v))
A fix is really easy:

    def serialize(self):
        for k in sorted(self.attrs.keys):
            write(jsonify(k, self.attrs[k]))
Not bad, but a few problems, you can't customize the output order, everything now needs to be comparable, and its a smidge slower, especially for really big objects (remember: you're writing a DSL for generating large serializable things, there's a good chance you'll want to have some way to autogenerate large quantities of data to be serialized).

2. add an `_order` attribute to your class, then your serialize method becomes

    def serialize(self):
        for k in self._order:
            write(jsonify(k, self.attrs[k]))
Well, now you have to forward declare everything, which is kinda annoying, you're populating your namespace with crap (what if your generated json/python/whatever needs a `_order` attribute!), and if you ever forget to update your order attribute, your stuff doesn't work write.

3. Replace your class's dict with an OrderedDict. Now, you've done some dark magic to do this, but you don't need to forward declare, your users control the output order naturally in a way they expect, and you don't have to sort a bunch of things every time you want to serialize any data. (admittedly python 3.6 I think voids this issue by making the class dict and OrderedDict anyway, but that's technically an implementation detail)

Does that make sense?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: