Customizing class creation in Python

andy_ppp · on March 30, 2017

I don't see why you'd need this sort of indirection. It's a very strange way of hiding behaviour from people that is currently clearly understood. Should metaclasses really be able to magic up decorators out of thin air and why is importing a decorator and using it bad?

As for __init_subclass__() it literally allows you to mess with things when subclassing objects in totally unexpected ways!

"Adjusting how classes are created can be very difficult to debug and so should only be used when you have a really legitimate use-case."

I'd say never use it - if anyone has a good reason to do magic like this I'd like to see it!

joshuamorton · on March 30, 2017

Take a look at SQLAlchemy/django's ORMs.

Personally I have two uses, one is in proprietary code, but the idea is basically use python to generate json, but allow incoherence of json pieces via natural python flows. (metaclasses specifically allowed me to reverse the flow of creating objects, so that visually, code wouldn't look like

    inner1 = JsonObject(config)
    inner2 = JsonObject(config)
    outer = JsonObject(config, children=[inner1, inner2])

but instead

    class outer(JsonObject):
        inner1 = JsonObject(config)
        inner2 = JsonObject(config)
        _config = config

ish. With this isn't not clear why this is important, but with highly nested structures, having the python code visually mirror the output in terms of nesting is really nice.

The second is at https://github.com/gtagency/pyrostest/blob/master/pyrostest/.... This isn't strictly necessary, but "The documentation states that all test_* methods in a RosTest subclass spin up roscore" is a lot simpler and safer than "if you forget to add `RosTest.setUp()` and `RosTest.tearDown()` at the beginning and end of all of your setUp and tearDown methods, your tests will fail. (and further, future invocations of tests may fail until you run something like `killall -9 roscore && killall -9 rosmaster` in shell because you've unintentionally broken your environment."

Generally speaking, Metaclasses allow you to make better user interfaces for developers, and avoid the repetitive kinds of things you encounter in "enterprisey" code. Specifically, they allow you to make interfaces much more declarative than you otherwise could (ie. the SQLAlchemy example of "this is a table and here are its columns" vs. "My table is a function that takes in some stuff and magic happens within it")

Groxx · on March 31, 2017

As further support of this argument: the Ruby community is rich with metaprogramming, sometimes going to extreme lengths, frequently for API-improvement purposes.

With a little knowledge about how it all works, it's not too obtuse, and the developer benefits can be pretty significant - ever found an API annoying, or resisted doing something "right" because it's more work or less performant than a simple option? With rich enough metaprogramming, the best option can be the simplest and most-obvious in nearly all cases.

Maybe Ruby has irreparably tainted me, but as time goes on I fall further towards "more power is more better". 99.9% of the time you won't touch it, but it's a lifesaver when it's available and you need it, and that comes up much more often in libraries. The community as a whole reaps the benefit even if less than 1% know how to use it.

andy_ppp · on March 30, 2017

Personally I find a language with Macros easier to understand and far more powerful for such things, and it's not a load of special cases that are still being added to (in this case the __init_subclass added in 3.6).

Fine about meta classes for some things but you haven't covered why you'd use the two examples in the article.

joshuamorton · on March 30, 2017

I actually wanted prepare in production, because it would have allowed me to override the class dict with an ordereddict, which was a more preferable solution to making output jsons deterministic than just sorting the components, but you do what you can.

As for init subclass, if you're alright with a DSL that does weird things with the class line, it can be useful, as an example applied to SQLAlchemy, they could do this (in pseudo sqla):

    class MyTable(Table, primary_key='uid'):
        uid = Column(int)

which would enforce via the api that there can only be 1 primary key.

andy_ppp · on March 31, 2017

I don't actually understand either example which kinda proves my point ;-)

joshuamorton · on March 31, 2017

How do you mean?

So for prepare, imagine you have a class whose entire job is to be serialized to some external format. I'll use json, but it could be csv, yaml, whatever. Heck, maybe you're doing code generation and its a group of functions that will be placed into another file.

Say I have an instance of this class:

    class JsonObject(Serializable):
        key = "value"
        second_key = "second_value"

This gets serialized to

    {
        JsonObject: 
        {
            key: "value",
            second_key: "second_value"
        }
    }

Except that sometimes what you get is

    {
        JsonObject: 
        {
            second_key: "second_value",
            key: "value"
        }
    }

Minor, but important, difference. Python stores object attributes in a dict. Python's dicts are unordered, and so when serializing, the order that things are printed in is undefined behavior. That means that now instead of just using a normal diffing module, you need to write some json-differ that parses and compares the json, and you lose the ability to do side-by-side comparisons. So you want deterministic, ordered, output generation.

Now, to be clear, you could model your api like this:

    output = JsonObject()
    output.append(key="value")
    output.append(second_key="second_value")

And that works well for this simple example, but as soon as you start nesting things, it gets confusing, so just assume that for reasons you want this DSL for code generation.

You have 3 options:

1. Create some determinism: your serialize function looks something like this (pseudopython):

    def serialize(self):
        for k, v in self.attrs:
            write(jsonify(k, v))

A fix is really easy:

    def serialize(self):
        for k in sorted(self.attrs.keys):
            write(jsonify(k, self.attrs[k]))

Not bad, but a few problems, you can't customize the output order, everything now needs to be comparable, and its a smidge slower, especially for really big objects (remember: you're writing a DSL for generating large serializable things, there's a good chance you'll want to have some way to autogenerate large quantities of data to be serialized).

2. add an `_order` attribute to your class, then your serialize method becomes

    def serialize(self):
        for k in self._order:
            write(jsonify(k, self.attrs[k]))

Well, now you have to forward declare everything, which is kinda annoying, you're populating your namespace with crap (what if your generated json/python/whatever needs a `_order` attribute!), and if you ever forget to update your order attribute, your stuff doesn't work write.

3. Replace your class's dict with an OrderedDict. Now, you've done some dark magic to do this, but you don't need to forward declare, your users control the output order naturally in a way they expect, and you don't have to sort a bunch of things every time you want to serialize any data. (admittedly python 3.6 I think voids this issue by making the class dict and OrderedDict anyway, but that's technically an implementation detail)

Does that make sense?

prashnts · on March 30, 2017

While I agree that _this_ is an extreme demonstration, metaclasses are powerful tool for developing dev-friendly Base classes/APIs in your libraries. Django models, for example, use metaclasses so the fields you define are more expressive while creating instances.

paulddraper · on March 30, 2017

That's okay as long as the magic holds. If it starts to fail, you have to peek behind the curtain, and you lose yourself among the meta.

andybak · on March 31, 2017

Django famously had a "magic removal" refactor before 1.0 as they collectively decided they had gone too far. They've since plotted a more conservative path but the ORM and similar magic was deemed reasonable.

I've found Django's metaclass usage pretty robust over the years. The curtain mostly remains unpeeked behind.

luhn · on March 30, 2017

> aside: please don't abuse collections.namedtuple to make a simple Python object; the class is meant to help porting APIs that return a tuple to a more object-oriented one, so starting with namedtuple means you end up leaking a tuple API that you probably didn't want to begin with

I haven't heard this before. I subclass namedtuples regularly—whenever I need a simple immutable object. I haven't had any problems personally. (Except when pickling them. I would not recommend that.)

Can any experienced Pythonistas weigh in? Is this bad practice?

dragonne · on March 30, 2017

The biggest gotcha is that namedtuples compare exactly like regular tuples. Thus, two namedtuple types with identical elements will compare equal even if the type and field names differ. Usually that isn't what you want.

I've taken to using attrs everywhere for exactly this reason: http://attrs.readthedocs.io/en/stable/why.html#namedtuples

TTPrograms · on March 30, 2017

I don't really see how it's a problem that it's "leaking" a tuple API - it's not like there's private methods in Python. Everything can see everything. If anything having the state of an object available explicitly as a tuple seems like a good idea to me, as opposed to trying to figure out what's a stateful variable and what's a method etc.

joshuamorton · on March 30, 2017

You'll get surprises such as your subclass being indexable when you (probably) don't want that.

lgas · on March 31, 2017

What problem would that cause? Wouldn't it only be a problem if you attempted to index into it, which you wouldn't do (on purpose) if you didn't think it was indexable? And if you did try to index into it thinking that it wasn't indexable, it would be a bug, which is the same as when it's indexable...

joshuamorton · on March 31, 2017

Exactly. You probably don't want this thing that is essentially a namespace to be indexable, but you index into it, which is a bug, but it quacks like a tuple, so your mistake propagates silently, which is a bad thing.

Its more permissive than one would expect. Although the bigger issue is what dragonne mentioned.

carapace · on March 30, 2017

> Adjusting how classes are created can be very difficult to debug and so should only be used when you have a really legitimate use-case.

...and you never have a really legitimate use-case. Seriously, please don't use this sort of magic in production code that you expect other people to use and depend on. I'm looking at you Django project. Bloddy cowboys.