Hacker News new | comments | show | ask | jobs | submit login
PEP 435 Accepted – Adding an Enum type to the Python standard library (python.org)
188 points by randlet on May 10, 2013 | hide | past | web | favorite | 106 comments

Can someone help me understand how this helps? I'm only a mediocre programmer, and I'm having trouble understanding the implications.

It seems like you declare these like any other class. So what's the difference between this, and creating a class full of ints?

If I have class Color: red = 1 blue = 2

etc, wouldn't I get the same thing?

Is the difference that you don't need to instantiate an instance of it?

I know I'm being dense, and I've read the PEP, but it's not quite clicking yet.

In this case, the attributes are not ints.. so If you defined that class functions that took it's attributes would need to accept an int, and if I passed max-int or something it'd be up to your function to go figure out whether my argument was a valid value for Color.

The enum class gives you that stuff. The argument you'd expect would be instanceof(arg, Color) == True (I think that's how they did it) and if it was, you'd be assured that the value was a valid value for Color.

So you could do it yourself, sure.. but to get all the little benefits, you'd have to do more coding than naming a bunch of ints in a new class.. It's subtle though. Not a wild game-changer or anything :)

Thanks.. That makes sense.. And I suppose it also helps satisfy the "explicit is better than implicit" edict, by making it clearer what you are doing.

In addition to what famousactress said, you also get:

    >>> Color.red
    <Color.red: 1>
So when you print out an enumeration, you get its name instead of just the number. This can be a big deal for debugging e.g. OpenGL code, which has a lot of enumerations. Which of the following would you prefer?

    >>> gl.getError()
Or would you prefer,

    >>> gl.getError()

Read the section on IntEnums[1]:

  IntEnum values behave like integers in other ways
  you'd expect:

  >>> int(Shape.circle)
  >>> ['a', 'b', 'c'][Shape.circle]
  >>> [i for i in range(Shape.square)]
  [0, 1]

[1] http://www.python.org/dev/peps/pep-0435/#intenum

The syntax strikes me as somewhat strange. It requires writing out enum values (and thus avoiding duplicates or, for a cleaner look, just keeping them sorted), necessitating unnecessary effort when new values are added in the middle, and misleadingly implies that the enum values are merely class-level integer constants. I would have preferred a more explicit API that used a list of strings for the enumerators.

As an optional implementation that'd be nice, but this syntax I assume is meant to solve two problems:

1. Making the integral values of an enum clear and unchanged by re-ordering the names. If the integral values matter to you (because you serialize enums that way), then I would want them to be explicit.

2. The integer values of these enums aren't exclusive. Two enums can reference the same value.

That said, I agree a short hand for people who don't care seems reasonable, where the integers could be auto-assigned by definition order, or just all set to 0.

I completely agree, this disgusts me:

>>> Animal = Enum('Animal', 'ant bee cat dog') >>> Animal.ant <Animal.ant: 1>

That being said, I welcome enums - I made classes to basically represent an enum a bunch of times.

The second parameter can also be a list/tuple of strings. e.g.:

>>> Animal = Enum('Animal', ('lion', 'tiger', 'liger', 'tigon'))

What disgusts you about it, in particular?

The fact that it allows whitespace-seperated words. It's just wrong.

It's weird, unnecessary, it just shouldn't be there.

Personally, I would've preferred:

>>> Enum('Example', 'foo', 'bar', 'baz')

(ie. Enum(name, *values))

That syntax is a bit unusual, but it has the clear advantage of being far less verbose. Consider this:

* The functional API is mostly for short code snippets and experiments in the shell. For real programs, the standard class syntax is preferred and recommended. * namedtuple already uses the same functional API allowing space-separated members, and it's a popular tool in the stdlib. Yes, the first time you run into it it feels a bit odd, but then you get used to it and it's not a big deal.

Looks exactly like namedtuple, why do we need this?

The only similarity is the signature; semantics are completely unrelated.

I guess I'm looking for a usecase that namedtuple (or set...) doesn't support. Here's one example:


in that example the values of the enumerated constants are simply integer values, they do not have their own type. so there is nothing stopping you comparing values from two unrelated enumerations, and getting a nonsensical True or False result, instead of a TypeError.

I suppose that's true and theoretically useful. I don't find myself coding nonsense very often, but python has not been a language that gives much support (or is appropriate) to use cases that can't tolerate any.

I now see some good uses for Enum, but (with a skeptical eye to additions) I believe that the reasons to keep the language/libs small are equally important.

Strange that the auto-assigned values start at one.

Not that strange, actually. Reserving zero allows using it as an 'undefined' enumeration value that plays well with basic Boolean tests, e.g. 'if not animal: ...' and such.

Now all I want in Python is a real switch statement...

If this was a strongly-typed language I wouldn't want enums to be nullable, but maybe people would use that in Python.

Incidentally, this makes me wonder if one could implement something similar for flags (powers of two values, overloading | and &). Though sets of enums might be sufficient.

Python is a strongly-typed language.

Perhaps you're thinking of static versus dynamic typing?

The only languages I'd consider strongly typed are static typed.

Python functions are always 'a -> 'b (any to any), the weakest possible type for a function. Same for variables (as opposed to values), they are all 'a; same for list membership or attribute membership. There is no place in the language with room for a nullable / nonnullable distinction.

A statically-typed language is one in which (1) names and values both are typed, and the only legal binding of a value to a name is when the type of the value is compatible with the type of the name; (2) all expressions have types, and only values or names of the appropriate type may be used or returned. Often but not always, some form of source-code processing prior to execution will evaluate all types in the program and determine whether these conditions are met.

A dynamically-typed language is one in which only values have types, and there are no restrictions by type on what values may be bound to particular names. Expressions may or may not have defined types, and may or may not check these types (some dynamically-typed languages do allow statements to be made about the types of functions, for example; some use these in a "static-y" way, while others treat them more as hints for runtime optimization).

Strong and weak typing are far less precisely defined. The most broadly-used criterion I've personally seen is that in a strongly-typed language, attempting to perform an operation with values whose types are incompatible with the operation will fail immediately, and will not try to implicitly coerce the values to acceptable types for the operation.

Python is generally considered strongly typed, and dynamically typed. There is no consensus that these two labels are incompatible, just as there is no consensus that weak and static typing are incompatible (C is often described as being both statically and weakly typed, for example, due to C's casting and conversion facilities).

In a dynamic language it's the values that are typed (at runtime), not variables, functions, etc. I think your interpretation of weak vs. strong is already covered by static vs. dynamic, whereas the other interpretation of weak typing is about silent coercion of types such as in Javascript and PHP where for example an integer can be concatenated to a string, or in C where a string can be interpreted as an integer with a cast.

You might consider that, but frankly, you'd be wrong. Python is unambiguously strongly (and dynamically) typed.

Python accepts:

     1 + True == False + 2

It should have to be:

     1 + int(True) == int(False) + 2

Python accepts this because bool is a subclass of int, and True and False have defined integer values (1 and 0 respectively).

Which means there's no coercion or conversion going on here. For what you think it "should" be, Python would have to break Liskov substitution for subclasses of int.

It shouldn't be a subclass of int. http://arxiv.org/abs/math/9205211

You're free to argue that. But the fact that it is, and behaves correctly as a subclass of int, does not make Python be weakly typed. In fact, rather the opposite.

Then it gets fixed, and everyone screams because it's not backward compatible (Python 3).

It should have just been part of Python 3.

Yep, but they didn't aim to or succeed in 'fixing everything' in python3, just the most pressing items.

Here's to Python 4!

The syntax is easy for us who have been using ghetto enums:

  class Colors:
      red, green, blue = range(3)

Adding new values in the middle is exactly why it needs to be explicit. If you're pickling them, for example, or otherwise recording them, you certainly care about the values. Auto-assignment is often convenient, but it certainly must never be the only way to do it.

Similar to this?


Having to throw in a " ".join(...) isn't that bad.

No need for join().

> It can be a whitespace-separated string of names, a sequence of names, a sequence of 2-tuples with key/value pairs, or a mapping (e.g. dictionary) of names to values.

Even better. The many levels of skimming on display.

Would love to see functionality here similar to the iota keyword in Go.

I prefer the Michael Ford proposed syntax and would gladly port the implementation if possible. Having to specify either an ordinal number like in BASIC of 1980 or have to specify symbols as strings looks dodgy to me.

Since Enums are mostly unmagical (apart from the member wrapping), you can use a compact syntax:

  class Colors(Enum):
      red, green, blue = range(3)


    red, green, blue = itertools.count()
work too?

If it works in a Python shell, it will work for an enum (this doesn't).

The PEP mentions these explicit values allow the creation of aliases in a very natural way.

  class Keys(Enum):
      space = 1
      esc = 2
      enter = 3
      return = 3 # Alias to "enter".
Also, less magic.

return = 3 shouldn't work.

you know what they meant...

Michael Ford's proposed syntax looks much nicer. I don't understand why the enum values have to be assigned when they are first looked up. Why couldn't the commas in the definition signal that the enum members should have automatic values assigned at the definition?

Doesn't the 'functional API' make up for it ? `Animal = Enum('Animal', ['ant', 'bee', 'cat', 'dog'])`

The functional API could stand to be improved a bit:

    >>> Animal = Enum(['ant, 'bee', 'cat', 'dog'])
    >>> Animal.ANT
I'm not convinced it's useful to pass the class name into the functional API, but I would see value in having an easy API to create enumerated strings. It's easier to see what's going on in other environments (e.g. in a datastore, or in JavaScript) if you pass around string identifiers instead of ints.

Here's the class I've been using to do that:

    class Constants(object):
        def __init__(self, names, enum = False):
            self._names = frozenset(names)
            if enum:
                for (i, name) in enumerate(self._names):
                    setattr(self, name.upper(), i)
                for name in self._names:
                    setattr(self, name.upper(), name.lower())
            self.frozen = True
        def keys(self):
            return [name.upper() for name in self._names]
        def values(self):
            return [getattr(self, name.upper()) for name in self._names]
        def __setattr__(self, key, value):
            if getattr(self, 'frozen', False):
                raise Exception("Constants cannot be modified after instantiation")
                object.__setattr__(self, key, value)

    >>> ImportMethod = Constants(
    >>>     [
    >>>         'email', 
    >>>         'manual_upload', 
    >>>         'api',
    >>>     ]
    >>> )
    >>> print(ImportMethod.API)

There is a technical reason why the class name is passed into the API, and it's the same for other functions which create types (collections.namedtuple(), and type(), for example). You want to be able to print out the class name when you print an enum, but there is no other reliable way to figure out the class name. In Python, a class is just another value, like functions and integers and its name is nothing special. When you define a class normally, the name and dictionary get passed to the type() function.

I understand that, but I'm not sure it matter. Is it worth the ugliness in the API to print out the class name when you run print? Is 'api' that much worse than "ImportMethod.API"?

That's not all it's there for. Python programmers expect to be able to do things like print(obj.__class__.__name__). You're definitely free to write your own alternative Enum, however, since it's just a library function.

I don't think that enums should have values (i.e. numbers).

Nearly every language does it that way but isn't that just because C originally implemented enums with numbers? Modern languages are much higher level.

Java enums don't have any numeric equivalent value, they are just objects, with textual names if you must convert them into something more primative.

Java enums have int ordinal values: http://docs.oracle.com/javase/7/docs/api/java/lang/Enum.html...

I think what you mean though is that in 99.9% of cases a user of a Java enum never needs to (or shouldn't) refer to the ordinal value.

It can actually be pretty useful to be able to add them up and then use them in or statements.

Examples from C#:


It's also used to be sort of useful for serialization or storing them in a DB as they'd take up less space than a string, but these days that's not so important and I'd actually caution against it.

Basically you don't need it often, but when you do it's extremely useful.

Enum values don't have to be ints, they can have any value you want. The explicit values are there because Python's "explicit is better than implicit" zen, and the functional API gives you a way to create enums without assigning explicit values if you really want to.

I always thought the fact that Java enums don't have numeric values ended up being a nice idea, but a mistake.. specifically because .ordinal was still exposed, and ends up getting used by people who want to serialize the enum as something other than a string... .ordinal of course is affected by the order the enum is defined in, which is sketchy.

My memory could be a little faded on this though, I haven't coded in Java in a few years..

if you use "None" with this PEP, you'll get the same effect.

One approach would be to set values like "red = object()". I use that often to create guaranteed-unique sentinel values.

My approach was this:

class UniqueValue: pass


The user doesn't necessarily need to have access to the raw numbers I guess, but when it comes down to comparisons/branching etc, numbers are pretty much always going to be fastest and simplest.

And if they are going to be based on numbers underneath, why not let the user/coder/whatever have access to them?

Everything is numbers underneath, so you could ask the same question about all data types. For example, if pointers are going to be based on numbers underneath, why not let the programmer have access to them? Lots of good reasons, actually, which is why most languages don't provide that access. Many of those reasons apply to enumerations too.

Well, you've outed me as a C programmer there I guess :)

I don't really subscribe to the idea that languages should force people onto rails. I love higher level languages for their rapid development of course, and the high level constructs. But there's something about being able to get your hands dirty, right in the guts of a problem...

As an option, it can be handy in some cases (less in Python than C, but still somewhat); requiring that you write them out for all enums, however, is somewhat annoying. C itself gets this right: you can write '= 1' to choose a value or leave it out to assign one automatically, and in either case you can convert to an integer (actually, they're integers in the first place, but that's C).

Doing binary arithmetic on enums can be very useful! First | Second = Both.

That's what enum flags in c# are for, it's a very slick feature.

  Enumerations support iteration, in definition order:
  >>> class Shake(Enum):
  ...   vanilla = 7
  ...   chocolate = 4
  ...   cookies = 9
  ...   mint = 3
  >>> for shake in Shake:
  ...   print(shake)
Python's execution model sez that the class declaration is nothing more than code that is exec'd in a dictionary. We know that Python dictionaries do not perserve order, so why is this iteration in definition order possible, barring modification to the interpreter or dict type ?

It's possible by using a __prepare__ method in the metaclass of Enum, which allows to return an essentially ordered dict for the __dict__ of enumeration classes.

I guess this is new to Py 3k? Thanks for the tip!

Right, PEP 3115. Indeed one of the reasons is that "there is an important body of use cases where it would be useful to preserve the order in which a class members are declared."


There is an OrderedDict class in the collections module of standard library. Enums store their members in a special __members__ attribute, which is such an OrderedDict. See this section of the PEP: http://www.python.org/dev/peps/pep-0435/#duplicating-enum-me...

You more or less ignored the contents of my comment w.r.t. execution model. I am well aware of OrderedDict, and I still do not understand how this behavior is possible unless Python 3 differs from 2 in the kind of dictionary used in code block executions.

There's no need to modify the interpreter. Redefining __dict__ to support ordering would be enough, or perhaps defining a purpose-built __metaclass__[1].

[1] http://stackoverflow.com/a/6581949/183481

No-go on metaclass, the received type for metaclass __new__ method's members argument is a plain dict.

Unless this is changed in Py 3, I still don't understand how this works.

Here's an example from the Python 3 docs[1] that specifically uses metaclass for exactly this scenario:

    class OrderedClass(type):

        def __prepare__(metacls, name, bases, **kwds):
            return collections.OrderedDict()

        def __new__(cls, name, bases, namespace, **kwds):
            result = type.__new__(cls, name, bases, dict(namespace))
            result.members = tuple(namespace)
            return result

    class A(metaclass=OrderedClass):
        def one(self): pass
        def two(self): pass
        def three(self): pass
        def four(self): pass

    >>> A.members
    ('__module__', 'one', 'two', 'three', 'four')

[1] http://docs.python.org/3/reference/datamodel.html#metaclass-...

When will this ship in python? I don't know a lot about the typical time between a PEP acceptance and a release containing the feature.

We expect the implementation will be committed in the next few weeks, and you'll be able to give it a try using Python's tip revision ("trunk"). It will actually _ship_ with Python 3.4

Before March next year: http://www.python.org/dev/peps/pep-0429/.

Python has been getting bloated in recent years. Why not just do the following?

  def enum(*args):
      return dict(zip(args, range(len(args))))

  colors = enum('red', 'green', 'blue')

  # {'blue': 2, 'green': 1, 'red': 0}

There are a few things this proposal does that improves upon yours:

+ Enum values of different types are incomparable. This is widely seen as a good thing. + Enum values are instances of their Enum; this allows more flexibility in user code. + As a result of the above, you can have a dictionary without Enum values of two different types colliding. + Enum values print in a more friendly way. This is expected to help debugging. + To support the above, enum values know their own name. This is likely helpful both for debugging and for various introspection hacks. + Enums can be iterated over in a fixed order. This allows automated help systems and similar to maintain consistency between runs, improving user experience. + There's a lot more error checking provided to avoid cases like defining the same enum value twice.

But I understand your sentiment. We always enjoy smaller languages because it helps us keep them in our heads. But note that Enums aren't a new language feature -- this PEP simply adds another module to the standard library. The code your provide, flawed as it is, is still a pattern common to many different libraries, so it would be good to put it into the stdlib; but if we're doing that, might as well do it right, don't you think?

Fair enough. The enums themselves aren't a problem; they just seem symptomatic of a pile-on-more-stuff mentality.

"Python has been getting bloated in recent years"

Didn't Python 3 do more removing/tidying up than actually adding of stuff?

Interesting that this is PEP 435 and the prior attempt was PEP 354.

The permutations will continue until the PEP improves.

Yes, I noticed that curiosity :-) Not intentional!

Luckily there's no need for more permutations - this PEP was accepted. I expect PEPs 534 and 543 to surface at some point, but it isn't likely they will deal with enums.

Stupid me, I was using enums the wrong way. I was using an enum like a constant.

I.E. If my database takes a 'Sex' parameter, with Male as 1, Female as 2, Unknown as 3, Both as 4 - I'd use an enum like so:

    update person set sex=Sex.Male
Looks like I can't do that with this enum class. Well, I suppose I'd have to do it like so:

    sex = Sex.Male.value
not very sexy...

Wouldn't IntEnum fit your case better?

Could you not use mock.sentinel values as an enum? Also, you can create a poor man's enum from 2.7+ with:

    from collections import namedtuple
    myenum = namedtuple('myenum', ['a', 'b', 'c'])(range(3))

    myenum.a == myenum.a

That's not even a poor man's enum, it's just an enum. Named tuples can be used for everything an enum can be. This doesn't seem like a necessary PEP, but I've made my peace a long time ago about Python's bloated standard library. It's not the end of the world, and I still love Python, but this doesn't seem necessary to me.

You don't have methods and you don't have a proper repr(). Also, even if you don't assign numbers, the fields are still ordered (which they shouldn't be).

What purpose would Enums serve in a dynamically-typed language?

Read the PEP motivation :-)

It's still useful to have mnemonic names for special values, with nice string representations and some guarantees on what is equal to what.

Even though dynamically typed, it still have the concept of types. There are good examples on how it works in the PEP.

And more the the point, Python is strongly typed in the sense that it has very few automatic conversions between types, in contrast to, say, Perl. As the PEP says, this means that you can't accidentally compare an enum value of one type to an enum value of another without a runtime error occurring, instead of an infuriatingly subtle bug.

Python may defer its type checking to run time, but it still has it.

Seems a bit dodgy that enum values can be any value. What happens if you use mutable values and later change them to be the same?

Objects can be mutable or immutable. The value of a mutable object can change to another value, but it's not that the value-in-itself is mutable.


So can I declare an enum value to be e.g. a list? Is that value "frozen" when the enum declaration is processed or what?

A list is not a value, it is a type of object. The second to last paragraph of the section in the reference manual I linked may help:

  Some objects contain references to other objects; these are called
  containers. Examples of containers are tuples, lists and dictionaries. The
  references are part of a container’s value. In most cases, when we talk
  about the value of a container, we imply the values, not the identities of
  the contained objects; however, when we talk about the mutability of a
  container, only the identities of the immediately contained objects are
  implied. So, if an immutable container (like a tuple) contains a reference
  to a mutable object, its value changes if that mutable object is changed.
PEP-435 says enums are bound to "unique, constant values" -- not that you can bind enums to arbitrary types of objects.

You could say that about a lot of things in Python. If you add a constant value as a class attribute, anyone using that class has the ability to change it, even if that would screw things up.

> Iterating over the members of an enum does not provide the aliases

Anyone know the motivation for this? Seems like a source of frustration to me.

The aliases aren't part of the enum, if you think of the enum as a set of values. Iterating is done on the values, not the names.

(the next paragraph gives you the __members__ ordered dict, if you want to iterate on something else)

Barry gave a good talk about this at DCPython on Tuesday.

Interesting. It looks like if you want to use enum in 2.7 you have to use a package called flufl[1].

[1] http://bazaar.launchpad.net/~barry/flufl.enum/trunk/view/hea...

flufl.enum was the initial candidate for PEP 435, but the current approach changed and uses implementation techniques not available in Python 2.7

We may create a almost-compliant back-port for 2.7 as an external library though

You sure it's not better spending the energy elsewhere? Python 3 needs more exclusive features to encourage people to migrate from Python 2.

"Almost compliant" would mean compliant except for guaranteeing the correct order of iteration, would it not?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact