Hacker News new | past | comments | ask | show | jobs | submit login
Python 3.7: Introducing Data Classes (jetbrains.com)
407 points by ingve on April 18, 2018 | hide | past | favorite | 158 comments



As noted in the PEP data classes is a less fully-featured stdlib implementation of what attrs already provides. Unless you’re constrained to the stdlib (as those who write CPython itself of course are) you should consider taking a look at attrs first.

http://www.attrs.org/en/stable/


Less fully-featured, sure, but also a bit more cleanly designed (in my opinion). Are there features from attrs that you would miss with dataclasses?


This is spot on. The design of attrs remines me a little bit of the syntax from a declarative ORM, for example. I'm sure it can do very powerful things that I've not had occasion to use, but it is heavy. The @dataclass format is very clean and seems more like the syntactic sugar that I expect from Python.

One of the prime uses of a dataclass is to be a mutable namedtuple. And the syntax can be almost identical:

    Part = make_dataclass('Part', ['part_num', 'description', 'quantity'])
(from Raymond Hettinger's twitter)

This has the added benefit of not requiring type hinting, if you don't want to bother with such things.


in a freshly created venv, with attrs and ipython (and deps, of course):

  (attrs) % ipython
  Python 2.7.14 (default, Oct 19 2017, 23:28:49)
  Type "copyright", "credits" or "license" for more information.
  
  IPython 5.6.0 -- An enhanced Interactive Python.
  ?         -> Introduction and overview of IPython's features.
  %quickref -> Quick reference.
  help      -> Python's own help system.
  object?   -> Details about 'object', use 'object??' for extra details.
  
  In [1]: import attr
  
  In [2]: Part = attr.make_class("Part", ["part_num", "description", "quantity"])
  
  In [3]: Part?
  Init signature: Part(self, part_num, description, quantity)
  Docstring:      <no docstring>
  Type:           type

Outside of the name of the thing that creates the item, the syntax is identical to what you have here.

edit: removed triple backticks, indented with 2sp according to https://news.ycombinator.com/formatdoc rules


attrs also has a feature that dataclasses don't currently [0]: an easy way to use __slots__ [1].

It cuts down on the per-instance memory overhead, for cases where you're creating a ton of these objects. It can be useful even when not memory-constrained, because it will throw AttributeError, rather than succeeding silently, if you make a typo when assigning to an object attribute.

0: https://www.python.org/dev/peps/pep-0557/#support-for-automa...

1: http://www.attrs.org/en/stable/examples.html#slots


Does that still matter now that PEP 412 (https://www.python.org/dev/peps/pep-0412/) is implemented in Python 3.3 and newer?


No, I wouldn't bother with __slots__ in 3.7, especially with the newly optimized dict.


PEP 412 makes __dict__s more memory efficient than they were before, but not more efficient than no __dict__, which is the point of __slots__. The following program demonstrates the difference. Note that it lowers the available address space to 1GB so that memory exhaustion occurs sooner, and thus only works on UNIX-like systems that provide the resource module.

  import resource
  import sys

  class WithoutSlots:
      def __init__(self, a, b):
          self.a = a
          self.b = b

  class WithSlots:
      __slots__ = ('a', 'b')
      def __init__(self, a, b):
          self.a = a
          self.b = b

  resource.setrlimit(resource.RLIMIT_AS, (1024 ** 3, 1024 ** 3))
  cls = WithSlots if sys.argv[1:] == ['slots'] else WithoutSlots
  count, instances = 0, []
  while True:
      try:
          instances.append(cls(1, 2))
      except MemoryError:
          break
  count = len(instances)
  del instances
  print(cls, count)

Here are numbers from my laptop:

  $ python3.6 /tmp/slots.py 
  <class '__main__.WithoutSlots'> 5830382
  $ python3.6 /tmp/slots.py slots
  <class '__main__.WithSlots'> 16081964

That's almost 3x more instances with __slots__! This isn't the case with PyPy, though, thanks to a more efficient representation of objects:

https://morepypy.blogspot.com/2010/11/efficiently-implementi...


That's a silly example. If you're making billions of integers, use NumPy. If it's just one pass, use a generator. If you're making lots of objects with more interesting attributes, the attribute storage will overwhelm the difference the instance dicts make.

My point was not that __slots__ does nothing, but that there are more important things to worry about.


Suppose I want to run algorithms on large arrays of 2D points while maximizing readability. I want to store the x and y coordinates using Python integers so I don't have to worry about overflow errors, but I expect that most of the time the numbers will be small and this is "just in case".

I claim that in this case, __slots__ is exactly the right thing to worry about.


It's hard for me to imagine that situation coming up, but yes, __slots__ does indeed have a purpose.

BTW, have you considered using the complex type to handle that for you? It's 2d and ints should be safe in float representation. If it overflows it'll crash nicely.


Good one. But let's say I want something mutable, so complex won't do.


That's an interesting example, and thanks for demonstrating it with a modern version! I definitely wouldn't have expected that result.


Does attrs support type hints? I didn't see it in a quick skim...

One thing the stdlib implementation has going for it: better naming. attr.ib() is not exactly crystal-clear.


You can `from attr import attrib, attribs` and use those instead of `@attr.s` and `attr.ib()`.




Yep, I’ll add the URL in my comment, sorry about that.


Raymond Hettinger had a pretty good presentation on Data Classes and how they relate to things like named tuples and a few recipes/patterns. It was linked on Reddit[0] but it looks like the video has been removed from YouTube. His slides are online[1], though.

[0] https://www.reddit.com/r/Python/comments/7tnbny/raymond_hett...

[1] https://twitter.com/i/web/status/959358630377091072


I love using attrs, like the idea of bringing something similar to the standard library, but strongly disagree with the dataclasses API. It treats untyped Python as a second class citizen.

This is what I'd prefer

  from dataclasses import dataclass, field

  @dataclass
  class MyClass:
    x = field()
but it produces an error because fields need to be declared with a type annotation. This is the GvR recommended way to get around it:

  @dataclass
  class MyClass:
    x: object
You could use the typing.Any type instead of object, but then you need to import a whole typing library to use untyped dataclasses. I highly prefer the former code block.

There's a big thread discussing the issue on python-dev somewhere. Also some discussion in https://github.com/ericvsmith/dataclasses/issues/2#issuecomm...

Anyway, it's not a huge issue—attrs is great and there's no reason not to use it instead for untyped Python.


Yeah, it seems strange to force people to use type hints when it has had such a mixed reception. I really tried to use type hints with a new project a few months ago, but ended up stripping it all out again because it's just so damn ugly. I wish it were possible to fully define type hints in a separate file for linters, and not mix it in with production code. It's kind of possible to do it, but not fully [1], and mixing type hints inline and in separate files is in my opinion even worse than one or the other.

[1] https://stackoverflow.com/questions/47350570/is-it-possible-...


I've always wanted a programming UI similar to RapGenius's UI. With annotations and docs being opened in a form panel.


It's great that we have simple/clean declarations for NamedTuples an (Data)classes now. But I wonder why they chose two different styles for creating them. This for NamedTuples:

    from typing import NamedTuple

    class Foo(NamedTuple):
        bar: str
        baz: int
and this for DataClasses:

    @dataclass
    class Foo:
        bar: str
        baz: int


When you write it that way it makes me wonder why there isn't a DataClass type


The short answer is that the only way to do what dataclasses do as a base class is via python metaclasses, and you can only have one metaclass. So this way, you can dataclassify something that inherits from a metaclass.


I love that Peter Norvig left an improvement to the __post_init__ method in the comments section of the JetBrains blog. I wonder if he uses PyCharm?


I'm happy to see data classes. I think something like this exists in 3.6:

    class Person(typing.NamedTuple):
        name: str
        age: int
But I don't think it supports the __post_init__; however, constructors have no business doing parsing like this anyway, so unless I'm missing something, deriving from `typing.NamedTuple` seems strictly better than `@dataclass` insofar as it seems less likely to be abused.


Tuples are read-only.


Ah, of course. Good point. I tend to write things in an immutable style, so I don't usually pay attention to this.


I see immutability as a feature.

There are uses cases for mutability, but it should be opt-in, not opt-out. So I'm not loving the fact that "frozen" (the mutability param for data classes) defaults to False.


Python is very much not immutable by default, it'd be weird to subvert it in a specific case.


I agree, which is part of the reason I think I'd be inclined to derive from NamedTuple and forego data classes.


Also, AFAIK you cannot add methods to NamedTuple, making it much less flexible than data classes.


One of the examples in the documentation uses a method. https://docs.python.org/3/library/typing.html#typing.NamedTu...


Yeah, you're right. You just cannot overwrite __init__:

> AttributeError: Cannot overwrite NamedTuple attribute __init__


Coming from C++ it feels really weird that you can simply assign instance.new_name = value from anywhere without properly declaring it beforehand. You also never really know what you get or if somebody modified your instance members from the outside.


I can only imagine how weird it must seem that you can override methods of instance objects and even classes, or even replace a whole class of an instance with another.

    >>> class Foo:
    ...     def bar(self):
    ...         print('foo')
    ... 
    >>> class Baz:
    ...     def bar(self):
    ...         print('baz')
    ... 
    >>> f = Foo()
    >>> f
    <__main__.Foo object at 0x7fa311e7a278>
    >>> f.bar()
    foo
    >>> f.__class__ = Baz
    >>> f
    <__main__.Baz object at 0x7fa311e7a278>
    >>> f.bar()
    baz


Does that work even if the types had fields? What about it the fields had a different total size? What if Baz had no parameterless constructor (I.e only had a contractor that guaranteed arg > 0 for example)?

Is this like an unsafe pointer cast where “you are responsible, and it will likely blow up spectacularly if you don’t know what you are doing” or is it something safer that will magically work e.g with types of different size?


Inline:

- Does that work even if the types had fields? Yup!

- What about it the fields had a different total size? Totally fine!

- What if Baz had no parameterless constructor (I.e only had a contractor that guaranteed arg > 0 for example)? Then you throw an exception when you call the constructor.

- Is this like an unsafe pointer cast where “you are responsible, and it will likely blow up spectacularly if you don’t know what you are doing” or is it something safer that will magically work e.g with types of different size?

Mostly the former, but if you're coming from a strongly typed compiled language, it may feel like a bit of the latter too, since if you don't run into any obvious runtime incompatibilities, it'll all "just seem to work" even if the underlying classes are 200% different.

(Disclaimer, I've been using python for ~decade now but am still always nervous to speak authoritatively about it, since I work with peers who are FAR deeper in the actual implementation than I am, and I run the risk of being subtlety incorrect.)


> Then you throw an exception when you call the constructor.

I was assuming that doing

   myobj.__class__ = whatever
Would not call any constructors?


You're correct, sorry, I was not clear. The actual value setting that you cite will work pretty much regardless, you could do __class__ = "Q" for all python cares.

The problems would come later when you try to use any functionality of the new __class__ in the manner of the old __class__ for which the new one is not compatible. e.g.:

    class A:
        def go(self):
            print("A ran!")

    class B:
        def go(self):
            print ("B ran!")

    class C:
        def go(self, foo):
            print ("C ran!", foo)

    a = A()
    a.__class__ = B
    a.go()

    B ran!
    In [11]:

    a = A()
    a.__class__ = C
    a.go()

    ---------------------------------------------------------------------------
    TypeError                                 Traceback (most recent call last)
    <ipython-input-11-b67fe8fb94e1> in <module>()
          1 a = A()
          2 a.__class__ = C
    ----> 3 a.go()

    TypeError: go() missing 1 required positional argument: 'foo'


Thanks. The constructor thing can be illustrated similarly (found an online compiler, I don't normally python...).

    class A:
        i = -1
        def __init__(self, num): 
            if num <= 0:
                raise(Exception('noo!'))
        def go(self):
            print('A has value {}'.format(self.i))

    class B:
        def go(self):
            print ("B ran!")

    b = B()
    b.go()

    b.__class__ = A
    b.go()


This is one of my favorite explorations of the crazy things that are possible in Python: https://www.youtube.com/watch?v=H2yfXnUb1S4


JS & PHP let you do this as well. One advantage is that you don't have to adhere to a rigid class structure and be forced to refactor or create a new class every-time you need add a new property or method. And sometimes you want a property/method for just that particular instance, and not all members.

As with most things, there are trade-offs.


> One advantage is that you don't have to adhere to a rigid class structure and be forced to refactor or create a new class every-time you need add a new property or method.

I wouldn't qualify this as an advantage; it encourages bad code and it precludes a lot of good tooling (including tooling which would automate the sort of refactoring you'd like to avoid).


Well this sort of argument has been going on for several decades between the dynamic and static proponents.


The gradual typing movement is a concession from the dynamic community that there is indeed value in formalism. BTW, I'm a professional Python developer.


Just like introduction of generics and interfaces to static languages is a concession that there is value in "informalism" (if that's a word).

Basically, there are values in both. There is no silver bullet, these are all different tools in our belt, and some work better in some situations and others in others -- hammers and screwdrivers.

One thing I love about Python is that it allows a lot of different tools and methods, allowing you to select what works in a given situation. Many of those tools are far from perfect, but they get the job done in a very satisfying manner, more often than not.


> Just like introduction of generics and interfaces to static languages is a concession that there is value in "informalism" (if that's a word).

You have it exactly backwards. Generics and interfaces are commitments to formalism and the promises of static typing. Before that there was ‘void’, the canonical dynamic type. ‘void’ (or ‘Object’ in Java and other static OOPs) became less common, not more.

If there is value in dynamic types, I’ve scarcely seen it, (and I use Python every day). I’m told that dynamic typing really shines through Clojure and other lisps, but I haven’t gotten to that level yet.

> One thing I love about Python is that it allows a lot of different tools and methods, allowing you to select what works in a given situation. Many of those tools are far from perfect, but they get the job done in a very satisfying manner, more often than not.

This is a nice property when you want to play around with a new paradigm without learning a new syntax and toolchain, but when you’re working on a team, agreeing about the paradigm and features and style quickly becomes tedious.


One spot where I think that dynamic typing shines is dealing with anything from the schema-on-read approach to things, including reading JSON.

In Python, for example, you can just parse your JSON or XML into a dict and start grabbing what you need, typically with fairly minimal hassle involved in dealing with things like missing values or some clown sticking a string into a field you thought could only contain ints.

In Java, by contrast, ugh. You can laboriously litter your code with a whole mess of bean classes and have Jackson take care of the parsing. But if you want to obey Postel's Law, this quickly turns into a foamy quagmire of beans and annotations and whatnot that's at least an order of magnitude larger than the code that actually interacts with the input, and still throws an exception the moment someone populates the timestamp field with ISO 8601 instead of seconds since 1970/01/01. Or leaves it unpopulated. Or alternatively you can ask for a Map<String, Object>. That approach isn't actually any more strongly typed than what a dynamic language offers, but it does have the advantage of justifying the price of that fancy ergo keyboard.

I'm not a big dynamic typing fan overall, but I do think that the .NET folks were on to something when they added the DLR extensions to .NET and gave us ExpandoObjects.


I actually disagree here. I can't speak to Java, but dealing with JSON in Python is actually pretty tedious compared to Go. For example, with Go, if I have a type `Person struct { Name string, Age int}`, I can pass that to `json.Marshal()` (the Go equivalent of `json.dumps()` in Python) and it will render the expected JSON. In Python, we have to marshal everything explicitly (with a `to_dict()` method or equivalent).

And Go isn't even a particularly good example of JSON handling in statically typed languages; OCaml, for instance, does a much better job.


> You have it exactly backwards. Generics and interfaces are commitments to formalism and the promises of static typing. Before that there was ‘void’, the canonical dynamic type. ‘void’ (or ‘Object’ in Java and other static OOPs) became less common, not more.

This has nothing to do with dynamic/static typing and everything to do with weak/strong typing.


You’re mistaken. In C, void pointers (the “dynamic type” in C) were used to make algorithms generic. This would be necessary regardless of whether or not C was strongly typed. For instance, Go is strongly typed but lacks generics, and it too needs a dynamic type (interface{}) to produce many generic algorithms.


Oh grief. Just noticed the asterisk after ‘void’ became formatting.


> it encourages bad code

That's a matter of opinion, not a fact. It can be thought of as a shortcut for an applied mixin, without the needless pollution and boilerplate.


It’s a matter of opinion insofar as code quality is subjective and loosely defined, but we do have a general consensus about what code ought to look like, and the discussed features get us farther from that ideal far more often than they get us closer.


Doesn't happen maliciously in practice, also can be very handy when you need to attach a little extra data for the ride. If you need extra assurance there are techniques to make the instance "very" read-only.


If you run a linter, the cases where you are doing this outside of __init__ will usually be pointed out. You can silence the warning/error on a case by case basis if you really need to do it.


Which linter does that? flake8 doesn't AFAIK.



pylint does.


I have to be honest, coming from Python 2.3 (2004ish), I don't recognize "new" Python anymore. I think it's mostly regarding type definitions.


It’s not too bad I think, it’s just an evolution really. You can probably grok the basics of the type annotations in a short sit down. I can’t even remember when decorators were introduced but that even more greatly changed how python was written. I’ve been using python since 1.6 and I always thought the amount of repetition was ridiculous. I bet I’m not the only one that has written a “dsl” of what attrs and this pep does 1000 times using the facilities python had at the time: metaclasses, then decorators. Of course all these implementations were rushed, half assed and barely production quality. Despite any warts attrs is a pleasure to use. Type annotations boost IntelliJ/pycharm already quite clever assistance. One lingering thing is attrs named_attrs that while syntactically the best approach in my mind doesn’t work well with IntelliJ. So hopefully this will address it.


It's relatively recent. IMHO Python 3.5 to 3.7 feel like the language is going in a different direction than it did before -- type hints and the handling of asynchrony in particular.


If been using a lot of Python/JS/TypeScript in the last couple of years and it seems like each new release brings them closer together.


"If been using a lot of Python/JS/TypeScript in the last couple of years and it seems like each new release brings them closer together."

Yeah, I've just moved from a Python thing to doing some modern JS (async/await etc.), and they feel like cousins. To my surprise, I find that I like the comparative minimalism of JS: it has most of the key features of Python, but doesn't feel like it has the same cognitive load.


which IDE do you use these 3 with?


VSCode


After seeing the huge improvements that JavaScript has gone through over the years I'm all for language updates. Same with Java and C++ (although not as much for Java and I don't know C++ but I always hear C++11 is "new").


Java 8 (and 9 more so) bring a lot of changes but more focused on core libraries than the language itself. Not as drastic as the changes JS has seen.

C++11 though is quite a bit different. Probably the biggest change being that raw pointers * for the most part should not be used anymore.


Python has grown a lot since then. Back then it was this "better scripting language" that every Linux user kinda knew. Now it's being used much more widely and that just wouldn't cut it any more.


Looks like how a lot of languages already work out of the box. E.g. whenever I create a data type in Julia I automatically have such a constructor.

Static languages such as Go and C already essentially let you do this through initialization through braces.



While 3.7 is not here, back-port: https://github.com/ericvsmith/dataclasses


Also on PyPI.


I think it's because I'm working with elm right now, but this kind of thing scares me:

    created: datetime
But

   if type(self.created) is str:
       self.created = dateutil.parser.parse(self.created)
So basically, the type annotation cannot be trusted.


In Python, type annotations are (expressly) just a special type of comment that's been given a regularized format for the sake of both human and computer readability.

I don't know that it should scare you, but, like with many Python features, it's something that should be approached in a sensible manner. The Python philosophy is to leave you with the freedom to monkey around, and leave it to you to decide whether you want to abuse that ability.

(Look at me, all talking like someone who didn't spend 2 hours diagnosing a type error that a static language would have found and forced me to fix within half a second. :P )


No, they can't be but Python doesn't pretend that they do. They are just that, annotations. They don't ensure correctness. It's up to other programs to analyze (like mypy) and warn during development. That's it.


This __post_init__ thing is shit IMO. Why does it make sense for the language feature to allow you to completely disregard the type hint and fix it later?

I have to imagine that the type checkers would factor in anything that occurs in __post_init__ when evaluating whether the class conforms to the type hints, but it still feels like this python static typing stuff is drifting in the wrong direction.


Judging from your confrontational tone I will probably regret commenting but here goes.

Arguments accepted in __init__() are not necessarily tied to the so-called "shape of the type" at runtime. Fields with init=False, and the presence of ClassVar and InitVar is just two examples.

In terms of types of arguments passed to __init__(), they totally can be incompatible with the runtime type of the given field. There's nothing wrong with that. On the contrary, that is a useful feature. The obvious example is factories for field defaults. E.g. you can't store a list instance as a default value, it would be reused across your type's instances. Instead, you use default_factory=list. But when you do, what type does the respective argument to __init__ accept? Well, it's Union[YourType, MISSING] because you can omit it and it will initialize your type for you.

Type conversion is another common example. Your __init__ might accept timestamps as strings but always convert them to datetimes. Maybe you accept Unicode strings but internally store as bytes or vice versa? Maybe you support an external API so you have to accept a bunch of arguments that you later convert to a saner internal state? There are numerous use cases.

Summing up, don't confuse the signature of __init__ with the type shape.


I disagree. __init__() should do little more than assign attributes to parameter values, with the exception of handling `None` default parameters, and this exception only exists because Python's default parameters don't behave intuitively.


They're not intuitive, but they are consistent. It would be very surprising if `a=[]` behaved differently in a function definition (which isn't just syntax, but an executable Python statement) than elsewhere.


I disagree. ‘a=[]’ is inconsistent in function definitions today because the rhs is evaluated at interpreter load time but the assignment is evaluated at call time. The intuitive thing is would be to evaluate the whole expression at once (at call time) just like everywhere else.


I don't like `__post_init__` either. Wouldn't it make more sense to just override `__init__` similar to:

    @dataclass
    class DataObj:

        some_date: datetime

        def __init__(self, **kw):
            kw['some_date'] = dateutil.parser.parse(kw['some_date'])
            super().__init__(**kw)


To allow you to override init that way, @dataclass would have to create an additional superclass with the generated init method, rather than generating the init method in the class intended for use, which seems to have more overhead.


You could attach the __init__ method to dataclass (as if it were a class) and call it like this, couldn't you?

  dataclass.__init__(self, **kw)
Like in the olden days before super().


From the description, the dataclass decorator generates an __init__ method for the decorated class, which seems different than using a common generic __init__ method for all classes with the decorator.


Yeah, good point. I didn't think that through very much.


If you want to do it that way, I imagine you could use regular subclassing

    @dataclass
    class BaseDataObj:
        some_date: datetime


    class DataObj(BaseDataObj):
        def __init__(self, **kw):
            kw['some_date'] = dateutil.parser.parse(kw['some_date'])
            super().__init__(**kw)


Would be convenient if it also supporting generating a to dict method with dunder hooks for customizing the translation ...


Looking forward to having ORMs support this way of defining models


I just hope the `__post_init__` method catches on and becomes a regular python dundermethod. I actually find it to be a very good quality in this new implementation.

Example from the PEP 557:

  @dataclass
  class C:
       a: float
       b: float
       c: float = field(init=False)

       def __post_init__(self):
           self.c = self.a + self.b

https://www.python.org/dev/peps/pep-0557/#post-init-processi...

I could use this all over the place.


Where would it be useful outside of an automagically generated __init__ method?


It will be nice to have this but for data deserialization, which is the case in the example, I would still use drf serializers or marshmallow schemas.


Re: serialization, worth mentioning cattrs project (work with attrs classes, but would work the same way with stdlib dataclass):

https://github.com/Tinche/cattrs


Is Python becoming more statically typed then?


Dynamic and static languages are fundamentally different. There is no more or less.

I've tried to elaborate here why annotating variables with type doesn't make a dynamic language static:

https://medium.com/@Jernfrost/dynamically-typed-languages-ar...

You can't look at the presence of type information to determine if a language is static or dynamic. What matters is when and how that type information is used. In static languages expression have type. In dynamic languages values have type.

The implication is that you can't know the type of something in a dynamic language until the expression has been evaluated at run time. With a static language we can determine the type of every expression at compile time, which requires full knowledge of the whole program. It is why dependencies becomes much more problematic in static languages and why they are so poor as glue languages.


> In static languages expression have type. In dynamic languages values have type.

I don't think this is true. In a dynamic language all values, including expressions, share a single type - a union of all possible types. In a static language you can limit which possible types a value may hold. A dynamic language is like if you cast every single type to Any in a static language.

In Python this is entirely possible with mypy - a static type system for the Python language, which works through annotations.

> in static languages they are used to prevent the compilation of programs containing expressions where types don’t match up.

This is already the case with mypy in Python. It seems very much static. It's like everything is statically determined to be Any, then you specify in certain areas a more limited type.

In your case with Julia, the type annotations are static. Even if by default Julia is dynamic, it allows for static annotations.

I disagree that there is no in-between. I think Julia is actually a great example of an in-between. And, in fact, it has a name - gradual typing.


> In a dynamic language all values, including expressions, share a single type - a union of all possible types. In a static language you can limit which possible types a value may hold. A dynamic language is like if you cast every single type to Any in a static language.

No, you're thinking of a weakly- versus strongly-typed language. Python is dynamically and strongly typed - values have a definite type, but a variable can be assigned a value of any type.


As RussianCow said, I think you're misreading my post. A Python variable, from a type perspective, is a variable with a type Any/ a type that is a union of all possible types.

Using annotations you can then specify the type.

Nothing to do with weak/ strong, which merely imply some level of implicit casting.


I think you misread the OP: they are saying that Python is like a statically typed language in which every value has the type Any. That has nothing to do with strong/weak typing.


Julia is always dynamically typed and type annotations do not make it static, in contrast to how some "traditional" gradually typed research languages have worked. You can find a more detailed explanation here:

https://stackoverflow.com/a/28096079/659248

The notion that dynamic languages are "unityped" comes from looking at them through the lens of static typing, leading to a disconnect about what "type" means and a correspondingly meaningless answer. Since the static notion of type applies to expressions and the dynamic notion of type applies to values, when you ask what the type of an expression in a dynamic language is, you get an useless answer since dynamic languages don't—by their very nature—assign types to expressions. Yes, you can sometimes figure out what the type of an expression must be, but the ability to do so is incidental and not guaranteed by the language, nor an inherent part of its semantics.

The dynamic notion of type belonging to a value distinct from an expressions is present to a limited extent in object-oriented static languages with subtyping: when the static type of an expression and its actual runtime type can be different. The static type corresponds to the static notion of what a type is while the runtime type corresponds to the dynamic notion of what a type is.


I don't get where this whole "expressions vs values" thing is coming from. Why are you making this distinction? I don't get it. The expression '2 + 2' has a type in mypy - so why are we drawing this distinction here of all places?

> Yes, you can sometimes figure out what the type of an expression must be, but the ability to do so is incidental and not guaranteed by the language, nor an inherent part of its semantics.

It's not guaranteed with most languages - they build up from primitives, and almost all of them have some sort of soundness holes.

If I have every primitive type in Python correspond to a type in mypy, I don't get how that isn't a basis of a static type system.

It seems like you're talking about the difference between evaluation and a value itself. But mypy types are evaluated...

> The notion that dynamic languages are "unityped" comes from looking at them through the lens of static typing, leading to a disconnect about what "type" means and a correspondingly meaningless answer.

Pretty sure it's just the simple, type-theory way of defining it, and I don't see why we would define types in a way that isn't consistent with type theory.


> I don't get where this whole "expressions vs values" thing is coming from. Why are you making this distinction? I don't get it.

What's your definition of what makes a language static verus dynamic?

> If I have every primitive type in Python correspond to a type in mypy, I don't get how that isn't a basis of a static type system.

Being able to describe and annotate types doesn't make a static type system. A static type system is a way of associating with each valid program a proof that there will be no runtime type errors (or at least that certain entire classes of runtime errors will not occur). That's the entire premise of type theory.

The fact that you can assert types and determine the types of some expressions in mypy doesn't make it static (just like it doesn't make Julia static). You can prove that `2 + 2` is an integer in any language. Being able to do that does not make a language static or the term "static" is vacuous.

> Pretty sure it's just the simple, type-theory way of defining it, and I don't see why we would define types in a way that isn't consistent with type theory.

When a definition isn't useful, you don't just throw up your hands and say "oh well, guess we can't do anything about this"—you use a definition that is useful for the problem at hand. The fact that type theory's entire conclusion about dynamic languages is "they only have one type" is about as clear evidence as possible that, despite the name, "type theory" is not a useful tool for understanding types in dynamic languages. Yet systems like Julia and mypy are clear evidence that interesting things can be said about "types" even in systems that traditional type theory would call unityped.


> What's your definition of what makes a language static verus dynamic?

Types are checked before program execution.

I think this definition is very, very standard, and in keeping with a type theory view.

> Being able to describe and annotate types doesn't make a static type system. A static type system is a way of associating with each valid program a proof that there will be no runtime type errors (or at least that certain entire classes of runtime errors will not occur).

These two statements seem to contradict each other. Adding the type annotations is exactly what allows mypy to associate a proof with code.

I don't think the definition is useless... it seems entirely consistent with gradual typing.


> Types are checked before program execution. I think this definition is very, very standard, and in keeping with the type theory view.

This is a common informal understanding of the distinction, but when types are checked is not a property of a language and does not agree with the type theoretic definition of what a type system is. For example, you can defer type checking of Haskell programs until runtime [1]. Does Haskell suddenly become a dynamic language just because you decided to check types later even though your code is the same and the program behaves the same? No. What makes Haskell static is the fact that it comes with a set of rules that assign a proof of type-correctness to every valid Haskell program. When or even if you choose to check whether those rules are followed is not the deciding factor.

Similarly, the same Python 3 program can be run with or without running mypy on it first. The Python language is the same either way and a correct program will behave exactly the same since running mypy has no effect on program execution. Does whether Python is a dynamic language or not depend on whether I happen to have run mypy on it first? In this view, the adjectives "dynamic" and "static" do not describe the language and its semantics, they describe how one happens to use it.

> > Being able to describe and annotate types doesn't make a static type system. A static type system is a way of associating with each valid program a proof that there will be no runtime type errors (or at least that certain entire classes of runtime errors will not occur).

> These two statements seem to contradict each other. Adding the type annotations is exactly what allows mypy to associate a proof with code.

Type annotations are neither necessary nor sufficient to be able to type check a program. Type annotations are almost entirely unnecessary in Hindley-Milner languages (ML, Haskell), yet these are very much static languages—these are the languages of type theorists. Conversely, the mypy developers describe mypy as a "type linter" [1] for a reason: mypy is not what type theorists would consider to be a "type checker" precisely because you cannot associate a proof of the lack of type errors with every valid Python program. You can give a proof of the correctness of some programs, but that’s true in any language, so if that’s the criterion for being “static” then every language is static, so the term is meaningless.

> I don't think the definition is useless... it seems entirely consistent with gradual typing.

This HN post about gradual typing is relevant and worth reading: https://news.ycombinator.com/item?id=8595116. Mypy has optional typing, not gradual typing because the Python type system is not complete, even with type annotations. Perhaps you disagree with this perspective and consider Python 3's types to be "real types" and believe that mypy is a "real type checker". In that case, you are in direct disagreement with type theorists because they consider Python to be a unityped language and would take umbrage at calling mypy a "type checker" insisting instead that it is merely a "type linter". This is exactly why I feel that the type theoretic perspective should be broadened to consider systems like mypy to be "real" type systems, albeit dynamic ones, and that they should be studied and formalized rather than dismissed with unhelpful terms like "unityped".

[1] https://ghc.haskell.org/trac/ghc/wiki/DeferErrorsToRuntime

[2] http://mypy.readthedocs.io/en/latest/basics.html


So what this comes down to is that python is not statically typed, but mypy+ python is.

And given how mypy works, you can enforce types over only parts of a program.

That, to me, sounds like the python ecosystem is becoming more statically typed.


> With a static language we can determine the type of every expression at compile time, which requires full knowledge of the whole program. It is why dependencies becomes much more problematic in static languages and why they are so poor as glue languages.

But you have to have this information anyway, or at least most of it. Even in a dynamically typed language, you have to know what sort of arguments a function expects; otherwise, how can you write any code at all to operate on those values if you don't even know what they are? Static typing just forces you to be more explicit about encoding those constraints. Whether that's worth the tradeoff depends on a lot of factors, but its use as a "glue" language is certainly not one of them.

Edit: Moreover, the line between statically typed and dynamically typed languages isn't as well-defined as you claim. For instance, TypeScript is a statically typed language, but it successfully compiles regular JavaScript code because, by default, every expression is given the type `any`. This means you can start out with dynamic typing and add static types progressively. And at that point, how is that any different than Python with mypy[0]?

[0]: http://mypy-lang.org/


> Even in a dynamically typed language, you have to know what sort of arguments a function expects; otherwise, how can you write any code at all to operate on those values if you don't even know what they are?

Not necessarily...

If the value passed to function implements whatever functionality called then its type doesn't really matter. Really the whole theory behind duck typing.

  >>> i = 1 + "1"
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
  TypeError: unsupported operand type(s) for +: 'int' and 'str'
while javascript (apparently) will assign "11" to i.


If you operate on a value, you are assuming that the value supports that operation, which means you have some information on the interface that the value supports (i.e. a type). If you had absolutely no information about the interface of a value, you could not possibly do anything with it.

There is actually an entire category of static type systems called "structural type systems"[0], which is basically duck typing checked at compile-time rather than runtime.

[0]: https://en.wikipedia.org/wiki/Structural_type_system


> Even in a dynamically typed language, you have to know what sort of arguments a function expects; otherwise, how can you write any code at all to operate on those values if you don't even know what they are?

True, but the Smalltalk idea was that you send a message to an object, and that object decides how (or whether) to handle it. So in Ruby, if you send the + method to some kind of number object along with the objects you want added, and the receiving object decides how to perform that addition, assuming the objects you sent can be summed.


Sure, but even then, you still have to make assumptions about what messages an object accepts. In other words, you have to have some information about the interface of every object.


You seem to contradict yourself.

> With a static language we can determine the type of every expression at compile time, which requires full knowledge of the whole program.

> Dynamic and static languages are fundamentally different. There is no more or less.

For many Python programs, every expression can be determined statically. While this may not be true for all expressions in all programs, surely this means the OP's claim that "Python is becoming more statically typed" is true and your claim that "There is no more or less" is false, no?

Besides, there are languages like Go which have dynamic types (interface{}) and which are considered statically typed languages. You might say that the static type is interface{}, but then you could say that every type in an un-annotated Python program is also interface{} (or whatever the equivalent Python type would be).

It seems like your arguments hinge on semantic games.


>Besides, there are languages like Go which have dynamic types (interface{}) and which are considered statically typed languages.

And C# has a var type (dynamic) and Dart has gradual typing.


var is type inference, not dynamic typing. Unless something has changed since I last C#’d. The docs seem to say this still holds: https://docs.microsoft.com/en-us/dotnet/csharp/language-refe...


I meant this variable declaration:

C# Dynamic Type: C# 4.0 (.NET 4.5) introduced a new type that avoids compile time type checking. You have learned about the implicitly typed variable- var in the previous section where the compiler assigns a specific type based on the value of the expression. ... A dynamic type can be defined using the dynamic keyword.Jan 1, 2014


Oh neat. Not too sure where I’d use that. But nice to have around.


It's for COM/Reflection when the type can't be static because the library with the type isn't available to link against.


Python, like many dynamic languages, is gaining increasing support for optional static type declarations and AOT type checking and is also leveraging the same structure to make code more concise.


It certainly seems to be allowing the option to be. I welcome this reduction in boilerplate-ness.


Well considering they are type hints that aren't actually checked then the answer is no.


That was my take on this too, why do the type hints need to be there for this to work, they don't seem to be involved at all?


> That was my take on this too, why do the type hints need to be there for this to work

Because (1) you need to declare the fields, and (2) it's good to do that in a way that will work with typecheckers even though they are external, and (3) there are actually two particular type declarations that are used by the implementation, even aside from external typecheckers.


There are a few options to use them without declared types, but for some reason there was a noticeable reluctance to document them, perhaps due to types coming back into fashion.

For example you could set the type to object, typing.Any, ... (Ellipsis), or None, etc.


attrs does a pretty good job, while also supporting older Python versions: http://www.attrs.org/en/stable/


Finally! Attrs is great, but I'm glad to see this in the standard library.


Looks great, too bad my project will forever be in 2.7


Presumably (hopefully?) not in 2020, when it stops getting security fixes?

https://legacy.python.org/dev/peps/pep-0373/


It will probably be a fair bit longer than that before 2.7 stops being relevant. 2.7 is still the default system Python on a lot of Linux distros, which will be in vendor support for longer. It doesn't really matter if the fixes are coming from the PSF or someone else - nice thing about Foss.

RHEL/CentOS 6 is still reasonably widely used, and the system Python there is 2.6.


> 2.7 is still the default system Python on a lot of Linux distros

I will never understand why that matters. "ed" is the default system editor, but I'm only "{apt-get,yum} install {vim,emacs}" away from having something I actually want to use. That's the whole point of a distro. You don't have to use Python 2.ancient just because /usr/bin/paleolithic is written with it.


In this context it doesn't really matter that it's the "default", just that it's supported by the distro maintainers, and so receiving security updates regardless of what the PSF supports.


Not all environments have unfettered access to the internet to download whatever arbitrary packages the user decides they need that day. Sometimes you're stuck with whatever the distro shipped with.


Most distros have been shipping Python3 for years, just not set as the target for `/usr/bin/python`. You can run `/usr/bin/python3` currently, almost anywhere.

And "software that has been gradually going EOL for the last 5 years" is not "packages the user decides they need that day". I'd be very surprised if any distros that ship with Python2 do not ship with Python3 in 2020, and I'd even wager that most will default to python3 by then.


I'm not buying that excuse. It implies machines that are installed and deployed exactly once and then never updated in any way, even for security stuff. In that case, an old version of Python would be so far down the list of problems that it wouldn't even be brought up.


What if you have a gigantic Python 2 codebase?


You've only got 2 years, get started now! If you need to do it gradually (as would be wise for a gigantic codebase), start using `six` to make your codebase forwards-compatible. Then when all your tests pass on python3, cut over, and remove `six`.

(Or take a gamble on Google taking over support for Python2 when it officially goes EOL, I suppose).


Redhat (or someone like them) might keep the lights on for Python 2 if a large enough customer leans on them.


There's approximately zero chance that RedHat will support backports of huge, many-contributor projects like NumPy or Django that are dropping Python 2 support (see: https://docs.scipy.org/doc/numpy/neps/dropping-python2.7-pro... and https://docs.djangoproject.com/en/2.0/releases/2.0/). They just can't - there aren't enough hours in the day.

Five years from how, you don't want to be explaining to your CEO why it's literally impossible for you to integrate with a third party SDK because your stack is so ancient and petrified that new code can't incorporate it.


That's a different but unrelated problem. In that situation, consider that official support is almost over for Python 2 and extended support from third party vendors is going to get nothing but more expensive with time. It will be harder to maintain those projects as the dependencies they use drop support for an increasingly obsolete release. It will become much harder to hire great engineers willing to work on legacy versions.

Basically, Python 2 is going to become a very serious technical debt very soon. It's past time to start paying that down.


You an use attrs then.


Why too bad? Sounds like you have a useful and realistic project and you don't plan to waste time on 2 -> 3 busy work.


Reminds me a lot of the attrs module.


I have used dictionaries for this for the longest time. The thing is I sort of rely on the type being flexible, so what can I do?


If you rely on your type being "flexible" I would argue that you have a design problem.


My working code beats your ideological purity.


Seconded...I swear to God "that's a symptom of bad design" and its ilk are the most hackneyed type of response to a situation that has the bonus of making you seem "above it all". It absolves you of even having to bother to look into the situation to see if maybe there is nuance that you can't just throw platitudes at.


This was really coming from practical experience, but no offense taken.


I asked for how this PEP could or could not be used for my use case, and your response was "your use case is invalid" without knowing why I do the things I do. No one should have to respond kindly to bullies.


Calm down, no one is bullying you.


You can continue using dictionaries


I'm piqued by one thing in this article. Using an object as a dictionary key. What's the use for that? I don't think it's ever occurred to me to do that.


Anything hashable can be a dict key, so if you have some mapping between one thing that’s hashable and other things, you can use a dict directly rather than having to come up with an index and keep track of that association.


Dynamic dispatching comes to mind.

  d = {Foo : do_something_with_foo,
       Bar : do_something_with_bar}

  d.get(type(x), default_function)(x)
Probably some more efficient way but honestly I've used this before instead of writing a whole if elif chain.

--edit--

Makes python's lack of a switch operator sometimes less painful...


Oh, that makes sense. Not sure why this question rubbed people the wrong way. I was genuinely curious about when you would use an object as a key.

I guess my uses for dictionaries are pretty vanilla. But that’s a kind of thing I do all the time. I set up dictionaries where the key is a possible value from some operation or query, and the value is the function I want to perform on that value.

This extends that concept. Thanks!


I'm not sure what part of the article you refer to, but:

- If the objects are value types (as dataclasses are, since they respect equality): it makes sense to use them as dict keys, just as it makes sense to use tuples as dict keys. Changing a tuple to a namedtuple is natural.

- If the objects have reference semantics rather than value semantics, you can still use them as dict keys in Python. The default __hash__ is the id/address of the object (the default __eq__ uses id() too). I don't particularly like this style but I have seen it used. It feels error-prone to me and not "algebraic".


memoization is a common use case. If you are doing an expensive computation that gives the same answer for the the same inputs, you can build a lookup table for cases you have already encountered to avoid recomputing. you can use an object to encapsulate the inputs (or the inputs might be objects already).


Using an object as a dictionary key. What's the use for that?

I've used tuples has dict keys several times, so I don't see why I conceivably wouldn't use a Data Class containing the same data as a dict key.


Avoids the extra step of hashing the object to make the key by simply passing the object itself.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: