Hacker News new | past | comments | ask | show | jobs | submit login
Mypy - An experimental Python variant with dynamic and static typing (mypy-lang.org)
108 points by room606 on Sept 23, 2012 | hide | past | favorite | 39 comments



why doesn't this use the python standard for adding type information? it should be:

   def foo(n: int):
not

   def foo(int n):
then it would be valid python 3 code (and would also work with pytyp (disclaimer: mine)).

http://docs.python.org/py3k/reference/compound_stmts.html#fu... http://www.python.org/dev/peps/pep-3107/ http://www.acooke.org/pytyp/


Thanks for mentioning pytyp!

I have been using this recipe: http://code.activestate.com/recipes/572161/ to add type checking to Python for a while now, so it's nice to see a more mature and developed library that does the same (and more.)

In your code examples I didn't see any decorators around the functions that were being typechecked, so how do you then enforce the type checking?

In addition, is pytyp as fully-featured as the recipe above? (The recipe is short but well-written.) The comments at the beginning of the recipe shows a bunch of examples that demonstrate all that it can do.


huh. i will fix that, thanks. there should be a @checked annotation on the function that is checked in the example there.

pyptyp is pretty comprehensive. for example, Rec(a=Seq(Opt(int)),b=Alt(float,str)) is the type for something like a dict where a is a list that contains ints or None and b is either a float or a string.

BUT it's pure python so it's not at all fast. i wouldn't use type checking throughout a codebase - only for debugging, or tests.

it's perhaps more useful as a start for building other things. for example, it includes mapping from JSON to python objects - it can use type annotations to guide the construction of python objects from JSON lists and maps. but on the other hand the internals are quite complex (i have been working with it recently, after not using it for a year or so, and it's taken some effort to understand everything).


Another similar project: https://github.com/podio/valideer. The (rough) equivalent schema for the example above can be expressed as {"a": ["?integer"], "b": AnyOf("number", "string")}.


also, i should perhaps mention, compared to some other libraries, pytyp is integrated with ABCs and the Python type system. so you can use isinstance() with these expressions, for example.


Mypy needs some extra syntax not provided by standard Python annotations (e.g. local variable annotations, casts), so it makes sense to have the syntax related to static typing clearly different from Python to avoid confusion. C-style annotation syntax also has the benefit of being familiar to a very large programmer population. There will also be a converter from mypy syntax to Python syntax.


just seen this reply; thanks. i encourage you to think again, though. one reason is that you would be more likely to influence the future evolution of python itself if you worked more within the system...


  > Mypy will get rid of the Global Interpreter Lock (GIL)
  >Details of the concurrency model are still undecided, though
Wait, what? There have been numerous attempts to get rid of the GIL, most of which failed miserably. A promise to get rid of the GIL without even deciding on a concurrency model seems a little premature.


This is not true. There have been plenty of successful attempts to remove the GIL, including from CPython. It's not impossibly hard to do so by a long shot.

The GIL is an optimization though, and so the performance hit from removing it on single threaded code is prohibitive. You can watch any of David Beazley's talks on the GIL for more info.


I stand corrected. Still, saying they are going to remove the GIL without addressing the challenges faced by other people who did the same - most notably the performance hit - makes me skeptical as to their ability to fulfill their promises.


Prohibitive is too strong I think. The patch to 1.4 discussed in a blog post by Beazley came with a 2x slow down, but that was an initial patch without much work on optimization. Besides that, I think 2x slowdown isn't that bad considering the language is not used for its raw efficiency anyway.


Well, the only reason to remove it is to run multiple Python threads in parallel for efficiency...so efficiency is definitely the top concern here.


No the reason to remove it is to get parallelism from threads. Once you have parallelism, you can overcome a 2x slow down for a single thread by using more cores.


If 99% of your codebase uses one thread, then no, a 2x slow down is not acceptable, even if you can overcome it by using more cores.

Remember, the ideas was not to have CPython and CPython-without-GIL, it was to have one unified GIL-less CPython.

For that, a 2x performance drop would not be acceptable for most code.


But 2x was just with the initial, non-optimized patch. A language like Java seems to perform quite well even with fine-grained locking, so it's possible.

And I don't see why performance is suddenly used as a deal breaker for a language primarily used for scripting and other non-CPU-bound purposes.

I speculate that the performance issue wasn't really the most important reason for rejecting the patch, but more so not wanting to deal with the complexity of maintaining the patch. It's a shame because the future is definitely going to have lots of cores, and not all problems are well-suited to multiprocessing.


The difference is that Java's memory model was designed from the ground up to work with multiple threads. Python's semantics are that only 1 thread is running at any point in time. Keeping that illusion while running multiple threads in parallel is what causes the huge overhead. The alternative is to do away with CPython semantics, but that would break a ton of existing code.


>And I don't see why performance is suddenly used as a deal breaker for a language primarily used for scripting and other non-CPU-bound purposes.

People want to use Python for lots of CPU bound processes. A big Python niche is numeric and scientific computing.

You say that it is "primarily used for scripting and other non-CPU-bound purposes", but this is also a consequence of it being slow. If it was faster it would open MORE uses. LUA and C# got the games action. Go gets attention on the networking apps end, etc...


The mypy FAQ now answers this question:

We don't believe that it is practical to adapt the CPython VM incrementally for running parallel workloads efficiently. Instead, we are going to get rid of the GIL by having a new VM for running mypy code. The VM will be based on the Alore VM, but it will use compiled native code from the start (the Alore VM uses a bytecode interpeter). It will use a garbage collector instead of reference counting (CPython primarily uses reference counting), and it will also support a modified C Python extension API that will support proper multithreading without a GIL.

The CPython VM will still be used to run Python code and modules, and it will still be limited to running a single thread at a time (most of the time). Objects passed between the VMs will have to be copied or accessed using proxy objects, so there will be some performance overhead compared to native mypy modules. We plan to port commonly used, performance-sensitive Python modules to mypy and the modified C API to minimize performance bottlenecks.


From the FAQ:

The initial compiler will compile into C or use LLVM for the back end

This is .. a rather fundamental point. More critically, there is no link that I could see to any code.

It is an interesting idea, but it seems a little too under-cooked to merit much discussion right now. However, one way that this could get really good traction is if it were to support Cython code (which doesn't seem to be the case, from the FAQ)


From their homepage:

Access to Python libs

Mypy will support accessing Python modules from mypy programs, running in the stock CPython virtual machine for the best compatibility. You can also access your existing Python code.


Cython, not CPython. I took it as a given that it will run "pure" CPython-compatible code. I meant that the ability to JIT down Cython code (without the C translation step) would be a strong hook to get some people interested. There have been steps toward this in PyPy, and I expect it will happen in Numba sooner or later.

This would actually be even more compelling that I initially thought, since you pointed out that mypy run in stock CPython (I overlooked that and assumed it would have separate interpreter). All hypothetical, of course -- where's the github link?


As much as I'd like to believe that there is a Python interpreter without a GIL, type inference, and LLVM backend that can support the full standard library I'm doubtful until I see evidence.


How will mypy give satisfactory array and integer performance? In the case of arrays, unless they are homogeneous, you have to implement them as arrays of objects. And in the case of integers, they won't have the Python semantics unless they are the same number of bits as Python integers.

Also, what about things like metaclasses, changing methods of classes at runtime, etc? It seems like it won't be much like Python at all, and certainly not compatible, if it is to be efficient!

It seems to me that Python is fundamentally dynamic in nature, not just programatically, but also in terms of its type system.

Cobra was an attempt at a Python-like language with static compilation and type annotation. But so far, I don't see a mass exodus from Python to Cobra.

If mypy isn't going to be fully Python compatible, why wouldn't people who were unhappy with Python just switch to another statically typed/compiled language that already exists?


I've been thinking about this for a while: the benefit of static typing doesn't really come from being compile-time, it comes from making sure the causes of a certain category of errors must be local to the function where the error's seen. I suspect that a run-time static typing framework, which simply did the equivalent of asserting the type of any variable whenever it was assigned, would provide almost all of the safety/development advantages of compile-time typing, and be much easier to implement for python.

Has anything like this been attempted?


A subtle advantage of static typing is documentation: you can see what kinds of arguments are required. This can also be used in auto-generated documentation, creating hyper-links to the type and implementations. (I noticed this in python docs vs. javadocs.)

There's an old question: which benefits of static typing are needed - and are they worth it? (to the extent that static typed languages are successful). Performance is obviously important. But if dynamically typed languages were fast enough for most needs, would the bulk of users switch to them? (note: it doesn't matter if static typing is faster still, just that dynamic typing is fast enough.)

Would it follow the long-term trend, of performance being traded for developer productivity? The real question behind this being whether dynamic typing actually increases developer productivity (and if so, for which tasks). Obviously, dynamic typing requires fewer keystrokes, less up-front design, and is more flexible (when iterating lean/agile). But static typing prevents some bugs, aids documentation, and helps manage projects as they increase in size.

However, this may be moot. We're no longer getting the long-term increases in single-core performance, so this trade-off may no longer apply. And, note: we do actually have some excess performance, and what has happened is in some cases it is absorbed by dynamic languages (python/ruby on serverside), and it is used to create smaller computers, lower performance, using less wattage (smartphones/tablets). iPhones use objective-c, androids use a java variant.

Maybe we've reached a point of equilibrium in the trade-off, if we include the end-user's demands?

BTW: I'm personally at home with stating typing, but I can't remember the last time it prevented a bug for me. (other people using it in others ways might have different experiences).

PS: to give another answer to your question: Flash's ActionScript 3.0. It's javascript plus optional static typing. Coincidentally, flash is dying/dead.


Dylan, Common Lisp, Dart, many others.


An interesting idea. I do like Python's syntax - it's how I indent my C code so it's quite natural. Having to hit backspace is a bit annoying though when you're used to the IDE auto inserting a } you can down-arrow over.

Personally I'd prefer explicit dynamic types and implicit type inference, but that's a personal preference. From working with C# and F# in my day job, F# is much more graceful without the 'var' everywhere that you get with C#'s inference.


Reminds me a bit of Julia[1], which allows optional static typing.

[1] http://julialang.org/


Perl6 also has optional typing:

  sub fib (Int $n) returns Void {
    my Int ($a, $b) = 0, 1;
    while $a < $n {
      say $a;
      ($a, $b) = ($b, $a + $b);
    }
  }
NB. At moment Void doesn't appear to be implemented in current version of Rakudo. Removing returns Void will make it work!


It looks like they could've chosen to extend one of the established projects (e.g. Cython, PyPy, CPython), maybe making some compromises. Instead they chose to start from scratch, which means that the chances of this actually becoming usable are.. Rather slim. What a pity. The Python community could really use something like this.


What about "hints"?

  def foo( often int n ):
This one would generate both the dynamic & the static versions.

Once all call sites stop using the dynamic version, one can move on to strict typing (and remove the "often" qualifier), if such is the goal.

Think of it like the old "register" hint in C code (now obsolete).


> dynamic (or "duck") typing

Nit-pick: This post seem to say the dynamic typing and duck typing are the same thing. They're not. Duck-typing means looking up method names runtime by name, and not compile time by interface. Dynamic typing is a much bigger concept.


This looks like a honestly interesting project, but the motivation wording has a few other nitpicks that ring a sort of alarm in my head.

> Performance > Static typing can give you high, scalable and predictable efficiency, without the slow warm-up seen in many JIT compilers. These are important for interactive applications and games, for example.

static vs dynamic typing has not much to do with performance, at least as exposed here: the most widespread JIT out there is for a static language: Java, all the while both Python and Ruby can be compiled AOT (both to bytecode and to native code). Now type information could add some data for static analysis, which may bring some performance benefits but given how static typing is optional, such analysis (and following benefits) would only go so far.

> Compile-time and runtime type checking > Static typing makes it easier to find bugs and with less debugging (and with less staring at long stack traces)

Well, for the breadth and depth of Java and C# stack traces I have enjoyed, they can be equally confusing, just as dynamic typing stack traces can be equally informative. More often than not, a confounding stack trace is a code smell of some architectural issue.

> Grow your programs from dynamic to static typing > You can develop programs with dynamic types and only add static typing after your code has matured. This way you do not have maintain type declarations in initial development when the code is still changing rapidly.

This sounds interesting, but it also sounds like in reality, few areas would end up being typed and probably not enough for static typing to be meaningful. Humans are lazy, and programmers even more (that's why we tell machines to do stuff for us after all). I'm thinking a bit like for test suites, written after the fact or too late in the game. I guess that, just like tests, responsible developers will end up doing the right thing.

Maybe it's just the way it's worded and the project fundamentals are solid, but this sounds too much like a bullet list of "why dynamic typing is bad" (not that they're a silver bullet either) from someone who does not grok dynamic typing (and form the about page that does not seem to be the case, which only adds to my unease).

Anyway I'll keep mypy in my peripheral radar.


Mypy itself may be vaporware at the moment, but it's the successor to an existing language which has been in development for several years- http://www.alorelang.org/


Whats the benefits of this over cython?


From the FAQ:

How is mypy different from Cython?

Cython is a variant of Python that supports compilation to efficient C modules. Mypy differs in the following aspects, among others:

Mypy will have a powerful type system that can detect many type errors while supporting a very Python-like programming model. Cython has simpler types that primarily serve to speed up code.

Mypy will be able to speed up most programs, even programs that heavily use object-oriented features. Cython is primarily focused on speeding up numerical code and tight loops.

Mypy will have new virtual machine that allows speeding up all parts of the VM, including standard libraries and the garbage collector. Cython uses the normal Python VM.

Cython supports accessing C functions directly and many features are defined in terms of translating them to C. Mypy is not bound to any particular target language and can support both C and Java backends, for example. However, accessing C library functionality in mypy will not be as easy as in Cython.


It's a new programming language rather than a tool to generate C code that ties in with the CPython API.


Boo, Shed-skin


For people who don't know what you're talking about:

Boo - http://boo.codehaus.org/

Shed-skin - http://code.google.com/p/shedskin/

Both are projects that take a crack at adding or creating a static type system in concert with Python like syntax. The above take very different approaches.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: