Hacker News new | comments | show | ask | jobs | submit login
Why Python is Slow: Looking Under the Hood (jakevdp.github.io)
208 points by underyx 1145 days ago | hide | past | web | 176 comments | favorite

    Dynamic typing makes Python easier to use than C
I disagree with the implications. The main reasons Python is easier to use is independent of type system. Not having to manually manage memory, for example! Overall it is a higher-level language, whereas C is designed for maximum performance.

Dynamic typing is probably preferable to a 40 year old type system. But Python could be easier to use (catch bugs ahead of time!) and execute faster by adding a modern type system. Optional typing (like TypeScript or Facebook's new PHP implemntation) would probably be appropriate.

>> Dynamic typing makes Python easier to use than C

> Overall it is a higher-level language, whereas C is designed for maximum performance.

I agree with most of your points, but on this I think you are comparing apples to oranges. C is not designed for maximum performance, but as a system language and as such only as slim abstraction over assembler. As a side effect it allows for maximum performance. A more fair comparison (pretty much every widely used language is easier to use than C) would be C++ vs. Python, where in C++ most - if not all - of the higher level abstractions were explicitly designed not to worsen performance compared to low level code. And in my opinion Python is so much easier to use than C++ because it values simplicity and consistency more than efficiency.

For example the general loop constructions in Python which are possible because of nice generators and the wonderfully straightforward `yield` construction are unacceptable in C++ as they lack the flexibility to be as efficient as possible in every case.

The advantage of dynamic typing only becomes apparent in a third argumentative step; when you try build a language with consistent and efficient abstraction and you end up with something like Haskell whose type system complexity is huge entry barrier. That's why I think dynamic typing is good for Python.

I find the Haskell type system much easier than the Python type system.

I'd argue that Haskell actually has a pedagogy problem, because many of its proponents are academics or mathematicians, who are comfortable with that kind of language. Talking about monads, functors, and arrows is a very natural activity for them, and clarifies rather than complicates the subject. However, you don't really have to understand any of that to get started in Haskell, and much of the blogging community has (I think wrongly) focused on explaining advanced concepts rather than making easy introductions.

You don't have to deal with monads any more than you have to deal with Python's OO behavior - which is to say, for quite a while.

I'd also argue that once you take in to account the sometime strange OO implementation of Python, and that Haskell has type inference for most things, Haskell is easier to understand the high level concepts in. Typeclasses and higher order functions (functions which act on functions) provide a very neat way to substitute in different types to your algorithm without having to change the whole thing, and don't even delve in to the complicated world of functors and monads.

> much of the blogging community has (I think wrongly) focused on explaining advanced concepts rather than making easy introductions.

I would like to make two notes about this:

- Most of the Haskell tutorial examples are about computing pi, fibs, and other math functions or series. Which is fine, but rather obvious beyond the recursion vs iteration bit. It's not just that newcomers arrive to Haskell with an imperative mindset, it's that they (we?) come with imperative (stateful) algorithms, which of course turn out to be a nightmare to implement.

- I found those explanations about Monads, etc necessary because as soon as I wanted to do anything less obvious than calculating a function or series, it was pretty hard to figure out how to get it done. For example, adding print logs to my code to debug, or building the code to output the formatted result of my program. "Why can't I use map here?" "What is is this mapM thing?" are very real roadblocks.

Those problems are not unique to Haskell, of course, but Haskell's purity makes it much more inflexible - you can't do some part the Haskell way to learn the ropes while you do the rest imperatively to get actual, satisfying results.

You can maybe put off the theory of monads, but their machinery hits you fast and early. For example, the meet-and-greet from Learn You A Haskell:

    main = do  
        putStrLn "Hello, what's your name?"  
        name <- getLine  
        putStrLn ("Hey " ++ name ++ ", you rock!")
The monad effectively introduced a separate syntax for declaring variables, which you must use in preference to let/where, and the compiler will emit a gibberish error message if you mix them up. So you must acquire at least a working understanding of monadic syntax before you can write a trivial program.

> I'd argue that Haskell actually has a pedagogy problem, because many of its proponents are academics or mathematicians, who are comfortable with that kind of language. Talking about monads, functors, and arrows is a very natural activity for them, and clarifies rather than complicates the subject.

Btw. I am a mathematician and the only time I ever had contact with category theory was in a seminar presentation. The great majority of students at our university would be as new to functors as most of you.

But on the other hand the reason I am fascinated by Haskell are all these advanced methodologies that promise to make the act of solving a problem more manageable. To contrast that against the Python/C comparison I find that Python gets less in my way (probably even less than any other programming language I've tried) but hasn't taught me concepts that I didn't already know from C++/Java/…

So the question would be: If you don't want to learn about Monads, why would you come to Haskell in the first place?

Category theory is pretty popular IME, I've seen lots of mathematician talk about it. It seems fairly popular especially with algebraic geometrists and algebraic topologists.

OTOH I've never seen any mathematician use (perhaps with the exception of myself, depending on at what point you start counting a person as "mathematician" and at what point you consider someone to be "using haskell") haskell. I doubt the haskell community actually has any mathematicians, it's mostly just people who'd like to be mathematicians but are too lazy to get a degree.

The overlap of mathematicians who care about category theory (some logicians, algebraic geometrists, algebraic topologists, ...) and mathematicians who program (mostly mathematicians from applied disciplines, numerical methods, computational physics, ...) is pretty much zero.

You probably won't find your algebraic topology professor programming in any language anytimes soon, except LaTeX.

> The overlap of mathematicians who care about category theory [...] and mathematicians who program [...] is pretty much zero. You probably won't find your algebraic topology professor programming

Well, unless perhaps s/he has some involvement in a project like this one http://comptop.stanford.edu/ ("The overall goal of this project is to develop flexible topological methods which will allow the analysis of data which is difficult to analyze using classical linear methods"; first named person in left sidebar is an algebraic topologist).

Or this one http://www.esf.org/index.php?id=8764 ("Applied and Computational Algebraic Topology"; first named person on steering committee is an algebraic topologist).

Or this one http://munkres.us.es:8080/groups/catam/ ("Computational Applied Topology and Applied Mathematics"; directory is an algebraic topologist).

Or, turning away from algebraic topology as such to mathematics inspired by algebraic topology and involving both category theory and programming: http://homotopytypetheory.org/ ("... new program for a comprehensive, computational foundation for mathematics based on the homotopical interpretation of type theory. [...] currently being implemented with the help of the automated proof assistant Coq"; note that in practice doing nontrivial things in Coq is a matter of functional programming).

For sure there are plenty of mathematicians (in algebraic topology and elsewhere) who don't program, but I think the intersection between people interested in categories and people interesting in programming is bigger than you suggest.

I don't know how many of them are using Haskell, though.

Yeah, I'm aware that some projects that apply computer science to pure mathematics (in various forms like Kenzo/CAT, PLEX (not CPLEX, that's for applied mathematics/finance/etc), agda, coq, GAP, ...) exist, but I don't think the claim that they are very very niche with mathematicians is exaggerated.

If you go to your average mathematics department and ask a few professors there that work in logic, algtop or alggeom, I doubt you'll find any that do any programming for their research. You probably won't find that many logicians that use proof-assistants either, although those are probably the least niche category of the things listed (and they are fairly popular with the CS folks)

> However, you don't really have to understand any of that to get started in Haskell

For sure. Then you get a compiler error.

Programming Haskell without understand the type system and lambda calculus is not possible.

Edit: thanks for downvoting me. If you believe that you can program Haskell without understanding the basics of Functional Programming and without understanding the type system, then you live in an alternate reality or want to evangelize Haskell.

I'm sorry, but to be an amazing programmer in Haskell, the understanding you need of the lambda calculus is minimal.. For example, consider this function to compute the Ackermann function in the lambda calculus:

    λx. x (λf y.y f (f 1) ) (λx1 f x.f (x1 f x))
or even just addition

    λx y f x.(x f) (y f x)
none of that will even come up in Haskell.

Even the type system - you need to understand some tiny amount - you can only add ints to ints etc. The more complicated stuff, the type inference, the monads etc - can be ignored until later.

Haskell can be used without understanding - almost all programming languages can. That's why we have so many newbie bugs.

Until later? What later. An hour later?

Let's look at a few chapter titles of Real World Haskell.

Chapter 2: Types and Functions

Chapter 3: Defining types, streamlining functions

Chapter 4: Functional Programming

Chapter 6: Using Typeclasses

Chapter 14: Monads

Chapter 15: Programming with Monads

Chapter 18: Monad transformers

It's a matter of perspective -- to someone who has taken a university class on lambda calculus/on logic that had a chapter on lambda calculus (e.g. as part of a M.Sc. in mathematical logic or CS), anything in RWH would probably seem extremely basic and superficial. It doesn't even introduce most of the basic things a first chapter in a book on lambda calculus would probably introduce you to.

My degree was also heavy on FP and LP alongside many other themes.

I think the biggest issue is with those developers that never got a degree or were unfortunate to land in a degree that taught programming languages instead of programming concepts.

The pretentious view that Haskell's core is a mathematical structure familiar to mathematicians is wrong.

Haskell borrows concepts from category theory (a "field" so abstract from most math that most mathematicians don't need but a handful of its concepts) to label typeclasses, and those typeclasses don't always follow from their namesake.

Further, Haskell could exist and keep its property of referential transparency without the the 'mathematical' structures or their names. Here's a version of Haskell without monads: http://donsbot.wordpress.com/2009/01/31/reviving-the-gofer-s...

Monads, functors, arrows, etc. are more aesthetic than fundamental to the core nature of Haskell. They're just a bookish (and sometimes bureaucratic, IMO) design choice. What I mean by that, is that the language designers took a concept (referential transparency) and built an understandable structure, but beside and aside from this, made a bunch of nested, derived typeclasses of dubious value or purpose. Sometimes I look at a Haskell library and have a more academic version of the, "Why the fuck is this a class?" moment.

I'd like to comment tangentially that Haskell is almost the OOP/Java of the 2010's - the programming community claims that it makes your code "safer", and in some sense it very much does, but its features are being perverted and overhyped while its caveats are being forgotten.

I'd love to see optional typing in Python, I wonder if there is any official reasons about why it never got introduced.

I very rarely change the type of a variable, so it would be essentially free speed and free speeds for me.

I actually wonder, do people use dynamic typing so often? I mean, it's nice to do "variable = int(variable)" when I know that I'm getting an integer in a string, but that's probably the only use case can I can think off that doesn't just reuse variables for something else.

> I'd love to see optional typing in Python, I wonder if there is any official reasons about why it never got introduced.

Here are a few from Guido himself (not really anti static-typing, but just walking you through pros/cons as he sees them). Great reads:




I consider http://cython.org/ to be the "optional typing" implementation of Python.

I started programming in Java, and moved to Perl and now Python. At first it seemed a bit weird, dynamic typing, but now I embrace it and enjoy the flexibility it offers. I see that many people prefer static typing and I am not sure why.

Can someone give me a concrete example of when static typing would be beneficial? (The majority of my work revolves around databases, so I guess I rely on that as a kind of type checking to a degree).

Static typing makes it a lot easier to reason locally about code.

I can't understand what a function is doing in isolation when the types are unknown. A function takes a variable foo and returns foo[bar], what did it return? Is that even valid code? What does that even mean if foo is a string and bar is a database handle? Ok, this is a lookup of some kind... is it an index or a hash table? I have no way of knowing without reading the comments (which may or may not exist). The only other way to figure out what the function is used for is to trace the data all the way back to its creation... literally find every call site and slog my way up the call stack.

In a dynamic language you can't even tell if all your code will execute until you run it and test all the code paths, let alone attempting to convince yourself that it's correct. I don't understand why people want that cognitive burden when the compiler can tell you straight away that you've typed something nonsensical.

I've just never seen a code sample that made me want to throw away static typing. Not even one.

If you use a language with type inference, you won't see what it returns either. Because it is not in the code. You would need an IDE which finds it out.

Sometimes it computes a type and you have no idea what the type is about.

Also dynamic typing does not mean you don't know what a function receives or returns. Many dynamically typed languages are object oriented and use classes.

For example in Lisp I would write:

    (defmethod collide ((s1 ship) (s2 submarine))
      (the collision-event (make-collision-event ...)))
Then I know that the arguments are objects of class ship and submarine and it returns an collision-event.

It's not statically typed, but the code is nicely readable and testable.

With inferred types you still know that callers must be passing something for which the operations performed are valid.

Many languages allow you to specify types even when they may be inferred to add clarity/preconditions. This is useful for modelling the domain as well as local reasoning. Perhaps the current implementation is valid for all integers, but in the domain context it only makes sense for the range 0-100.

If the compiler can infer the type locally then so can I. At least for reasonably well structured code.

I thought type inference in Haskell or ML was a global algorithm.

Static typing à la C,C++,Java,js,etc... is from my experience inferior to dynamic typing.

On the other hand, the Haskell typing is superior in many ways to dynamic typing. Because you lose very few power of expression but you gain a lot of time in your workflow (and also security).

Concrete example (I'll use js):

    function showField(data) {
you relaunch your application, you click on three to four elements, then you enter your name and a password. You click on the button, and "BAM, filed2 doesn't exists". Correct the typo or add a test and replay the game of executing your new code.

Static typing:

    showField : ConcreteData -> IO ()
    showField data = putStrLn $ filed2 (field1 data)
Try to compile, get the error: showField : filed2 doesn't exists, may be you mean "field2".

correct your code. Now you're done.

Here is another example:

Dynamic typing:

    function showField(data) {
      if (data.field) { console.log("OK"); } 
      else { console.log("Not OK")}
Try it using all manipulations to reach the test case, then: ERROR, couldn't find field for null. :-| replace by if (data && data.field).

In static typing:

    showField data = if (field data) 
                        then putStrLn "OK"
                        else putStrLn "Not OK"
compile: could not match String with (Maybe String) at ... now:

    showField data 
      | field data == Nothing = putStrLn "Not OK"
      | otherwise = putStrLn "OK"
You fixed it, but you didn't had to lose your time searching the error.

The BIG bonus is that you detect the error before you discover it at runtime. Imagine you didn't detected the error during your test (even with unit tests). This is why it is so easy to push this kind of error in production in a dynamic typed language.

And I only scratched the surface.

I program mostly in JS now, and I love even more when I am doing Haskell.

Very well put, I am learning haskell at the moment, and beginning to appreciate the power of its type system in a way never which I never felt in Java, C etc.

When you chain a series of functions/actions together, only to supply data for computation last, type signatures almost always ensures the correctness of the entire computation. This is true even the computation occurs at a high abstraction level. It's almost like magic, when dealing with the more difficult concepts (well for beginners) like monads. There has been a number of times when I wrote something but cannot make sure it does what I want, yet it compiles and a few tests reveals it indeed works like a charm.

Whereas in dynamic languages, it'd be a nightmare to dig into the dirty details at runtime. I always have this problem in R, a series of supposedly elegantly linked operations fail for mistakes like forgetting to convert character values into factor type. To debug such errors, I always have to explicitly carry out middle steps and store intermediary results, until finally locating the problem.

Another note, even in javaScript, sometimes there at patterns encourage the use of implicit types or interfaces(typeclass in haskell). The canonical example is jQuery, most of the times you can safely do:

This is because any jQuery object inherits methods from $.fn, and most of these methods returns either the same jQuery object, or another jQuery object. In a sense, jQuery itself is a typeclass, anything looks like $(something) is a instance type.

To find undefined slots or undefined functions, you don't need a type system. A competent compiler is sufficient.

The type system enforce some kind of usage.

    myObject[ someFunction() ] = 1;
No compiler in the world would be able to detect if myObject.field should exists or not.

In a (good) type system oriented language, you have to declare that myObject is a Map, and therefore when you want to access "field" you know this access could be "Nothing". The type system alert you if you don't handle the "Nothing" value.

Static typing lends itself to static analysis and other defensive programming practices that are easy to enforce at compile-time (such as design by contract.) A lot of bugs and design errors die long before erroneous code hits source control as a result - definitely not all of them, but many of them.

The best compromise between static typing and dynamic typing, IMHO, is what C# (and later generations of C++) do: implicit typing via the VAR keyword.

You can't implicitly type the members of an object or the arguments of a function, but you can implicitly type variables in local scope. This is how C# is able to support anonymous types and functions so easily - implicit typing abstracts away the gangly looking static objects that are produced by LINQ queries and the like.

Implicit typing makes it very easy to refactor large blocks of statically typed code without compromising the benefits of static typing. So you get some of the flexibility of dynamic typing with the predictability of static typing.

Realistically - you don't utterly redefine the type of objects at runtime very often in production code even with dynamic languages, so you don't lose much with implicit typing.

>You can't implicitly type the members of an object or the arguments of a function

That was due more to architectural issues in the C# compiler than any principled reason. C#'s type inference is extremely weak and incomplete and is a poor example of "getting it right".

I'm failing to imagine a scenario where implicitly typing either of these would benefit me as the developer - having those elements remain statically typed seems like "the right thing to do" given their role in defining public code contracts. Could you help me understand why this is a bad thing?

They are always statically typed; no one is talking about dynamic typing. Inferred types are just as static as any others.

Example: Dictionary<string, Func<string, int, Dictionary<string, string>>> somedic = new Dictionary<string, Func<string, int, Dictionary<string, string>>> { ... }

Is almost entirely better served by

  let somedic = dict [ ... ]
Your point about defining a public interface is separate. If you have a specific API that you cannot break, then feel free to type it out all you want. If you feel a piece of code is confusing or unclear without an annotation, go ahead and add it.

For the majority of code, having to specify the types is an exercise in verbosity, nothing more. Unless you're only writing public libraries with no internal implementation, I don't see how there's not a scenario.

I agree that doing that type of inference would be convenient - I deal with a lot of nested generics like that on daily basis.

But this is just syntactic sugar, and the var keyword already takes care of the left-hand operand in this equation. Generic functions also do a reasonably good job of type inference under this set of circumstances, although I've found that generic delegates and lambdas not so much.

whoops, meant to say "gangly looking HIDDEN objects" instead of "static" objects

Your only experience with static typing is with Java which has one of the lowest "value for effort" type systems around. Java's type system is mostly bookkeeping and doesn't help you functionally really at all while taking a lot of effort to deal with at the programmer end. By contrast there are much more powerful type systems where the typing takes much less effort AND delivers you actual functional benefits. Crucially Java's generics are not preserved through to runtime, so there is no actually difference between List<Integer> and List<String> to the JVM, only the compiler. If there was a difference all sorts of powerful things could be done with the types at runtime, but as it is it's mostly just the compiler complaining about whether you dotted i's and crossed t's.

I program a lot in Groovy these days, and it now has optional static typing with a limited amount of type inference. What I observe is that I voluntarily prefer to type my variables much of the time because it really does save time and takes only a fraction more effort than leaving them untyped. I find Python quite hard to decipher much of the time because it's nearly impossible to know what type a variable passed into a function is unless you trace back to the caller.

> I program a lot in Groovy these days, and it now has optional static typing with a limited amount of type inference

Groovy has enabled types on variables and functions (and fields and methods) since Groovy 1.0 but that makes the code run slower.

That is not what I am referring to. 2.0 introduced true static typing that compiles to code that not only enforces complete type correctness at compile time (unlike the old option) but executes at near Java speed.

In Javascript (which is also dynamically typed) I've had a few times where adding 1 and 0, say, yielded "10" because either the 1 or the 0 originally came from a string. OK, I can live with that, but I find the fact that you can use that string in numeric comparisons potentially confusing.

The general argument is that if you write enough unit tests, you can guarantee that your code doesn't make those kinds of mistakes. The statically-typed response is "why should you have to write unit tests when the compiler can make that guarantee for you?"

Adding strings to integers is usually referred to as weak typing, which is a seperate and orthogonal concept to dynamic typing. The runtime of a dynamic language could look at the two types of the values, decide they're different and throw a type error. Similarly, a static type checker could look at the types of the values and decide it knows how to do an implicit cast from one to the other.

That's a valid point. But, being dynamically typed, you don't find out about the type error until runtime. It's possible, although unlikely, that a Python web application can run for months before hitting a particular code branch where a string and an int are added together, and an exception is thrown. (In Python, it's also possible to get syntax errors long after you start your program).

It's possible that the compiler would have had enough information to determine this, if the compiler were to check types. Again, the proposed solution is to simply write enough tests to exercise all code paths.

For what it's worth, I enjoy working in Perl. But I realize when I do that some things are deferred until runtime. I also enjoy working in C and C++, partly because I can push some things to compile time (type checks, sure, but also arithmetic ( http://en.cppreference.com/w/cpp/numeric/ratio/ratio ), unit checking ( http://www.boost.org/doc/libs/1_55_0/doc/html/boost_units.ht... ), a decent amount of reflection ( http://en.cppreference.com/w/cpp/header/type_traits ), some assertions (static_assert), etc.). I don't necessarily push it all to compile time, but C++ allows me to.

Python is a strong dynamically typed language. Unlike JavaScript, which is a weak dynamically typed language. In Python `'1' + 0` throws a `TypeError`. While JavaScript attempts to figure out what you meant.

Which would be fine if I only ever used "+" with literals. But normally, I write things like

x + y

where x and y are values I got from who knows where. Javascript shouldn't pretend to figure out what I mean by this, because in many cases, I'm talking gibberish, and all I have is a bug.

I'd rather not talk about this stuff in terms of "types." It's about assertions, and Python makes them early and throws the exceptions early. It's a defensive practice that you can emulate in any language, and your assertions can be much richer than anything that is typically thought of as a type.

I'm definitely a fan of Python's strategy. It's much better at detecting bugs early and often. That, and there's a distinction in Python between identity and equality, with the `is` and `==` operators respectively.

Frankly, I think Perl has the best approach to this specific problem. We all want to do some string concatenation. We all want to render some non-string values as strings sometimes. So:

    1 + 0 = 1
    1 . 0 = 10
Just stop using + to concatenate strings.

> In Javascript (which is also dynamically typed) I've had a few times where adding 1 and 0, say, yielded "10" because either the 1 or the 0 originally came from a string. OK, I can live with that, but I find the fact that you can use that string in numeric comparisons potentially confusing.

There are two problems demonstrated by your example; the other is the overloading of the + operator as a design decision in JavaScript. Easier, but worse.

That's a JS probelem. The majority of dynamic languages would handle this "properly".

We are currently refactoring a bunch of stuff at work, and by now we can largely manage dependencies between various subsystems with the type system - without the type checking being too intrusive. Basically this means: If I have the current application and I add a new subsystem because I need it, and I don't have all the requirements of this system in place already, I get an error at compile time telling me to fix this. This makes it very hard to do certain classes of mistakes.

The interesting thing is that a sufficiently powerful type system with strong type inference eventually converges with dynamic typing in ease of use, but it offers the option to do rather complex reasoning about the code at compile time, again, preventing certain classes of errors.

Beyond the classes of errors being prevented, I feel like a mature, strongly typed architecture offers a certain amount of guidance in implementing new solutions. If you start implementing certain interfaces in our more mature codebases, you get a number of situations you need to handle - since the interface forces you to implement certain methods. You can't forget these situations, again, preventing mistakes or recreating the same bugs.

Note that both situations assume a certain level of maturity in a solution. Initially, a type system can be somewhat annoying while you work out the kinks, I'm perfectly fine to admit this. This requirement also implies what I feel - in small, hacky scripts, static type systems tend to be annoying to some degree. But in large, complex systems, I don't want to miss them.

I recently made a class implementing an interface from a library in python. That interface had a "canonical" implementation, in that most instances of that interface were of that class. When I tried to call some functions from that library with my own class, they tried to call methods from the "canonical class", which were neither present in the interface nor my implementation.

Result: program crashed when trying to call not present method. Had to clear up a few of these before it would work. If the type system hadn't been so forgiving the library implementers would have realized how their abstractions were leaking, and it would have been a lot easier for me to implement the interface correctly.

Static typing provides more information for the programmer and for the runtime.

More information makes the code somewhat readable and understandable 10 years after when it was written. Dynamic typing often becomes a much more messy affair.

Static typing is beneficial when you want to catch a certain type of errors before your code enters production, it makes you write less error prone code.

Not my experience. Dynamically typed OO language carry enough information around.

the point is that in statically typed languages, the type information is used at compile time, and the computer can reason about certain types of errors for you prior to execution. in dynamic languages, the type information may well be there, but you need to exercise the code at runtime (via tests or whatever, but still runtime) to make use of it.

True, but code is running. If it's not used, then the error won't be relevant anyway. I'm not talking about rocket, aircraft, power plant software. But even there, I never heard that any of those really critical domains use Haskell (or similar) in their control systems.

There is no reason you can not have something like "variable = int(variable)" in a strongly typed system. In fact, Haskell lets you do (essentially) this by shadowing variables. This particular example wouldn't work, because Haskell would view it as a recursive definition, but there is no reason you couldn't design a language so that the asignee side looks at the pre-shadow variable.

In my experience the dynamic typing is very useful when you start to do collections and nested data structures. You easily create a mixed type nested directory tree with only a few lines of code without all that boilerplate.

And because you can create them so easily, soon you end up with hundreds of them and you have no idea which one contains what :)

> You easily create a mixed type nested directory tree with only a few lines of code without all that boilerplate.

A similar thing can be achieved in Haskell, with the Typeable typeclass. The Scrap Your Boilerplate also fixes problems like this.

There has to be some use for dynamic typing when even Haskell (ghc) adds support for it, I guess.

Cython[1] is the closest I'm aware of to optional type annotation for Python. It works quite well, although requires a separate compilation step and the source code is no longer Python.

I assume by "new PHP implementation", you're referring to Hack[2] which is a different language, although heavily based on PHP. (I believe Hack is a superset?)

[1] http://cython.org/ [2] http://hacklang.org/

Cython actually shows the cost of interpretation quite well. Even without added type declarations it speeds up a quicksort of a list of integers by anywhere from 40-50%.


I certainly don't see Cython as anything but a quick way to get efficient C code in to Python module though... you typically get the worst of both worlds with compilation + dynamic typing.

Why are you not using existing, efficient sort implementations?

The github repo is from Alex Stepanovs course at A9, which is largely focused on data structures and algorithms. The point of the benchmarks is to compare 'FORTRAN like' code in many languages and to demonstrate that languages don't really matter if you pick the right DS&As.

That quicksort algorithm is about as efficient as it gets btw.

Well yes, Hack is a "new language" in a similar way TypeScript is, but Hack is built to interoperate with regular PHP code. It's a superset, similar to how Objective-C was/is with C, only even more integrated with it's subset language :)

Regardless, it's damned nice (especially with TypeScript on the front end: static optional typing everywhere!)

There is support in Cython for doing the compilation transparently:


> Dynamic typing is probably preferable to a 40 year old type system.

I don't think ML users will agree ;)

Both System-F and Martin-Löf type theory are roughly 40 years old as well.

What is wrong with type theory that it needs to be replaced every few decades, just because it got "old"?

Nothing is wrong with it, it's just as relevant today as it was 40 years ago, if not more. Most of the really interesting modern work is just building on top of some form intensional Martin-Löf theory.

I don't think there's anything implicitly wrong with a 40 year old type theory. Rather, C's type system is pretty decrepit.

This was a rhetorical question, it is clearly ridiculous to toss things out just because they are 40 years old

Because older type systems aren't webscale /s.

Please continue! Don't leave us hanging for the riveting conclusion :) I don't know ML so I would love to hear another perspective

Take a look at http://en.wikipedia.org/wiki/ML_%28programming_language%29

ML dates from the 70s and its type system is the inspiration for Haskell's, although Haskell also adds typeclasses (and various Haskell extensions add a bunch of other stuff).

The most widely used versions today are Standard ML and Ocaml, although many languages take inspiration from ML, for example Coq, Haskell, Clean, etc.

Just to add to the list: Scala takes a lot of nods from ML and F# is a nice variant if you're on .NET.

Optional typing is the main goal of the mypy project: http://www.mypy-lang.org/

In my land we call this "julia"

I always wondered why more languages didn't do this with the goal - ostensibly - being faster overall performance.

At the same time I hate it when syntax vacillates wildly from application to application.

Ok, I wanted to write "julia, julia, julia..." (check out the varargs) but I shall be sensible. Julia's syntax is what an algol culture programmer would expect in order to deal with a late 2000's concept of typing. It has the HCI virtue of being what I (you?) expect.

At the same time it is 40x faster (in my applications developed in a fortune 500 company, ok the research bit - but still delivered for use if not prod) than python.

Now, if only we got it running in SPARK...

Modern type systems tend to be complicated and tend to hinder incremental development.

It wouldn't make Python easier to use. Just the opposite.

Whoah, hold on there, I disagree entirely. Modern optional type systems are great for incremental development, as I can ignore them for the most part in my code when rapidly iterating, but then when my APIs are starting to solidify, bring in some proper guarantees about my code's correctness. I've used Hack and TypeScript in production, and while they're no Haskell in terms of "moderness" of the type system, they're both quite nice.

Python4! More sharding! :D

The bigger picture is that CPython's core team simply does not care too much about performance. Performance has never been a fundamental requirement, but merely an afterthought.

The biggest reminder of this was Python 3, which for me was a complete disappointment. They could have limited python's super-dynamic behavior (e.g. changing builtin functions, patching classes on the fly, etc.) or made them optional. They could've added optional typing annotations a la Cython. Or even changing the builtins and language syntax to allow more inplace operations and preallocations, so that temporary results wouldn't have to be allocated on the heap over and over again. All of these changes would have made python faster and more JIT-able. None of these things happened. Performance-wise, python 3 is no step forward.

Python+Cython is still a powerful combination, but eventually Julia or similar languages will eat python's lunch with respect to scientific computing.

I think if python3 had made major performance improvements (not to mention the GIL) there wouldn't be nearly as many 2.7 holdouts.

And it still wouldn't be done.

Yup. the HUGE "CPython is only reference implementation" ego

Not for pure performance Fortran still seems to be king for that type of scientific computing.

One thing that bothers me - and has for a long time - is why Python (Perl, Ruby, etc), never have really leveraged the work Common Lisp systems have done (CMUCL, SBCL, etc), which provide very good performance without sacrificing dynamic typing or the REPL.

It bothers me too.

Dynamically typed does not imply no types. A good CL implementation takes that to heart and produces really nice code out of the box. And the language spec provides for an API to allow program developers to specify types and hints to the compiler to optimize certain functions and code paths where it's needed.

The benefit from this is that you can rapidly prototype your application and harden it to the platform you deploy on when performance becomes an issue... and with built-in conditional compilation you can do that in a cross-platform way.

Python has types; I would even describe it as strongly-typed. It is thus relatively JIT-able (as pypy and psyco have shown).

The chief problem is the high level of dynamic typing going, plus the huge amount of globally-changable assumptions (you can add methods to any instance of a class, you can globally change the definition of language built-ins, etc.)

Some charity, please; I know Python is strongly-typed. It just doesn't take advantage of it and doesn't have facilities for producing fast code.

Common Lisp not only provides type hinting facilities in the language it also provides hinting to the compiler to turn off all run-time type checks in critical code sections.

Languages like Python and Ruby have generally looked to the SmallTalk and Self literature (inline caches, dynamic deoptimization, hidden classes etc) rather than Common Lisp, so people aren't ignorant of the research, they're just looking in a different area. If you think there's something in Common Lisp we're missing, point it out!

Well, you are missing compilation to native code. Common Lisp has very good performance characteristics in the hands of an expert, whereas $modern_dynland usually winds up requiring a call out to C, fragmenting the codebase and requiring dual-expertise (one in $dyn_lang, one in C) to maintain effectively.

I don't meant to say USE LISP (but you know, it's available and ready to go!), but I do mean to say: the knowledge (algorithms, research papers, implementations) on how to do this relatively efficiently was developed in the late 70s and early 80s.

edit: SBCL is a good example of what can be done with this knowledge. I would be 100% unsurprised if perl/python/ruby could benefit from these algorithms. See phkoung's blog for what he did with SBCL on numerical analysis and high performance Lisp.

But all these algorithms (including compilation to native code) are already in use in JRuby, Topaz, Truffle, PyPy, ZipPy, Unladen Swallow, Pyston, etc. In fact many of these projects use research developed only over the last decade or two, which CL does not use, such as partial evaluation, partial escape analysis, tracing JITs, etc.

So we all know the research, and we know how to put it into practice to make both Ruby and Python run very efficiently. There's no question that Ruby and Python can benefit from this work, because it's already been done many times. What does not look tractable is cramming it all into the existing code bases of CPython or MRI.

I know about those projects too; but at the end of the day, Python is basically CPython. Maybe PyPy should replace Python 2 (I'd argue so, loudly), but for now... CPython is the definition of Python. And it's not a easy problem to make CPython (Python) faster (GIL, effective dynamic analysis), like you said.

What's frustrating is that there's no visible effort to make the Standard Python very fast. Personally, I'd rather see the money poured into Python actually poured into SBCL, but that's just me.

You saying "I'd rather see the money poured into Python actually poured into SBCL" because CL is a better language?

> What does not look tractable is cramming it all into the existing code bases of CPython or MRI.

Of course what is now called "MRI" since 1.9 isn't what was "MRI" before; its what was, previously, one of the alternative implementations (YARV), which got adopted as the mainline implementation.

For some time, there was lots of talk about YARV maybe getting replaced with Rubinius as the engine for the main implementation somewhere down the line, though I think if that happens, its still a few years off. (For one thing, Rubinius will need to get to the point where "Support for Microsoft Windows is coming soon" isn't a thing -- and AFAICT, that "soon" may well be only true on a geological timescale.)

AFAIK dynamic languages have gone away from compiling to native code because there is almost no benefit to it, if all you're doing is basically unrolling your interpreter into a unnecessarily huge binary. Methods like tracing JITs are much more effective at delivering good performance in the hotspots of your program.

The exception is when you allow yourself to annotate your code with type information, which allows you to eliminate a lot of the code you'd otherwise emit. Python for instance allows you to do this with Cython/snakeskin/etc., so the technique (that I think) you're talking about is applied there, because it is effective.

And the easiest way to leverage that effort is to use CL as an implementation platform. Whatever you think of CL as a language to program in directly, it really shines as a language to implement dynamic languages in. You get a lot of useful pieces, including native-code compilation, some very powerful control-flow operators, multithreading (in most implementations), and of course garbage collection, for free.

There's a more subtle benefit as well. Because compilability was a key goal in the design of CL, using it as a translation target while you're designing a new dynamic language will help you make sure your language is also efficiently compilable.

EDITED to add: in CL, the whole tagged-pointer thing has been taken care of for you, so you don't have to think about it (cf. [0]). This means you get fixnums (unboxed immediate integers), with automatic overflow to bignums (arbitrary-precision integers), for free. In fact there's the whole CL numeric tower, including ratios and complex numbers. Other goodies: multiple inheritance. Multimethods (multiple dispatch). Class creation and redefinition at runtime.

It baffles me that anyone would want to reimplement all that stuff, or even part of it.

[0] https://mail.python.org/pipermail/python-dev/2004-July/04614...

Python is a very different language than Lisp. It's much more dynamic. Every method call conceptually involves a hash table look up. Lisp on the other hand is much closer to the bare metal, and much easier to implement efficiently. For Python the implementation techniques of Self would be more appropriate.

> Every method call conceptually involves a hash table look up

Like CLOS.

What work specifically are you referencing?

JIT compiling to native code from the REPL is a big one.

Python (and Ruby for that matter) are slow because there are no billion-dollar companies stuck with an initial choice of language that impedes their ability to grow. PHP and Javascript used to be extremely slow and now after several dozens of millions thrown at JIT, rewrites, forks and redesigns they're starting to get much much faster.

Not so. Google uses Python extensively in-house (it's one of the four "blessed" languages) and, in fact, employed GVR until he was recently hired away by Dropbox--another billion-dollar company which relies heavily on Python. At one point Google even spearheaded an ambitious effort (https://code.google.com/p/unladen-swallow/) to make CPython much faster. It failed.

At the point where even Google can't make it happen, it really starts to look like Python performance is limited at some very fundamental level to what we see today. Personally I think this is fine. I use Python for everything day-to-day and offloading the intensive stuff to a C extension (a la Numpy) works just great. There are very few instances where I find myself wishing Python was faster.

> At one point Google even spearheaded an ambitious effort (https://code.google.com/p/unladen-swallow/) to make CPython much faster. It failed.

The difference between Unladen Swallow and V8 is that while V8 was "make JavaScript fast on the workloads we care about by any means necessary", Unladen Swallow was "make CPython fast without disturbing the implementation's interface to native extensions". A V8-style approach to speeding up Python might well be practical even if a project with Unladen Swallow's constraints to using, and preserving the same relationship to extensions, was not.

> At the point where even Google can't make it happen, it really starts to look like Python performance is limited at some very fundamental level to what we see today.

For the reasons above, "Google can't make it happen" isn't really a good description. Google choose to try with constraints it didn't apply to JavaScript, because for their purposes with Python -- unlike their purposes with JavaScript -- if they were going to go with an unconstrained approach, then they were quite happy tossing the language aside. OTOH, with V8, running actual JavaScript was important but existing external modules called from JavaScript weren't a constraint, so the interpreter had fewer constraints than with Unladen Swallow, and making the language faster (rather than getting something faster, but not necessarily keeping the same language if it made anything else more difficult) was a higher priority.

So V8 succeeding while Unladen Swallow fails isn't necessarily a sign that making a fast Python interpreter is harder than making a fast JavaScript interpreter -- making a fast JS interpreter (language non-negotiable) was a priority for Google, making Python faster if it could be done was a nice-to-have that wasn't worthwhile if it doing it in Python was disruptive enough that they wouldn't be able to leverage their existing library of stuff-built-to-interface-with-CPython without additional effort. The efforts weren't parallel, and the results aren't comparable.

Unladen Swallow started as an intership project and didn't get very much support at Google [1]. When US started (~2009), Google had been writing their core, high-performance stuff in Java and C++ for many years already so there was no internal or existential need to make Python faster.

[1] http://qinsb.blogspot.com/2011/03/unladen-swallow-retrospect...

On the other hand, there may have been applications higher up the stack that would've benefited from a more efficient Python. I don't know if it still is but I heard GMail was written in Python, you'd think an application of that scale would benefit from better performance.

The internet says it is written in Java: http://en.wikipedia.org/wiki/Gmail (possibly with some Python build tools, for which performance doesn't matter)

The Gmail frontend is Java. The backend (mail routing, spam filtering, virus checking and parts of storage etc) is C++.

I think the point is that companies that use Python tend to be able to switch out to other languages where they need speed very easily, where it makes sense; for example, a lot of Python developers got very excited about Go, and now there's mostly-Python apps with certain functionality written in Go dotted about the place. The idea of calling out to services to do things is decidedly normal in Python.

PHP companies tend to have more of an issue with this, for various reasons. PHP apps can suffer from the "big ball of muck" issue quite easily, where there's not nearly enough abstraction to allow for parts of the code to be written in other languages. Such an architecture would also require additional deployment steps, vs PHP's "copy everything to the right place", and there's a lot of PHP devs who, while rather good at PHP, are quite resistant to new languages.

PyPy, Cython are examples of similar examples for Python.

Also, extension modules, Pillow, Numpy, Scipy, Pandas greatly reduce need to make Python it self faster.

Yes, Python is slower in execution than some other languages, but:

   ... efficient use of development time ...
That is the reason, why Python counts (not only references). Python has many very good libraries, is a good OOP language, easy to learn, but still very, very powerful. You can express in Python some things in a single line, where you need hundreds of lines in C++ or other languages.

The few percent of running speed that you might loose, are neglect-able in most cases against the win in development speed.

In many applications you don't need the full CPU power, but often times you are hindered by e.g. the disk speed or other factors ... and than you don't lose anything when you are a little bit slower in some minor tasks.

> You can express in Python some things in a single line, where you need hundreds of lines in C++ or other languages.

citation needed.

Example: x = {'foo':34,'bar':[1,2,3],'baz':'quux'}

The headers alone for STL maps are hundreds of LOC.

That's tricky, because C++ is statically typed and you've selected a varying value type, but here's the closest thing in real terms.

    #include <map>
    #include <vector>
    #include <string>
    #include <boost/any.hpp>
    using boost::any;
    using std::map;
    using std::string;
    using std::vector;

    int main() {
        map<string,any> x = {{"foo", 34}, {"bar", vector<int>{1,2,3}}, {"bar", "quux"}};
Syntax overhead amortizes to constant factor ;)

Counting headers is rather unfair.


map<int, char> x = {{1, 'a'}, {3, 'b'}, {5, 'c'}, {7, 'd'}};

Any not-ancient C++ using boost Assign:

map<int, char> x = map_list_of (1, 'a') (3, 'b') (5, 'c') (7, 'd');

Shamelessly stolen from here:


I agree with the upstream comment about expressiveness, but this isn't a fair example.

I have none, it is my personal experience, since I programmed in C++ (and other languages) before. So cite me, if necessary.

And if you really need to have a piece of your code execute really quickly, you can just make C extension that python can easily invoke.

Python's support for this is awesome, but there is a gap: you lost cross platform. You now have to figure out how to make it compile on every platform that anybody could every possibly want to run it on, including ones not yet invented in the future ... Jython / Java (or other JVM languages) are quite a nice option in that space.

Of course you loose a little platform independence, but that is something you don't have anyway when you program in C++ or that kind. Also you get rather rich support for makefiles that are cross platform. I had no problems till now with C code I wrote that integrates into the Python environment. When you don't need OS specifics and have knowledge how to implement things CPU independent (given a common >=32bit architecture) in C, you will have little to no trouble at all.

Also one little hint: Use Cython for enhancements. Cython is great for implementing small enhancements when you don't want to hassle with refcounts and the normal Python API. Also your enhancements are automatically compatible to Python 2 + Python 3. You can also call C functions from Cython very easily for full speed. Cython is really great, when you don't need to optimize to the last nanosecond.

> Python ends up being an extremely efficient language for the overall task of doing science with code.

This is pretty close to how I characterize my time, 'doing X with code', and Python yields great returns in these terms.

While everyone is on the subject, I thought I'd mention some weird behavior I saw today.

I just switched my program from using a defaultdict to a regular dict.

I.e., from defaultdict(lambda:'NA') to regular dict using get(val, 'NA') for access.

And I got a something like a 100x speedup. It runs in two minutes instead of two hours. I had no idea a defaultdict would be so much slower.

Unless there's something funny going on in my program and it's unusual behavior.

The docs for the defaultdict are here: https://docs.python.org/2/library/collections.html#collectio...

And they pretty much say that there is only one method that's been overridden from the normal dict class. And all it does is it executes the anonymous method (lamda) you supplied instead of returning the default value you pass in.

Also, this depends on your usage, and what you have in the lambda expression. I'm curious about your code, care to paste/link some of it for us?

I would guess that the code tries to get the value for keys that do not exist in the defaultdict. Well, if the key does not exist then it is being added with the default value 'NA'. So in this way you can end up with a huge dictionary where most of the values are just 'NA'.

Oh, wait, so defaultdicts actually insert missed keys?? I thought they simply gave you a default value instead of a keyerror. If you're right, then that would certainly explain it.

They have to, because they cannot, in general, decide whether the lambda you gave it is idempotent _and_ returns an immutable value.

For example, if you provide a function that returns an empty list, and do:

    l = foo['bar']
    print len(foo['bar'])
That should print 2 (apologies if this isn't correct Python)

If one had a defaultdict that took a default value in a language with enough reflection, you might be able to deduce that the lambda always returns a simple value such as 3.1415927, and not store copies in the dictionary.

What you have there will print 1.

Yep, take a look!

    >>> import collections
    >>> dd = collections.defaultdict(int)
    >>> print dd['doesnt exist']
    >>> print dd
    defaultdict(<type 'int'>, {'doesnt exist': 0})

Correct. The behavior you are looking for is already in the standard dict class. You just pass in an extra parameter to the get method like so: dict.get("key", "my_default_value_if key_is_not_inserted_already")

Use .get() instead of [] to avoid setting the default value.

That's what they said they're doing.

If you are predominately adding entries to a dictionary, the standard dict will be faster.

Where defaultdict shines is the case where you hit an entry a bunch of times, but you'd like to automate initializing an entry the first time it is hit.

The most common example is probably the frequency distribution, where the frequency needs to start out zero and be incremented each time a key is seen. Using int or lambda:0 as defaultdict's argument works fine for the simple case. If you want to tabulate more than just the count, e.g. a count and one or more sums, you can pass a constructor that produces an object with an update method, to which you pass whatever your key represents.

Simple microbenchmark: http://pastebin.com/vkacByzp

It appears that defaultdict saves time as well as code for the normal use case.

This is convenient, since it's almost exactly what I plan to talk about in the undergrad class I'm teaching today (https://github.com/williamstein/sage2014). I view this sort of article as good motivation for Cython, and the value of this article is merely that it aims at typical undergraduates not in computer science.

Good read. Whenever I read stuff like this, I always wonder if it is always true that dynamically typed languages are slower than statically typed languages ? Also, do we have to take this for granted that more higher level the language is, the slower it will be ? Or are there exceptions ?

Also, it is worth asking if for majority of use cases of python for data/analysis, the ease and flexibility outweights the slowness.

Always is a bit of a strong word but yes, it's always true.

Virtual functions in C++ which allow some form of dynamic behaviour are slower than static function calls because they inherently involve another level of indirection. Static calls are known at compile time, they can be inlined by the compiler, they can be optimized in the context in which they're called. Now, nothing prevents the C++ run-time from trying to do the same thing at run-time but you can relatively easily see that it'll have to make some other compromised to do so. Nothing prevents a C++ program from generating C++ code at run-time, compiling it, and loading it into the current process as a .so. Now that's a pretty dynamic behaviour but there's again an obvious price. You can also write self modifying code. At any rate, static languages are capable of the same dynamic behaviour that dynamic languages are capable of but you often have to implement that behaviour yourself (or embed an interpreter...).

Fundamentally, a dynamic language can't make the kinds of assumptions a more static language can make, it can try and determine things at run-time (ala JIT) but those take time and still have to adapt to the dynamics of the language. The same line of code "a = b + c" in Python can mean something completely different every time it's executed so the run-time has to figure out what the types are an invoke the right code. Now the real problem is that if you take advantage of that then no one can actually tell what this code is doing and it is utterly unmaintainable.

To compound the problems facing dynamic languages is the fact that CPUs are optimized for executing "predictable" code. When your language is dynamic there are more dependencies in the instruction sequence and things like branch prediction may become more difficult. It also doesn't help that some of the dynamic languages we're discussing have poor locality in memory (that's an orthogonal issue though, you could give a dynamic language much better control over memory).

EDIT: One would think it should be possible to design a language which has both dynamic and static features where if you restrict yourself to the static portion runs just as fast as any other statically compiled language and still allows to switch to more dynamic concepts and pay the price when you do that.

Apologies if you've already read this, but this paper was absolutely enlightening for me on what a more performant Python would look like: http://julialang.org/images/julia-dynamic-2012-tr.pdf especially section 2 on language design, which discusses how it's actually different.

(And yes, I know I'm just adding to the people plugging Julia here) Among other things, Julia actually avoids forms of dynamism that are in practice rarely employed by programs in dynamic languages. Existing code is immutable, but new code may always be generated if different behaviour is desired.

I'm not entirely sure if I could accurately say Julia allows one to trade between efficiency and dynamism when each is desired, but it does appear to be a much better balanced approach than is normal.

It's not always true in practice. See for instance: http://benchmarksgame.alioth.debian.org/u64q/benchmark.php?t...

Common List on SBCL is often faster than F# on Mono, eve though Mono is statically typed, as the former runtime has had a lot more work put into optimising it.

Or, compare F# on Mono with Clojure on the OpenJDK: http://benchmarksgame.alioth.debian.org/u64q/benchmark.php?t...

Clojure again is faster, due to a more optimised runtime.

Notably, tho, Both SBCL and Clojure have optional types. So the number of indirections necessary is already substantially reduced, as per the GP. For this to be the case in practice, tho, requires that type annotations actually be commonly used in both lisps.

A more interesting counter example would be LuaJIT, since it has to support completely dynamic code with a fair bit of monkey patching going on. But that is more of a showcase for tracing JITs being fairly powerful (for hard to predict code bases) than that indirections are cheap.

That's a good point. SBCL at least attempts type inference though, so I often find the code is quite fast without annotations. But then I suppose SBCL with type inference is probably closer to a statically typed language in that sense.

Jake's blog is pretty great. I really suggest reading more stuff in there if you care about scientific computing.

Related: "Python is only slow if you use it wrong" http://apenwarr.ca/diary/2011-10-pycodeconf-apenwarr.pdf [pdf]

The 'slow' part wasn't so new to me, but the `id` command and the attendant understanding that smply typing 110 in the interpreter creates an OBJECT and when you assign a=110 `a` points at that object (and when you reassign a=30, a new object is created and a points at that), blew my mind. Thanks for this!

Previously I had thought that doing a=110 creates an object 'a' that stores the value 110 (and when we do b=a, b simply points to a). I had no idea there is a third object in play.

Actually, it's a bit more complicated: for small integers, the Python runtime has a stash of pre-made objects. For compiled code, the code may include a static reference to a constant object.

Which means that, if you execute "a = 110" a bunch of times, no new objects get created. If you execute "a= 110+300" a bunch of times, a whole bunch of new objects is created and destroyed.

By the way, this boxing/unboxing logic is also at work when you stick numbers into a standard HashMap, since it stores Integer objects and not the primitive type. However, in Java you can use GNU Trove's IntObjectHashMap or ObjectIntHashMap and save some memory, whereas Python's "everything is an object" rules this out (unless you're using PyPy and its tracing jit which figures out that a always contains a number).

That's pretty central to any OO language. "Everything's an object." Veering away from that, only serves to turn the language into a mess of boxing and unboxing.

That's not to say that OO is necessarily the ideal language paradigm, but it has certainly been the most dominant in the era Python has existed.

OO languages do not have to do this. Ruby doesn't, for instance (that is, it turns the language into a "mess of boxing and unboxing", but that's invisible to the programmer and makes working with integers so much more efficient). There's simply no reason to allocate integers on the heap -- it's a bad design desicion.

Python statically allocates small integers (-5 to 256), so you're not actually creating a new object unless you're creating one outside that range.


Dynamic allocation isn't the only cost there -- a cost that's just as big when you're doing number crunching in Python is cache efficiency. If every number is on the heap, and every number in Python is ~20 bytes long, you're going to have caching problems, even if your numbers only ever range from -5 to 256.

How does Ruby know that 'a' is an integer and not any other type? Java and C# can use the variable types to track this, but I don't see how would a dynamic language work without tagging the value.

My understanding from 5 years ago is that in MRI the leading bit is reserved to indicate whether something is an integer. This of course reduces the size of numbers that fit into a machine word but it seems like a pretty good tradeoff.

Yeah, apparently they're called tagged pointers. The More You Know.

EDIT: And here's Guido explaining why he didn't want that in Python: https://mail.python.org/pipermail/python-dev/2004-July/04614...

Most Lisp systems use tags. Several data types will have tags included in the data word, like fixnums, characters. Other data types have an extra word.

Ruby -- or more precisely, MRI, though some other interpretations may do something similar -- uses the a fixed set of object_ids (which is what is stored on the stack to locate an object on the heap) for small integers, so if the object_id is within that range, the interpreter uses it directly to calculate the value, and doesn't need to go to the heap to get a value. (MRI does a similar thing with the values nil, true, and false, as well.)

Python does the same thing for integers in the range (-5, 256)

One easy way to know is to run it a few times and observe the behavior.

This is why JIT runtimes need a "warm up time" to get really fast, say V8 for JavaScript.

`a=110; b=a` involves only one object: 110. a and b are just names.

And 110 is probably an interned int, so a and b are likely referencing an object that was compiled into the interpreter.


  >>> a = 110
  >>> b = 110
  >>> id(a) == id(b)


Uh, I didn't use assignment between variables; my code is the same as your second example (both variables independently assigned to 110).

Indeed; I was confused by the dekhn's aside.

Something I noticed a while back is that Python already has objects pre-made for the small integers.

That's common. Same for the one-character strings.

That's a sticking point with a lot of young Java programmers encountering boxed types for the first time. They see that x is true here:

  Integer x = new Integer(5);
  Integer y = new Integer(5);
  x == y
But this is false:

  Integer x = new Integer(5000);
  Integer y = new Integer(5000);
  x == y
(The real fix, of course, is to use the 'equals' method when comparing objects.)

I'd argue the real fix is to use a language that doesn't have insane semantics for ==.

No, Java's semantics for == are entirely sane and straightforward: It compares primitives by value, and objects by identity in memory.

The true problem is that newbies who haven't grasped the difference see that x==y in a sample program and then leap to all sorts of conclusions, overgeneralizing what they observed because they don't understand how it happened.

The problem is that comparing by value is not a good default for ==. A language should strive to make the common tasks as simple as possible to do right, while not making it impossible to check object equality or some other kind of equality.

Compare Ruby:

  1 == 1 => true
  4711 == 4711 => true

  1.object_id == 1.object_id => true
  4711.object_id == 4711.object_id => false

The #equal? method does your object_id comparison, by the way.

Ruby's default #== for new classes simply performs identity comparison, just like in Java. The difference is that Ruby has operator overloading, so all the builtin classes overload it with something sensible.

(This could work in Java, or it could go the .NET route and ban using the == operator entirely until a certain interface (.NET's IEquatable) is fulfilled.)

  > binary_add<int, int>(a, b)
no... there is simply not function call overhead for adding two integers in C.

It's just notation, to make it clear that two integers (and not some other type of object) are being added.

It's just not good notation at all, though. It strongly implies "call". And "object". This is misleading.

It's inlined ;)

This is hardly breakthru stuff.

True. It also never claims to be.

The very first part says that this is only a writeup for people who are not intimately familiar with why "dynamically typed" might slow down Python.

Based off my own experience, dynamic typing has much less to do with slowness than the plethora of string copying which occurs in your typical python script.

but for scientific computing with numbers - the author's target audience - dynamic typing is the main bottleneck.

Boxing and memory locality are the the main bottlenecks in scientific computations with Python, dynamic typing is orthogonal.

If a language is dynamically typed, it is very likely that it will use boxing, so they do not seem orthogonal to me.

There's ways to avoid that, where it matters, though its a pretty common thing in naive Python/Ruby, true.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact