
Measuring Polymorphism in Python Programs [pdf] - ashitlerferad
https://people.dsv.su.se/~beatrice/python/dls15_large_images.pdf
======
btilly
The article's spin is that since so little of the program uses complex
features, we don't really need those features in our language. However this
conclusion is not justified. The right question is how much more work it would
take to write the program without those features.

The real use of this kind of work is studying how to best optimize dynamic
programs. The conclusion is that a Python JIT optimizer can assume the first
data type you see is likely to be what you'll always see, add a small sanity
check for that case then goes down a fast path. This will correctly optimize
98% of the code. And there is nothing good to be done with the remaining 2%.

This is a concrete data point. JIT optimizers for other dynamic languages
(particularly JavaScript) have discovered and taken advantage of similar
things.

~~~
brianwawok
Or maybe not.. what about mypy?

[http://mypy-lang.org/](http://mypy-lang.org/)

If 98% of the code has very basic types (i.e. function takes a Integer and
returns a Decimal)... we could encode that in type hints with something like
mypy, and get many of the type checking that statically typed languages get,
without a crazy type system.

Hey I will take a 98% statically type checked language over 0%... if in so
many cases I clearly know the types of stuff, let's put that info in there and
detect and code time. So much of the standard library could use this to be
cleaner.

~~~
btilly
That would be an interesting tool.

Run a program while it is instrumented. The tool figures out what every type
seems to be. Then goes back through the program and tries to prove that those
types are correct.

It could then automatically annotate large chunks of code in a way that lets
them run faster and will produce a warning if an unexpected type makes its way
in in a future iteration. (An unexpected type, of course, being either a bug
or a likely candidate for something that might expose latent bugs.)

~~~
jerf
That's basically how most JIT compilers work. But it would be nice to be able
to get the annotations back out of the JIT. Anyone know of a JIT that lets you
do that? (In any language, since now I'm just academically interested.) It
that would be darned useful when one "inherits" a large, dynamically-typed
code base... it would greatly increase freedom to refactor if you were more
sure that such and such a function really does only ever get a number, despite
all the code you see for handling objects and strings, for instance.

~~~
webwarrior
In Pharo (modern Smalltalk implementation) you can get some info from VM and
JIT.

Someone is already working on type inference using VM caches.

See [http://forum.world.st/parameters-of-the-virtual-machine-
td48...](http://forum.world.st/parameters-of-the-virtual-machine-
td4897121.html)

------
raymondh
I wonder whether the authors consider "value = somedict.get('somekey')" or
"match_object = re.search(pattern, somestring)" as returning a single type.
Both of those common calls can return None.

I'm also curious as to what they consider "unbounded polymorphism". Calls like
"len(s)" or "with cm" or "for x in s" work with many different types (though
usually only on one type at a particular part of a program). Likewise, most of
the collections types and classes are necessary generic (in one place you make
make a set of integers and in another place you use a set of strings). The
_use_ of the collections is typically monomorphic but the collections
themselves are necessarily generic.

Another thought is that when I write programs that use a single type for a
given variable, I still place value on the duck-typing and polymorphism (to
ease future maintenance, support debugging, and leave the code loosely
coupled). For example, when I write a function that accepts a file object and
the calling code only passes in file objects, I still value my ability to pass
in a StringIO object instead.

Another thought is that I find the mechanical extraction of percent usage
statistics to be dubious since the results are profoundly biased by the kind
of code being sampled. For example, my data analytics code is nearly 100%
monomorphic. However, code that uses ORMs like SQLAlchemy, Peewee, or Django
or that uses templating engines (like Jinga2, Cheetah), or that does anything
interesting would tend to have much different statistics. (Performance Guided
Optimization in C has taught us that data and usage patterns greatly affect
the statistics).

All that said, I don't disagree with the authors that a lot of Python code
could be statically typed. Tracing JITs have already proven the value of call
site specialization to a particular type.

~~~
leovonl
> I wonder whether the authors consider "value = somedict.get('somekey')" or
> "match_object = re.search(pattern, somestring)" as returning a single type.
> Both of those common calls can return None.
    
    
      val get: 'a dict -> 'a -> 'b option
    
      val search: re -> string -> match option
    

Above is an hypotetical type notation in OCaml for both functions. I think it
pretty much covers everything.

------
catnaroek
> In simple terms, only 2.5% of call sites in most programs can't be handled
> by name-based typing with single inheritance. Adding parametric polymorphism
> (i.e., generics) only makes half a percent of difference, and most of the
> remaining cases can't be handled by widely-used mechanisms.

If parametric polymorphism only helps with 0.5% of all call sites, then
probably:

(0) You're relying too much on implementation details across module
boundaries. This destroys opportunities for type abstraction.

(1) You're unwittingly repeating the same logic over and over for multiple
types. Not likely the case in Python.

(2) You're relying too much on unenforced conventions.

------
p4wnc6
I don't understand the link's final remark:

> This doesn't mean that languages shouldn't include more complex type
> systems, but it does (or should) mean that the onus is on their designers to
> show that the complexity is worthwhile.

In general, the "complexity" of more advanced type systems is not presented to
the programmer, with C++ being a notable exception.

Instead, the programmer benefits from ease of expression of types _even when
they are mostly writing monomorphic functions_. For example, I would say that
> 95% of the benefit I've ever derived from Haskell's static typing system has
been due to the clarity of type expression and the codification of intentions
_in monomorphic functions_! All the extra stuff with advanced type class
features, higher order types, quantified types, etc., is nice and all, but it
probably has only ever mattered to me for at most 5% of the cases.

Regardless though, in those other 95% of monomorphic cases, the clarity of the
static type system, the way it has made me clarify my design and think about
the type constraints in function call chains, the way it has made me codify my
intentions for other developers to see, the way it has prevented silly bugs or
highlighted misconceptions I wouldn't have otherwise caught -- this has all
been very valuable, all without me ever having to really deal with any
"complexity" of the Haskell type system. As a mere Haskell user, I don't have
to fiddle with that. The existence of the fancier type options never gets in
my way if I don't need it for anything.

Now, I love Python and I'm not saying static typing is always better. I'm just
saying that if a language has a fancy and "complex" type system, that's not
the same thing as saying that a day-to-day programmer will ever have to
interact with that complexity in order to get benefits from it. They probably
won't. They'll get lots of valuable benefits more or less for free even
(perhaps _especially_ ) when their programs are mostly monomorphic.

------
Sonata
What features would a type system require to be able to soundly type check
100% of the patterns that are found in dynamically typed programs? Dependent
types? Refinement types? Is it possible?

~~~
btilly
It is impossible. Dynamically typed programs are able to look at the name of
the type, write arbitrary Turing complete code reasoning about it, and then do
whatever they want with it.

For example a unit test framework might walk your class hierarchy, identify
all classes whose name matches a particular pattern, and then start doing
stuff with that.

Of course there is always a way to accomplish the same thing without abusing
the type system. But as soon as you do so, it is a different program.

~~~
lmm
Many type systems allow encoding arbitrary Turing complete code, so that
doesn't necessarily imply typing those programs correctly is impossible.

~~~
scott_s
That is not what btilly means. You are talking about using a type system to
write an arbitrary program. btilly is talking about using an arbitrary program
to specify the typing rules for a set of values. His assertion is correct: if
you can use an arbitrary program to determine what the types are in your
program, then figuring out the types of your program is undecidable.

Again, this is different from using a type system to write an arbitrary
program. But, it _is_ true that type systems which allow you to write
arbitrary programs _are_ undecidable. Because it's possible that such programs
written in the type system cannot finish, then they cannot be well-typed.
_Most_ programs will be okay, and can be verified to be well-typed, but there
still exist some that cannot be.

~~~
lmm
> That is not what btilly means. You are talking about using a type system to
> write an arbitrary program. btilly is talking about using an arbitrary
> program to specify the typing rules for a set of values. His assertion is
> correct: if you can use an arbitrary program to determine what the types are
> in your program, then figuring out the types of your program is undecidable.

Right, but my point is that it's not necessarily impossible to just express
the program that figures out the type in the type system.

~~~
scott_s
I'm not quite sure what you mean. Being able to "express the program" (which I
take to mean a human writes the program) is independent of being able to
statically verify what it does.

------
inglor
Not only this makes sense, Python offers pluggable type systems that let you
type some of your program, or all of it - as much as you'd like.

A similar trend with TypeScript emerged in JS land where it's gaining
traction. (and flowtype to some extent although that is seeing a lot less
adoption)

------
m_mueller
Interesting that type parameters only make half a percent of difference. So
for 2-3% of Python code, the equivalent statically typed code would get really
hairy I assume?

~~~
chubot
I think it's analogous to generics in Go. Go left out generics, which isn't a
problem for most code. But when you need it, it's super annoying not to have.

------
dang
Url changed from [http://neverworkintheory.org/2016/06/13/polymorphism-in-
pyth...](http://neverworkintheory.org/2016/06/13/polymorphism-in-python.html),
which points to this.

