Hacker News new | past | comments | ask | show | jobs | submit login
Grumpy: Go running Python (googleblog.com)
1411 points by trotterdylan on Jan 4, 2017 | hide | past | favorite | 451 comments



- Amusingly, it runs Python 2.7, even though this project started long after Python 3.x came out.

- It's a hard-code compiler, not an interpreter written in Go. That implies some restrictions, but the documentation doesn't say much about what they are. PyPy jumps through hoops to make all of Python's self modification at run-time features work, complicating PyPy enormously. Nobody uses that stuff in production code, and Google apparently dumped it.

- If Grumpy doesn't have a Global Interpreter Lock, it must have lower-level locking. Does every built-in data structure have a lock, or does the compiler have enough smarts to figure out what's shared across thread boundaries, or what?


> Amusingly, it runs Python 2.7, even though this project started long after Python 3.x came out.

Basically, we needed to support a large existing Python 2.7 codebase. See discussion here: https://github.com/google/grumpy/issues/1

> It's a hard-code compiler, not an interpreter written in Go. That implies some restrictions, but the documentation doesn't say much about what they are. PyPy jumps through hoops to make all of Python's self modification at run-time features work, complicating PyPy enormously. Nobody uses that stuff in production code, and Google apparently dumped it.

There are restrictions. I'll update the README to make note of them. Basically, exec and eval don't work. Since we don't use those in production code at Google, this seemed acceptable.

> If Grumpy doesn't have a Global Interpreter Lock, it must have lower-level locking. Does every built-in data structure have a lock, or does the compiler have enough smarts to figure out what's shared across thread boundaries, or what?

It does fine grained locking. Mutable data structures like lists and dicts do their own locking. Incidentally, this is one reason why supporting C extensions would be complicated.


> Basically, exec and eval don't work. Since we don't use those in production code at Google, this seemed acceptable.

What about stuff like literal_eval? Or even just monkeypatching with name.__dict__[param] = value ?

> It does fine grained locking. Mutable data structures like lists and dicts do their own locking. Incidentally, this is one reason why supporting C extensions would be complicated.

Would there be a succinct theoretical description of exactly how that's implemented anywhere? What about things like numpy arrays.


> > Basically, exec and eval don't work. Since we don't use those in production code at Google, this seemed acceptable. > What about stuff like literal_eval? Or even just monkeypatching with name.__dict__[param] = value ?

literal_eval could in principle be supported I think. name.__dict__[param] = value works as you'd expect:

  $ make run
  class A(object):
    pass
  a = A()
  a.__dict__['foo'] = 'bar'
  print a.foo
  bar
EDIT: fixed formatting


Hmm, numpy isn't pure python, is it? If I read correctly this only works with pure python.


By volume numpy is mostly assembler written to the Fortran ABI (it's a LAPACK/BLAS-etc wrapper).


NumPy is a library that provides typed multidimensional arrays and functions that run atop them. It does provide a built-in LAPACK/BLAS or can link externally to LAPACK/BLAS, but that's a side effect of providing typed arrays and is nowhere near the central purpose of the library.

Also, NumPy is implemented completely in C and Python, and makes extensive use of CPython extension hooks and knowledge of the CPython reference counting implementation, which is part of the reason why it is so hard to port to other implementations of Python.


Having typed arrays without efficient functions over them would be rather pointless.


Are you sure you aren't mistaking numpy for scipy?


numpy is the foundation of scipy.


Is there not a single namedtuple in the entire Google codebase? That's strange :o


Heh, I came across the namedtuple exec thing the other day when I was trying to get the collections module working :\

namedtuple will have to be implemented differently. I think it can be accomplished by defining the class with type()? Maybe with a metaclass...


> I think it can be accomplished by defining the class with type()?

I've done it using more or less that method. The code is in the "coll" sub-package of my plib.stdlib project; the Python 2 version is here on bitbucket:

https://bitbucket.org/pdonis/plib-stdlib/src


You won't get exact compatibility, but a metaclass implementation would give almost all the features. I can't remember what exactly you give up, but I did that once and I lost some introspection friendliness.


Nevermind, all you need is type(). Metaclass unnecessary.


Are namedtuples that popular? They always felt awkward to me. If some temp variable with multiple values inside a loop, I either use normal tuple or a dict. If passing data around a dict or a real class. I never got the huge win from namedtuple?


namedtuples are tuples, meaning they are stored efficiently, and are constant (thus can also be used as dictionary keys). Unlike regular tuples, they can be accessed like a class/dictionary for readability, but requiring much less allocations (compared to dict/class), so much faster. Also, as they are tuples, you have well defined methods (printing, comparison, hash value, ) you'd have to implement yourself for dict/class.

If you like writing in functional style, namedtuples are much more natural than dict or classes, and more efficient to boot.


Attrs (https://attrs.readthedocs.io/) replaced namedtuple for us (and many others). It's slightly more verbose but allows all class goodness such as methods, attribute validation, etc.


Doesn't work for everything, but you can subclass a namedtuple:

  from collections import namedtuple
  
  class Foo(namedtuple("Foo", "a b c")):
      @property
      def sum(self):
          return self.a + self.b + self.c
  
  
  f = Foo(1,2,3)
  print f.sum


That doesn't look super awesome to me. I.e. classes or attrs both seem better.


Aaaaaarrrrrgggggh! I've had that particular itch for every one of my ten years with python, and at last I get to scratch it!

Thanks so much for bringing it up.


We use them extensively in our API client code to pass back immutable, well-defined data structures. Dictionaries and classes are mutable and then each layer of code tends to sloppily change them however is convenient, meaning the underlying data can end up being represented differently in different code flows.

Namedtuples are a way to preserve the data unless the consuming code _really_ wants to change it, which is sometimes legitimate.

I'm not totally sold, as in some cases dictionaries or classes would add nice value. But namedtuples have a rigidity that makes you think twice before tampering with retrieved data.


In every introductory python course tuples are presented as just immutable lists. However a "more accurate" way of describing tuples is if you think of them as records with no field names. When you see tuples as records then the fact that are inmutable make sense, since the order and quantity of the items matters (it remains constant). Records usually have field names and here is where namedtuples comes in handy. Also helps to clarify what the tuples wear (see https://youtu.be/wf-BqAjZb8M?t=44m45s), just 2 minutes clip. If you are thinking why don't define a class, I will tell you a couple of reasons:

1) You know before hand that the number of items won't be modified and the order matters since you are handling records. So it is a simple way of accomplishing that constraint.

2) Because they extend tuple they are inmutable too and therefore they don't store attributes per instance __dict__, field names are stored in the class so if you have tons of instances you save a lot of space.

Why creating a class if you just probably need a read-only interaction? But what about if you need some method? Then you can extend your namedtuple class and add the functionality you want. If for example you want to control the values of the fields when you are creating the namedtuple you can create your own namedtuple by overriding __new__. At that point it is worth it to take a look at https://pypi.python.org/pypi/recordclass.


(1) A grep can identify all the occurrences; a sed might even fix them. (translated to Google internal tools obviously)

(2) Apparently setting a __dict__ key works; they could be implemented like that.


There are some defined in built-in modules. Even in Python 2.7, where sys.version_info is a namedtuple.


Yeah, one of the motivations for adding namedtuple to stdlib was a drop-in compatible upgrade of existing interfaces returning tuples. Notable atrocities included `time.localtime()` returning a 9-tuple, and `os.stat()` returning a 10-tuple...


> There are restrictions. I'll update the README to make note of them. Basically, exec and eval don't work. Since we don't use those in production code at Google, this seemed acceptable.

I'm guessing pretty much the entire AST module is a no-go?


I think the CPython AST module is written as a C extension module so currently it's a no-go. I don't think there's a fundamental reason Grumpy couldn't run a pure Python AST module, though.


The ast module itself is in Python, but it imports the _ast module which is an extension module. This actually isn't that big of a deal, though, as the entire AST is defined in a DSL (see https://cpython-devguide.readthedocs.io/en/latest/compiler.h... for some details), so you just have to write some code to generate _ast in Python instead of C (which PyPy may have already done).


So I take that means Grumpy can't run itself?


Correct, Grumpy cannot yet run Grumpy :)


> I'll update the README to make note of them.

I managed to run into 2 trying to build a 5 line program :-)

  $ cat t.py; ./tools/grumpc t.py  > t.go;go build t.go;echo '----';./t
  import sys
  print sys.stdin.readline()
  ----
  AttributeError: 'module' object has no attribute 'stdin'
  $

  $ cat t.py ;./tools/grumpc t.py
  c = {}
  top = sorted(c.items(), key=lambda (k,v): v)
  Traceback (most recent call last):
    File "./tools/grumpc", line 102, in <module>
      sys.exit(main(parser.parse_args()))
    File "./tools/grumpc", line 60, in main
      visitor.visit(mod)
    File "/usr/local/Cellar/python/2.7.12/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ast.py", line 241, in visit
      return visitor(node)
    File "/Users/foo/src/grumpy/build/lib/python2.7/site-packages/grumpy/compiler/stmt.py", line 302, in visit_Module
      self._visit_each(node.body)
    File "/Users/foo/src/grumpy/build/lib/python2.7/site-packages/grumpy/compiler/stmt.py", line 632, in _visit_each
      self.visit(node)
    File "/usr/local/Cellar/python/2.7.12/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ast.py", line 241, in visit
      return visitor(node)
    File "/Users/foo/src/grumpy/build/lib/python2.7/site-packages/grumpy/compiler/stin visit_Assign
      with self.expr_visitor.visit(node.value) as value:
    File "/usr/local/Cellar/python/2.7.12/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ast.py", line 241, in visit
      return visitor(node)
    File "/Users/foo/src/grumpy/build/lib/python2.7/site-packages/grumpy/compiler/expr_visitor.py", line 101, in visit_Call
      values.append((util.go_str(k.arg), self.visit(k.value)))
    File "/usr/local/Cellar/python/2.7.12/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ast.py", line 241, in visit
      return visitor(node)
    File "/Users/foo/src/grumpy/build/lib/python2.7/site-packages/grumpy/compiler/expr_visitor.py", line 246, in visit_Lambda
      return self.visit_function_inline(func_node)
    File "/Users/foo/src/grumpy/build/lib/python2.7/site-packages/grumpy/compiler/expr_visitor.py", line 388, in visit_function_inline
      func_visitor = block.FunctionBlockVisitor(node)
    File "/Users/foo/src/grumpy/build/lib/python2.7/site-packages/grumpy/compiler/block.py", line 432, in __init__
      args = [a.id for a in node_args.args]
  AttributeError: 'Tuple' object has no attribute 'id'


Ugh, sorry about that. There's a couple issues here:

1. Lambda tuple args are not yet supported -- I actually didn't know that was a thing :\ -- https://github.com/google/grumpy/issues/17

2. Even if that worked properly, sorted() is not yet implemented: https://github.com/google/grumpy/issues/16


Yeah.. It also used to work with def, but it was removed in python3. You can do this in 2.7:

  def func((a,b)):
      return b

  mytuple = 1,2
  print func(mytuple)
in py3 you need

  def func(t):
      a,b = t
      return b
Not sure if

This is probably the cleaner way to write that:

  key=operator.itemgetter(1)


sorted() is widely used, adding will extend coverage considerably.


> Basically, exec and eval don't work.

Couldn't they supported with a slower runtime implementation? I mean I still love the idea and actually like the idea.


>- Amusingly, it runs Python 2.7, even though this project started long after Python 3.x came out.

Python 2.7 is what's running at Google. Not really surprising they're looking at this considering the fast approaching end of (core dev) support for Python 2.7.

Write an interpreter in another language and programatically port modules to Go. Seems pretty sensible to me.


Given the failure of their unladen-swallow work to make it into CPython, I think Google is probably tired of trying to make Python faster. Some of their stated goals with Go were to be a faster, compiled Python, so this makes a lot of sense for their use case. They face the choice of fixing all their existing Python code to run in Python 3 (which won't make anything faster), or just porting everything to a different language. They chose the latter, and this lets them incrementally convert Python code to Go. I don't know that this makes sense for anyone but Google, just like Hack probably doesn't make sense for most PHP development that's not at Facebook.


> I don't know that this makes sense for anyone but Google

I'm not sure; as a Go developer, I kind of like the idea of having access to the Python library ecosystem from Go, without being forced to create an IPC bridge and building up the requisite release-management and deploy-time goop.

Plus, I'm just not a Python developer; in the case where the only library that exists to do something is written in Python, I'd much rather write Go that calls that Python library than Python that calls that Python library.


IIUC, it's not about accessing python libs from go. It's for accessing go libs from your python program and transpiling that python code to go source and compile it with go tool chains.

Eg: python code (from blog post)

  from __go__.net.http import ListenAndServe, RedirectHandler

  handler =  RedirectHandler('http://github.com/google/grumpy', 303)
  ListenAndServe('127.0.0.1:8080', handler)


sure but the reverse should be equally feasible. It's transpiling Python to Go, so theoretically we should be able to (eventually) "convert" Python libs to Go and call them from Go. A lot of utility libs are available in Python... the Go library ecosystem is relatively sparse


I'm not sure if it is already possible? Is it.


> I kind of like the idea of having access to the Python library ecosystem from Go

I'd like to see an example of this, because from the blog post I get the impression that this mostly allows accessing the Go ecosystem from Python, rather than the other way around. For example, how would Python classes be handled from Go?


embed a cpython interpreter into the Go runtime as an embedded interpreter?


Statically linked Python interpreter sounds pretty great to be honest.


> Python 2.7 is what's running at Google. Not really surprising they're looking at this considering the fast approaching end of (core dev) support for Python 2.7.

I'd prefer that all new Python tools that need to support 2.x also support 3.x. It's an additional development cost, but IMHO, a worthwhile investment in the future.


Well, the difference here is that Google seems to be looking at Golang as the future for their internal tooling currently implemented in Python 2.x, instead of Python 3.x. I'm curious to know how much additional work might be necessary for this to support 3.x, but it doesn't sound like that's part of their use case.


While python2 may be the past, I think that for many python3 is not the future.


Highly depends on the use case.

Python 3.5 with uvloop+sanic can be faster than node.js without any JIT:

https://github.com/channelcat/sanic#benchmarks


Still slower than OCaml, Haskell, Java or .NET.


Mmm, "faster than node.js" isn't a great benchmark out in the wide world. Although node.js being as fast as it is remains an astonishing thing.


Who cares about the speed of the interpreter? The interpreter's job is to orchestrate high-performance components written in some high-performance language. If your interpreter is dominating execution time, you should move some of your logic to native code.


Better not use an interpreter in first place, rather a language with REPL that allows compilation straight to native code.


Why? Native code is costly: machine code generally has a much bigger footprint than interpreter bytecode. In some cases, interpreted code can be faster due to cache effects and reduced IO load making it faster to be smaller.


I am yet to see such benefits in action.

The fact that Google has started this project to migrate away from Python to an AOT compiled language, shows where the performance wins are.


>I am yet to see such benefits in action.

Here you go:

https://morepypy.blogspot.com/2011/08/pypy-is-faster-than-c-...

https://morepypy.blogspot.com/2011/02/pypy-faster-than-c-on-...

For more examples, just search "pypy faster than c".

Also, here is an article from the Python wiki about why speed doesn't really matter a lot of the time:

https://wiki.python.org/moin/PythonSpeed

And, my own two cents:

Speed is relative. Does every piece of code need to be as performant as possible? No. I would argue that, in most cases, speed of development is far more important than speed of execution. This is, of course, not true for things like drivers or statistical analysis.

Writing a web application? Speed isn't that important as the whole process is i/o bound anyways.

Writing a machine learning algorithm? Depends.

Web scraping? I/O bound, speed not really important.

Image processing? Speed matters at least a little bit.

Writing networking glue for distributed systems? Speed probably doesn't matter.

It's all relative. If it needs to be fast, it needs to be fast. Most things don't really need to be fast. For the things that don't need to be fast, why build them with C/C++/Rust/Go when you could spend half the time building them in Python/Ruby/js/etc?


PyPy isn't an interpreter, you are just validating my assertion about JIT/AOT compilers.

I usually ignore it when talking about Python, because that is what the community does, by gathering around CPython.

> For the things that don't need to be fast, why build them with C/C++/Rust/Go when you could spend half the time building them in Python/Ruby/js/etc?

Because one can use languages like OCaml, Haskell, Lisp, Scheme, Racket, F#, C#,... thus having both the productivity of a REPL environment and the execution speed of native code.


>PyPy isn't an interpreter, you are just validating my assertion about JIT/AOT compilers.

I thought we were talking about Python implementations exclusively. My mistake.

>OCaml

I have no experience with OCaml, so I won't make any comments regarding it's efficacy.'

>Haskell

Well thought out language. I like the purity, but it's too academic for real world use outside of scientific computing. FP isn't for everyone, and my personally belief regarding it is that it is better used as a tool alongside other paradigms than all by itself as the only paradigm.

>Lisp

Lisp is useful for a lot of things. It's also not very popular for new projects as far as I've seen. There are also a ton of different versions, so I don't know if "Lisp" is really a good descriptor.

>Scheme

As far as I know Scheme is the defacto teaching language for most compSci programs.Or, at least it was for a long time. Once again, FP is not for everyone. A lot of people also dislike Lisp style syntax, myself included.

>Racket

Same issues as Scheme.

>F#

F# is a fantastic language. There's not really a whole heck of a lot to complain about other than the .NET implications. The only detriment relative to Python is F#'s much smaller ecosystem and community.

>C#

Once again, some people just don't like .NET stuff. A lot of people also see static typing to be a detriment in many use cases.

Relative to Python, these languages also share several other problems when it comes to real world application: lack of competent developers, stagnating ecosystems, lack of third party libraries, ecosystem lock in, and cross-platform comparability issues. In the case of languages like Haskell, they could even be considered "esoteric".

I'll give you that many of these languages are more "pure" or "logical" than Python. I'll even give you that most of them are designed much better than Python. None of that changes the fact that Python is overall easier to read, easier to learn, easier to write, has a better ecosystem, is platform and file system agnostic, has a very non-restrictive license, and is, overall, very pleasant to work with.


Actually, they said they would keep programming in python.


Well... yes. That's basically my point. Comparing speed to node.js is pretty much useless because if you really care about speed you're not using node in the first place.


From Google's POV, Go is the future, not Python 3, they created it for this reason. As mainly a sysadmin these days, I tend to agree with them for their use-case. For deployment, performance and overhead, Go is great for system tools, which is reflected in the new toys in the "sysops" toolbox. Pretty much every-one of them is written in Go these days, where that used to be Python or for a brief period Ruby or C/C++ for the more performance sensitive stuff.


Python3 was never the future of Python. It never got over the hump all new languages need to if attempting to reach relevance.

It's likely code using lots of C-extensions will continue with CPython2 and new code will be written in Grumpy (pure Python2).


That is pretty ridiculous. Pretty much all major libraries are Python 3 compatible and everyone is writing Python 3 (or should be). (Yes, I'm still on Python 2 but moving soon).


My python course at uni focused on python3. We talked about differences and toyed with both interpreters, but ultimately wrote all projects and assignments in 3. I'm sure this is the case for students at other schools too.


I've been writing Python 3 for years.


That's top-down change from the PSF/core dev team and some major library developers added dual 2/3 support. The users never arrived and it'll be 10 years Python3 has been available next December. I'd like to unify Python again but we as users didn't break it either. 10 years any reasonable person in charge would hang it up or change course.

Grumpy is pretty much what most everyone would actually want out of a new Python and may have arrived just in time.


> some major library developers added dual 2/3 support

> The users never arrived

That's the way it used to be a few years ago - it changed a lot the last few years. Pretty much all libraries are ported and many new libraries are Python 3-only.

asyncio is nice.

All Python devs I personally know moved to Python 3. Porting is a lot less painful than it used to be.


Gevent is nice. AsyncIO is terrible[0].

[0]http://lucumr.pocoo.org/2016/10/30/i-dont-understand-asyncio...


It's actually ridiculously nice, especially with the new async/await constructs.


Python 3 is largely cross-compatible with Python 2. If you're not working with unicode, not reliant on perfect floating point division, and not using aysncio, then chances are what you produce will work just fine on Python 2.


>(Yes, I'm still on Python 2 but moving soon).

I'm like really new to programming and I'm still just learning the basics, but I see this little addendum a lot from people who say everyone should be writing Python 3.


me too. I've migrated a big project from 2 to 3 two years ago. And let's get this straight : innovation happens on python 3. So staying on 2 would feel like riding a dead horse. I know I did the right thing.

And if Google can make an interpreter for 2, then sooner or later, one for 3 will pop up. Since Google made some restrictions on what Grumpy supports from python 2, I'm sure someone somewhere will be able to do the same stuff for three.


It's true, but moving our huge codebase to Python 3 is a big undertaking. We're making progress towards it by using Python 3 constructs for new files, etc. For my personal projects I'm already in the process of moving over.


So is it just a time issue, or are there compatibility reasons for why not all of you code is Python 3?

I ask because I started to learn to code with Python 2 because that's what was preloaded on my system. Is one over the other a big enough deal at a beginner level that I should switch to 3 now? How much of a learning curve am I in for?


> Is one over the other a big enough deal at a beginner level

No, at a beginner level it's not, there are many guides that explain the differences at a beginner level, and you can go through those in a few hours at most, for example http://python-future.org/compatible_idioms.html

But, if you start working now on a Python 2 project and that project starts growing significantly, then it will be hard to convert the codebase. That's why you can see people saying that they didn't switch yet, it's not that they don't know Python 3, it's that upgrading large legacy code bases is hard (not only in Python).


For me, the hardest part was migrating from Python 2 string to Python 3 Unicode string. But at the same time, this was a huge improvement for my code base because I work with several languages and unicode makes that much easier/safer. So it was a good thing.

Now the rest of upgrades were a bit painful (I was using some functional programming stuff, httprequest libraries, etc.)


Okay. Thank you for the advice.

My biggest program is like 100 lines of code maybe. So I will go ahead an switch now. But it's like 10pm where I'm at so here's hoping I don't play too much...


Just a time issue. I don't believe the learning curve is that big, most of it would be in how you deal with strings and the `print` statement, which isn't that heavily used in web apps.


what system is that? a mac? If so , the sadly you do have to get py3 yourself. If a Linux distro, then you likely already have python 3 preinstalled. `/usr/bin/python` won't point to python3 in anytime in the near future.


Haha it's Ubuntu. After I downloaded Python 3.6 last night, I found out I had Python 3.4 already on my machine. I got excited about new software and forgot to check. It's a crutch :/


Here, I'll give you a counterexample:

We're on Python 3 for all new stuff, and are migrating the old whenever we can.


Python 2.7 is EOL in 2020. Make of that what you will :/


The community will fork the 2.7 codebase and continue to support it, even if Python.org EOLs it.


It looks like all Python 2.8 is missing is a new name:

https://www.naftaliharris.com/blog/why-making-python-2.8/


I can see "the community" doing security fixes, maybe some bug fixing and a few back ports, but so far I havn't seen much effort from any community to bring active development of new features.


It seems like the 2.7 community is happy enough without the new features -- just need the bugfixes and continued backwards compatibility to keep that segment happy.

From a new features perspective, the other reply's Placeholder is fascinating. (I haven't looked into it thoroughly yet.)


It is certainly interesting if it truly materializes. But it looks like the plan is to mostly backport stuff from py3- so py3 is still the future, with Placeholder getting some of those features eventually. And that is great from a legacy codebase perspective.


Right.


That means nothing. Python 3 was also expected to be mainstream by 2015, but it's nowhere near even 30% yet.


It's mainstream for new projects. Which is what was expected and aimed for.


Citation needed. Most companies I know that have 2.7 older projects also do new projects in 2.7. They don't want to introduce 2 different versions of the language, set of dependencies etc to their production.


absolutely true. imagine porting a decade or more of code for only the promised benefit of a more "pure" language. asyncio is nice, but excluding it and enhanced generator syntax from 2.7+ is >policy<, not engineering. python 3 is bootheel style top-down engineering >management<, not good engineering. The BFDL is fallible. Placeholder looks like a great Python 2.8+ to me. Runs all my old code and gives me new syntax, without rejiggering the stdlib for purity's sake? Twist my arm.


>and everyone is writing Python 3 (or should be)

You'd be surprised.

If anything, the numbers show the opposite. The vast majority of Python codebases, legacy or new, are 2.7 or older.


Which numbers? People keep saying things like that, but things like the Python 3 Wall of Superpowers https://python3wos.appspot.com/ don't seem to support it. Do you have something more concrete?


The "Wall" is just numbers for Python3, without context for how it compares to Python 2.

For 2.7 Pypy reported 419,227,040 downloads for 2016.

At the same time, for ALL 3.x versions combined (up to 3.6) there are just: ~52 million downloads.

That's 1/8th of the Python 2 downloads.


Given that there are only 7.5 billion humans on the planet, and that rather significantly fewer than 1 in 20 people are PyPy-using developers, perhaps those numbers should be taken with a grain of salt?

The message I would take from those statistics is that needing a fresh download of Pypy is less common among 3.x users than among 2.7 users, who apparently needed to reinstall from the web at least a few times a day during 2016.


> The message I would take from those statistics is that needing a fresh download of Pypy is less common among 3.x users than among 2.7 users, who apparently needed to reinstall from the web at least a few times a day during 2016.

Occam's Razor would suggest that there are fewer Python 3 users


It should be PyPI rather than PyPy in the parent, FWIW.


Oops, mea culpa.


>Given that there are only 7.5 billion humans on the planet, and that rather significantly fewer than 1 in 20 people are PyPy-using developers, perhaps those numbers should be taken with a grain of salt?

Those are not downloads of PyPi, but of packages. It's not like "number of downloads == number of individual developers". Those are packages, including package updates. A single developer can download 50 deps across his codebase, and update them to later versions 2-3 times a year.


> A single developer can download 50 deps across his codebase, and update them to later versions 2-3 times a year.

As if individual developers are the reason behind the bulk of the downloads. I wonder how many downloads Travis alone counts for?

Your hate of Python 3 in every discussion about it is frankly baffling.


>As if individual developers are the reason behind the bulk of the downloads. I wonder how many downloads Travis alone counts for?

Travis runs/tests user projects, so there's nothing about it that's especially partial to Python 2 over Python 3.

>Your hate of Python 3 in every discussion about it is frankly baffling.

Or, you know, my pragmatic assessment of its popularity.

That you'd even use the word "hate" (when in fact, I like Python 3 over 2.7, even if its mostly tame updates over what 2.7 offers) shows that you're probably too partisan. I was enthused with Python 3 even when it was only a vision called Python3K back in 2000-ish. My personal preference has nothing to do with whether I see more people using it or not.

The situation is not unlike the perennial "next year is when Linux dominates the desktop", which has been every year since 1999.


> even if its mostly tame updates over what 2.7 offers

> The situation is not unlike the perennial "next year is when Linux dominates the desktop", which has been every year since 1999.

Your bias is showing, as it does in every comment section on this site regarding Python 3, as you make comment after comment about how inferior Python 3 is and how nobody is using it at all because your sample of 2 companies shows this and how it personally hurt your family or whatever. You don't stop. Either you hate it or you hate something else and use Python 3 as a vent.


If by "your bias" your mean my assesment of the state of Python 3 vs 2, that doesn't change depending on whether I like the language or not, then we agree.

>as you make comment after comment about how inferior Python 3 is

Actually I've never made any such comment. In fact, tame updates" means that IT IS an update over 2.x, only not that much as it could be. Which most people I've read agree, or at least agreed until the async stuff.

>and how nobody is using it at all because your sample of 2 companies shows this and how it personally hurt your family or whatever.

Notice how I never said that, but actually gave concrete numbers that place those using it in much less (up to 1/8 less) of those who use 2.x?

So why the lie? Less is not the same as "nobody at all", and doesn't fix by itself just because you really really wish more people used 3.

>You don't stop.

Yeah, I continue expressing my opinion and my argumentation. I should stop because you happen not to like it?

Please don't bring "the feelz" into technical and community discussions. It cheapens the argumentation. If anything, it's you who are biased: 80% of your submissions on HN are for Python stories.

One can acknowledge that D is way less popular than Golang or that Perl 6 failed to gain traction over 5, without hating Perl 6. Ditto for Python.


How can you accuse him of that? You're rebutting everyone with Python3 criticism, including myself. This is a prime example of projection. You're the one on a rampage, against Python2.

From what I've seen of his posts, he's only talking about the reality of the situation.. not "how it should be".

Go look at the stats on PyPi and other metrics. Python3 failed, there is a cutoff time for adoption. It's no different than the first 24 hours of a missing person report. You don't get eternity to see if something is going to pan out or not. We're past that point for Python3. It may survive as it's own (smaller) thing, but Python2 isn't going to die either and that's more assured than Python3's fate.

And coldtea is right, but we're not going to do your research for you. What I'm saying needed to be said to you, but you need to find better ways to contribute than just rebutting everyone who has something to say about Python3. Talking about how he hates "something else" and using Python3 as a vent is just ridiculous.


That response is talking about something else because your original comment accidentally said "PyPy", which is an implementation of Python, instead of "PyPI", the package repository.


A huge confounding factor: newer Py3 codebases are more likely to be built with newer pipeline tooling like devpi (to cache PyPI downloads), wheel (to cache locally-built packages), and Docker (which caches all the things).

Our legacy Python 2 build pipelines that we're actively moving off of hit PyPI far more often than our Py3 processes.


>A huge confounding factor: newer Py3 codebases are more likely to be built with newer pipeline tooling like devpi (to cache PyPI downloads), wheel (to cache locally-built packages), and Docker (which caches all the things).

Maybe in your case, but from what I've seen, I seriously doubt use of Docker or Devpi makes any dent in newer Py3 codebase dependency downloads. Besides, tons of new codebases for greenfield projects are still done in 2.x Python.

Not sure how it is in scientific computing area, but for enterprise/web apps, any company that has legacy 2.x code and libs in production (which is most of them) will continue to write new parts (including new projects) in 2.7 for compatibility with their Python production setup.

3.x is either from companies that didn't already have significant 2.x Python code in production (generally newer companies that for some reason went with Python instead of Node or Go that the cool kids use) or new programmers that just get started and start with 3.x.


The PyPy statistics aren't worth much since they're counting all sorts of automated downloads/dependencies/etc.

That's why packages like supervisor and graphite - which aren't libraries - are among the top downloads.


>The PyPy statistics aren't worth much since they're counting all sorts of automated downloads/dependencies/etc.

Those would exist for both 2.x and 3.x so it's not a differentiating factor.


There's plenty of 2.7 out there, but we're moving over slowly, basically due to the nice function annotation/type checking work. That's to me the first really compelling reason to use the latter.


Hell, even OpenStack (which is giant tangled mess of code written by like a few dozen teams) is making good progress.


> - It's a hard-code compiler, not an interpreter written in Go. That implies some restrictions, but the documentation doesn't say much about what they are. PyPy jumps through hoops to make all of Python's self modification at run-time features work, complicating PyPy enormously. Nobody uses that stuff in production code, and Google apparently dumped it.

Does it really imply many restrictions? Common Lisp, for example, is probably more dynamic than Python and it's been a compiled language for ~20 years.


> ~20 years.

Common Lisp was designed for interpretation and compilation from day one. The first implementations from 1984/85 had already compilers.

> Common Lisp, for example, is probably more dynamic than Python

Some parts are more dynamic than Python, some not. For example everything that uses CLOS+MOP is probably more dynamic. Also some stuff one can do when using a Lisp interpreter may be more dynamic. CL is more static, where one uses non-extensible functions, type declarations, static compilation, inlining, ... The parts where a CL compiler achieves good runtime speed may not be very 'dynamic' anymore.


20 years only takes us back to 1997, several years after Common Lisp was finally standardized. The ancestral dialects of Common Lisp were compiled as far back as the early 60's.


I didn't mean to imply that Lisp was only 20 years old or that common lisp was precisely 20 years old (I used the `~` to indicate that Common Lisp was approximately 20 years old), I just wasn't sure whether Lisp has always been a compiled language, so I restricted my claim to being a claim about Common Lisp and estimated its age conservatively.


In regards to the third point: The global interpreter lock protects the fact that python's GC scheme is not thread safe. It does not coordinate accesses across threads, and therefore grumpy's would not either. In grumpy the GIL is replaced by Go's GC implementation that is specifically tuned for multithreaded execution. Any additional synchronization would need to be done with individual locks etc...


The GIL is not just for GC; it does coordinate access across threads, albeit at a very low level -- if two threads execute "mylist.append(v)" at the same time, it is the GIL that makes sure it actually works as expected, and from comments above it seems grumpy uses per-object locks for that.


That's not a good characterization of the GIL. It doesn't prevent races or make multithreaded mylist.append safe. It makes sure that mylist.append doesn't cause a segfault as the bytes in RAM are in an inconsistent state during an update. Beyond that, it doesn't really protect you from your bad threaded code.


(assuming mylist is a standard python list) it does prevent races inside mylist.append. It does make mylist.append safe.

When I wrote "append from two threads .. as expected" I meant "two items will be added, which one first is unspecified", and the GIL certainly takes care of that.

I agree it does not protect you from your bad threaded code - but then, nothing short of STM does (and even STM doesn't guarantee starvation in the general sense - nothing can).


An unintended side effect of the GIL is that all calls to a C implemented function are atomic and single threaded, provided that the C function doesn't release the GIL. In practice this means lists and dicts are thread safe and existing Python code relies on this.


No support for 3 makes sense. This is all about building an off-ramp to put Python behind them.


I was in second year of college and the Python 2 vs Python 3 was a couple of years running. Is this fight -still- not resolved? I'm not a python developer so I'm out of the loop.


The arguing will continue for years after Python 2 is legitimately dead, but the shift has been happening and will continue to happen. Python3 is the future, and more and more new projects are being started with it, and more and more legacy 2.7 codebases are being moved to Python 3 or deprecated in favor of Python 3 replacements.


> Nobody uses that stuff in production code

Nobody uses the features of Python which make it a dynamic language? Google must write some really weird Python if their compiler is that strict.


In Python, you can get at a variable, or even code, in another thread with "getattr()". You can monkey-patch another thread while it is running. This is not very useful, but it's easy to implement in a naive interpreter such as CPython. Part of the price for this is the Global Interpreter Lock, so you don't really have two threads running at once. PyPy has a huge amount of machinery so that stuff will work.

Grumpy doesn't even seem to try to implement that. That's a good thing. If you restrict Python a little, it's much easier to compile.


> That's a good thing. If you restrict Python a little, it's much easier to compile.

Isn't that more or less what RPython does? https://rpython.readthedocs.io/en/latest/architecture.html I mean, I know that starting with a full-fledged(?) Py27 codebase rules out _actually_ using RPython for the stated goals of Grumpy, but I think the two projects agree in principle and differ about the definition of "restricted" :-)


RPython is a restricted (hence R) language specifically for VM development, it is not a general-purpose language.


>Nobody uses the features of Python which make it a dynamic language?

Python has TONS of dynamicity besides those (eval and co), who are seldom used by anyone anyway....

If you think eval is what makes Python dynamic you're doing it wrong...


I'm not sure what is meant by "dynamic language" in that sentence, but examining the compiler output, it supports the features of Python which I think of when I think of "dynamic language" (e.g., a class `Foo` gets compiled into a runtime `*Object` with collections of properties and methods, not into a `type Foo struct` with fixed fields and methods).


Anybody running Django uses this. It uses the pattern of specifying plugins as class paths in strings in the config, which are then looped over and instantiated at runtime.

Frameworks do lots of such dynamic tricks in order to provide nice DSLs for building apps.


'exec' and 'eval'? No it's not, the importlib machinery is used (which doesn't just eval(read('import.py')))


I have never used exec or eval in production Python, and I doubt I could get them past code review because of the possible security impact.


Their python is weird if it doesn't use eval?


or getattr/setattr, or dynamically building classes with type(), or probably more features I can't remember now.


  > getattr/setattr, or dynamically building classes with type()
I think Grumpy handles both those things fine.


What does "hard-code compiler" mean?


It seems to be that it pesudo-transpiles python to go and compiles that down using a normal go toolchain.


What's the difference between transpiling and pesudo-transpiling? (Even if you meant pseudo-transpiling, I still don't know what the difference between that and transpiling is.)


That was a typo. I meant pseudo. I don't actually know that there is any difference in this case. I made that statement hastily. It transpiles, nothing pseudo about it.


Transpilers with no runtime.


It's written to run Python 2.7 because these problems are largely solved in Python 3, and needs solved for people on Python 2.x versions.

"Upgrade to Python3" is the usual defense to that, but it's not really practical for large companies with software such as YouTube completely written in Python 2.x.


No, its written to support 2.7 because the majority of Google's code they want to over haul is in 2.7. I don't think much of this is fixed by python3, at the very least the speed benefits you get don't even compare. See the graph on the OP comparing # of threads using CPython and Grumpy


How is it not practical to upgrade to Python 3, yet it is practical to rewrite in Go?


As someone who works on both python and go day to day, I find this to be quite interesting.

Just tried this out on a reasonably complex project to see what it outputs. Looks like it only handles individual files and not any python imports in those files. So for now you have to manually convert each file in the project and put them into the correct location within the build/src/grumpy/lib directory to get your dependencies imported. Unless I missed something somewhere.. The documentation is a bit sparse.

Overall I think the project has a lot of potential and I'm hoping it continues to be actively developed to smooth out some of the rough edges.


Thanks for trying it out! And sorry about the lacking documentation. I'll be fleshing it out over the next little while.

Your assessment is right: the grumpc compiler takes a single Python file and spits out a Go package. Incidentally, this means you can import a Python module into Go code pretty easily.

I don't have a ready solution for building a large existing project but I'll write up a quick doc to outline the process. The trickiest bit is that the Python statement "import foo.bar" translates to a Go import: import "prefix/foo/bar". Currently prefix always points at the grumpc/lib directory so that's one way to integrate your code, but I need to make it more configurable.


I hope this is a well thought out solution that can evolve into something great... and not just something built for a single purpose.

I question the transpiler. I think I'd much rather prefer a solution like Jython.


I'm confused because Jython runs on the JVM, but Go is a compiled language. Can you clarify?


Jython is a python interpreter written in Java. Grumpy is a python transpiler that converts python to navtive go object code.

Edited to add: The difference is that Jython doesn't covert python to JVM bytecode.


What's the advantage of writing an interpreter? Go already has an excellent runtime (scheduler, GC, etc)--why should this project reimplement it?


The interpreter could still use that stuff.

One advantage of an interpreter in general is that one important use case for Python is interactive scripting, as data scientists do.


Fair point. I would think it shouldn't require too much work to build a REPL on top of this. Rather than transpiling, you would parse the Python AST into the same runtime Objects that Grumpy constructs statically. Seems straightforward conceptually, though I'm sure it would be complex in practice.


I'm sure as this gets hacked on they'll be able to support consuming imports and doing all the conversion to go recursively


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: