Hacker News new | comments | show | ask | jobs | submit login
Grumpy: Go running Python (googleblog.com)
1411 points by trotterdylan 287 days ago | hide | past | web | 451 comments | favorite



- Amusingly, it runs Python 2.7, even though this project started long after Python 3.x came out.

- It's a hard-code compiler, not an interpreter written in Go. That implies some restrictions, but the documentation doesn't say much about what they are. PyPy jumps through hoops to make all of Python's self modification at run-time features work, complicating PyPy enormously. Nobody uses that stuff in production code, and Google apparently dumped it.

- If Grumpy doesn't have a Global Interpreter Lock, it must have lower-level locking. Does every built-in data structure have a lock, or does the compiler have enough smarts to figure out what's shared across thread boundaries, or what?


> Amusingly, it runs Python 2.7, even though this project started long after Python 3.x came out.

Basically, we needed to support a large existing Python 2.7 codebase. See discussion here: https://github.com/google/grumpy/issues/1

> It's a hard-code compiler, not an interpreter written in Go. That implies some restrictions, but the documentation doesn't say much about what they are. PyPy jumps through hoops to make all of Python's self modification at run-time features work, complicating PyPy enormously. Nobody uses that stuff in production code, and Google apparently dumped it.

There are restrictions. I'll update the README to make note of them. Basically, exec and eval don't work. Since we don't use those in production code at Google, this seemed acceptable.

> If Grumpy doesn't have a Global Interpreter Lock, it must have lower-level locking. Does every built-in data structure have a lock, or does the compiler have enough smarts to figure out what's shared across thread boundaries, or what?

It does fine grained locking. Mutable data structures like lists and dicts do their own locking. Incidentally, this is one reason why supporting C extensions would be complicated.


> Basically, exec and eval don't work. Since we don't use those in production code at Google, this seemed acceptable.

What about stuff like literal_eval? Or even just monkeypatching with name.__dict__[param] = value ?

> It does fine grained locking. Mutable data structures like lists and dicts do their own locking. Incidentally, this is one reason why supporting C extensions would be complicated.

Would there be a succinct theoretical description of exactly how that's implemented anywhere? What about things like numpy arrays.


> > Basically, exec and eval don't work. Since we don't use those in production code at Google, this seemed acceptable. > What about stuff like literal_eval? Or even just monkeypatching with name.__dict__[param] = value ?

literal_eval could in principle be supported I think. name.__dict__[param] = value works as you'd expect:

  $ make run
  class A(object):
    pass
  a = A()
  a.__dict__['foo'] = 'bar'
  print a.foo
  bar
EDIT: fixed formatting


Hmm, numpy isn't pure python, is it? If I read correctly this only works with pure python.


By volume numpy is mostly assembler written to the Fortran ABI (it's a LAPACK/BLAS-etc wrapper).


NumPy is a library that provides typed multidimensional arrays and functions that run atop them. It does provide a built-in LAPACK/BLAS or can link externally to LAPACK/BLAS, but that's a side effect of providing typed arrays and is nowhere near the central purpose of the library.

Also, NumPy is implemented completely in C and Python, and makes extensive use of CPython extension hooks and knowledge of the CPython reference counting implementation, which is part of the reason why it is so hard to port to other implementations of Python.


Having typed arrays without efficient functions over them would be rather pointless.


Are you sure you aren't mistaking numpy for scipy?


numpy is the foundation of scipy.


Is there not a single namedtuple in the entire Google codebase? That's strange :o


Heh, I came across the namedtuple exec thing the other day when I was trying to get the collections module working :\

namedtuple will have to be implemented differently. I think it can be accomplished by defining the class with type()? Maybe with a metaclass...


> I think it can be accomplished by defining the class with type()?

I've done it using more or less that method. The code is in the "coll" sub-package of my plib.stdlib project; the Python 2 version is here on bitbucket:

https://bitbucket.org/pdonis/plib-stdlib/src


You won't get exact compatibility, but a metaclass implementation would give almost all the features. I can't remember what exactly you give up, but I did that once and I lost some introspection friendliness.


Nevermind, all you need is type(). Metaclass unnecessary.


Are namedtuples that popular? They always felt awkward to me. If some temp variable with multiple values inside a loop, I either use normal tuple or a dict. If passing data around a dict or a real class. I never got the huge win from namedtuple?


namedtuples are tuples, meaning they are stored efficiently, and are constant (thus can also be used as dictionary keys). Unlike regular tuples, they can be accessed like a class/dictionary for readability, but requiring much less allocations (compared to dict/class), so much faster. Also, as they are tuples, you have well defined methods (printing, comparison, hash value, ) you'd have to implement yourself for dict/class.

If you like writing in functional style, namedtuples are much more natural than dict or classes, and more efficient to boot.


Attrs (https://attrs.readthedocs.io/) replaced namedtuple for us (and many others). It's slightly more verbose but allows all class goodness such as methods, attribute validation, etc.


Doesn't work for everything, but you can subclass a namedtuple:

  from collections import namedtuple
  
  class Foo(namedtuple("Foo", "a b c")):
      @property
      def sum(self):
          return self.a + self.b + self.c
  
  
  f = Foo(1,2,3)
  print f.sum


That doesn't look super awesome to me. I.e. classes or attrs both seem better.


Aaaaaarrrrrgggggh! I've had that particular itch for every one of my ten years with python, and at last I get to scratch it!

Thanks so much for bringing it up.


We use them extensively in our API client code to pass back immutable, well-defined data structures. Dictionaries and classes are mutable and then each layer of code tends to sloppily change them however is convenient, meaning the underlying data can end up being represented differently in different code flows.

Namedtuples are a way to preserve the data unless the consuming code _really_ wants to change it, which is sometimes legitimate.

I'm not totally sold, as in some cases dictionaries or classes would add nice value. But namedtuples have a rigidity that makes you think twice before tampering with retrieved data.


In every introductory python course tuples are presented as just immutable lists. However a "more accurate" way of describing tuples is if you think of them as records with no field names. When you see tuples as records then the fact that are inmutable make sense, since the order and quantity of the items matters (it remains constant). Records usually have field names and here is where namedtuples comes in handy. Also helps to clarify what the tuples wear (see https://youtu.be/wf-BqAjZb8M?t=44m45s), just 2 minutes clip. If you are thinking why don't define a class, I will tell you a couple of reasons:

1) You know before hand that the number of items won't be modified and the order matters since you are handling records. So it is a simple way of accomplishing that constraint.

2) Because they extend tuple they are inmutable too and therefore they don't store attributes per instance __dict__, field names are stored in the class so if you have tons of instances you save a lot of space.

Why creating a class if you just probably need a read-only interaction? But what about if you need some method? Then you can extend your namedtuple class and add the functionality you want. If for example you want to control the values of the fields when you are creating the namedtuple you can create your own namedtuple by overriding __new__. At that point it is worth it to take a look at https://pypi.python.org/pypi/recordclass.


(1) A grep can identify all the occurrences; a sed might even fix them. (translated to Google internal tools obviously)

(2) Apparently setting a __dict__ key works; they could be implemented like that.


There are some defined in built-in modules. Even in Python 2.7, where sys.version_info is a namedtuple.


Yeah, one of the motivations for adding namedtuple to stdlib was a drop-in compatible upgrade of existing interfaces returning tuples. Notable atrocities included `time.localtime()` returning a 9-tuple, and `os.stat()` returning a 10-tuple...


> There are restrictions. I'll update the README to make note of them. Basically, exec and eval don't work. Since we don't use those in production code at Google, this seemed acceptable.

I'm guessing pretty much the entire AST module is a no-go?


I think the CPython AST module is written as a C extension module so currently it's a no-go. I don't think there's a fundamental reason Grumpy couldn't run a pure Python AST module, though.


The ast module itself is in Python, but it imports the _ast module which is an extension module. This actually isn't that big of a deal, though, as the entire AST is defined in a DSL (see https://cpython-devguide.readthedocs.io/en/latest/compiler.h... for some details), so you just have to write some code to generate _ast in Python instead of C (which PyPy may have already done).


So I take that means Grumpy can't run itself?


Correct, Grumpy cannot yet run Grumpy :)


> I'll update the README to make note of them.

I managed to run into 2 trying to build a 5 line program :-)

  $ cat t.py; ./tools/grumpc t.py  > t.go;go build t.go;echo '----';./t
  import sys
  print sys.stdin.readline()
  ----
  AttributeError: 'module' object has no attribute 'stdin'
  $

  $ cat t.py ;./tools/grumpc t.py
  c = {}
  top = sorted(c.items(), key=lambda (k,v): v)
  Traceback (most recent call last):
    File "./tools/grumpc", line 102, in <module>
      sys.exit(main(parser.parse_args()))
    File "./tools/grumpc", line 60, in main
      visitor.visit(mod)
    File "/usr/local/Cellar/python/2.7.12/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ast.py", line 241, in visit
      return visitor(node)
    File "/Users/foo/src/grumpy/build/lib/python2.7/site-packages/grumpy/compiler/stmt.py", line 302, in visit_Module
      self._visit_each(node.body)
    File "/Users/foo/src/grumpy/build/lib/python2.7/site-packages/grumpy/compiler/stmt.py", line 632, in _visit_each
      self.visit(node)
    File "/usr/local/Cellar/python/2.7.12/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ast.py", line 241, in visit
      return visitor(node)
    File "/Users/foo/src/grumpy/build/lib/python2.7/site-packages/grumpy/compiler/stin visit_Assign
      with self.expr_visitor.visit(node.value) as value:
    File "/usr/local/Cellar/python/2.7.12/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ast.py", line 241, in visit
      return visitor(node)
    File "/Users/foo/src/grumpy/build/lib/python2.7/site-packages/grumpy/compiler/expr_visitor.py", line 101, in visit_Call
      values.append((util.go_str(k.arg), self.visit(k.value)))
    File "/usr/local/Cellar/python/2.7.12/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ast.py", line 241, in visit
      return visitor(node)
    File "/Users/foo/src/grumpy/build/lib/python2.7/site-packages/grumpy/compiler/expr_visitor.py", line 246, in visit_Lambda
      return self.visit_function_inline(func_node)
    File "/Users/foo/src/grumpy/build/lib/python2.7/site-packages/grumpy/compiler/expr_visitor.py", line 388, in visit_function_inline
      func_visitor = block.FunctionBlockVisitor(node)
    File "/Users/foo/src/grumpy/build/lib/python2.7/site-packages/grumpy/compiler/block.py", line 432, in __init__
      args = [a.id for a in node_args.args]
  AttributeError: 'Tuple' object has no attribute 'id'


Ugh, sorry about that. There's a couple issues here:

1. Lambda tuple args are not yet supported -- I actually didn't know that was a thing :\ -- https://github.com/google/grumpy/issues/17

2. Even if that worked properly, sorted() is not yet implemented: https://github.com/google/grumpy/issues/16


Yeah.. It also used to work with def, but it was removed in python3. You can do this in 2.7:

  def func((a,b)):
      return b

  mytuple = 1,2
  print func(mytuple)
in py3 you need

  def func(t):
      a,b = t
      return b
Not sure if

This is probably the cleaner way to write that:

  key=operator.itemgetter(1)


sorted() is widely used, adding will extend coverage considerably.


> Basically, exec and eval don't work.

Couldn't they supported with a slower runtime implementation? I mean I still love the idea and actually like the idea.


>- Amusingly, it runs Python 2.7, even though this project started long after Python 3.x came out.

Python 2.7 is what's running at Google. Not really surprising they're looking at this considering the fast approaching end of (core dev) support for Python 2.7.

Write an interpreter in another language and programatically port modules to Go. Seems pretty sensible to me.


Given the failure of their unladen-swallow work to make it into CPython, I think Google is probably tired of trying to make Python faster. Some of their stated goals with Go were to be a faster, compiled Python, so this makes a lot of sense for their use case. They face the choice of fixing all their existing Python code to run in Python 3 (which won't make anything faster), or just porting everything to a different language. They chose the latter, and this lets them incrementally convert Python code to Go. I don't know that this makes sense for anyone but Google, just like Hack probably doesn't make sense for most PHP development that's not at Facebook.


> I don't know that this makes sense for anyone but Google

I'm not sure; as a Go developer, I kind of like the idea of having access to the Python library ecosystem from Go, without being forced to create an IPC bridge and building up the requisite release-management and deploy-time goop.

Plus, I'm just not a Python developer; in the case where the only library that exists to do something is written in Python, I'd much rather write Go that calls that Python library than Python that calls that Python library.


IIUC, it's not about accessing python libs from go. It's for accessing go libs from your python program and transpiling that python code to go source and compile it with go tool chains.

Eg: python code (from blog post)

  from __go__.net.http import ListenAndServe, RedirectHandler

  handler =  RedirectHandler('http://github.com/google/grumpy', 303)
  ListenAndServe('127.0.0.1:8080', handler)


sure but the reverse should be equally feasible. It's transpiling Python to Go, so theoretically we should be able to (eventually) "convert" Python libs to Go and call them from Go. A lot of utility libs are available in Python... the Go library ecosystem is relatively sparse


I'm not sure if it is already possible? Is it.


> I kind of like the idea of having access to the Python library ecosystem from Go

I'd like to see an example of this, because from the blog post I get the impression that this mostly allows accessing the Go ecosystem from Python, rather than the other way around. For example, how would Python classes be handled from Go?


embed a cpython interpreter into the Go runtime as an embedded interpreter?


Statically linked Python interpreter sounds pretty great to be honest.


> Python 2.7 is what's running at Google. Not really surprising they're looking at this considering the fast approaching end of (core dev) support for Python 2.7.

I'd prefer that all new Python tools that need to support 2.x also support 3.x. It's an additional development cost, but IMHO, a worthwhile investment in the future.


Well, the difference here is that Google seems to be looking at Golang as the future for their internal tooling currently implemented in Python 2.x, instead of Python 3.x. I'm curious to know how much additional work might be necessary for this to support 3.x, but it doesn't sound like that's part of their use case.


While python2 may be the past, I think that for many python3 is not the future.


Highly depends on the use case.

Python 3.5 with uvloop+sanic can be faster than node.js without any JIT:

https://github.com/channelcat/sanic#benchmarks


Still slower than OCaml, Haskell, Java or .NET.


Mmm, "faster than node.js" isn't a great benchmark out in the wide world. Although node.js being as fast as it is remains an astonishing thing.


Who cares about the speed of the interpreter? The interpreter's job is to orchestrate high-performance components written in some high-performance language. If your interpreter is dominating execution time, you should move some of your logic to native code.


Better not use an interpreter in first place, rather a language with REPL that allows compilation straight to native code.


Why? Native code is costly: machine code generally has a much bigger footprint than interpreter bytecode. In some cases, interpreted code can be faster due to cache effects and reduced IO load making it faster to be smaller.


I am yet to see such benefits in action.

The fact that Google has started this project to migrate away from Python to an AOT compiled language, shows where the performance wins are.


>I am yet to see such benefits in action.

Here you go:

https://morepypy.blogspot.com/2011/08/pypy-is-faster-than-c-...

https://morepypy.blogspot.com/2011/02/pypy-faster-than-c-on-...

For more examples, just search "pypy faster than c".

Also, here is an article from the Python wiki about why speed doesn't really matter a lot of the time:

https://wiki.python.org/moin/PythonSpeed

And, my own two cents:

Speed is relative. Does every piece of code need to be as performant as possible? No. I would argue that, in most cases, speed of development is far more important than speed of execution. This is, of course, not true for things like drivers or statistical analysis.

Writing a web application? Speed isn't that important as the whole process is i/o bound anyways.

Writing a machine learning algorithm? Depends.

Web scraping? I/O bound, speed not really important.

Image processing? Speed matters at least a little bit.

Writing networking glue for distributed systems? Speed probably doesn't matter.

It's all relative. If it needs to be fast, it needs to be fast. Most things don't really need to be fast. For the things that don't need to be fast, why build them with C/C++/Rust/Go when you could spend half the time building them in Python/Ruby/js/etc?


PyPy isn't an interpreter, you are just validating my assertion about JIT/AOT compilers.

I usually ignore it when talking about Python, because that is what the community does, by gathering around CPython.

> For the things that don't need to be fast, why build them with C/C++/Rust/Go when you could spend half the time building them in Python/Ruby/js/etc?

Because one can use languages like OCaml, Haskell, Lisp, Scheme, Racket, F#, C#,... thus having both the productivity of a REPL environment and the execution speed of native code.


>PyPy isn't an interpreter, you are just validating my assertion about JIT/AOT compilers.

I thought we were talking about Python implementations exclusively. My mistake.

>OCaml

I have no experience with OCaml, so I won't make any comments regarding it's efficacy.'

>Haskell

Well thought out language. I like the purity, but it's too academic for real world use outside of scientific computing. FP isn't for everyone, and my personally belief regarding it is that it is better used as a tool alongside other paradigms than all by itself as the only paradigm.

>Lisp

Lisp is useful for a lot of things. It's also not very popular for new projects as far as I've seen. There are also a ton of different versions, so I don't know if "Lisp" is really a good descriptor.

>Scheme

As far as I know Scheme is the defacto teaching language for most compSci programs.Or, at least it was for a long time. Once again, FP is not for everyone. A lot of people also dislike Lisp style syntax, myself included.

>Racket

Same issues as Scheme.

>F#

F# is a fantastic language. There's not really a whole heck of a lot to complain about other than the .NET implications. The only detriment relative to Python is F#'s much smaller ecosystem and community.

>C#

Once again, some people just don't like .NET stuff. A lot of people also see static typing to be a detriment in many use cases.

Relative to Python, these languages also share several other problems when it comes to real world application: lack of competent developers, stagnating ecosystems, lack of third party libraries, ecosystem lock in, and cross-platform comparability issues. In the case of languages like Haskell, they could even be considered "esoteric".

I'll give you that many of these languages are more "pure" or "logical" than Python. I'll even give you that most of them are designed much better than Python. None of that changes the fact that Python is overall easier to read, easier to learn, easier to write, has a better ecosystem, is platform and file system agnostic, has a very non-restrictive license, and is, overall, very pleasant to work with.


Actually, they said they would keep programming in python.


Well... yes. That's basically my point. Comparing speed to node.js is pretty much useless because if you really care about speed you're not using node in the first place.


From Google's POV, Go is the future, not Python 3, they created it for this reason. As mainly a sysadmin these days, I tend to agree with them for their use-case. For deployment, performance and overhead, Go is great for system tools, which is reflected in the new toys in the "sysops" toolbox. Pretty much every-one of them is written in Go these days, where that used to be Python or for a brief period Ruby or C/C++ for the more performance sensitive stuff.


Python3 was never the future of Python. It never got over the hump all new languages need to if attempting to reach relevance.

It's likely code using lots of C-extensions will continue with CPython2 and new code will be written in Grumpy (pure Python2).


That is pretty ridiculous. Pretty much all major libraries are Python 3 compatible and everyone is writing Python 3 (or should be). (Yes, I'm still on Python 2 but moving soon).


>(Yes, I'm still on Python 2 but moving soon).

I'm like really new to programming and I'm still just learning the basics, but I see this little addendum a lot from people who say everyone should be writing Python 3.


me too. I've migrated a big project from 2 to 3 two years ago. And let's get this straight : innovation happens on python 3. So staying on 2 would feel like riding a dead horse. I know I did the right thing.

And if Google can make an interpreter for 2, then sooner or later, one for 3 will pop up. Since Google made some restrictions on what Grumpy supports from python 2, I'm sure someone somewhere will be able to do the same stuff for three.


It's true, but moving our huge codebase to Python 3 is a big undertaking. We're making progress towards it by using Python 3 constructs for new files, etc. For my personal projects I'm already in the process of moving over.


So is it just a time issue, or are there compatibility reasons for why not all of you code is Python 3?

I ask because I started to learn to code with Python 2 because that's what was preloaded on my system. Is one over the other a big enough deal at a beginner level that I should switch to 3 now? How much of a learning curve am I in for?


> Is one over the other a big enough deal at a beginner level

No, at a beginner level it's not, there are many guides that explain the differences at a beginner level, and you can go through those in a few hours at most, for example http://python-future.org/compatible_idioms.html

But, if you start working now on a Python 2 project and that project starts growing significantly, then it will be hard to convert the codebase. That's why you can see people saying that they didn't switch yet, it's not that they don't know Python 3, it's that upgrading large legacy code bases is hard (not only in Python).


For me, the hardest part was migrating from Python 2 string to Python 3 Unicode string. But at the same time, this was a huge improvement for my code base because I work with several languages and unicode makes that much easier/safer. So it was a good thing.

Now the rest of upgrades were a bit painful (I was using some functional programming stuff, httprequest libraries, etc.)


Okay. Thank you for the advice.

My biggest program is like 100 lines of code maybe. So I will go ahead an switch now. But it's like 10pm where I'm at so here's hoping I don't play too much...


Just a time issue. I don't believe the learning curve is that big, most of it would be in how you deal with strings and the `print` statement, which isn't that heavily used in web apps.


what system is that? a mac? If so , the sadly you do have to get py3 yourself. If a Linux distro, then you likely already have python 3 preinstalled. `/usr/bin/python` won't point to python3 in anytime in the near future.


Haha it's Ubuntu. After I downloaded Python 3.6 last night, I found out I had Python 3.4 already on my machine. I got excited about new software and forgot to check. It's a crutch :/


Here, I'll give you a counterexample:

We're on Python 3 for all new stuff, and are migrating the old whenever we can.


Python 2.7 is EOL in 2020. Make of that what you will :/


The community will fork the 2.7 codebase and continue to support it, even if Python.org EOLs it.


It looks like all Python 2.8 is missing is a new name:

https://www.naftaliharris.com/blog/why-making-python-2.8/


I can see "the community" doing security fixes, maybe some bug fixing and a few back ports, but so far I havn't seen much effort from any community to bring active development of new features.


It seems like the 2.7 community is happy enough without the new features -- just need the bugfixes and continued backwards compatibility to keep that segment happy.

From a new features perspective, the other reply's Placeholder is fascinating. (I haven't looked into it thoroughly yet.)


It is certainly interesting if it truly materializes. But it looks like the plan is to mostly backport stuff from py3- so py3 is still the future, with Placeholder getting some of those features eventually. And that is great from a legacy codebase perspective.


Right.


That means nothing. Python 3 was also expected to be mainstream by 2015, but it's nowhere near even 30% yet.


It's mainstream for new projects. Which is what was expected and aimed for.


Citation needed. Most companies I know that have 2.7 older projects also do new projects in 2.7. They don't want to introduce 2 different versions of the language, set of dependencies etc to their production.


absolutely true. imagine porting a decade or more of code for only the promised benefit of a more "pure" language. asyncio is nice, but excluding it and enhanced generator syntax from 2.7+ is >policy<, not engineering. python 3 is bootheel style top-down engineering >management<, not good engineering. The BFDL is fallible. Placeholder looks like a great Python 2.8+ to me. Runs all my old code and gives me new syntax, without rejiggering the stdlib for purity's sake? Twist my arm.


My python course at uni focused on python3. We talked about differences and toyed with both interpreters, but ultimately wrote all projects and assignments in 3. I'm sure this is the case for students at other schools too.


I've been writing Python 3 for years.


That's top-down change from the PSF/core dev team and some major library developers added dual 2/3 support. The users never arrived and it'll be 10 years Python3 has been available next December. I'd like to unify Python again but we as users didn't break it either. 10 years any reasonable person in charge would hang it up or change course.

Grumpy is pretty much what most everyone would actually want out of a new Python and may have arrived just in time.


> some major library developers added dual 2/3 support

> The users never arrived

That's the way it used to be a few years ago - it changed a lot the last few years. Pretty much all libraries are ported and many new libraries are Python 3-only.

asyncio is nice.

All Python devs I personally know moved to Python 3. Porting is a lot less painful than it used to be.


Gevent is nice. AsyncIO is terrible[0].

[0]http://lucumr.pocoo.org/2016/10/30/i-dont-understand-asyncio...


It's actually ridiculously nice, especially with the new async/await constructs.


Python 3 is largely cross-compatible with Python 2. If you're not working with unicode, not reliant on perfect floating point division, and not using aysncio, then chances are what you produce will work just fine on Python 2.


>and everyone is writing Python 3 (or should be)

You'd be surprised.

If anything, the numbers show the opposite. The vast majority of Python codebases, legacy or new, are 2.7 or older.


Which numbers? People keep saying things like that, but things like the Python 3 Wall of Superpowers https://python3wos.appspot.com/ don't seem to support it. Do you have something more concrete?


The "Wall" is just numbers for Python3, without context for how it compares to Python 2.

For 2.7 Pypy reported 419,227,040 downloads for 2016.

At the same time, for ALL 3.x versions combined (up to 3.6) there are just: ~52 million downloads.

That's 1/8th of the Python 2 downloads.


Given that there are only 7.5 billion humans on the planet, and that rather significantly fewer than 1 in 20 people are PyPy-using developers, perhaps those numbers should be taken with a grain of salt?

The message I would take from those statistics is that needing a fresh download of Pypy is less common among 3.x users than among 2.7 users, who apparently needed to reinstall from the web at least a few times a day during 2016.


> The message I would take from those statistics is that needing a fresh download of Pypy is less common among 3.x users than among 2.7 users, who apparently needed to reinstall from the web at least a few times a day during 2016.

Occam's Razor would suggest that there are fewer Python 3 users


It should be PyPI rather than PyPy in the parent, FWIW.


Oops, mea culpa.


>Given that there are only 7.5 billion humans on the planet, and that rather significantly fewer than 1 in 20 people are PyPy-using developers, perhaps those numbers should be taken with a grain of salt?

Those are not downloads of PyPi, but of packages. It's not like "number of downloads == number of individual developers". Those are packages, including package updates. A single developer can download 50 deps across his codebase, and update them to later versions 2-3 times a year.


> A single developer can download 50 deps across his codebase, and update them to later versions 2-3 times a year.

As if individual developers are the reason behind the bulk of the downloads. I wonder how many downloads Travis alone counts for?

Your hate of Python 3 in every discussion about it is frankly baffling.


>As if individual developers are the reason behind the bulk of the downloads. I wonder how many downloads Travis alone counts for?

Travis runs/tests user projects, so there's nothing about it that's especially partial to Python 2 over Python 3.

>Your hate of Python 3 in every discussion about it is frankly baffling.

Or, you know, my pragmatic assessment of its popularity.

That you'd even use the word "hate" (when in fact, I like Python 3 over 2.7, even if its mostly tame updates over what 2.7 offers) shows that you're probably too partisan. I was enthused with Python 3 even when it was only a vision called Python3K back in 2000-ish. My personal preference has nothing to do with whether I see more people using it or not.

The situation is not unlike the perennial "next year is when Linux dominates the desktop", which has been every year since 1999.


> even if its mostly tame updates over what 2.7 offers

> The situation is not unlike the perennial "next year is when Linux dominates the desktop", which has been every year since 1999.

Your bias is showing, as it does in every comment section on this site regarding Python 3, as you make comment after comment about how inferior Python 3 is and how nobody is using it at all because your sample of 2 companies shows this and how it personally hurt your family or whatever. You don't stop. Either you hate it or you hate something else and use Python 3 as a vent.


If by "your bias" your mean my assesment of the state of Python 3 vs 2, that doesn't change depending on whether I like the language or not, then we agree.

>as you make comment after comment about how inferior Python 3 is

Actually I've never made any such comment. In fact, tame updates" means that IT IS an update over 2.x, only not that much as it could be. Which most people I've read agree, or at least agreed until the async stuff.

>and how nobody is using it at all because your sample of 2 companies shows this and how it personally hurt your family or whatever.

Notice how I never said that, but actually gave concrete numbers that place those using it in much less (up to 1/8 less) of those who use 2.x?

So why the lie? Less is not the same as "nobody at all", and doesn't fix by itself just because you really really wish more people used 3.

>You don't stop.

Yeah, I continue expressing my opinion and my argumentation. I should stop because you happen not to like it?

Please don't bring "the feelz" into technical and community discussions. It cheapens the argumentation. If anything, it's you who are biased: 80% of your submissions on HN are for Python stories.

One can acknowledge that D is way less popular than Golang or that Perl 6 failed to gain traction over 5, without hating Perl 6. Ditto for Python.


How can you accuse him of that? You're rebutting everyone with Python3 criticism, including myself. This is a prime example of projection. You're the one on a rampage, against Python2.

From what I've seen of his posts, he's only talking about the reality of the situation.. not "how it should be".

Go look at the stats on PyPi and other metrics. Python3 failed, there is a cutoff time for adoption. It's no different than the first 24 hours of a missing person report. You don't get eternity to see if something is going to pan out or not. We're past that point for Python3. It may survive as it's own (smaller) thing, but Python2 isn't going to die either and that's more assured than Python3's fate.

And coldtea is right, but we're not going to do your research for you. What I'm saying needed to be said to you, but you need to find better ways to contribute than just rebutting everyone who has something to say about Python3. Talking about how he hates "something else" and using Python3 as a vent is just ridiculous.


That response is talking about something else because your original comment accidentally said "PyPy", which is an implementation of Python, instead of "PyPI", the package repository.


A huge confounding factor: newer Py3 codebases are more likely to be built with newer pipeline tooling like devpi (to cache PyPI downloads), wheel (to cache locally-built packages), and Docker (which caches all the things).

Our legacy Python 2 build pipelines that we're actively moving off of hit PyPI far more often than our Py3 processes.


>A huge confounding factor: newer Py3 codebases are more likely to be built with newer pipeline tooling like devpi (to cache PyPI downloads), wheel (to cache locally-built packages), and Docker (which caches all the things).

Maybe in your case, but from what I've seen, I seriously doubt use of Docker or Devpi makes any dent in newer Py3 codebase dependency downloads. Besides, tons of new codebases for greenfield projects are still done in 2.x Python.

Not sure how it is in scientific computing area, but for enterprise/web apps, any company that has legacy 2.x code and libs in production (which is most of them) will continue to write new parts (including new projects) in 2.7 for compatibility with their Python production setup.

3.x is either from companies that didn't already have significant 2.x Python code in production (generally newer companies that for some reason went with Python instead of Node or Go that the cool kids use) or new programmers that just get started and start with 3.x.


The PyPy statistics aren't worth much since they're counting all sorts of automated downloads/dependencies/etc.

That's why packages like supervisor and graphite - which aren't libraries - are among the top downloads.


>The PyPy statistics aren't worth much since they're counting all sorts of automated downloads/dependencies/etc.

Those would exist for both 2.x and 3.x so it's not a differentiating factor.


There's plenty of 2.7 out there, but we're moving over slowly, basically due to the nice function annotation/type checking work. That's to me the first really compelling reason to use the latter.


Hell, even OpenStack (which is giant tangled mess of code written by like a few dozen teams) is making good progress.


> - It's a hard-code compiler, not an interpreter written in Go. That implies some restrictions, but the documentation doesn't say much about what they are. PyPy jumps through hoops to make all of Python's self modification at run-time features work, complicating PyPy enormously. Nobody uses that stuff in production code, and Google apparently dumped it.

Does it really imply many restrictions? Common Lisp, for example, is probably more dynamic than Python and it's been a compiled language for ~20 years.


> ~20 years.

Common Lisp was designed for interpretation and compilation from day one. The first implementations from 1984/85 had already compilers.

> Common Lisp, for example, is probably more dynamic than Python

Some parts are more dynamic than Python, some not. For example everything that uses CLOS+MOP is probably more dynamic. Also some stuff one can do when using a Lisp interpreter may be more dynamic. CL is more static, where one uses non-extensible functions, type declarations, static compilation, inlining, ... The parts where a CL compiler achieves good runtime speed may not be very 'dynamic' anymore.


20 years only takes us back to 1997, several years after Common Lisp was finally standardized. The ancestral dialects of Common Lisp were compiled as far back as the early 60's.


I didn't mean to imply that Lisp was only 20 years old or that common lisp was precisely 20 years old (I used the `~` to indicate that Common Lisp was approximately 20 years old), I just wasn't sure whether Lisp has always been a compiled language, so I restricted my claim to being a claim about Common Lisp and estimated its age conservatively.


In regards to the third point: The global interpreter lock protects the fact that python's GC scheme is not thread safe. It does not coordinate accesses across threads, and therefore grumpy's would not either. In grumpy the GIL is replaced by Go's GC implementation that is specifically tuned for multithreaded execution. Any additional synchronization would need to be done with individual locks etc...


The GIL is not just for GC; it does coordinate access across threads, albeit at a very low level -- if two threads execute "mylist.append(v)" at the same time, it is the GIL that makes sure it actually works as expected, and from comments above it seems grumpy uses per-object locks for that.


That's not a good characterization of the GIL. It doesn't prevent races or make multithreaded mylist.append safe. It makes sure that mylist.append doesn't cause a segfault as the bytes in RAM are in an inconsistent state during an update. Beyond that, it doesn't really protect you from your bad threaded code.


(assuming mylist is a standard python list) it does prevent races inside mylist.append. It does make mylist.append safe.

When I wrote "append from two threads .. as expected" I meant "two items will be added, which one first is unspecified", and the GIL certainly takes care of that.

I agree it does not protect you from your bad threaded code - but then, nothing short of STM does (and even STM doesn't guarantee starvation in the general sense - nothing can).


An unintended side effect of the GIL is that all calls to a C implemented function are atomic and single threaded, provided that the C function doesn't release the GIL. In practice this means lists and dicts are thread safe and existing Python code relies on this.


No support for 3 makes sense. This is all about building an off-ramp to put Python behind them.


I was in second year of college and the Python 2 vs Python 3 was a couple of years running. Is this fight -still- not resolved? I'm not a python developer so I'm out of the loop.


The arguing will continue for years after Python 2 is legitimately dead, but the shift has been happening and will continue to happen. Python3 is the future, and more and more new projects are being started with it, and more and more legacy 2.7 codebases are being moved to Python 3 or deprecated in favor of Python 3 replacements.


> Nobody uses that stuff in production code

Nobody uses the features of Python which make it a dynamic language? Google must write some really weird Python if their compiler is that strict.


In Python, you can get at a variable, or even code, in another thread with "getattr()". You can monkey-patch another thread while it is running. This is not very useful, but it's easy to implement in a naive interpreter such as CPython. Part of the price for this is the Global Interpreter Lock, so you don't really have two threads running at once. PyPy has a huge amount of machinery so that stuff will work.

Grumpy doesn't even seem to try to implement that. That's a good thing. If you restrict Python a little, it's much easier to compile.


> That's a good thing. If you restrict Python a little, it's much easier to compile.

Isn't that more or less what RPython does? https://rpython.readthedocs.io/en/latest/architecture.html I mean, I know that starting with a full-fledged(?) Py27 codebase rules out _actually_ using RPython for the stated goals of Grumpy, but I think the two projects agree in principle and differ about the definition of "restricted" :-)


RPython is a restricted (hence R) language specifically for VM development, it is not a general-purpose language.


>Nobody uses the features of Python which make it a dynamic language?

Python has TONS of dynamicity besides those (eval and co), who are seldom used by anyone anyway....

If you think eval is what makes Python dynamic you're doing it wrong...


I'm not sure what is meant by "dynamic language" in that sentence, but examining the compiler output, it supports the features of Python which I think of when I think of "dynamic language" (e.g., a class `Foo` gets compiled into a runtime `*Object` with collections of properties and methods, not into a `type Foo struct` with fixed fields and methods).


Anybody running Django uses this. It uses the pattern of specifying plugins as class paths in strings in the config, which are then looped over and instantiated at runtime.

Frameworks do lots of such dynamic tricks in order to provide nice DSLs for building apps.


'exec' and 'eval'? No it's not, the importlib machinery is used (which doesn't just eval(read('import.py')))


I have never used exec or eval in production Python, and I doubt I could get them past code review because of the possible security impact.


Their python is weird if it doesn't use eval?


or getattr/setattr, or dynamically building classes with type(), or probably more features I can't remember now.


  > getattr/setattr, or dynamically building classes with type()
I think Grumpy handles both those things fine.


What does "hard-code compiler" mean?


It seems to be that it pesudo-transpiles python to go and compiles that down using a normal go toolchain.


What's the difference between transpiling and pesudo-transpiling? (Even if you meant pseudo-transpiling, I still don't know what the difference between that and transpiling is.)


That was a typo. I meant pseudo. I don't actually know that there is any difference in this case. I made that statement hastily. It transpiles, nothing pseudo about it.


Transpilers with no runtime.


It's written to run Python 2.7 because these problems are largely solved in Python 3, and needs solved for people on Python 2.x versions.

"Upgrade to Python3" is the usual defense to that, but it's not really practical for large companies with software such as YouTube completely written in Python 2.x.


No, its written to support 2.7 because the majority of Google's code they want to over haul is in 2.7. I don't think much of this is fixed by python3, at the very least the speed benefits you get don't even compare. See the graph on the OP comparing # of threads using CPython and Grumpy


How is it not practical to upgrade to Python 3, yet it is practical to rewrite in Go?


As someone who works on both python and go day to day, I find this to be quite interesting.

Just tried this out on a reasonably complex project to see what it outputs. Looks like it only handles individual files and not any python imports in those files. So for now you have to manually convert each file in the project and put them into the correct location within the build/src/grumpy/lib directory to get your dependencies imported. Unless I missed something somewhere.. The documentation is a bit sparse.

Overall I think the project has a lot of potential and I'm hoping it continues to be actively developed to smooth out some of the rough edges.


Thanks for trying it out! And sorry about the lacking documentation. I'll be fleshing it out over the next little while.

Your assessment is right: the grumpc compiler takes a single Python file and spits out a Go package. Incidentally, this means you can import a Python module into Go code pretty easily.

I don't have a ready solution for building a large existing project but I'll write up a quick doc to outline the process. The trickiest bit is that the Python statement "import foo.bar" translates to a Go import: import "prefix/foo/bar". Currently prefix always points at the grumpc/lib directory so that's one way to integrate your code, but I need to make it more configurable.


I hope this is a well thought out solution that can evolve into something great... and not just something built for a single purpose.

I question the transpiler. I think I'd much rather prefer a solution like Jython.


I'm confused because Jython runs on the JVM, but Go is a compiled language. Can you clarify?


Jython is a python interpreter written in Java. Grumpy is a python transpiler that converts python to navtive go object code.

Edited to add: The difference is that Jython doesn't covert python to JVM bytecode.


What's the advantage of writing an interpreter? Go already has an excellent runtime (scheduler, GC, etc)--why should this project reimplement it?


The interpreter could still use that stuff.

One advantage of an interpreter in general is that one important use case for Python is interactive scripting, as data scientists do.


Fair point. I would think it shouldn't require too much work to build a REPL on top of this. Rather than transpiling, you would parse the Python AST into the same runtime Objects that Grumpy constructs statically. Seems straightforward conceptually, though I'm sure it would be complex in practice.


I'm sure as this gets hacked on they'll be able to support consuming imports and doing all the conversion to go recursively


For those who are interested, I've used grumpy to compile the following Python code and placed it at https://play.golang.org/p/YP1SP7WsdR . (Note the playground can't run this, it just had convenient formatting support for Go; the generated source wasn't 100% gofmt compliant.)

    class Test(object):
        def __init__(self, value):
            self.value = value

        def method(self):
            print(self.value)

    class Test2(Test):
        pass

    t = Test("hello")

    t.method()
Pythonistas, note I had to have "class Test(object):" and not just "class Test:". The former compiled successfully into a Go program but that program then failed at runtime with "TypeError: class must have base classes".


Thanks for trying it out!

Yeah, Grumpy does not currently support old-style classes. Since all of our code internally requires new-style classes, this was not a high priority feature. It is something that we'll get to.


I did not mean that as a complaint against your very young codebase, I meant that as a defense against Python people complaining about my code. :)


I don't think they'd do that. All Python 2 code I've seen uses `class Foo(object)`, at least since 2.2 came out.


That's fascinating. It's creating run-time data structures similar to CPython's for data, and manipulating them with very general code. There seems to be a type comparable to Python's internal CObject, and it's used for most (all?) data. It's not generating Go that looks anything like human-written Go. There's no sign of type inference, although it's hard to tell from such a simple example. It's a lot like a Python run time environment, where everything is a CObject. Still, once you can do that, you can start optimizing, such as inferring that something is an integer and using ordinary Go arithmetic types.

All that stuff with "switch" seems to be to handle Python exceptions in a language that doesn't have exceptions. Maybe later, analysis can tell that some function can't raise an exception, and translated calls for such functions can be simpler.


> Still, once you can do that, you can start optimizing, such as inferring that something is an integer and using ordinary Go arithmetic types.

I was hoping for something more aggressive even, like compiling Python classes to Go structs so long as the program doesn't need the dynamic behavior. Alternatively, Grumpy could support declaring native Go types via some sort of pragma or a new `struct` keyword or some such, which would be treated like a normal Go object (rather than defining your Go objects in a separate Go package).


I'd expect to see that in time. If you analyze the whole program to find all the fields of an object and verify the absence of code which dynamically adds a field, you can then make it a struct of "CObject" like entries. Then, try type inference on the fields. Some will clearly be integers, booleans, floats, or strings. Those can be represented with type-specific representations.

If you can identify the built-in types, that's most of the potential win; you get to do hardware arithmetic. If you represent integers as 64 bits and check for overflow, you probably don't need bignum promotion outside of crypto code.


Are Go's integers unbounded? If not, proving the value never exceeds the range to silent convert essentially a BigInteger into an int might be hard.


Go tends to use machine sizes like C, but this implementation appears to properly handle it. The following Python code has identical output for me under Python and grumpy:

    two_32 = 4294967296
    print(two_32 * two_32)
    print(type(two_32 * two_32))
And I tested some other things I won't burden HN with, but promotion is implemented, yes.


Interesting to note that `print()` is supported out of the box--no need to `from __future__ import print_function`.


Sure, but that works without from `__future__ import print_function` in Python2 as well.

But it gives different output; the print() prints a tuple whereas the function print() prints a newline.

Also compare print(1) and print(1,2) with and without the __future__ import.


Oh, weird. I could have sworn I've seen errors for using parens with `print` in Python 2...


That's a print statement with an expression surrounded by parentheses. You can try it yourself in a Python 2 interpreter.


I can't help but see this balkanization of Python as a sign that the core language is falling apart.

How many interpreters are there now? And how many of them have even close to 100% compatibility with Python 2.7 or 3.N? Guido has lost control of the language, but has he's still officially the BDFL there's no real standardization body. His stubborn view on functional mechanisms have held the language back syntactically, breaking BC with Python 3 without fixing the language's fundamental problems... it really feels like Python is lost in the desert.

Which doesn't mean the language is dead, but it's rudderless. I think we were all hopeful when Guido joined Google that we'd see real direction for Python, but that obviously didn't happen.

Not that Python is dead, obviously - still lots of great projects are written in Python. But I don't like the language's future.


Why do you think multiple implementations are a problem? We've got multiple ruby runtimes, multiple basics, multiple JavaScript engines, multiple C compilers, etc. None of those languages are falling apart because of it.


It's not that different from what's happening with Java (via Android) or Go (with GopherJS, for example), or even C (many, many extensions). A popular language attracts implementors, even if they don't implement the whole thing on their target platform.


Actually, historical evidence suggests the opposite of your argument: multiple implementations of a language typically mean it is succeeding, not failing.


Really then, it suggests that Python 2 is succeeding, not Python 3.


GVR, the PSF and the core dev team overestimated their influence. They still truly believe the majority will come around to Python3. I agree with your sentiments and balkanization is the right word. Guido won't even read these comments. He thinks it's all some unjust slander and nonsense that will be forgotten in 3 years. :)

It is a good time to jump off the Python train in general, and I say that as someone invested in Python who loves it. If possible I'd recommend people reach for Go or Elixir depending on their needs or requirements.

I will admit I'm a little shocked how much of a failure Python3 adoption has been. I think if it had been Grumpy from the start it would've been a huge success. This is exactly what people want and Google should be commended for sharing this.

Here's to hoping Grumpy takes on a life of its own and is the new de facto Python.


Both my workplace and the one big open source project I use homeassistant all use python 3. Do you have any data to backup python 3 adoption being a failure? I know it was certainly painful for many years.

As a python 3 user everything seems fine on my end. Though 3 has its own new warts. They are smaller and more forgivable warts for now but its probably not a good sign.

I do agree the direction python is heading is not very interesting anymore but that doesn't mean its dead or useless now.


We try to use Python 3 at work but have to run Python 2 as well because there are still packages that weren't upgraded and it's too much working re-writing them all for no direct benefit.

We also have third party vendors that only support Python 3 in experimental versions, and there not even recent versions (Bloomberg is a great example).

I really like Python3 features, but the pain of using them drives me towards using other languages. I hope Julia will be stable and mature enough soon so that I can dump Python all together. I really like Julia, but currently the changes in the languages are too fast and there are constantly incompatibilities with packages that don't update fast enough. But I'm reasonably certain that this will be fixed once they reach 1.0.

I'm sure Python is far from dead, but can imagine that Julia has the potential to kill it in many domains.


Do you do data science?


Yes, mostly for finance.


What python 3 features are painful and why?


> Here's to hoping Grumpy takes on a life of its own and is the new de facto Python.

I can appreciate you have that opinion, but I'll be livid if that's true - the last thing in the entire world I want is to do battle with dependencies and the very, very strict/opinionated Go build system.

If your idea is that all Py27 is just transpiled into Go and then jettisoned, that's fine, but keeping one foot in each world sounds terrible.


Go compiles down to native code (x86, ARM assembly etc). That's what Grumpy code generates as well but it still needs to be maintained in the original Python2 or Go (depending on whatever your original source is). What I'm suggesting is that Python3 is jettisoned and Grumpy takes the forsaken throne that Python left behind.

One thing is clear with all these new compilers/runtimes, you want to be writing Python2 syntax because that's where all the action is. I hope Grumpy succeeds and new features are added and becomes it's own ecosystem that plays nicely with Go code. These folks at Google have really done what Guido & Co should've done.

This is Python3 as most of us wanted it to be, it's worth rewriting all your code for... but you don't even have to do that. Valid Python2 is Grumpy already. I don't know what else I'd want. It compiles existing Python2 AND offers a legitimate upgrade from CPython at the same time.

As far as all of the lost C extensions? You won't need them with the performance that the Go runtime has. That's been the answer all this time, not maintaining C-extension compatibility.

They nailed this thing, it's the answer to "what's the future of Python?" that everyone has been wondering for the past 9 years.


> you want to be writing Python2 syntax because that's where all the action is.

> This is Python3 as most of us wanted it to be.

> Valid Python2 is Grumpy already.

> It compiles existing Python2 AND offers a legitimate upgrade from CPython at the same time.

> As far as all of the lost C extensions? You won't need them

None of the statements are true. You seem to be very confused what Grumpy can and can't do, and what the need of actual Python developers are.

The only benefit of Grumpy is speed (I don't think go "interop" counts). Now, that's a pretty big benefit for some, but comes with significant drawbacks and probably always will. Even though CPython is only the reference implementation, many clever people have worked to make it faster. Getting rid of the GIL is also very difficult. The easiest way to gain speed is to limit Python to a subset of features and then optimise for that. While this is a fair approach, hailing it as the future of Python is terribly misguided.


Citations needed for your assertion on each point being false.

Grumpy proves Python2 is where the action is at. Everyone wanted a speed improvement with a new Python, that's the ultimate carrot.. instead Python3 was and still is in some ways slower than 2. Other than exec, eval and C-extensions, Python2 is valid Grumpy.

You didn't provide any reasoning or proof that my points, which were just reiterated, were false. If you're going to "port" anywhere from Python2, removing C extensions (which no language should have to be dependent upon anyway, so it's an improvement) and exec/eval usage is a bigger win than Python3.

The future of Python is what the users decide, not what the PSF decided. I recognize there's a lot of confusion and propaganda surrounding that. This is open source, not top-down control.


Lead on the project said he would like to support python 3 at some point.


That would be great. But the issue is really what do people do that have all this mass of Python2 source. The Python3 people are off doing what they want to do on that, which I consider the experimental branch. There's just so many new mistakes made with Python3 it's not a slam dunk for people to move to. At this point, it's become more of a social pressure / political thing (2020?) than a logical decision to move to 3. Something like Grumpy is definitely going to take the throne that Python3 abandoned.


You are upset about Python 3 adoption, so you advise reaching for much less popular Elixir or for brand new Google-specific Grumpy? If insufficient popularity were the problem, those choices wouldn't make any sense.


> They still truly believe the majority will come around to Python3.

People will upgrade if they make py3 more appealing, something like a 20% speed boost would be nice.


> His stubborn view on functional mechanisms

Reference, for a non-Python dev who hasn't kept up with it?


Guido has said that he wishes Python didn't have lambdas.

Also, map and reduce were removed from the standard global namespace and into the functools module.



Python needs a new runtime. This talk shows how bad of shape it's really in.

https://www.youtube.com/watch?v=qCGofLIzX6g&list=PLRdS-n5seL...

Basically, the language doesn't have a "spec" per-se. The language is whatever the defacto CPython implementation happens to do within it's giant eval loop.

Another great talk about CPython internals:

http://pgbovine.net/cpython-internals.htm


> Basically, the language doesn't have a "spec" per-se.

It does[1]. And process of improving it is called PEP[2].

[1]: https://docs.python.org/

[2]: https://en.wikipedia.org/wiki/Python_(programming_language)#...


Uh, what? Claiming that your [1] is in any way a specification for the language is utterly absurd. It's far too vague. (Compare to even an IETF RFC, and you'll see what I mean. If you want to compare to a real language spec, compare to ISO C++.)


> to a real language spec, compare to ISO C++.

No thanks. Written specs can always have interesting implications or undefined behaviour. Just because it's written in a more verbose language (English) doesn't mean it's less vague.

E.g. GGC is the de facto C spec for many. Code/platforms as spec makes more sense and is easier to maintain/update, with quicker iterations of language features (c.f. Ruby/Python to C++).


My issue isn't necessarily with the fact that it's in English. It's that it's hopelessly imprecise English. Maybe you'd have less of an issue with the Java Language Spec? (Which, IIRC, even left out some memory model problems until recently.)


CPython is the spec (or really more the CPython test suite). Just like the Ruby MRI. It's a simple, plain interpreter without many frills, and to add or remove a feature you have to submit a PEP which goes through a specification process.

Python started as a one-man-band project and of course didn't have a specification.


Yes, but that's what the PP was arguing wasn't the case (by linking to a "spec"). I'm not sure why you're restating the obvious (OPs position).


> Python started as a one-man-band project and of course didn't have a specification.

C started as a one-man-band project and of course does have a specification.

JavaScript started as a one-man-band project and of course does have a specification.


C wasn't a one man band project (at least a two-man-band one at the start!) and neither was JavaScript. C also didn't have a formal specification for over 20 years (only an informal one) and JavaScript had a strong selection bias for interpreters that roughly conformed to the specification. But that didn't exactly help it, JS is/was notorious for differing implementations of browser APIs.

Each of them also has a strong need for a specification, as there are many differing compilers and interpreters. There are a few for Python but are specialized, the CPython interpreter is good enough for 90% of cases.


For your narrow minded understanding of what a programming language specification is only: https://en.wikipedia.org/wiki/Programming_language_specifica...

You can say it's imprecise or lack of ratification from one or many international organization(s), but you cannot say it doesn't exist. End of story.


The reference [1] is a decent spec. It might not be as formally rigorous as an ISO standard, but it's probably as good as Go's [2], which also a "reference".

[1] https://docs.python.org/3/reference/index.html

[2] https://golang.org/ref/spec


In my experience with both languages, while the Go spec is incredibly readable, navigable, and succinct, the Python reference is a sprawling mess that is difficult to navigate or even to Ctrl-F in.


To be fair, Go is also a much smaller language, which hasn't gone through the process of collecting and shedding multiple layers of legacy, and exposes far fewer implementation details.


cpython is the reference implementation, so it makes sense that it's;

A) Not well optimised.

B) Touting features before the spec/standard.

EDIT: people really dislike that I said this, and I'm having trouble finding my original citation- it was on one of the many python books I own. Most likely "Learn Python The Hard Way" but I'll dig out the exact chapter where they compare pypy to cpython and mention that because cpython is the reference implementation it values code clarity over performance optimisation.


Who says CPython is not well-optimized?

CPython is 25 years old -- people have been making it faster for a long time. Python 3.6, the latest release, has many performance improvements, cf. http://www.infoworld.com/article/3120952/application-develop...


Interestingly, achieving performance parity with CPython is one of the biggest challenges of this project. There are certain things CPython does very fast like allocating and freeing many small objects.


So, Grumpy currently isn't faster than CPython?


Notably, some of the 3.6 performance improvements were merged in from PyPy :-)


It makes sense that the official, primary and by far most popular implementation of one of the most used languages in existance is not well optimized?

(edit: I'm just being polemic about your statement here. CPython is reasonably optimized within the constraints it currently has).


It only makes sense because it's python. A language where style is part of the syntax, and readability is one of the things that many libraries focus on , the so called being "pythonic".

It makes sense that the reference implementation mirrors the same patterns than the language itself.


Someone else already posted this link: https://www.youtube.com/watch?v=qCGofLIzX6g&list=PLRdS-n5seL...

It explains why CPython can't improve on many things.


Please watch that first video, it's a good one. It explains how CPython essentially _is_ the spec because its internals leak into the spec when they have no business being there.


Pretty much the same is true about most other languages that have a single main implementation. This is even true, to some degree, for Java, which had competing implementations relatively early on.


At some point it leaked pretty hard as well. The package scope was an unspecced implementation behaviour that became a standard later. (If I recall the story correctly)


The main reason I moved from coding in Python to Go as my main language many years back is because concurrency was such a pain in standard Python (the other was compile time error checking).

It's interesting to see the same pain has now made caused the runtime itself to be implemented in Go.

It's a pity C extensions (often used in scientific computing) are not supported but Go does have support via CGO, so maybe some approach can be worked out to access C routines in the future.


I "grew up" on Python, then wrote a whole bunch of Go for my job. Then this past Autumn re-visited Python to implement a networked terminal based game[1].

With what I learned about Go and concurrency, I would say that currently in Python, writing concurrent code is not very hard, and is as close to Go as you can get without actually just writing Go.

Now, you may be saying "but Python has the GIL, how can concurrency be easy in Python?" I'd say, you're definitely not wrong that the GIL is a problem, but it's not much of a problem for concurrency.

This goes back to the heart of Rob Pike's classic talk, "Concurrency Is Not Parallelism"[2]. To quote Wikipedia:

        In computer science, concurrency is the decomposability property of a
        program, algorithm, or problem into order-independent or partially-ordered
        components or units.
In Python, you can pretty easily emulate the conceptual properties of Goroutines and Go channels with Python threads and queues. The problem is that doing this in Python won't net you the performance increases you get with Go. And I believe that is an important distinction. There are plenty of cases where you don't care so much about the performance benefits of parallelism, but you want the conceptual and implementation benefits of concurrency.

In closing, concurrency in Python is pretty easy to work with, it just performs very poorly.

[1] - https://github.com/lelandbatey/defuse_division

[2] - https://blog.golang.org/concurrency-is-not-parallelism


I wrote a little task module that does precisely that and provides you with a go() function and "channels":

https://github.com/rcarmo/python-utils/blob/master/taskkit.p...

Obviously, it wasn't amazingly performant. But it did help a lot for doing concurrent stuff, and I've been pondering re-doing it for asyncio.


Using Gevent[1] is quite similar to Go when it comes to concurrency in Python.

[1] http://sdiehl.github.io/gevent-tutorial/


Concurrency is even easier with the new async/await syntax


I'm not philosophically opposed to supporting C extensions. The additional complexity just was not deemed to be warranted since the YouTube frontend doesn't use a lot of C extensions.

In principle it's possible to implement something like JyNI (http://jyni.org/) or CPyExt (https://morepypy.blogspot.de/2010/04/using-cpython-extension...) to bridge the CPython and Grumpy APIs. In practice, marshalling data across the interface can be very expensive.


Out of curiosity, what C extensions does YouTube use?

If this is good enough to run YouTube's python code already it's honestly super impressive. Well done.


To be clear, Grumpy cannot yet run YouTube's Python codebase. There's still a lot of work to do on the standard library.

There are a handful of C extensions for JSON, protobufs, etc that YouTube uses, but mostly they're small utility functions written by us to optimize particularly hot code paths.


Is the plan to use grumpc to transpile the code to Go and then work in Go in the future, or is the plan to keep coding in Python and add a grumpy step before deploying?

(If the former, you could just update the code to use the standard Go JSON/etc packages..)


The idea is to continue to write code in Python. The transpiled code is not suitable for working with directly. That said, there is the possibility of rewriting bits and pieces in Go (e.g. performance critical stuff) and then call into it from Python. Sort of a hybrid approach.


I"m also curious about this


Will these be rewritten in go then?


Does anyone know the technical reasons why C extensions are not (at least easily, apparently) supported by Go? Is it to do with Go's being a GC'd language? I would have thought that should not be a reason per se, since Python also has GC, but has plenty of extensions written in C. But I'm not a language internals expert.

Also, further signs that GC may not be the reason, is that D also has GC, but can link to C libraries somewhat easily (not sure about all cases or how far the ease goes).


It's because Go uses a different stack structure, called "segmented stacks", in order to enable cheap goroutines. Basically, Go stacks start tiny (8 KiB, as opposed to much larger C stacks), then it grows them in small segments. Additionally, Go code runs inside an event loop, which enables excellent I/O performance without kernel context-switches, and ordinary C function calls conflict with this event loop.


Segmented stacks in Go went away in Go 1.3 (https://golang.org/doc/go1.3#stacks; June 2014).

The alternate stack structure is indeed one issue. The bigger one is the GC, though; the Go runtime needs to know which pointers it is responsible for freeing, and which are the responsibility of the C code.


> The bigger one is the GC; the Go runtime needs to know which pointers it is responsible for freeing

That is not the bigger issue, and AFAIK already handled for C types.

The stack/calling conventions is the reason why cgo is "not go", cgo calls have significantly more overhead than just about every other FFI (the overhead of a cgo call is ~2 orders of magnitude more than a "native" go call, or was around the same time last year, that is you could perform ~100 no-op non-inlined native calls to a do-nothing function by the time you need for a single cgo call to the same).


The original question was about why C extensions are not supported by Go. It's not a matter of performance; it's a matter of correctness.


>Additionally, Go code runs inside an event loop, which enables excellent I/O performance without kernel context-switches

Interesting, didn't know this (that Go code runs in an event loop). Is the reason something to do with goroutines and channels? something like, a routine gets info that data is available for it to read (on a channel, sent by another goroutine), via an event it receives?

Also, can you explain this point:

"which enables excellent I/O performance without kernel context-switches" ?


Since we also talking about Python. If you ever used AsyncIO you will see that programming in it is a bit different than you usually write code without it.

Before you can call any coroutine you first need to start an event loop and schedule something in it. This essentially enables the language to schedule another async function each time you use await.

Since Go by default always is async, before your main function is called, it sets up the even loop and then calls your main, which technically is also a coroutine. Your code appears to be sequential, but it is not executed that way.


Interesting ...


One of the reasons an asynchronous I/O event loop can be faster than a threaded model is that the CPU spends more of its time in a single userspace thread per core, switching between clients that are ready. A threaded server will incur a kernel-space context switch each time, while an asynchronous loop will keep the processing time in userspace.


Go has cgo, which works fine for most purposes of fine; native code interop is not an issue for Go.

Grumpy likely doesn't support the C extensions due to time, and complexity of having to actually emulate the GIL since Python does not have fine grained locking for structures. C extensions that work with Python data structures need to first hold the GIL.


> Does anyone know the technical reasons why C extensions are not (at least easily, apparently) supported by Go?

It's because Python's C API is inherently non-thread safe. The API lacks passing an interpreter pointer as a parameter (as Lua's API does for example). So Python is forced to use a terrible thread local storage hack involving the Global Interpreter Lock to swap interpreter instances which is insanely inefficient and limits compute-bound programs to a single thread.

Python 3.x had a chance to fix the API and do away with the GIL once and for all, but inexplicably they did not. There was a misguided notion that C extensions between 2.x and 3.x could be interoperable.


I never saw Python as anything more than a scripting language to portably automate tasks across UNIX and Windows environment, even back in the Zope days.

My experience with Tcl teached me to stay away from languages that don't have either a JIT or AOT compiler on their reference implementation.


I think that that's rather dismissive of a language that runs huge web, scientific, and general purpose applications daily.


It might be, but that it is usually a consequence of not knowing any better or making use of existing libraries.

Just like people learned 8-bit BASIC and went on to do business applications and games on it. I went Z80 ASM instead.

Personally I would only use Python for shell scripting and advise for using Julia instead.

Of course, others see it differently.


Julia? Look, I enjoy playing with Julia, but have you actually used it? Because most of the people I meet who turn up their nose at Python and say nice things about Julia haven't ever used Julia. It's got great potential but has some major gaps and warts, and is just plain dog shit slow for certain things compared to SciPy, which, you may not be aware, is basically C and Fortran code wrapped in Python API.


> is just plain dog shit slow for certain things compared to SciPy

Do you have real examples here or is this just FUD? SciPy is not known for using absolute state of the art algorithms or perfectly optimized implementations. It can be pretty easy to improve on the naively implemented or legacy pieces of SciPy, e.g. http://tullo.ch/articles/python-vs-julia/


I do follow Julia development and I am aware that it isn't quite there, but at least their community does embrace JIT compilation, not like Python that PyPy is just yet another project, ignored by the reference implementation.

> SciPy, which, you may not be aware, is basically C and Fortran code wrapped in Python API.

Which for me personally means, that I would rather C and Fortran directly or better yet, a C++, .NET or Java binding to them.


> Which for me personally means, that I would rather C and Fortran directly or better yet, a C++, .NET or Java binding to them.

Being able to describe things with a syntax that looks almost like pseudocode and runs highly-optimized C/Fortran code to do heavy lifting has huge, huge advantages.


Scala, Clojure, F# also allow it, with the added benefit of industrial strength JIT/AOT compilers.


Except without the great Python ecosystem and without great projects like Numpy/Scipy


Numpy/Scipy are only relevant to a minor set of computer users, and even then, there are alternatives like LANPACK and BLAS.

Also Java and .NET ecosystems are just a little bigger than Python.


> there are alternatives like LANPACK and BLAS.

LANPACK + BLAS = scipy

> Numpy/Scipy are only relevant to a minor set of computer users..

> ... Java and .NET ...

Being able to describe things with a syntax that looks almost like pseudocode and runs highly-optimized C/Fortran code to do heavy lifting has huge, huge advantages.


> LANPACK + BLAS = scipy

Thanks, I already knew that.

> Being able to describe things with a syntax that looks almost like pseudocode and runs highly-optimized C/Fortran code to do heavy lifting has huge, huge advantages.

Hence we are back at Scala, Clojure, F#, enjoying the respective AOT/JIT native code compilers, and integrating with that highly-optimized C/Fortran code.


I use Scala and Clojure. I like F#. Have you ever worked at an actual business that needs to hire developers that know these languages WELL? I can tell you that the significant added cost can't be justified by the not-very-big benefits compared to just using Scipy.


> Scala, Clojure, F#

Functional languages. The "year of Linux on the desktop" of programming languages. Also none of those look like pseudocode (Scala does if you ignore bits and squint).


The amount of job postings say otherwise.


No they don't. On stack overflow jobs F# has 14 posts, closure 28 and Scala ~100. Python has 400+.

Other sites have similar totals. But they are filtering for London/UK/Europe, so that might be a bias.


This made be wary of Julia: http://danluu.com/julialang/


Julia's nice, but it's playing catch up to R & Python. Name any statistical algorithm, and R probably has it. Name any scientific field, and there's probably a Python library for it.


Some people are very productive in Django and Rails (and the like), in non-performant languages. Many sites never grow to the size where running on the JVM (or AOT-compiled Go) would save you money from fewer servers vs. more coder-hours.


The scientific community thinks otherwise.


The scientific community uses it as a tool to automate specification of tasks to be performed inside of C libraries. It's a case pjmlp may not have explicitly named, but it's of the same kind.


PyPy is making good progress at implementing and JITing numpy as native Python code.


They've been working on NumPyPy since at least 2011, and to my knowledge it's still not in serious use.


AOT can be done by Cython and it's fully compatible with reference as far as I know.

More

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: