Hacker News new | past | comments | ask | show | jobs | submit login
GoPy – Extend CPython by writing Go (qur.me)
209 points by spdy on Apr 7, 2013 | hide | past | web | favorite | 36 comments



This looks really slick, but it's a bit unfortunate that this was implemented in a way that's tied to the CPython API specifically, given that there are Python-VM-neutral methods of low-level interop available (ctypes, CFFI, etc.). My general sense (and certainly my own experience) has been that many people doing speed-sensitive Python work are already running on PyPy now, and that writing lower-level extensions has become a way to squeeze out an extra level of performance beyond that, so having to go back to CPython to take advantage of this infrastructure would be kind of a bummer, and almost definitely a non-starter in my projects, which leaves me stuck with C.


> My general sense (and certainly my own experience) has been that many people doing speed-sensitive Python work are already running on PyPy now

Depends, an other category drops down to Cython (which in many cases of low type dynamism consists of just running Python code through Cython).


Right, I definitely don't think PyPy+[ctypes|cffi] is universal ("many," not "all"). Lots of people are doing Cython. Also, anyone who has a performance-sensitive workload that depends on SciPy is out of luck on PyPy at least for the time being. All I'm saying is that PyPy as a performant-Python strategy is common enough now that it's worth supporting in new performant-Python projects.


My thoughts exactly. Great concept and I will definitely follow it as it develops, but I'd love to see an abstraction layer that takes the pain out of the a$$ that is the Python C-api. I left that years ago and I'm not going back.


Ive been using the Python C api recently and I find it the most straight forward option of getting performance out of Python. If you write a few helper classes in C++ using RAII - the python c-api becomes actually pretty pleasant to use.


I'm jealous. Granted, I learned the API before I re-learned C, so I admittedly shot myself in the foot. Now that I'm back up to speed in C (I was in a Java shop for 5 years) and doing more work in Go for my own start-up, I shy away from using Python for anything other than one-off numerical analyses.


This looks fantastic. Having only worked extensively with Python and Javascript, I haven't been looking forward to learning C or C++ to write performant bits of an application, but Go really excites me.

What are the limitations? Are there situations where GoPy won't work that (eg) ctypes would?


The main limitation atm is that it looks like a very experimental project, nowhere near production-quality.


Nice! I assume this only works with gccgo (to create a shared library)?


Nice! Does this let you get around the GIL?


I believe so, at least you can in C. You have to release the GIL in your extension, do your business, and cleanup before giving the control back to Pyhton.

http://docs.python.org/3.3/c-api/init.html

Edit: Even though you can do it in C, Go is a no go. Under item four in the caveats [1] is stated:

  | When using goroutines py.Lock must be used. No other 
  | GIL or threading interface functions have been 
  | tested - and they most likely will not work.
[1] http://gopy.qur.me/extensions/


> Calling Go code from a second thread will currently cause the code to raise an exception (i.e. all calls into Go code must be made from the same thread)

That's ... interesting.


I haven't looked at this much (not much of a Python user these days) but presumably you can still multithread your Go code using Go semantics regardless of this restriction. The Go code you call from Python should be able to create goroutines as needed which then report back to the main Go thread as needed, as long as they follow the Go conventions about pointer ownership (channel receiver owns the memory sent to it).

Of course it would be nicer if the main Go thread didn't lock out the Python caller from running (which it seems to do), but there should still be lots of concurrency available on the Go side.


yes. See lock.UnblockThreads(); in the parallel example from the article (the sleep is executed without holding GIL).


Nice! It looks pretty simple, but it makes me wonder. Why bother interfacing your Python code with Go? Why not go straight to C for more performance?


1. Because it can be done.

2. The native way of doing this in C (without using SWIG or cython or any other such)is more verbose.

3. It may be easier (not necessarily faster) to write certain code in Go as opposed to both python or C.


Would Go's static binaries be an added advantage?


No, because in Python's interop model, importing a compiled module involves a shared object (so you don't have to relink the interpreter to add one).


strictly speaking, it is not required: you can build a giant python binary with all the extensions you need, though it can be quite a bit of work to do so.

Granted, the vast majority of users use the loadable module approach.


Because Go is way more pleasant to write and is still very performant


Python doesn't do any parallel computation. Go does it very well. I honestly don't know how to do parallel computation in C or C++, but I doubt it's very easy. Slower code can run faster overall if you have multiple processors, so pure processing speed has the potential to matter less.


You can do parallel computations in Python e.g., many numpy functions can utilize more than one CPU:

  import numpy as np

  N = 5000
  # create a couple NxN matrices with random elements
  a = np.random.rand(N, N) 
  b = np.random.rand(N, N)
  # perform matrix multiplication
  c = np.dot(a, b)
Cython makes it easy to write C extensions for Python if necessary.

You can also do a compute-intensive work in parallel using multiple processes:

  import random
  from multiprocessing import Pool
  
  def fire(nshots, rand=random.random):
      return sum(1 for _ in range(nshots) if (rand()**2 + rand()**2) < 1)
  
  def main():
      pool = Pool() # use all available CPUs
      nshots, nslices = 10**6, 10
      nhits = sum(pool.imap(fire, [nshots // nslices] * nslices))
      print("pi = {pi:.5}".format(pi=4.0 * nhits / nshots))
  
  if __name__ == '__main__':
      main()
The code calculates Pi using Monte Carlo method http://demonstrations.wolfram.com/MonteCarloEstimateForPi/


I honestly don't know how to do parallel computation in C or C++, but I doubt it's very easy.

Actually, with OpenMP parellelization is often easy in C and C++. In embarrassingly parellel problems, it is often a matter of adding an OpenMP pragma before the for-loop. To give one example, in some machine learning code I wrote, it was simply a matter of replacing

  for (int i = 0; i < static_cast<int>(contexts.size()); ++i)
by:

  #pragma omp parallel for
  for (int i = 0; i < static_cast<int>(contexts.size()); ++i)
Obviously, you have to ensure that shared mutable variables are properly protected. But, in many cases it's fairly simple. I have tried to emulate some of OpenMP's functionality in the par package:

http://github.com/danieldk/par

I think the more convincing argument is that many Python programmers will feel far more comfortable writing Go than C or C++. It offers good performance and some control over memory layout, while being substantially higher level than C and not the enormous language C++ has become.


> In embarrassingly parellel problems

To be fair, it is trivial to parallelize embarrassingly parallel problems in most languages. Including Python (with `multiprocessing`).

Go separates itself from C/C++ by making concurrent programming much easier. Namely, channels are the predominant form of both communication and synchronization between concurrent processes. This removes much of the need for locks to protect state, which is a source of common bugs.

Go goes an extra step with M:N scheduling, but that is orthogonal to concurrent programming.

Also, you might be interested in my type parametric version of `ParMap` [1]. It loses compile time type safety, but performance still seems to remain roughly the same.

[1] - http://godoc.org/github.com/BurntSushi/ty/fun#ParMap


> To be fair, it is trivial to parallelize embarrassingly parallel problems in most languages. Including Python (with `multiprocessing`).

Of course, but usually you are switching to C because the code that is being parallelized needs to be faster.

Go separates itself from C/C++ by making concurrent programming much easier.

Right. I was reacting to the grandparent, who mentioned parallel computation, not concurrency.

Also, you might be interested in my type parametric version of `ParMap` [1].

Nice :). Though my Haskell heart cries when people use interface{} plus reflection as a replacement for parametric polymorphism. Before we know it the library stack will effectively turn Go into a dynamically typed language.


> Right. I was reacting to the grandparent, who mentioned parallel computation, not concurrency.

Ah! I missed that. Mea culpa.

> Nice :). Though my Haskell heart cries when people use interface{} plus reflection as a replacement for parametric polymorphism.

As does mine :P But it was a fun exercise nonetheless. If you have a Haskell heart, you might be interested in my write up [1] on writing type parametric functions (using new Go 1.1 features). You will do some bleeding, but presumably you like types, so you might find it interesting. :-)

> Before we know it the library stack will effectively turn Go into a dynamically typed language.

Fortunately, the performance cost of reflection makes practical usage of it very limited. `ParMap` is in the minority here (where the overhead of concurrency shadows the overhead of reflection).

[1] - http://blog.burntsushi.net/type-parametric-functions-golang


outsourcing parallel work to goroutines? yes please!


I would love to see some benchmarks here! Great idea!


This looks brilliant. My only concern being, consdering an extra layer comes in, how much is the performance affected by this?


For new code, I would rather use Julia.


...was thinking of the same thing. Julia seems much nicer for the use case of writing scicomp code, but Go overall will be a much more popular language. How would you choose one of them for writing performance tuned Python extensions?


If it is for writing extensions, then PyPy would be a better option if it is to stay in Python land.

For writing new code from scratch, Julia is a much better option. The language is not yet 1.0 and already achieves parity to C compilers in many algorithms.

Go compilers are yet to match JVM and .NET speeds, let alone C.


Thanks, I hadn't previously heard of Julia.


You will probably find comments from me here dismissing the language when it was initially announced, but after watching the MIT presentations on its current state I think the language has a future in technical computing.

http://julialang.org/blog/2013/03/julia-tutorial-MIT/


I respectfully disagree with some of your other opinions, but I think you might be on to something with Julia :D


When can we expect a Windows port?




Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: