
PyPy.js: First Steps - rfk
http://www.rfk.id.au/blog/entry/pypy-js-first-steps/
======
jnbiche
Numba is a fast Python compiler that uses LLVM as an intermediary. In fact, I
just checked, and it has a switch to emit LLVM directly.

Emscripten is an LLVM-to-JavaScript transpiler.

Is there any way the two of those could be hooked up?

~~~
kingkilr
Numba isn't a general purpose python compiler, it compiles a small subset of
the Python language that targets numeric computations.

~~~
jnbiche
Yes, Numba isn't a general purpose compiler of Python -- sorry if I implied
that. And to be clear, this isn't a Python runtime you'd compile to JS for the
browser, this is something that could compile scientific/numeric Python into
JS. However, it is exactly the type of code that Numba targets that would be
the most interesting to port to JS. And it's quite simple to port such code in
Numba; usually all that's required is some incantation involving 'jit' or
'autojit'.

I mean, you're not going to be using tkinter or running Python networking
modules on the browser, anyway. If you could compile some of the scientific
and numeric Python that exists into JS, it would allow a lot of scientific
computing to be distributed and run on the web, client-side, including most of
the machine learning and NLP code that exists now in Python.

From just an hour of research, it doesn't look like it would be very hard. But
I only know enough about the two projects to be dangerous.

At the very least, it's an interesting idea.

EDIT: TO clarify, I'm referring to "distributed" as in distribute an asset (in
this case a JS script) to a computer to be run, not distribute a work load.

~~~
sleepingpills
The problem with your suggestion is that scientific/numeric python tends to
rely very heavily on numpy/scipy/pandas, most of which are in turn written in
C/Fortran/Cython. Meaning, you could only funnel the glue code to JS, not the
code that does the heavy lifting.

You'd need to also port the above libraries to JS, which would be a pretty
large undertaking.

Finally, grid computing is not any easier in JS than in python, since python
already has lots of bindings for the required tools (e.g. take a look at
[http://star.mit.edu/cluster/](http://star.mit.edu/cluster/) which even comes
preinstalled on EC2 AMIs).

Edit: I just noticed that you suggested using JS for distributed computing in
lieu of python. Care to suggest a scenario where JS would be a better suited
language? Why would I want to run scientific computing in browsers?

~~~
jnbiche
You know, sometimes people just do stuff like this for the hell of it. We call
it hacking. There doesn't have to be a well-founded reason for people to
wonder if they can make Part A fit Part B. I'm not proposing this to a
dissertation committee. I'm not suggesting it as a means of grid computing.
I'm simply suggesting that it might be fun/interesting to play around with
some scripts in Python and try to port [edit: this should be _transpile_ not
port] them to JS using via Numba, LLVM and emscripten. Obviously, a rewrite of
pandas in JS is not in the cards.

Usually, hacking like this leads to nothing, but sometimes it leads to
everything.

Care to suggest a reason why I shouldn't do it if I feel like it? I'm really
gritting my teeth in forbearance here.

PS: I wasn't even thinking of grid computing when I mentioned this. I meant
"distributed" in the sense of distributing an asset, not a work load. Nor was
I thinking of whether or not JS was a "better language". I was just wondering
if it could be done.

Edit: I can't reply to your comment below, so I'll write a few words here.
First, apologies if you weren't trying to be snarky. Phrases like "Care to
suggest...?" can in certain contexts imply disdain and snark. If I misread
your intent, then my apologies.

Second, Numba actually _is_ a kind of a JIT, so it's interesting on several
levels. I don't think you'd get any added benefit from the JS JIT in terms of
performance, but instead it would be another way to distribute scientific
Python scripts. For example, as far-fetched as it is right now, this would
allow ipython notebooks to be distributed and run in their "static" form"
without needing a local or remote Python server (note that here again I'm not
meaning any kind of distributed or grid computing).

Finally, scientific Python may actually not be the least amenable code to
bring into JS, since pure computational code has limited I/O and thus we don't
have to worry about the DOM. Also, note that the C libraries that are used in
pandas and elsewhere could in theory be compiled into JS using Emscripten. A
large task, to be sure, but an interesting idea.

Unfortunately, I'm too busy at the moment to start up a project like this, but
it's definitely going on my list of things to hack around with.

~~~
sleepingpills
There is no need to be defensive, I wasn't implying that you mustn't do it.

You brought up scientific python specifically, but this tend to involve (in my
experience) overwhelmingly native code with small amounts of plumbing written
in python. Numba is already a restricted subset of python, and without porting
at least parts of numpy, you are further restricted to just a subset of that
subset (by giving up vectorized numpy ops).

I totally agree that playing around with it will be fun, I'm merely pointing
out the fact that you chose perhaps the least amenable kind of python code for
this particular task.

Again, by all means go for it! I agree that there are lessons to take away
even if it doesn't lead to anything specific.

Edit: you also seemed to imply that there is some inherent benefit in moving
scientific/numeric computation from python to JS (and the browser). I was
honestly interested in the reasoning behind this. I know that numeric python
could do well with a proper JIT for example and a lot of research went into JS
VMs (pypy doesn't count yet).

------
pudquick
How does this compare to [http://repl.it/](http://repl.it/) ?

Anyone figured out how to get a comparable benchmark out of it without loading
a full browser?

The reason I ask is that
[http://repl.it/languages/Python](http://repl.it/languages/Python) loads
quickly and can actually run in my iPhone 4S Mobile Safari. At 139 MB
uncompressed currently, I'm not sure this new project will ever result in
something that would let me write python code natively in a <script>. I know
that he's currently running with node.js, so no browser required, but if this
project is only ever going to be yet another desktop/server interpreter, I
would be gobsmacked if it could remotely approach PyPy or CPython performance
- even once they get JIT going.

~~~
quacker
repl.it uses a version of CPython compiled with emscripten[1], so the version
in the article should be similar in size.

1:
[https://github.com/replit/empythoned](https://github.com/replit/empythoned)

~~~
pudquick
This is incorrect.

The article involves compilation of the PyPy interpreter (not CPython) into
JS.

This is why I asked about someone attempting a benchmark comparison between
the two.

Still, it's a fun project. Good luck to the author.

~~~
quacker
The quoted 139 MB includes the full Python standard library, which certainly
accounts for a majority of that size. repl.it also compiles the entire Python
standard library, so why should the two be significantly different in size?

~~~
pudquick
For anyone following here at this point, I've downloaded the entirety of the
repl.it python engine by using os.walk() on the root directory (/), causing my
browser to download every .js, .py, etc. file it can find and store them
locally, uncompressed, on my machine. It even amusingly found some .exe files
hosted in the distutils directory.

The entirety of the repl.it emscripten CPython project is 24MB, uncompressed.
This includes the entire standard library that it ships with and all the
'_underscore.so.js' emscripten compiled shared objects. Compressed via zip
it's 4.7MB. For comparison, this is almost identical (within a few MB) to a
clean install of python locally on my workstation, size-wise. I am assuming at
this point that it's most if not all of the standard modules included at
repl.it.

(And for reference: The core CPython engine, translated minus modules, weighs
in at 4.6MB uncompressed and 800KB compressed)

Downloading the prebuilt PyPy project from
[http://pypy.org/download.html](http://pypy.org/download.html) I see that
uncompressed the project is 55MB in size.

Removing all .txt and pure .py files, I'm left with 37MB (and that's being
generous) of 'code' files that potentially are being translated. And that's
with shared objects - not a static compile - so there's possibly duplicated
code in there that wouldn't be present in a single monolithic executable.

I stand by my assertion that 139MB is significantly different in size and that
the translation is what accounts for the majority of that size (84MB if I'm
generous, 102MB if I'm slightly more realistic).

As much as there may be a speed benefit, eventually, if everything works out
here, the current size of the project definitely moved it out of the realm of
anything I'd want to attempt loading into a browser.

~~~
rfk
A big part of the current size problem is the way that the stdlib files are
bundled - the contents of each file are encoded, byte-for-byte, as a list of
base-10 integers. So "hello" gets bundled as "[104, 101, 108, 108, 111]",
resulting in quite a bit of overhead.

I agree that 139M is pretty ridiculous for any practical purpose! I'm going to
work on lazily loading just the files that are needed, which should make a big
difference.

------
apendleton
Perhaps I'm missing something fundamental here, but my understanding of the
way that JITs work is that they inspect a bunch of bytecode, then generate a
bunch of machine code, then execute it. No Javascript interpreters are going
to let you write executable code into a random chunk of memory and run it, so
compiling a JIT with emscripten seems like a non-starter.

~~~
ot
Instead of generating machine code, the (guest) JIT can generate optimized JS,
possibly in the asm.js fragment. The host JIT can then JIT the generated JS
into optimized machine code, thus the overhead could in principle be minimal.

EDIT: If this seems too mind-bending, think that _even machine code_ is not
really machine code: the CPU actually JITs the "native" code (say, in the
x86-64 ISA) into a "more native" code that is what is actually executed by the
CPU (for Intel CPUs, these are "micro-ops"), and in doing that it uses a lot
of compilation tricks (such as trace caches), including optimizations driven
by runtime feedback (you can think of branch prediction this way).

Of course Javascript is a much thicker abstraction, but conceptually it is not
much dissimilar.

~~~
azakai
Very correct, but a side note: asm.js might not be useful or necessary for
this. asm.js makes more of a difference in large, hard to optimize codebases,
and less in small amounts of code, because JS engines have been optimizing
small amounts of code extremely well for a while now (using TraceMonkey,
CrankShaft, DFG, IonMonkey, etc.).

Also, asm.js code is structured in a way that makes it obvious the code will
not change over time (it's in a closure, where functions cannot be modified),
again, in order to make optimizing large projects easier. When JITing however
you do want to add new code all the time.

But this could work great without generating asm.js code. The VM itself is C
code that can be compiled wholesale into asm.js (when the Lua VM was compiled
that way it was quite fast, about 50% of native speed), and it would then JIT
at runtime normal JS and call into that.

~~~
slacka
"asm.js might not be useful or necessary for this" So for situations like
this, would Duetto's approach of mapping C++ objects/functions to native
JavaScript objects/functions be better? If so, is this a feature that could be
added in the future to Emscripten or are the designs mutually exclusive?

~~~
azakai
Both the emscripten/mandreel and duetto approaches map functions to functions.
The difference is that duetto maps objects to objects as well.

AFAIK the duetto approach brings no benefits in this case.

------
__alexs
This is an utterly ridiculous idea and I'm amazed it works at all, well done
:)

If the CPython API is so easy to get going on emscripten I wonder if anyone
has tried using emscripten to compile Nuitka
([http://nuitka.net/](http://nuitka.net/)) output to Asm.js yet?

~~~
slacka
"This is an utterly ridiculous idea and I'm amazed it works at all..." Yes,
reminds me of the people that like to run NES emulators, inside of Dreamcast
emulators, inside of their virtual PC running on a Mac. Yes it's impressive,
but as the author admits running a VM inside of a VM results in a "two orders
of magnitude" loss of performance.

------
hencq
While it's certainly an interesting project, I wonder how practical it is. All
javascript engines have powerful JITs and it seems like a waste not to use
them, but instead to implement your own interpreter. I think the approach
taken by e.g. clojurescript or dart is more viable, where they compile to
javascript and let the JIT do its magic.

~~~
azakai
As mentioned in the link, the goal is to eventually use the JIT - the title
does contain "First Steps" in it ;)

No reason this approach cannot JIT into JS just like clojurescript and dart.

~~~
hencq
If I read it correctly, his goal is to use the PyPy JIT. So what he's doing is
translating the PyPy JIT (which is created automatically from the interpreter
written in RPython) into javascript. Because he's using emscripten this
javascript code can be statically compiled by virtue of asm.js (in Firefox at
least).

~~~
azakai
Yes, he plans to use the PyPy JIT, but it will generate code that is then
JITed by the JS engine JIT.

It doesn't require special static compilation to optimize small amounts of
JITed code, all modern JS engines can do that extremely well. asm.js is not
necessary there.

------
Hello71
> two orders or magnitude speed difference right now

1\. I think you meant "of".

2\. log(781250 / 877) is closer to 3 than 2.

------
albertzeyer
Really interesting project!

In a comment, I found out about another project ShedSkin which seems to be
like RPython, i.e. it compiles a subset of Python to C++.
[https://code.google.com/p/shedskin/](https://code.google.com/p/shedskin/)
[http://shed-skin.blogspot.de/](http://shed-skin.blogspot.de/)
[https://news.ycombinator.com/item?id=6091123](https://news.ycombinator.com/item?id=6091123)

------
spankalee
Implementing an interpreter on typed arrays is leaving a lot of the power of
the dynamic OOP host untapped. Translating Python classes to JS objects like
Skulpt is a much better approach.

~~~
rfk
The RPython toolchain has another mode of operation, which outputs higher-
level class-based code rather than low-level C-style code. They use this for a
CLR backend, but it would be interesting to try implementing a JavaScript
backend at that level and compare it to the lowlevel+emscripten approach.

(This may have been tried in the past; in the post "10 years of PyPy" it's
mentioned that there was once a JavaScript backend but it was removed because
it was a horrible idea: [http://morepypy.blogspot.com.au/2013/02/10-years-of-
pypy.htm...](http://morepypy.blogspot.com.au/2013/02/10-years-of-pypy.html))

~~~
voltagex_
Common Language Runtime? I can't find any mention of this backend.

~~~
rfk
I got my acronyms wrong, it's called the "CLI" backend, but it's definitely in
there: [http://doc.pypy.org/en/latest/cli-
backend.html](http://doc.pypy.org/en/latest/cli-backend.html)

~~~
rfk
Heh, actually it might be possible to cheat a little with this backend as
well, using [http://www.jsil.org/](http://www.jsil.org/) to translate the
output of the cli backend into javascript. Trying C+Emscripten versus CIL+JSIL
would be a very interesting comparison.

------
dangayle
A hearty +1 to anything that lets us legitimately run a different scripting
language in a <script> tag

~~~
jnbiche
It's really worth looking into the recent project that transpiled the Lua VM
into JavaScript, and which allows Lua to be used in the browser -- even to
interact with the DOM. The size of the Lua VM in JS is under 200 kB, not much
bigger than a large JavaScript library.

~~~
dangayle
I've seen that. I need to learn Lua. I have a friend who uses it as the
scripting layer on top of all his C work, and he says its the best setup he's
ever tried.

------
redler
Qq

