Hacker News new | comments | ask | show | jobs | submit login
PyPy.js: Python in the web browser (pypyjs.org)
483 points by LopRabbit 6 months ago | hide | past | web | favorite | 120 comments

It's really awesome to see how far this has come. I believe we at Repl.it were the first to try something like this in production. We emscriptined CPython to JavaScript and contributed quite a bit to the project in the process (including the initial virtual filesystem implementation).


I deployed this at scale at Codecademy were millions of users were using it to learn Python but hit a lot of problems with it. For one, if you're in a 3rd world country then the bundle size is a non-starter (this is potentially solved with WASM being a binary format). Even if you managed to download the bundle lots of old computers would run out of memory trying to parse the JS (again probably solved by the binary AST format of WASM). Because of this and a few other issues (the dev experience of emscripten was really hard) we had to move away to running code on remote containers.

Now that I'm back working on Repl.it full time I'm excited to play again with the tech and see what we can do with it this time around.

I just wanted to say how much I love Repl.it and use it all the time. I use it as a testing scratchpad mostly but I know it's much powerful than that. I appreciate its simplicity. Thanks for a great product!

Thanks so much :)

Thank you for your work at Codecademy -- a place where I learned how to program. I'm working through my 4th year as a developer :)

I'm curious about the virtual filesystem:

  Welcome to PyPy.js!
  >>> import os
  >>> os.listdir('/')
  ['tmp', 'home', 'dev', 'lib']
  >>> f = open('/what', 'w')
  >>> f.write('hey')
  >>> f.close()
  >>> os.listdir('/')
  ['tmp', 'home', 'dev', 'lib', 'what']
  >>> open('/what').read()
What is in the stack that makes that work?

The Emscripten File System API, "The API is inspired by the Linux/POSIX File System API, with each presenting a very similar interface." https://kripken.github.io/emscripten-site/docs/api_reference...


I believe emscripten provides a virtual filesystem interface. Since it's essentially translating/running c it's either hooking through glibc or the syscall interfaces.

    >>> import os
    >>> for root, dirs, files in os.walk('/'):    
    ...   for d in dirs:
    ...     print os.path.join(root, d) + '/'
    ...   for f in files:
    ...     print os.path.join(root, f)

The speed is surprisingly reasonable. Testing with a simple loop, I get about 1.3-1.4 million loops per second in python3, 2 million loops per second in python2, 14.5 million loops in pypy, and 0.61 million loops in the browser.

My code basically calls int(time.time()) until its value changes, doing i+=1 for every time it did not change (n=10 for every platform (python2/3/pypy/browser)). My browser is Firefox 61, full code here: https://hastebin.com/turenofise.py

Fetching time in the browser isn't accurate because a lot of trackers exploit timing in order to fingerprint users. A lot of browsers now purposefully reduce the precision of the clock. See:



Oh, how convenient. Sometimes I wonder if we shouldn't find a way to be able to run trusted applications instead of making everyone's life difficult all the time, and that's speaking as a security consultant, not even a developer... but yeah I don't see a good way to make that happen.

This reminds me a bit of ActiveX.

The whole Meltdown/Spectre thing also contributed greatly to reducing timer precision.

I wonder if the time.time is the slow part there? Maybe worth doing a busy look for say 5 million iterations, timing that then doing the calculation for loops per second.

I'm sure it is a thousand times slower than i++, but it should be the same code on each platform.

But you made me think: indeed, the underlying call could be slower from JS whereas from pypy/python{2,3} it's fast (or vice versa). So I just tested a version where it checks how long ten million iterations take (n=3). Code here: https://hastebin.com/uyuhecabek.py

    browser/pypy   5.25 seconds
    python 3.6.6   2.16 seconds
    python 3.7.0   2.08 seconds
    python 2.7.15  1.42 seconds
    pypy   6.0.0   0.01 seconds
Since there is no syscall to be made, pypy can do this in compiled code and is blazingly fast (I bet C/assembly isn't much faster there), and the rest seems comparable to the previous estimate: the browser version is 2.25× slower compared to python3 (now 2.43×) and 3.33× slower compared to python2 (now 3.70×).

Tested on Brython (with Firefox) - http://www.brython.info/tests/editor.html?lang=en

    Iteration 0 took -0.9660000801086426s
    Iteration 1 took -0.9519999027252197s
    Iteration 2 took -0.9499998092651367s
It is about 10 times slower with Chrome on my computer.

Wow. That makes a huge difference.

    Iteration 0 took 0.6579999923706055s
    Iteration 1 took 0.6540000438690186s
    Iteration 2 took 0.6460001468658447s
(I noticed the code from hastebin was an old version, I forgot to swap the starttime-time.time() in the version I posted. Not that it matters much, just change the sign.)

Running it on OP's website again just to be sure, it now takes 4.9 seconds instead of 5.25, so there is some variance there, but this is still a big difference. This is Firefox as well (see another comment in this subthread), I don't have Chromium installed.

I'm surprised it's that different; in the optimum case, it feels like a several layers of JITing interpreter should get to the same performance in the end. Might not be enough to trigger it I guess.

I can't see the code because I'm on mobile, but if the JIT is function based and you're only calling the benchmarked function once it may not have it optimized for your first call. It certainly can't replace the instructions mid-loop.

Edit: SO answer explaining the PyPy JIT: https://stackoverflow.com/questions/37377787/what-kind-of-ji...

Are you sure pypy didn't just optimize out the loop entirely (as an aggressive C compiler would have done)?

Though trying to add the numbers for 1 to a million in a loop javascript was about 10 - 20 times faster when I tried it.

You would use sum(range()) in a Python though. Remember that many things you do manually in JS have a better, automatic and faster way in the stlib.

Besides, if calculations are important, you would use numpy.

Ah yes. sum(range()) in Python vs for(i=1...) in js was much closer, about 20% quicker in javascript.

Won't this calculation overflow to +Infinity in JavaScript after a while? On the other hand, Python will try to compute the actual result using arbitrary precision integers, which is that's slower.

Which version of python 3 did you test with? I expected 3.6+ to be faster than python 2.7

int in python2 is a more simple structure (fixed precision for non-long integers) than int of python3 (arbitrary precision)

Python 3.6.6 from Debian Buster in a VirtualBox VM on a Windows 10 host. Testing with 3.7.0, the result is no different. Python 3.5 is no longer available in the repositories to test whether that is indeed slower.

This pastebin requires JS...

Here's the code:

  import time
  j = 0
  while j < 10:
      if a!=int(time.time()):
          a = int(time.time())
          j += 1
      i += 1

There's some pypy.js vs cpython benchmarks at http://arewepythonyet.com/ - those look very good as well.

The print will slow everything down as well.

There is a similar Python interpreter here: http://www.codeskulptor.org/

It is used by a class taught on Coursera (Introduction to Interactive Programming in Python)

From what I can see:

- PyPy.js benefit: runs faster

- codeskulptor benefit: can use GUI components

Great project. My team used it for a live-coding Python notebook: https://github.com/inkandswitch/livebook#readme

To my surprise it was faster than Jupyter / CPython for some tasks (e.g. loading and parsing large CSVs).

Shouldn't be too surprising, given how PyPy is generally faster than CPython by about x7-8 factor:


I'd expect a much larger difference in tasks that are pure Python (loading a CSV is done primarily through the csv module, i.e. all the hot loops are in C).

Yes, but PyPy is usually compiled to machine code (and emits machine code), not to JS.

True, but the asm.js is then compiled to machine code by the browser's JS JIT.

I'd expect some overhead from sandboxing etc., but it should be fairly close to native speed.

It's not "overhead from sandboxing" that's a problem, it's the fact that translating from bytecode to bytecode to bytecode will cause it to lose some ... succinctness at each step, giving the final machine code generator a harder time coming up with efficient code for it.

Indeed, but many of the optimizations introduced by PyPy should be visible regardless of compilation target.

I love Python but js is close to an order of magnitude faster

Have you tried PyPy?

Unfortunately it hasn’t been a drop-in replacement for our CPython codebase (lots of Numpy/Pandas/etc).

no numpy & no tensorflow

Wow, that's awesome! Direct link to the live notebook: https://livebook.inkandswitch.com/fork/welcome

Naturally, I changed the first code block to this:

    from random import randint, seed
    "\n".join([randint(5, 25) * ' ' + s for s in ['such code', 'much live', 'wow']])

This is great, was looking for this exactly!

I'd love there to be a client-side webapp development system using Python with feature and output parity to JavaScript. If it was a product, I would buy it.

I've been thinking about building this for Tcl. Tcl has been available in the browser in various forms for decades, but one thing that makes Tcl uniquely suited to a web application style development is its threading model. In Tcl, you access a thread by sending messages to it, in which a Tcl interpreter inside its event loop processes. My plan was to have a client-side Tcl program in the browser that automatically instantiates a second thread, but the thread is a remote thread running on the server, so you would be able to access the server just as if it were another thread.

Something like (client-side):

    thread::send $::tcl-web::server -async {
        do stuff
    } placeToStoreResult
    vwait placeToStoreResult
    do stuff with result $placeToStoreResult

But then you have to code in... Tcl...

Then you GET to code in Tcl !

>Then you GET to code in Tcl ! I don't care about Tcl but your enthusiasm is golden. Sort of why I write silly stuff in bash or in an esoteric manner; "Why" "Cause we can!!"

Given how terrible GUI development in javascript is for the developer, this would be a net improvement.

However the language I yearn for in browsers is Lua.

Exactly. Ugh.

You should give Nim a try. It's a Python-like language which can be compiled to C and JavaScript. There is even a SPA framework[1] for it already and the NimForum[2] is written in it.

1 - https://github.com/pragmagic/karax

2 - http://forum.nim-lang.org/

While not exactly that, https://anvil.works/ is kind of heading that way (https://news.ycombinator.com/item?id=15584124)

Pity it's not Python3.

Yeah weird. I would expect Python2 to be dying off more by now. Wonder if we're headed to an environment where there's essentially 2 separate languages as Python 3 continues to change and grow?

It's a pretty old side-project, based on an interpreter that itself took quite a while to support Python 3 well due to lack of funding, so it's not that weird it hasn't been updated.

I still get surprises like starting to learn Google Cloud Functions and realizing that up until July of this year they only supported Python 2.

I have no idea why would a project of this caliber would start by using Python 2 instead of Python 3.

Edit: When I started reading about GCF, all docs said I could only use Python 2. Later I found that they seem to be on the way to change this. But still, I was very surprised that Python 2 was even an option to begin with.

Well this project has pretty much died off. No update in over a year, and nothing significant in 3-4.

I think we're roughly already there. I think the pivotal moment where Python 2 could've died out rapidly passed and they kept officially supporting it for too long.

As far as I know, permanent EOL for Python 2 is still Jan 1, 2020 (https://pythonclock.org/). Python 3 was first released December of 2008. That's an awfully long tail.

Fascinating. I always thought Python 2.7 support would end July 3, 2020 (exactly 5+5 years from when 2.7 was released), but for some reason they recently decided to go with Jan 1st.

> Specifically, 2.7 will receive bugfix support until January 1, 2020. All 2.7 development work will cease in 2020.

> I've updated the PEP to say 2.7 is completely dead on Jan 1 2020. The final release may not literally be on January 1st, but we certainly don't want to support 2.7 through all of 2020.

https://www.python.org/dev/peps/pep-0373/#maintenance-releas... https://mail.python.org/pipermail/python-dev/2018-March/1523...

Assuming the work here was primarily the asm.js based emitter, maybe it can be ported to PyPy3 without needing to go from scratch.

The most important import doesn't work, unfortunately:

  import antigravity

At least all is not lost:

  Welcome to PyPy.js!
  >>> from __future__ import braces
    File "<console>", line 1
  SyntaxError: not a chance

That was the first thing I tried :)

This is pretty old, and hasn't seen an update in over a year. I never understand why links links this get posted with no discussion to start it off with.

Also, wouldn't this be better implemented with WASM?

Maybe something like this? https://github.com/iodide-project/pyodide

CircleCI: Failed :(

No good deed goes unpunished.

> This is pretty old, and hasn't seen an update in over a year

So pretty much like every Python project then. What's happening to the Python community?

It's stable and mature?

If you've been writing JavaScript your whole career you might not know what that looks like.

Such a snarky response, but I can't deny it. I see a JS package that hasn't been updated in a year and I expect that it won't even build with my environment because webpack evolution. I see a python package that hasn't been updated in a year and it doesn't faze me. Not a whole lot that can break in terms of building and integration with my python project.

Perspective is everything. Now and again I use C libraries that haven't been updated in 15 years. Why? There isn't a need. Everything builds perfectly, and some of those libraries are still the go-to solutions in my field.

Its front page describes it as an experiment, and it has 98 open bugs. Both PyPy and Python have released new versions since the last update of this project, so it's at least missing important updates.

> What's happening to the Python community?

It's moved from "growing" to "well established." I don't see many people lamenting something is in CPAN and not pip anymore. It shouldn't be surprising that you find more stale projects and packages as time goes on. Most of the interesting stuff is in the development with specific packages (Pandas, Salt, OpenStack, etc.).

Limited man-hours, and more time being spend on the data-science community? I've seen a fair number of impressive things done targeting data-science, but less big generally useful libraries.

A lot of the low-hanging fruit for generally-useful libraries have already been created, and have reached maturity?

Micropython is pretty recent, and pretty impressive.

Will web assembly have any effect on a future web based python interpreter? Is there one in the works?

Someone posted this on another thread: https://github.com/iodide-project/pyodide

What I gather is that it's compiling CPython to WASM and ships it along with some other utilities. It looks pretty cool.

WASM plans on adding GC, polymorphic inline cache, direct DOM access, etc. That would allow for Python without having to download the whole runtime. Whenever that happens, it might become more mainstream. Similar for other languages.

There have been js/asm.js CPython builds demoed since early days of emscripten, and emscripten is the main WebAssembly toolchain, so it's "just worked" since the beginning.

Tried Fibonacci, and the thing crashed for 100. I'm not sure that a wasm python library can do any heavy lifting.

seems brython is more appropriate, mature and usable.

I tried to do a little something with Emscripten and WebAssembly, but I stopped on the sockets implementation. It relies on WebSockets. I wonder how well we could emulate real sockets using WebRTC to make WebAssembly apps more sophisticated.

Last I looked Emscripten already used WebRTC's data channel to emulate UDP.

Is there an up-to-date/maintained comparison of python-in-the-browser systems? As well as this, there's brython and skulpt, and it would be nice to be able to compare various implemented features, quirks and speed.

Really amazing. I use python every day at work, and most of the crucial libraries have tight inner loops written in C or Cython. I wonder what the web assembly future holds for those languages, and inter-language FFI.

    >>> os.getlogin()
Nice joke :)

  File "<console>", line 2
    return f'{fname} {surname} is {age}.'
SyntaxError: invalid syntax

So this is a subset of Python or something like that?

I believe this feature was introduced in Python 3.6. This implementation may be 3.6 <

Edit: Just checked the site and see it uses PyPy which states -- on their site -- the python version is 3.5.3

Or you know, just import sys and print sys.version instead of reverse-engineering it from the syntactic features introduced over time.

I can't do:

a, *b = range(3)

which should be 3.5 i think.

Other comments in this thread point to the implementation likely being Python 2.x.

This is awesome. They mention that it is a "compliant" implementation of python. How does one go about proving this?

There's no python standard, but my guess is that-- since this is based on the same source code that native PyPy is, with a different compiler backend -- they are making the claim based on the fact that native PyPy is widely considered to be similar enough to CPython for many workloads. Thus -- assuming the correctness of the RPython -> JavaScript transformation (which they have not proven, and would be a monumental task) -- you should expect this version to work like the native PyPy.

this is pretty neat! I was wondering, are there any projects to transpile python to js, but targeting the server side? It would be interesting to port a flask application, for example, to run on node and benefit from the jit, or maybe use express and call flask through its wsgi interface.

You're better off running it on the original version of PyPy, which already uses a JIT.

Why not just run these in their native environment?

Why use this instead of jupyter? Some educational applications maybe?

I think jupyter works with a backend server to execute python (right?).

Correct. It's just a view to a server running python.

This runs directly in the browser, whereas Jupyter needs a server.

Would it be sane to use this to run Python modules in node.js?

If the Python module was large and complex, it does something that would be incredibly difficult to replicate in nodejs either given the current ecosystem or costs of a rewrite, is not available in Rust/C/C++ that could be WASM'd (not mentioning others here due to large WASM overhead), you accept the large binary sizes of these things (i.e. you're not embedding the Python interpreter/runtime and putting on npm), and you're willing to put a node-friendly layer on top that makes the lower level calls...then it might be sane. But I'd consider modernizing to an updated emscripten that does WASM, see if you can avoid some emscripten runtime (i.e. WASM standalone compilation).

For all other cases, stay in Node or use a lib from a language more suited to WASM compilation. Now, if it's an option between this and a native Node module, that depends on your opinion of Node FFI and native Node modules.

If you have significant amounts of Python code to reuse in another language, I would recommend:

A) Run it as a subprocess

B) Call it over HTTP request/response

C) Use a message broker like AMQP

>>> print "hello"


>>>import json


Dope, everything I need.

Jeff Atwood law of Javascript

    everything that can be written in JS will be written in JS!!

is that interpreter faster then the default one?

* >>> import sys >>> sys.version '2.7.9 (?, Jul 03 2015, 17:08:29)\n[PyPy 2.6.0]' *

Why, exactly, does this have such a garbage-ancient version of Python?

Didn't we decide a long while back that fully featured programming languages were dangerous in the browser, especially if they could do disk I/O? Even if sandboxed? A la Java Applets and Adobe Flash?

    import os 
    for subdir, dirs, files in os.walk('./'): 
        for file in files: 
          print file 
Perhaps the environment is "fake", and these files don't really exist, even in some sandbox... otherwise, seems like this might pose a security risk.

This is Javascript, not real Python. If you're so afraid that this is dangerous, why do you post malicious code? And if you're so clever to post that code, why don't you try similar code to see if it really has access to your files?

    import os
The answer, by the way, is no.

> If you're so afraid that this is dangerous, why do you post malicious code?

That's not malicious code... it removes files from a sandbox... you can refresh the page and see for yourself.

> And if you're so clever to post that code

There's no need to be rude... there's an honest question and lack of understanding in my post - educate me, don't talk down to me.

> why don't you try similar code to see if it really has access to your files

The question was if someone could break out of a sandbox, such as with Java Applets and Adobe Flash. I have no idea how to do that - I'm not a security specialist, nor some sort of hacker guy.

> The answer, by the way, is no.

From the sandbox and using the standard `import os`, ya, you're right. The question, again, is what if someone got outside the sandbox?

Thanks for the elaborate response, now I much better understand what you're trying to say.

This is different from the examples you mention: there is no external software that creates the sandbox (whereas with Java and Flash you had to install those as plugins). It's really just Javascript as it is built into the browser. Any vulnerability in this "sandbox" is a vulnerability in the browser, not some 3rd party plugin.

Put differently, this pypy emulation is no different from the Javascript API that was in your browser already. This isn't a new plugin, just using Javascript to emulate Python through the Pypy interpreter.

Well why doesn't this apply to JS? The python interpreter is running in the JS sandbox

I suppose that was the explanation I was looking for.

I don't know about PyPy.js specifically, but most systems like this provide a virtual filesystem that maps back to things like indexedDB and localstorage which are browser APIs.

There is no way that pypy.js is able to access local files on your machine outside of the browser sandbox.

I bet there's a few cheeky hackers around the world who may disagree with that statement. Always assume it's broken.

No, not in this case. It's running in JS land in the browser so any exploits to break out of the browser sandbox using this could be done simpler with raw JS.

1. I'll copy&paste this to test it!

2. uh ok I'll paste it into an editor and remove the added indentation and then copy&paste that to test it

3. ... ok. I'll look at the code and reformat every single line and copy&paste that

the oldest complaint abides

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact