Hacker News new | past | comments | ask | show | jobs | submit login
Python 3.11 in the Web Browser (pycon.de)
292 points by tosh on March 26, 2022 | hide | past | favorite | 135 comments



I really, really wish it were possible for wasm modules like this to be shared between sites. That is, python.org could host an official wasm module, and the first time you visited a site using python the browser would download and compile it once and cache the result, and then any other website could use python with no download size or load time penalty. But unfortunately privacy concerns killed this utopia. Caches are now partitioned per origin, so every site that uses python has to download and compile the wasm module separately even if it's the exact same module from the same place, and that is unlikely to ever change: https://developer.chrome.com/blog/http-cache-partitioning/


While I understand how finding a particular unique resource cached may allow to glean some privacy-infringing info, I don't see how detecting a common library downloaded from a site that provides it for everyone (like python.org) would help reveal anything.

If users might define a list of sites trusted for cross-site caching (like fonts.google.com and other known library CDNs), this could help cache quite some common resources without downsides to user's privacy.


But then those 'common trusted sites' can spy on users massively.

fonts.google.com is, after google analytics, the most effective spyware around today.


The problem is that you can identify people by checking what resources they have already cached


Wait, so the traditional "use the JavaScript CDN as a cross-site cache" method does not work anymore? How did I not know that!


It was a huge privacy hole: malicious pages could measure how long it took for URLs to load and use that to tell which other sites you had visited in the past.



We could solve the privacy issue by implementing an installable plugin system into the browser. Let's call it NPAPI!


It seems like there could be some sort of cross origin cache header that could be set? That way by default you couldn't store in a global cache but server operators would be able to mark certain resources that could be.


Server operators would just set that header on everything. There's no downside for them. The privacy issue only impacts users and it would still exist even if only a subset of libraries were shared.


One proposed solution is checksums on CDN provided javascript:

https://w3c.github.io/webappsec-subresource-integrity/


Surprised the clients couldn’t just randomly add a small random delay on N percent of requests to nullify the side-channel info leakage.


It's my understanding that that type of countermeasure can be defeated with some statistical analysis.


Normally they could be, but it might actually work in this case. To do a statistical analysis, you need to be able to make many measurements. However, you only get one measurement per resource as a subsequent attempts will be cached regardless.


AFAIU you can invalidate the browser cache for a certain resource by hand:

> To delete the HTTP Cache, you just have to either issue a POST request to the resource, or use the fetch API with cache: "reload" in a way that returns an error on the server (eg, by setting an overlong HTTP referrer), which will lead to the browser not caching the response, and invalidating the previous cached response.

https://sirdarckcat.blogspot.com/2019/03/http-cache-cross-si...


This is very much not my area of expertise, but there are a few things that I think could be challenging. Let's assume that a cached resource loads in ~0.1 seconds and an uncached one loads in ~1.0 second:

* Any random delay of up to less than 0.9 seconds is completely pointless since we would know that any resource that loads in 0.5 seconds has to be cached - the 0.4 seconds spent waiting is useless.

* A random delay of up to a second creates ambiguous cases: is 1.1 a cached + maximally delayed response or an uncached but only slightly delayed response? But, with a random delay, we're still going to have pretty different peaks in the expected latency graphs for cached and uncached resources - uncached ones would peak around 0.5 seconds and cached ones would peak around 1.5 seconds.

* A random delay much more than a second would start causing the peaks on our expected latency graphs to be closer together - at a 10 second delay, we'll have a peak in the graph around 5 seconds for cached resources and 6 seconds for cached ones. I'm a bit fuzzy on how many resources we'd have to load to get a statistically significant finger print. I suspect its not all that many - but I could be wrong. However, we're also talking about a pretty significant delay by this point.

We also have the issue of how to handle AJAX requests. You only get one shot to measure the initial load. But, after that you can make an AJAX request to re-load that same resource with caching disabled and measure that. I'm not a statistician, but, I suspect that that information will be pretty helpful in figuring out what was cached vs not cached. Of course, we could add in some random delays here too - but since we can measure this an unlimited number of times, I suspect this would be even easier to defeat.

So, my suspicion is, that these type of random delays are possible to defeat if you have a better grasp of statistics than I do.

But, we could also assume that that is not the case and that they can't be defeated - what does that get us? We've had to add all these random delays in to block the side channel which is going to hurt latency - the very thing we're trying to improve with caching. I also suspect there is a ton of complexity to consider if you want to avoid having all of the same random delays on repeated visits.


Isn't the obvious solution here to just cache the delay? The browser presumably knows how long the original resource request took when site A made it, so it can just expose the same latency to site B and - since the cache by definition only hits if the request is identical - site B can't tell whether it's delayed by artificial cache latency or actual network latency. Chuck a symlink to the resource into each per-origin cache so non-initial requests still finish immediately, and possibly add a small chance of silently retrying the underlying request to see if it gets faster (with probability that can be computed entirely from site-B-observable load times).


Caching the delay defeats a big purpose of the cache in the first place - to reduce latency. But, also, the entity trying to fingerprint the request might also control the CDN - in which case, they can control the latency of non-cached requests. So, maybe in even minutes non-cached requests complete fast, but in odd minutes they are slow - in which case, a deviation from that behavior could be detected as a signal.


> a big purpose of the cache in the first place - to reduce latency.

You can't reduce first-visit-to-site-B latency below what it would be if you hadn't visited site A and also not reduce first-visit-to-site-B latency below what it would be if you hadn't visited site A; no possible delay policy will help with that.

> they can control the latency of non-cached requests.

This is more of a problem, but it doesn't need latency as such - site B could compare request traffic to b.example versus cdn.example to see whether a request was skipped due to already being cached.

(To be clear, I'm not sure you can actually do cross-origin caching securely for web traffic, for the above reason; I was addressing the narrow question of how to pick a delay - namely, don't delay cached and uncached responses by the same amount relative to their naive latency, because the whole point is that their naive latencies are different and we're trying to make that not true.)


You should be able to do something like this:

- if the resource isn't in cache, download it and note the time it took.

- if the current site has requested the resource before, just return it instantly.

- otherwise, wait the exact same time it took originally.

With this the first request a site makes for a resource will always take the same time and you have no way of knowing if that was the first time it was downloaded or if it's served from cache. It's obviously not quite that simple, you'll need to factor in which connection is in use etc, but it should be possible to keep cross site caching.


Weren't the shared libraries used on countless sites though? How would this work?


Yes but each site uses a lot of libraries too. The set of libraries and specific versions of those libraries used by a site is a pretty good fingerprint.


Still doesn't make sense - there is only a single set of libraries - the ones cached by a user's browser, and since each library in that set is used by thousands of sites, there's no way to know which site it was cached for. For each version, there are certainly thousands of sites using it. Since I'm still not getting it, perhaps you could explain how a particular set of libraries is fingerprinted?


The New York Times uses libraries (versions) A,B,C, and D. Wikileaks uses A,B,D and F. I see that you have A,B,D and F cached - that allows me to rule out the NYT and predict with some degree of confidence that you visited Wikileaks.

You can make up theoretical models for how many popular libraries there are, or how many sites, and say this should or shouldn't be possible. I haven't seen any such models, but advertisers were definitely using this technique in the wild so the models that say it's impossible are all wrong.

Adding noise to a small sample of requests doesn't buy you that much entropy - the signal is a little noisy anyway.


This would work if you only visited like 5 sites. If you visit thousands, and more like tens of thousands when you consider 3rd party embedded content, you can't figure out which sites a user visits, but rather only which ones they don't.


It worked pretty effectively.

Think about the 10,000s of JavaScript libraries out there, and the 100s of versions of each.


number of libraries * number of versions ~= bits of information to classify you

A good intuition is that although there's low certainty which site you have visited, there's high certainty which sites you HAVE NOT visited

I think is easier to see how you can fingerprint someone based on the set of sites they HAVE NOT visited

Edit: I'm not so sure anymore, I think that requires to test looots of libraries, it's not practical unless you are a really nasty ad company that tests hundreds of libraries in the background really sucking up your bandwidth


I got a better idea

Imagine ad companies choose libraries that are roughly used by 50% of users, if they test you with 10 of those, they learn ~10bits of entropy to classify you, i.e. which of the 1024 classifications you belong


Because you'd make your own library used exclusively for tracking, or check to see if another site specific set of resources had been loaded.


How is that any privacy hole at all?

Millions of websites use vue (as an example) from the same CDN. There's no way to know which site it was cached from.

What a terrible thing they've done, making the internet slower for everyone on earth for no good reason.


Its pretty bold of you to assume that you know more about this problem than every browser developer.

Websites don't just "use Vue". They use some particular version of Vue along with particular versions of other libraries. My understanding is that once you put all of that together, this can create some significant privacy leaks.


By "they" do you mean advertisers? If so, I agree


It was discussed here at the time: https://news.ycombinator.com/item?id=24894135


I wonder if a browser vendor could host a repository of common libraries? With a limited number of versions available that changes only at the same time as the browser, maybe it wouldn't leak much info. It would be almost like including the code in the browser.


Isn't that almost like all browsers including a language with the browser?


It is, except the feature is downloaded on demand.

Users need to trust the browser's producer anyway.


How could this go wrong - Bill Gates


Couldn’t browsers theoretically “bake in” the python.org binary to the browser itself to avoid this?

Kind of like how macOS (used to) ship with python preinstalled?


If browsers already have a multi-paradigm, dynamic/duck-typed, garbage-collected scripting language in JavaScript, why would they add another one--especially since JS is incredibly backwards-compatible to not break the web? In the broader scope of programming languages, Python and JS are the same thing. If their going to add a second language, the least they could do is make it interesting in comparison.


Javascript will be 30 years old in 3 years. It's been successful, sure, but it's also been controversial and a lot of language mistakes have had to be carried forward. The 5 minute "wat" talk still lingers in my head about those basic mistakes.

"Oh they fixed that in Typescript."

But they could have fixed it with practically any other language as well.


The Wat talk is mostly BS for laughs. It's hilarious but to someone familiar with the language it's a bunch of stuff you'd mostly never run into. Some are even outdated.

Like he makes fun of the fact that Array(16).toString() prints 15 commas. In the context of the talk it's funny but in reality, what would you expect. You made a array with 16 empty elements. Array.toString() calls toString() on each element and separates them by commas. Why is that unexpected?

He then shows Array(16).join("wat") which is the same as the previous except JS uses "wat" between elements instead of ","

"wat" + 1 is string + number coerced to string so string + string. string + is defined as concatenation

"wat" - 1. There is no override for - for a string so numeric minus tries to add a string to a number and returns NaN. Ok, why is that unexpected? When will this bite you. You shouldn't be adding numbers to string or strings to numbers. I've been programming JS for ~20 years, I don't remember running into any of these issues.

I've also never tries to add to arrays, add an object and an array, add an array and an object, nor add 2 objects. It's funny that the language does something but so what.

You wanna talk about a language that sucks try bash. Meanwhile I've had no problems shipping 100s of projects in JS. (also, C, C++, C#, perl, python, assembly, others)


> Javascript will be 30 years old in 3 years

Python is already >31 years old.. Python is even older than Java. It is so slow - can barely crawl compared to other languages. Maybe we should retire grand-daddy Python ?


Or maybe just have google invest resources to make it faster like they did with JS?


On top of it, JS is significantly faster.


Faster than what?

JS was considerably slow until Google decided to spend resources to make it faster.


Current standard JS runtimes (e.g. V8) are faster than current standard Python runtimes (such as CPython).

Not that Python can't be made faster (though many architectural decisions of the language resist it), but it hasn't been so far, so there's little incentive to include it to a browser. It gives too little new capabilities, on top of JS.

OTOH, say, WASM gave many new capabilities, and has been included.


Yes. It absolutely can. And it would be even more efficient than using WASM. However, I guess adding a new language would introduce a lot of security issues and what not.


This is what https://decentraleyes.org/ does, right?


While I get you point, I think it's of limited usefulness for Python. You would end up with Snekr.io using 3.11 and cheeseshoppr.ws using Python 3.8.4 and etc. You can get some packages for any Python3 but they are the smaller ones. The larger wheels like numpy and pandas have C code which needs to be compiled or there's a giant version dependent wheel.

Server side Python is often deployed in a venv which has the desired python version and all the dependencies copied with the code.


the caching sites would host a small number of versions and the publishers would be motivated to select from among them.


That page says there's a model for shared libraries being considered, so maybe there's hope.


You'd need trusted providers, and a mechanism to 'import' a trusted something into your 'domain'.

What am I missing?


You’re basically describing the old Java plugin paradigm.


I’m a big fan of Python in WASM! It really reduces the friction of playing around with things. Something I find pretty useful that I hacked together using it is https://pyground.vercel.app/ - I use it whenever I have a CSV or JSON file I want to do some poking around in using Python.

You can just drag and drop a file in, it’ll be imported in as the ‘data’ variable, and then you can run Python code/do matplotlib visualisations without installing anything.



Why use anything else when Jupyter is on WASM?


JupyterLite would probably be my go-to for working on a script or just trying out some Python or something like that. It looks like it'd be way less friction than getting Jupyter running locally, especially if you don’t use Python often.

Pyground is specifically written for a use case I wanted to optimise: to get data from a file on your local machine into a structured Python variable fast. I think with Jupyterlite you’d have to upload the file and then write your own code to read it/parse timestamps, which is just boilerplate. So if you're trying to do something like that and don't need anything else that JupyterLite offers then pyground might get you there faster. JupyterLite is way more flexible though.

Also you can use pyground to load the data, then do `import pickle; pickle.dumps(data)` in pyground, copy the output and then do `import pickle; data = pickle.loads(<copied output>)` in JupyterLite and you'll have the loaded data variable way faster than writing that code yourself and all the flexibility of JupyterLite :)


Have you seen Skulpt [1]? It doesn’t do matplotlib (or most other library) but it’s pure JS.

[1]: https://skulpt.org/


MatPlotLib support in Skulpt for BlockPy [1]. Not complete, but workable for a lot of common graphs that I use in my CS1 course.

[1]: https://github.com/blockpy-edu/skulpt/blob/master/src/lib/ma...


Ah yep! I came across it a few years ago when I was chatting to the team at anvil.works, they were using it for their client side python. Looks really cool, really ambitious and impressive project.


Anvil is pretty cool...I spent a good bit of time experimenting with it. Unfortunately, it has some edges you hit quickly as an experienced developer which get in your way. However, for their target audience, which I infer to be skilled business folks who don't necessarily develop full time, it's awesome ;-)


interesting, is it possible to write some sort of frontend framework like solidjs with this? what would be the the performance like I wonder.

it would be really great if we could have full python/pip support in the browser but with some vetting done (just realized we don't have such thing in pip, other than relying on pip lockfiles and pipenv)


Interesting! So I can write any Python code here in the web app, and the Python code will run locally on my client machine without sending any information to 3P web servers?


Yep, it’s all static with no server side and the Python all runs locally. The source is on GitHub: https://github.com/mcintyre94/pyground so you can run it locally or deploy it yourself if you’d like too. Any files you load always stay local too.

There are some limitations around what Python code you can run, there’s some details in the readme about those.


Did you use tailwind css for that?


Yep, I think it might have some tailwind UI bits and pieces too. The source is on GitHub: https://github.com/mcintyre94/pyground


Those interested in this should check out Pyodide[0]. It basically "just works" so long as the libraries you import are pure Python or are part of the core scientific stack (the Pyodide authors have done the work to manually port all the C code behind numpy, scipy, etc.).

What I really wish for is for ~all Python packages to work in the browser without manual porting of the underlying C/Rust/etc. being needed, since a lot of the interesting and useful libraries aren't pure Python, and manual porting is non-trivial.

I'm not sure what the best route to that future is, but I'm guessing it'd probably help if Python had a wasm runtime in its standard library[1], since then authors of libraries that use C/Rust/etc. might make cross-platform builds (perhaps by default).

Regarding this Pycon speech, it seems that it's related to the following entry in the 3.11 changelog[2], which the speaker was heavily involved with:

> CPython now has experimental support for cross compiling to WebAssembly platform wasm32-emscripten. The effort is inspired by previous work like Pyodide. (Contributed by Christian Heimes and Ethan Smith in bpo-40280[3])

But maybe Christian has more to reveal here? In any case, I'm hugely appreciative of all the work that is being done to bring Python to the browser!

[0] https://github.com/pyodide/pyodide

[1] https://discuss.python.org/t/add-a-webassembly-wasm-runtime/...

[2] https://docs.python.org/3.11/whatsnew/3.11.html

[3] https://bugs.python.org/issue40280


I think the closest to that goal is GraalVM. It can run LLVM bit code for the C parts while can natively run Python. Since the whole thing is java and java byte code can be compiled to wasm/js by teavm it should work even know though it is definitely not streamlined yet.


I think it would be cool if browsers shipped with WASM-compiled interpreters and runtimes for other languages, like they do with JavaScript.

That way you'd be able to use other languages in the browser without needing users to download 20MB of a WASM-compiled interpreter just to run 1KB of code.


You're still downloading the 20MB WASM-compiled interpreter, it's just that now you're redownloading it every time your browser updates.


Yes, but that's the point, if the user has it installed by default then there won't be much of a penalty if a website chooses to use another language than JavaScript.

Right now, WASM interpreters only really make sense for teaching and exposition with REPLs, where the user probably won't mind large downloads in order to do something out of the ordinary. Shipping interpreters would instead make that ordinary.


It's still a big penalty, you're just changing who pays the penalty, when they pay the penalty, how often they pay the penalty.

Personally, I'm pretty happy that we have WASM at all, and I think there's a lot of work we can do (over the next years) to make interpreters that work well in WASM.


A 20MB interpreted bundled in the browser is basically nothing considering the browser sizes. A 20MB script in a page is massive.


> A 20MB interpreted bundled in the browser is basically nothing considering the browser sizes.

My install of Google Chrome is around 85 MB. 20 MB is a lot more than nothing.


For comparison, pulled up a computer that's running Windows and has Edge installed (which will be indicative of a large part of the population that doesn't care much about the intricacies of their installs):

  Edge: almost 500 MB
  EdgeCore: almost 400 MB
  EdgeUpdate: about 20 MB
In the grand scheme of things, 20 MB is indeed nothing, because many browsers out there (that cannot be uninstalled without crippling the OS in some regards) are already pretty bloated, use bunches of plugins anyways and just generally have untold amounts of cruft in a variety of other software (e.g. just compare MS Office vs LibreOffice and look at how much space professional software like Photoshop or Blender or whatever takes up).

What i'd like:

  - to optionally be able to maximize the browser size install to minimize the amount of data that would have to be fetched over the network (e.g. one bundle for Python, one for .NET/Blazor, one for Rust, one for Go etc., based on what you need, maybe just all of them), the same way that plugins work
  - to have these bundles support being toggled (or even downloaded, if allowed) on a case by case basis, as they become necessary (e.g. enabled in daily driver device, disabled and not installed in a Firefox/Chrome/... install inside of a Docker container for testing)
  - somehow have the industry force everyone to slow down - e.g. you'd have new updates for all of these come out perhaps once a month instead of every other day, as you already do with front end plugins and needless updates of JS bundles
  - so, since CDNs were crippled and browser caching is impossible for common resources across different sites, re-introduce that mechanism in some capacity with these bundles that'd be installed locally
  - enjoy the ensuing hell that'd be like the Java Applet idea which was brilliant for rich content but had untold flaws in regards to sandboxing and other things (similarly to how Flash was later also killed off, really torn about that one)
So obviously we cannot win and will never have that.

Alternatively:

  - build more static sites
  - realize that you can't implement everything you need without JS
  - reinvent your own minimalist framework/library, badly; though hopefully use something like Svelte or Alpine.js for cases like that
What we'll realistically have instead:

  - unique bundles of resources for every site, no caching across sites, no proper ways to utilize CDNs in a cross-domain context due to fears of being spied on
  - fast development velocity with lots of updates, continuation of needing to download hundreds of KB or even multiple MB of JS assets to keep browsing the same content in almost the same way
  - the problem will only be made worse by developers pursuing larger WASM bundles, like Blazor in .NET, with few advantages for developers but many disadvantages for everyone else
  - this problem generally will not be regarded as serious, because most don't care about how much bandwidth they waste, a la Wirth's law


> it's just that now you're redownloading it every time your browser updates

There's no reason browsers couldn't have a persistent cache for this. Think of it more as an integrated dependency manager and VM that just happens to use web infrastructure. No one would blink at downloading 20MB of dependencies anywhere else, after all.


Browsers already have a persistent cache!

If you're downloading it on-demand and caching it, why would it need to be integrated in the browser at all? Why not just stick it in a CDN and treat it like a regular file, the way it works right now?


Different origins don't share a cache anymore. A Google Font on one origin doesn't benefit from it being cached after it's used on another origin.

The browser handling this as a special feature by default avoids cache isolation issues. It also makes it trivial to avoid leaning on even more third parties (CDNs, package managers) to run your code: as long as your browser is supported, code won't stop working in it.


Caching is partitioned by origin for privacy reasons; caching still helps, but a public CDN solution doesn’t improve things any.

That does open an argument for having a kind of global interpreter cache, though it could still be used for fingerprinting.


> Browsers already have a persistent cache!

And yet whenever this comes up, someone always insists that you'd have to re-download the entire runtime with every request, as if caching wasn't a thing.

As far as shipping vs caching goes, I see no reason not to do both. Maybe ship with the latest version of a few popular languages including javascript already pre-cached and allow for downloading others as required. I don't know what would be more optimal. Maybe when you download a browser, you can select language support options.

My point is, this is just an implementation detail, it doesn't have to be awkward or inefficient.


That's better than 100 times for various sites.


>WASM-compiled interpreters and runtimes

in which version?


The browser could ship with LTS versions and download other versions on the fly with a prompt to the user. There's no reason the browser needs to have every version always, just the ones most likely to be used.


I'd love that too, if I was a better programer I'd likely fork Chromium and build in Python support.


I did a quick speed comparison:

   import numpy as np
   import time
   n = time.time()*1000
   r = np.random.rand(10000000)**2
   print('done in', time.time()*1000-n, 'ms')
In the browser I was getting around 200ms and on my computer I was getting around 65ms, so around 3 times slower.

Also, if I try to allocate an array of length 100,000,000 then I get a `MemoryError` in the browser.

Does anyone know any ways to get around these limitations?


this is coincidentally really freaking good. to be able to run numpy in a browser without having to rely on javascript port? and its only about 3 times slower? I can live with this for now and i think this will improve drastically in the future


Depending on how the wasm was compiled, there are options to limit or prevent memory growth. But I think the total limit at the moment is 4gb because 32bit.


that's... really not bad at all. Kind of amazing, in fact. Thanks for putting in the work.


I find WASM in general very cool. Is there a way to run a WASM "binary" locally? For example, can I distribute a program as WASM and execute ./mywasmprogram and have it run?

I imagine I need a WASM runtime installed, and to somehow get my shell to recognize that WASM programs should be loaded with it (for lack of a hashbang line), but is that actually doable?


Yes, definitely. Wasmer is one way to do that: https://docs.wasmer.io/ecosystem/wasmer/getting-started


Note: There was some discussion of Wasmer here a couple of days ago: https://news.ycombinator.com/item?id=30758651


Excellent, thank you! I knew of Wasmer but didn't realize that's what it was. I look forward to the day where we can just distribute one architecture-independent binary.


Like a .jar file?


Yes, but with a much leaner, memory-efficient, and more secure runtime. WASM is kinda "JVM without the Java bits" but that's a good thing. Having the VM provide facilities for GC is not necessary. Go shows that you can embed it in the executable. And Java bytecode is too close to Java code and not close enough to machine code, and so the VM has to provide a full interpreter - also unnecessary.


Wasm requires a full JS runtime no? At this point those are no lighter than a full JVM.


Nope! Much like Node allows you to run JavaScript without a browser, there are plenty of standalone WebAssembly runtimes which don't need a host JavaScript runtime.


You can also use node to execute webassembly programs.

You do have to create a js host file, load in your webassembly and then run it with node.


The WASI interface even has system file access:

https://nodejs.org/docs/latest-v17.x/api/wasi.html


wasmtime, Standalone JIT-style runtime for WebAssembly, using Cranelift: https://github.com/bytecodealliance/wasmtime

wasm3, WebAssembly interpreter: https://github.com/wasm3/wasm3

… and many more


The irony is that in the old plug-in days ActiveState did had a Python plugin for the browser.


We've gone full circle


Except that unlike NPAPI plugins, WASM is sandboxed.


So we got rid of browser plugins like flash because they were "unsafe" and we "magically" couldn't "sandbox" them for a decade. Then somehow we come up with a new thing that is "sandboxed" (trust us, it is, because WASM!), so let's use that instead and recompile arbitrary binary plugins for that and run it in the browser (what could go wrong).

Really, without all the details/nuance, it sounds like we shot ourselves in the foot, went down a wrong path because of it, and now we're full-circle back to where we started. Except now the web is drastically different and the browser is now our one and true only terminal to the Holy Server that is Google et al.

/end of rant


This is hyperbolic. We could sandbox browser plugins, and did, but this broke a ton of things than used plugins because they expected not to be run in a sandbox.

Are you afraid of JavaScript in your browser? Perhaps you are, and that's fair, but WASM is no more dangerous than the JavaScript that everyone already runs.


It does very little, when the sand can be tainted or be coerced to call external imports in the wrong order.


If you are ready for some inception, go watch David Baezley use Python to implement a complete WASM interpreter in under 1h. https://www.youtube.com/watch?v=VUT386_GKI8 One of the best coding presentation i've seen.

This should make it possible to run python in wasm in python.



> there is one place that Python has not yet conquered: the browser

Well, there was that one web browser written in Python, with built-in Python scripting support: Grail.


I don't know what python's async support is like but the biggest issue with porting to the web is the browser mostly requires that you exit events before it will do anything. This is contrast to pretty much all other environments. That means there are tons of patterns and libraries that just don't work and require major re-writes or big AST manipulations like asyncify from emscripten.

I'm not saying you can't do async work in any language. Rather I'm saying that likely the patterns you're used to using in some language that's not JavaScript won't work in the browser and you'll have to massively change your style.


I can't wait for the browser to be released from the shackles of JS.

Just imagine how awesome would it be to use the full power of Python to build a react like library.

It is a source of great disappointment to me that WASM cannot directly control the DOM.


Why does it need to be direct? A bit of glue code doesn't hurt, especially since most of the other glue is going away with WASI.


That glue code is a massive cognitive burden on a programmer. It's not a big deal when you already know it.

But when you don't, a thin, glue layer of code following different paradigms than the other 90% of your project will crush your development agility anytime you have to touch that glue layer.


WASM Interface types will fix that


This is huge and comes at the right moment for me. The Web Browser part is far less important to me than being able to compile programs into small and autonomous wasm binaries. I am indeed building an open-source (soon) business app platform based on code: https://windmill.dev

Right now you can define any scripts and run them, but behind the scene, the way the workers work is that they have a static list of dependencies they can handle and always fork a python process to run the code in that environement.

I was scratching my head about how to provide proper isolation and handling of custom dependencies for Python short of zipping the entire list of pip dependencies and unzipping it at runtime. For Typescript, this can be achieved easily using deno compile. Store the small output bundle and run it with the proper runtime restrictions. With the ability to do more or less the same Python, this is a huge game changer.


How large is the wasm binary?

I would guess it's only for core python, without modules.

Brython was nice to use, although its dom syntax was a bit awkward.


Can someone explain to me, really slowly please, how to take an existing Python codebase such as https://github.com/infojunkie/mma and run it in the browser using one of the technologies mentioned in this thread? Especially, how to deal with filesystem calls that expect some files to be present in various folders. Thanks!


According to the documentation and discussion about this feature some unit-tests are currently simple skipped: virtual filestem by Emscripten that's not fully POSIX compliant, no processes, no sockets, async only via the browser event loop (source: https://bugs.python.org/issue40280)

This would mean to me they simply do not work or only to an extend. As you can reading the virtual filesystem seems not fully compatible.


I’m super excited about this being integrated upstream.

Long ago we compiled Python 3.6 and published to WAPM: https://wapm.io/python/python (you can run it online there!)

I wonder if we could publish the new version also! I think we got the Python repl properly running so it would be interesting trying to have that working on the new version too


Does the talk do a comparison to Pyodide? Seems very similar and I'd be very interested to hear what the advantages and disadvantages are.


I worry that we’ll now see more broken web apps, given the lack of type safety in Python. I know that JS doesn’t guarantee that either, but at least you can choose to work with TypeScript. True, there’s mypy and alike, but their hard to integrate into existing projects, and are also not perfect.


type hints[0], mypy[1] provide type annotations to python

And type hints is something that JS lacks. How many times I've tried to copy paste typescript code to nodejs/browser console and fail miserably

[0] https://docs.python.org/3/library/typing.html

[1] http://mypy-lang.org/


type annotations does not guarantee correctness, performance, nor does it make your code bug free. Type annotations however helps when writing code, as the tooling will know that something annotated as type X have these prototype variables, or that the object annotated with interface should (but does not guarantee) to have the annotated interface members. This can also be achieved via type inference, but inference is more complicated then simply annotating (writing down) the type in the code. Before Typescript some people used to have the type in the variable name, like strName, fltCost, arrPeople. Bugs are detected by running the program in a chaotic environment (live/prod) where there are humans involved. Bugs can also be found by carefully and critically studying the code. Some bugs can also be found by inference or type annotation tooling, or usually when trying to run the code for the first time.


Sorry, don’t quite follow, why would the lack of type safety in Python make this a problem, and how would Typescript help in this situation?

This comes from Python dev with no idea how to make web app. Eg I’ve seen some package not maintained for years in PyPI while continue to work fine.

Also I think Python is strongly typed while js isn’t. I would think that’s a bigger problem?


Having static types in help programmers make less mistakes by allowing the compiler to perform stronger forms of static analysis. This prevents programmers from accidentally passing in arguments to a function that have types the function was not defined to handle (since the compiler/interpreter will error out).

Also "strong typing" is not well-defined across the literature as acknowledged by this one link from Cornell (which lists strong typing as having "the type of every variable and every expression is a syntactic property" and variables that are "used only in ways that respect its type"): https://www.cs.cornell.edu/courses/cs1130/2012sp/1130selfpac...

You also have lecturers at Carnegie Mellon teaching that strong typing means "Types must by explicitly converted" :https://www.cs.cmu.edu/~07131/f18/topics/extratations/langs....

Personally I think the historical definitions from Wikipedia are the most concise (it's paraphrased by me as saying "arguments to a function should have the same type as the parameters the function was defined with): https://en.wikipedia.org/wiki/Strong_and_weak_typing

In terms of 'strong typing' (with the definition of not being able to do any kinds of implicit type conversions), Python can be seen as weakly typed since there are forms of implicit type conversions like adding different numerical data types together. I would say it has less instances of type implicit type conversions than something like Javascript though, so if there was a metric of "strength" defined that's inversely correlated with the instances of implicit type conversions than I would say Python could be considered "stronger" (although a lot of Javascript quirkiness is removed through the introduction of static types).

All that being said, the argument against Python seems to a gripe against implicit and dynamic typing whereas Typescript has neither and instead brings explicit and static typing (the argument doesn't seem web specific).


I think I don't disagree with anything you said fundamentally, but the main point in the parent comment is

> I worry that we’ll now see more broken web apps, given the lack of type safety in Python.

I just don't see why "broken web apps" would be related to "type safety in Python", and how JS/TS solved the issue.

Put it in other way, we could forget anything about types and not even having unit tests, and if the first time you deploy it, it works, with sensible deployment you should expect them to work "perpetually". That's why I mentioned that some unmaintained package in PyPI continues to work for years, because Python 3.x doesn't break backward compatibility. And if they should worry about compatibility with any dependencies, "pinning" to fixed versions or minor versions is better practice. (I know it doesn't solve all problems, and ideally people should keep maintaining them. But again the statement is about "broken web apps" and "type safety".)

P.S. Python as strongly-typed language is about say `1 + "2"` situation, where JS seems to allow but not so in Python (although Python's object/data model can handle them via `__add__`). Rebinding a name to another variable is ok in Python, but is a separate feature/characteristics.

Also, while I don't know TS, from other comments here it seems that type hints can be more powerful in some cases, if not strictly more powerful. I'm a believer in type hints when using Python and I find it very useful (and recent Python versions are making them more and more powerful.) It is more flexible than static typing but is still useful in performing static analysis, which has detected bugs in my program even when it passes unit tests. (Often it is related to sloppiness rather than "real" bugs though. Having good types make reasoning about the program more easily.)

Lastly, runtime type check is a thing in Python too. E.g. I think it is very common for the `__init__` to perform runtime type checking. And then there's some packages allowing you to define a schema where the library would perform runtime type check automatically given the schema.


People who want Python on the browser would have probably not chosen Typescript anyway, it's mostly not the same demographic. Not everyone cares about types.


At this point, as a Python dev, I wouldn't even bother. The web is dirty and unless you have huge community mindshare or oodles of funding (a-la Google), you're not going to get any meaningful traction with whatever python-based UI framework you create. The JS community (funded by big players like Google and Facebook and 20 million liters of starbucks hipster coffee) has figured out all the tooling, libraries, and practically taken over every facet of the web development ecosystem so it caters for JS, that it's going to be severely impractical and hopeless to jump in at this point.


when can we expect a recording of this to be uploaded on youtube, are there any existing content out there that talks more about wasm and python? seems really important, especially for many researchers and data scientists who use python primarily to be able to cross over to the browser.


Is there any way to compile a subset of Python itself to Wasm? (As opposed to compiling its interpreter to Wasm.) Same question for Javascript.


WASM is a virtual machine, but it's missing support for some things that would make that practical. GC support, polymorphic inline cache, char/byte types, reference types, etc. Some of that is on the WASM roadmap.


Thanks. I'd read that GC support might be on the way (and that that might make Go a bit more attractive for Wasm than it is currently). But I don't know anything about the other features you mention. Is it your understanding, or guess, that it is the "community's intention" to have (a subset of) JS or Python compiling to Wasm soonish and if so any guess as to how long?


CPython uses reference counting plus cycle-breaking, though, so it's not like you'd have to bundle a full GC implementation with your Python code.


[flagged]


You could have asked for benchmarks without saying, "Yaaawn." I think the author would be more likely to volunteer to improve the code if it were encouraged rather than yawned at.


Can't see the comment anymore as it's been flagged, but in general some of the most useless feedback you can give to projects in their infancy are: what doesn't work or what doesn't work well enough. If it takes you 5 minutes come up with such feedback, hands down, the author knows it already, and well informed readers know it already. For example, no one familiar with porting things to WASM would think that a large project like Python would just work with no performance issues on a first release.

It's the thing I look forward to the least about sharing stuff that I've worked on. I know all the issues, and sometimes you just want to show off a big endeavor before continuing with the nitty gritty of making better software. But you're gonna run into people who just want to poo-poo on your hard work.


It wasn't that bad of a comment, but was pretty much exactly what you said. Was having a pretty bad day when I posted.

Keep on working OP, I want to be able to pick python instead of JS someday.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: