Hacker News new | past | comments | ask | show | jobs | submit login
What is the core of the Python programming language? (snarky.ca)
232 points by BerislavLopac 8 months ago | hide | past | favorite | 129 comments



As a maintainer of the Skulpt Python-to-JS compiler (https://skulpt.org), I can reasonably attempt to answer this question.

Skulpt compiles Python to JS in one shot, so it's not a million miles away from the architecture Brett envisages. (I gave a lightning talk about it here: https://anvil.works/blog/pycon-talk).

Things I've observed:

- "Ordinary code" requires remarkably little fidelity. Python has a very strong "simplicity culture", so most code avoids being clever. If you're creating an environment for users to program against (as we do in Anvil), it's astonishing how little of Python you need to support.

- "Be able to use popular Python libraries" is really, really hard. The ordinary user doesn't use all the esoteric stuff Brett mentions, but libraries do, left and right. Even porting PyPy's `datetime` implementation to Skulpt required filling out a lot of odd little corners of the object implementation.

We still have a pretty crummy subset of `unittest` because it's eye-achingly dynamic - for example, it really puts metaclasses through their paces. (Metaclasses are a perfect example of a powerful feature that really depends on details that descend directly from CPython: The API exposes the fact that variable scopes are literally `dict` objects, that classes are created dynamically at runtime, etc.)

- Finally (and most unfortunately): A lot of the strength of Python is in its ecosystem, and a lot of that ecosystem relies on native code. (Again, much more than you'd think. When porting stdlib modules from PyPy to Skulpt, we have often been disappointed by how many of them use native code - even in PyPy!)

The Python-to-C interface is really, really CPython-shaped. Anyone who wants to be compatible with numpy, scipy, etc, needs to literally emulate CPython's data structures, and then translate them to whatever faster/simpler thing they use internally. This emulation overhead is the main reason PyPy is slower than CPython on some workloads. (In Skulpt, we haven't even attempted this. It doesn't bother us much in Anvil, of course, as the server-side CPython is just a function call away, but for Brett's use case I'm guessing it would really hurt. One day we might try, with a JS-to-wasm bridge and doing the emulation in wasm, but it would be a hell of ajob.)


>> - Finally (and most unfortunately): A lot of the strength of Python is in its ecosystem, and a lot of that ecosystem relies on native code.

I was thinking the same thing... One can wonder how much value there is left if you build an alternative Python interpreter that can e.g. run in the browser, but cannot support C extensions.

I mean I like Python for what it is, but once you stray from the common/intended use cases, e.g. trying to embed the interpreter in a C++ program, skeletons fall out of closets around every corner.

For all it's merits, CPython feels like a house of cards, burdened by a legacy of bad engineering decisions it cannot get rid of anymore. It made me realize it's better to stop fighting it, use CPython as-is (which is great), and use other scripting languages for embedding, live-coding, web, etc. I very much prefer Python over JavaScript for example, but even then it wouldn't cross my mind to use some kind of Python-to-JS implementation for web frontend code, honestly.


> CPython feels like a house of cards, burdened by a legacy of bad engineering decisions it cannot get rid of anymore

At PyBay last year, I ended up with a knot of Python core devs talking about this problem, and talking about building a better, more portable API for native modules. The transition would be nasty (the 2->3 transition was..scarring), but it would solve this problem at a stroke.

> I very much prefer Python over JavaScript for example, but even then it wouldn't cross my mind

I would encourage you to at least check out Anvil (https://anvil.works). It's more than a Python->JS compiler; it's a reimagining of what the web as a platform could look like.

FWIW, I agree with you that it's a bad idea to swap out JS for Python at one layer, while keeping the rest of the stack intact. It's extra complexity, and it doesn't solve the actual problem with the Web, which is the sprawling complexity of the stack and all the weird frameworks that work around it (while themselves making the problem worse).

The Big Idea of Anvil is to replace all those different layers of abstraction with one big, coherent abstraction. That's one reason why we deliberately chose Python rather than JS, so that users wouldn't reflexively reach out and break that abstraction every time they Googled how to do something.


> a better, more portable API for native modules

Pretty sure this exists already: https://github.com/pyhandle/hpy


I really liked Anvil, the thing that made me go back was the requirement for an enterprise account to not host on AWS.


Good news! We open-sourced the runtime, and the standalone App Server, so you can now deploy anywhere you like:

https://anvil.works/open-source

(And yes, it even works with the Free plan)


I'll give it another look, thanks for letting me know!


Anvil looks really interesting, and I appreciate your free forever tier, but I think you're missing a hobbiest tier.

I'd be concerned about investing too much effort in case my project strayed over one of the limits and I have to either pay £39 or kill the project. Whereas I would pay £5 without even thinking about it for access to all the libraries and perhaps a handful of emails etc etc.

Once I'm paying the £5, I can see my requirements potentially growing to reach £39/m but as a hobbiest I'd definitely never get there without stepping stones.


Just as a curious reader, what exactly is it that makes you very much prefer Python to JS? As someone who uses both professionally I rarely if ever miss Python features in JS (or vice versa for that matter, aside from the lack of multi-line anonymous functions or block scoped variables in Python)


Obviously some of this is subjective, but the main points of frustration are the terrible standard library, the type system requiring you to use things like ===, !== to prevent crazy type conversions, the total lack of advanced built-in data structures, and the fact that you pretty much need to use transpilers to use all the language features, things like coffee script to get a sane type system etc. JavaScript lugs around a lot of legacy and it leaks out in bad way. That, and I absolutely detest the whole npm ecosystem and most of the rest of the typical JavaScript toolchain. That said, I appreciate its ubiquity, and it does have some nice things lik very good support for functional, reactive and/or async programming styles.

But like I said, I may not be entirely objective as I work almost exclusively in C++ and backend Python these days, so JavaScript is pretty far outside my comfort zone anyway.


Tooling and libraries mostly. In python the ideal is there "should be one, and preferably only one, obvious way to do things". This makes it a lot easier for me to reason about other people's code. The tooling is also an issue, with how much it can take to get started coding.

There's also stuff that surprises me around variable scoping, I imagine I'd get used to it but...

There are also just a bunch of foot guns I have to re-learn, like the comparisons often being bizarre, it's easy to work around all those when you're aware of them but there are a lot of those little eccentricities.


Starlark is a Python dialect implemented by Google. It is rather liberal with compatibility, but as you observed, none of that matters for "ordinary code". It doesn't even pretend to support existing Python library ecosystem.

In exchange, Starlark is GIL-free. This was the primary motivation to reimplement. It really is that easy if you don't need to be compatible.

I think Starlark is a good approximation of the core of Python. It is easy to implement. It DOES 100% look and feel like Python, and if you use it as Python-as-pseudo-code mode, you will never realize it is not Python. On the other hand, almost no Python library will run. This feels paradoxical, but that's what it is.

https://github.com/bazelbuild/starlark


Wait I didn't realize it wasn't python. Google engineer for 3 years now.


Haha. It is 100% obvious if you are a Python implementer that Starlark IS NOT Python, but it is ALSO 100% obvious if you are a Python user that Starlark IS Python. Strange, isn't it? Such is life.


A lot of Starlark code at Google is still tested using a Python framework. So the code is compatible with Python to a fairly large extent.

A Starlark interpreter can easily be compiled to webassembly and run in a browser. I did it here: http://laurent.le-brun.eu/starlark/


Just to emphasize this:

https://github.com/bazelbuild/bazel-skylib/commit/9948d5538b...

It's possible to `exec` bazel and have it work, with some trickery.


This is quite intriguing since my first thought after reading this is "its too bad we don't have a language thats all the syntax of Python but none of the baggage" and then the #1 child of the #1 comment is almost that. Not quite, but almost.


It’s interesting to me that Python depends so much on native code to have even reasonable performance, but because so much of the ecosystem is native code and because the native code interface is “CPython shaped”, it becomes incredibly difficult to improve CPython’s performance because it would very likely change the shape of CPython. Of course we could have the best of both worlds—a fast Python that allowed you to use native extensions—but the native extension interface would have to be minimized to something that would give Python implementations some ability to look different than CPython today. And when you have a performant Python implementation, much less needs to be implemented natively, which makes things like package management and cross platform development much easier (specifically you don’t have to worry about cross-compiling and testing on your target becomes considerably less important because Python is portable while native extensions likely are not). The latter becomes a lot more pertinent if you’re on x86 targeting ARM or if you’re on a new ARM MacBook targeting x86 or if you’re targeting WASM from either. These cases aren’t common today, but ARM and WASM seem to becoming increasingly likely in the future.


I think that what's even more interesting to me is that people perceive this as such a preoccupying problem.

When I was a young little programmer, the dream was to have a base layer of highly performant native code for handling the heavy lifting, tightly coupled to a very high level scripting language that wasn't really even trying to be performant. The thought being that the scripting language should be more focused on flexibility than performance, because, for the bits of the software that it handled, developer productivity was more important than raw performance.

Fast forward a year or two, and we are living the dream. It's Python, and, while I could nitpick implementation details and language features all day, the big picture is that it's pretty awesome. So awesome that, for the machine learning and data engineering work that I do, the most performant and most productive language I have to work in is Python. (Also R, which is a pretty similar technical story, but I realize that R lost the popularity contest, and I understand why.)

I guess it starts to feel to me like we're making the perfect the enemy of the good, here. We've got a well-established, productive, flexible, and easy-to-learn language where top-tier performance can generally be implemented as a library solution. Despite its many imperfections, that's pretty darned good. In my career to date, I haven't seen much cause to believe that I could realistically expect much better than that. I think that, when I do need better performance, I'd rather write a library than rewrite the language.


Some people have performance requirements to meet, and we’re pretty locked into Python (because some of our engineers foolishly bought into the promise that we could just rewrite the slow parts in C/Pandas/multiprocessing/etc).

If this is “living the dream” then Go and other languages will blow your mind. You can have iteration that is at least as fast (especially when collaborating with other engineers) and performance that is 2-3 orders of magnitude better than Python. Not to mention a better packaging and deployment story.

The sad thing is Python could be much better, but it has to back out of some bad design decisions first.


This was known at the dawn of python and before, though. For example lisps in various flavors were doing this high-and-low very well decades earlier; arguably nothing has done it radically better since.

So it's worth asking why that never took over the world as a counter to why python exploded.

It's tempting as a technology person to wish a "better python" had been the one to gain that sort of market impact. However, we have to at least ask the question if some of its technical flaws actually contributed to its success....


> So it's worth asking why that never took over the world as a counter to why python exploded.

JavaScript would like a word.


> Not to mention a better packaging and deployment story.

Go's "packaging and deployment" is enough to rule it out for new projects for me.


They said they work on machine learning and data engineering, which means they are very likely using very optimized native code libraries with Python bindings. Go's FFI story has a lot of overhead, so it's actually not inconceivable that Python could be faster for their specific needs.


His first paragraph suggests he was making a general comment, but I agree that if you’re doing very specific things like calling into optimized libraries then its not much of a problem; however, if you have to call into those libraries O(n) or worse, which is often the case, then you’ll very likely lose all or most of the performance you gained due to marshaling costs (not sure how these marshaling costs stack up to Go FFI, but in ago you would just write optimized Go in this situation—e.g., move allocations out of the hot path, make sure critical paths are inlined, etc—which isn’t an option available to Python developers). But yes, if your use case is calling into an optimized C library O(1) (and you have a high degree of confidence that it will never be worse than that) then Python will work fine (or rather, you won’t be fighting with performance, but you’ll still have to deal with package management, poor documentation, huge binaries, etc).


do you mind sharing exactly what problems you've had with Python?


Performance and dependency management are the big ones. Huge binaries are another pain point as well. We would like to run code in lambda functions, but the binaries (which aren’t that complicated) are too big 250MB so we have to run them in ECS tasks which take something like 30s to start up and building the images takes a lot longer as well (we could probably spend time to optimize the docker image builds).


I run lambda functions in Python but I just upload the code. I suppose you are using lambda functions in a way that isn't really promoted for Python.


We also “just upload the code”; however, the issue is that the code size is too large. Lambdas can only be 250mb, and if your dependency tree includes Numpy that’s 70mb alone. Go binaries are about 2 orders of magnitude smaller than Python binaries by my crude estimation. Go compiles source code to native machine code and the linked trims dead code. Neither of these are possible in Python, so you get enormous binaries.


I can see why that's a problem. No tree shaking either in Python. Still it seems like aws can solve the problem by lifting your limits or something.


I think though the perceived problem is that as soon as you have to shift between high and low levels, you have a boring friction which gets people dreaming of a different way, hence Go, Julia et al.


It’s that the “shift between high and low levels” costs performance, and if you have to do it O(n) times or worse (as is often the case), then that marshaling cost is a significant portion of (or even more than) the savings that you get from using a lower level language. This isn’t an issue in Go because you can just write faster Go without a marshaling overhead (or you can pay a similar overhead and call into optimized C libraries, but this is rarely necessary or worth the trouble since Go is far more maintainable and generally fast enough.


I think the answer to the "how to support enough of the ecosystem to be useful" question really lies in making it super straightforward for package authors to integrate it into their packaging/testing/ci pipelines. If people test for your implementation, they will very quickly pick up when they're starting to use a feature that may unnecessarily exclude users and hopefully avoid it before it becomes too deeply ingrained in their design.


I love Skulpt! I'm so glad that it's being actively maintained. Thanks for all your hard work!


Lurking here is the main problem, not only of python but of so many langs: Dependency on the C-ABI.

EVERY interfacing with C sucks, even made on C!

Hopefully web assembly become good enough to change that...


> we have often been disappointed by how many [stdlib modules] use native code - even in PyPy!

The CPython interpreter is so slow that it makes sense to implement as much as possible in C. In the past, I've used this as an argument against slow interpreted languages - a faster implementation would not only run code faster, it would allow code to be written faster because more of it could be in a higher-level language.

However, I'm surprised that PyPy uses a lot of native code - not only does it have a fast JIT, but its whole raison d'etre is to be Python-written-in-Python - that's even the project's name!


Our of curiosity, why not just compile cpython to webasm and call it a day by shimming in module loading? I'm sure it's for performance (or extensibility?), but I'm curious where it really falls down.


This is exactly what Pyodide (mentioned in the article) does, and it works great for some use cases. The problem is that downloading and wasm-compiling the entirety of CPython and all its native modules is big and slow.

A colleague of mine collected and compared a few Python-in-the-browser implementations - including Skulpt and Pyodide - with code samples and a description of the trade-offs.

You might find the write-up interesting: https://anvil.works/blog/python-in-the-browser-talk


Not to mention any native module you might want to use from Pypi must also be compiled for WASM which is probably not very easy to come by.


That totally does work, but in practice it makes the initial bundle download too big to really be workable for the average website.


Would the same be true of javascript? If chrome didn't come with V8 in it, would it be a lot for the average website?


One way to find out would be to compile QuickJS [0] to WebAssembly.

[0] https://bellard.org/quickjs/


Seems to be 945KB in WASM.

http://numcalc.com/


When thinking about a minimal subset of Python to be truly useful, my mind jumps to MicroPython [0].

Which can be compiled to run on a standard platform, instead of the more common baremetal situation.

MicroPython does feature a REPL - though you can disable it so you can control UART in a more expected way.

A surprising amount of plain Python programs can run with the stripped down set.

Even though you won't find any of the referenced functions in sys. However, locals does work, as does builtins. You also get access to pip and a few other unexpected things like that.

[0] https://micropython.org


MicroPython is an amazing achievement, and Damien has done fantastic work growing the project.


MicroPython is addressing a different problem though: Minimalism wrt. libraries -- opposed to being more "static" in some aspects... In contrast, the author seems to be looking for an "acceptable" core that would still run most libraries -- but omitting some of the more dynamic aspects that are hard to emulate in WebASM....


Nonetheless, MicroPython is more static than CPython. Of the three examples of dynamism in the article (`locals`, `sys.settrace` and `sys._getframe`), MicroPython only supports `locals`.


MicroPython does support sys.settrace


However, it isn't documented [0].

And the sys.settrace tests are allowed to fail: [1]

PY_SYS_SETTRACE is a compile time option. It won't necessarily be a part of the build. I don't think it is by default on any of the ports at the moment, though I could be wrong.

Based on that, I'd call it an "unstable" part of the implementation at the moment.

[0] https://docs.micropython.org/en/latest/library/sys.html

[1] https://github.com/micropython/micropython/blob/master/tests...


Fair point about the documentation, but I doubt it's allowed to fail: the 'except AttributeError' is only meant to skip the complete test for builds without support for it, builds which do support it will run the test and will result in a failed build status if the test fails. And yes it's a compile-time option, just like a 100 other features, that's just how it's done in MicroPython and doesn't necessarily equate to 'unstable'.


Being a compile time option isn't what contributed to the unstable part there - MicroPython is modular. That's part of the benefit of the system.

That I couldn't find it enabled on any of the default builds is what made me conclude it might be unstable, especially as it is undocumented.


The reason for it not being enabled by default is probably twofold: code size increase/performance decrease, and it's not a feature which is used often especially not for the majority of targets.


Ah, so it does. I tested this on https://micropython.org/unicorn/ and didn't notice that it's running quite an old version.


It saddens me, that one of Pythons strengths - its C extensions - appear as a weakness in this context. The culprit is, that we perceive python as 'what you can DO through python' - which is a very reasonable perspective. But precisely python's useful boosting through C extensions, also anchors it to its friends. So, it's the old conundrum of what goes into a language core, versus what api library environment it lives in.


This is why I still use Python in performance critical code (but not so performance critical that you need to squeeze each CPU cycle). It's simply too easy to write performance critical functions in C and ffi call it in python. Of course, compared to many other language it's still slow "in general" but for your give particular task it's pretty easy to make it very fast. Just write the bottlenecking tight loop in C and everything else can be in python.


Rust with it's Cargo package manager, unicode string handling and strong type system is really great for Python speed-ups: https://developers.redhat.com/blog/2017/11/16/speed-python-u...


As I point out elsewhere it’s possible to have both performance and C-extensions, but the extension interface needs to be minimized so Python implementations have some breathing room to make optimizations—today almost any optimization would break the expansive extension interface. There are projects (like this: https://github.com/pyhandle/hpy) which aim to make a smaller interface, but the community must adopt this overwhelmingly before the old interface could be deprecated and work could begin to optimize any Python implementation.


The C extensions are partly why Python reigns in the machine learning and data science world.


As decidely a non-insider, I'll tell you that, for me, Python's core strength is to be a beginner's all-purpose programming language, and not very much else. I had a really good experience in teaching Python to 8-10 year olds; actually, the kids were able to work out most things for themselves as I couldn't help all that much really, since I'm not much into Python. And in fact, Python was forked or at least inspired from/by ABC which was specifically designed to be a less idiosyncratic PL.

Fun fact: ABC was developed by Steven Pemberton (among others) who'd later led many W3C efforts, including the failed XHTML2 effort. I didn't get around to asking how he went from ABC to XForms (which is as idiosyncratic as it gets) when I saw him as a speaker a couple years ago ;) but anyway he's always been an inspiration.


That's a good use of Python, and if that's all it's useful for for you, that's fine. I'm not sure whether or not you're trying to argue that that is the only reaosnable use of Python? To me, it sound like you are.

If so, that is contradicted by the many people who have successfully used Python for many other purposes. That includes run-once scripts (where you might have used Perl in the past) where run time speed is not as important as speed of writing the code, but also certain types of code running in production. Like any other tool, you have to be aware of its strengths and weaknesses: maybe it's not the best choice for a code base running into 10s of KLOC or more, or for doing CPU-intensive work that can't be vectorised (e.g. multiplying a few large matrices in numpy is fine, whereas multiplying an extremely large number of very small matrices is maybe not fine). The fact it has weaknesses doesn't mean it should be written off for all tasks, even those where the weaknesses aren't so important, especially when it has strengths too (e.g. if Python would do for a particular task then using C++ blindly because it's "better" would be wasting developer time for no reason). It's often used for glue code where all the work is actually done in C libraries like deep learning, and it works very well for that: if 0.1% of your runtime is spent in the top-level glue instead of 0.01%, that's probably not a problem.

Also, this discussion is fairly off topic to begin with. The article really isn't about the strengths and weaknesses of the language. It's about what consitutes the language versus an implementation of it (CPython).


I guess what I'm trying to say is that Python is fine as it is; I believe (relatively) recent additions such type annotations and chasing trends like async only serve to question Python's core strength.

I know full well that Python is used a lot for "scripting" tasks (such as in yum/dnf and countless others), and has apparently a good standing in ML; in fact I'm using it daily (for example, I'm using BackInTime for backup). But whenever I come across eg. Python bindings in a package I want to compile, or Python used as part of a build step, I know I'm in for extra work since rarely does it work out of the box (relying on fragile "/usr/bin/env python" hashbangs, or having me to manoever around missing Python packages or versions, etc.). Saying this from the perspective of someone who has seen shell alternatives come and go, even Perl has, in practice, better backward-compat and has generally aged better.

But yeah, it's offtopic I guess.


> chasing trends like async

Python is fairly popular for web backends, given the ecosystem built around WSGI, I would say that's one of its core strength, so async makes obvious sense.


Python does have some warts with like you said hashbangs but its still eons ahead of other similar languages in this regard (like Ruby) and that is a small thing to relegate it in your mind to a 'beginner only' language, since tons of other languages do things much worse than Python does.

Python's simplicity - which makes it appealing for beginners - is its core strength that had made it so widely adopted for so many different things.


> That includes run-once scripts (where you might have used Perl in the past)

I never learned perl, but nearly every time I try to do something complex in bash (more than a dozen lines or so containing control flow) I regret it and end up rewriting it in python. Then the next time comes around and I go "I really should get better at shell scripting, I'll do that," and bash my head against the wall for a while then end up going back to python again.


> bash my head against the wall for a while

I'm similar with shell scripts - awkward at best. The longest one I've written is only a few hundred lines, but it was tough to write and test. It sits in the middle of a busy build pipeline, and while it does the job, I'm wary of making any changes.

For long-term maintenance, especially for other people that may need to keep it runnning, I'd trade some performance for a higher-level language like Python or JavaScript - the code and logic would be so much simpler.


> Python's core strength is to be a beginner's all-purpose programming language, and not very much else

Wow - I do think that it's a great teaching language although this definition could apply to Java, JavaScript, PHP and others. But "and not very much else" seems to me to be a crazy limited view of Python.

Many of the worlds most popular websites (including the one I work for) use Python successfully in production.

It's also now become the standard language (replacing R) for doing data science and machine learning.

I'm not in the science community, but my understanding is that is used heavily in scientific programming as well.

I'm sure there are plenty of other use cases I haven't mentioned here.


Just because a tool can be used for a given case does not mean it should be used.

Python has a lot of drawbacks which make it a poor choice for use in production.

Duck typing means many classes of errors which could be caught by a compiler will make it into production, necessitating much more rigorous testing process (i.e. developer time)

Dependence on the system python interpreter and poor dependency management solutions make environment encapsulation at runtime a more serious problem in Python than other languages.

Python performs poorly and uses lots of memory compared to other languages, making it an expensive choice to build an application on top of.

Python's USP is that it has a shallow learning curve, which is why it has uptake in domains like science and ML which are populated by "non-programmers", but virtually every other property of the language makes it a poor choice for serious engineering work.


> and not very much else

Companies whose success relied writing v1 in Python and still using Python

- YouTube

- Instagram

- Pinterest

- Dropbox


That's appeal to authority.

The real question would be: would the people who wrote these apps and especially the people who maintain them, write these apps in Python if they had to do it again?


> Python's core strength is to be a beginner's all-purpose programming language, and not very much else.

I think you'll get a lot of pushback for this sentence; not the first clause, but the second, since it contradicts the lived experiences of many.

But all sentiments come from somewhere. I'd be interested to understand how you arrived at that position?


Yeah sorry to Pythonistas. That wasn't good wording at all; what I meant was what language I'd personally consider for what kind of projects.


It's gonna be an unpopular opinion here I suppose, but I agree with you. The only times I ever find myself reaching for python are one-off scripts, and when I have a simple problem that can be solved in < 50 loc with some combination of pandas + numpy + matplotlib. It's a good language for gluing together libraries, but I would never pick it for anything involving "actual" programming.


That's fair. Every language has different degrees of suitability to different problem domains.


RPython is relevant. He does mention PyPy, which uses RPython, but I am surprised that he does not mention RPython itself -- a restricted subset of the Python language used for similar purposes to the ones that he describes.


RPython is so restricted it prevents you from using most of the standard library.


I was interested a while ago (2013...) in this question, whether RPython might be an interesting general language by itself.

I asked about that in the pypy-dev mailing list. Unfortunately I cannot find a link to that thread, but here are some answers (from Armin Rigo and Carl Friedrich Bolz):

> we (= the general PyPy developers) are not really interested in this direction [...]. There is notably the fact that RPython is not meant as a general language used to produce stand-alone C libraries. It's certainly possible to make one in theory, but cumbersome.

> One of the reasons why the RPython project does not care much about this use case anymore is that we had a way to make CPython extensions. It was called the "extension compiler". It turned out to be a) extremely annoying to implement b) not very useful in practice. Thus a totally frustrating experience overall.

> The reasons for slowness was mainly: when compiling RPython code to be a C extension module, reference counting is used to reclaim garbage. That is extremely slow. Therefore the modules never got the performance anybody expected.

> When people look at RPython, an obvious feature is that it is syntactically identical to Python. "RPython must be an easy language, given that it has got the syntax of Python, which is easy". This is a common misconception. In fact, pleasing the automatic type inference process can be difficult. It requires the programmer keeping in his head the global types of his whole program, and carefully writing code according to these implicit types. The process is much harder for newcomers, which don't have any written-down example to learn how to manipulate the types --- precisely because they are implicit.

> So this is the reason we are recommending against RPython now (and for many years now). Anybody who thinks RPython is as easy as Python is someone who will happily try out RPython and be burned alive by it.

Edit: I found one link to the thread: https://mail.python.org/pipermail/pypy-dev/2013-June/011503....

Edit: I also found a (closed, opinion based, obviously...) related StackOverflow question: https://stackoverflow.com/questions/17134479/why-do-people-s...

From the FAQ (https://rpython.readthedocs.io/en/latest/faq.html):

> First and foremost, RPython is a language designed for writing interpreters. It is a restricted subset of Python. If your program is not an interpreter but tries to do “real things”, like use any part of the standard Python library or any 3rd-party library, then it is not RPython to start with. You should only look at RPython if you try to write your own interpreter.

A related project (probably abandoned since a long time): https://code.google.com/archive/p/rpythonic/


The typing issue should have changed to the better due to type annotations, right?


No, RPython doesn't support type annotations; it's a subset of Python 2.7. You basically have to write Java or Haskell but your types are globally checked and you have to guess at what will compile.


Give up (at least temporarily) everything listed - eval(), locals(), REPL etc and it still is going to be extremely useful for, arguably, most of the real use cases. E.g. I personally haven't ever needed any of these. REPL happens to be handy occasionally to quick-check something but I can hardly imagine it being used in production, it only seems useful in development time and for education. eval() seems a big red flag everybody should avoid at all cost.

Please do your best to support all the latest syntax though. Lack of support for some standard functions particularly hard to implement is Ok, lack of support for syntactic elements (e.g. type hints) is worse and, usually, easier to fix.


namedtuple and dataclasses both use eval(), so if you want to give up eval(), you'd need to give them up, too.


Interesting they use eval(), I wonder if their implementations could be changed to not use eval(). I guess in the end you could just use actual classes for these cases, but I sure do like namedtuple and dataclasses.


My bad: it's exec(), not eval(). And I wrote dataclasses, so you'd think I'd know better!

The issue is that it's dynamically generating methods. There has to be a way to get dynamically created code into python for this use case.


REPL-centric development is something that has caught on a lot lately and is something that I find myself gravitating towards more and more. For 'production' the eval() might be able to be cast off, but for development the REPL is indispensable - if not a first class requirement.


Sure, but you can develop on a local instance before compiling to WebAssembly for production.


Well, I just wanted to say that it is definitely possible for native-compiled languages to do things like that, mostly by just bundling up the compiler itself. (Not sure about wasm, but looks like the article is a bit more generic than just about wasm.)

One great example is SBCL, a Common Lisp implementation which only has an compiler, but provides REPL, `eval`, and `compile` functions as per the CL spec.

For the question about the core of Python, I would consider that everything not tied to the external environment as Python. So `locals`, `eval`, `compile` must be provided to be called as a full-fledged Python. On the other hand, I won't include all of the stdlib functions in the definition of Python, like handling the file system or environment variables.

Think about node and the web browser - both are JS runtimes and they share the core language (like `this` or `with`, which provides similar capabilities of `locals`, and `eval`), but provide different stdlibs based on the environment. (Node provides web servers, file server interactions, etc... and the web browser provides APIs to manipulate the DOM, attach event handlers, etc...)


> which only has an compiler

Nitpick: SBCL ships with an interpreter, but yes, it is turned off by default in favor of the native code compiler. See http://www.sbcl.org/manual/#Interpreter for details.


I'm no expert here but it seems like porting a duck-typed languages like python to webassembly is hard feat.

Arbitrary evaluation of source code at runtime, seems to me, exactly like the type of thing webassembly wants prevent. This limits a lot of the flexibility of a language like python (repl, eval, etc)

I feel like CPython isn't so much the problem as third party libraries that people have come to rely on that, under the hood, rely on architecture dependent code (numpy, scipy, psycopg, etc, etc). I would imaging that porting these is impossible.

I think a repl is fundamental to python more than any other language. I wouldn't say that about something else, but python, yes.


While juggling ideas for a web replacement a while back, I came across python2wasm[1], part of the pure python compiler infrastructure. Not complete, but interesting and well worth taking a look at.

1. https://bitbucket.org/windel/ppci/src/default/ppci/lang/pyth...


Boo is a Python-like language for .NET that took a swing at answering this question:

https://boo-language.github.io

Boo's answer is "just syntax" -- libraries are .NET assemblies.


Oh Man, I totally forgot about Boo. I remember looking at it years ago. This is just the thing I didn't know I was looking for. Thanks


MicroPython seems to have done a fairly good job of deciding what’s important in a python subset.


Good question about the value of a REPL to Python. My first experience with a REPL was the SQL*Plus tool from Oracle. I don't remember it being called a REPL. But, as I read about Common Lisp's REPL, I realized that I had been using one.

Early in my experience with Oracle's DBMS, I found the REPL handy to work out ideas. But a REPL is essentially a single line editor from which you can evaluate expressions/statements. As my ideas grew, the REPL became more of a hindrance. Having a full screen editor became more necessary.

Nowadays I work out all ideas within an IDE, firing up the REPL only for quick data checks. So my answer is that a REPL is helpful but not necessary.


> But a REPL is essentially a single line editor from which you can evaluate expressions/statements.

It is not always single line, look at IDLE or even Firefox supports multiline REPL capabilities although it still needs some improvements (like not rerunning var declaration statements).


Thanks. Multiline certainly raises the value of a REPL.


I think so too, though I guess single line makes more sense to an extent, there's a lot more that could be done with REPLs, like I'm surprised you can't just click on a previously submitted segment of code to be auto copied in the input text field (in the case of GUI based REPLs).


CPython C API is one the main culprit in making Python hard to implement and optimize.

As some commenters said, I don't think Python without access to C/C++ projects would be worth right now giving how sloe CPython is. About 100% of machine learning, data science and general science Python code relies on C/C++ extensions.

Current CPython API and the need to keep compatibility reduces the capabilities of improving CPython.

Look at PEP-620 (https://www.python.org/dev/peps/pep-0620/) for more info on how python developers are addressing this.


> science Python code relies on C/C++ extensions

Don't forget Fortran. A good portion of numpy is written in Fortran.


scipy, as far as I am aware. but that not change your stated fact on the role of Fortran in Python.


PyObject graphs that represent dynamic variables through pseudo OOP legacy C code. Check it out, it's a total nightmare of downcasting pointers. It amazes me in the 29+ years it hasn't been replaced by a C++ alternative that could retain compatability.

I think AST interpreted languages are becoming something of the past. Yes I'm aware there's bytecode in CPython. I like the idea of IL's as target platforms rather than 100 leaky interpreters that you have to manually bind your C code with if you want support ti support any one of them.


This is extraordinarily normal C code. I’ve worked with several large code bases that look like CPython code. Heck, a lot of Linux looks like that.

It really speaks more to the failure of C++ than anything about CPython. C++ is so much more than C with objects, and all that extra is what folks don’t want.


> This is extraordinarily normal C code. I’ve worked with several large code bases that look like CPython code. Heck, a lot of Linux looks like that.

Manually-implemented dispatch / vtbls you mean?


Yes, a lot of C libraries reinvent dynamic dispatch with vtables or some other way. Sometimes they reinvent exceptions as well with longjmp.


> reinvent exceptions as well with longjmp.

What a bizarre way to phrase it. Shouldn't it be the other way round? Exceptions are just a "reinvention" of longjmp under a different name (yet identically noxious to program readability).


Exceptions are a programming flow control concept.

longjump is a specific family of processor op codes.

This is the same idea that while loops are implemented using goto.


Well then, I agree with jnwatson.


Other C vtable implementations

- GTK (Gnome / GIMP)

- COM (Windows - though it is more intervined with C++)

- Quake / Doom and other old game engines though nobody uses C anymore today


I never understood the argument of C++ being too big of a language. Code bases can enforce restrictions on what can be used.


Can you give a practical example of how a codebase can prevent restrict typical 'here's a bazooka to shoot yourself in the foot' C++ code?


> I think AST interpreted languages are becoming something of the past. Yes I'm aware there's bytecode in CPython.

Then why are you mentioning AST interpreters? Python is not an AST interpreter.


Yeah I didn't realize that. What I should've said was language specific run-time environments are becoming something of the past.


Truffle is built around AST interpretation so that’s not going to happen any time soon.


> It might make sense to develop a compiler that translates Python code directly to WebAssembly and sacrifice some compatibility for performance.

This seems to be the author's preferred approach, but I'm not sure it would provide significant performance improvements over simply compiling the existing Python interpreter to WebAssembly.

Compiling dynamically-typed Python to statically-typed WebAssembly would be similar to how Nuitka [0] compiles Python to C. This compiled code may be a little faster than interpreted Python, but it's still going to be quite slow - much slower in a browser than JIT-compiled JavaScript.

AFAIK the ideal way to maximize the performance of Python in the browser would be to JIT-compile Python code to WebAssembly. Are WebAssembly implementations able to support that?

[0] http://nuitka.net/


> It's no secret that I want a Python implementation for WebAssembly. [...] with the fact that both iOS and Android support running JavaScript as part of an app it would also get Python on to mobile.

WebAssembly is not JavaScript. So I assume iOS and Android also support running WebAssembly as part of an app?


WebAssembly is compiled and executed by JavaScript though. You can create a runtime with no JS on the other hand probably for mobile. The beauty of WASM is eventually you could have it on all platforms as a way to make apps and expose API libraries to said runtime and now anyone can make apps in any language to run on WASM. So long as said language can be compiled to WASM.

You also need JS to interop back from WASM.


I suspect that Python's use in education would be an early casualty if the interactive REPL were removed.


> It's no secret that I want a Python implementation for WebAssembly. It would not only get Python into the browser, but with the fact that both iOS and Android support running JavaScript as part of an app it would also get Python on to mobile. That all excites me.

If the main goal is to be able to run Python code in a browser, another option is Brython[0]. According to the FAQ[1] it should work in all modern browsers, including mobile.

[0] https://brython.info/

[1] https://brython.info/static_doc/en/faq.html


Here's a great summary of the existing options https://stackoverflow.com/a/58684358/308851


Python for WebAssembly already exists and runs fine on iOS. https://holzschu.github.io/a-Shell_iOS/


Pretty neat. I didn't know that the app store now allows stuff like this. I thought they had banned programming interfaces.


Pythonista is still there.


Awesome blog. I enjoyed reading your articles. This is truly a great read for me. I have bookmarked it and I am looking forward to reading new articles. Keep up the good work!

<a href="https://360digitmg.com/india/data-science-using-python-and-r... science course in guntur</a>


A possible approach is to use Cython to convert Python code to C code and then compile that code to WebAssembly. I would not be surprised if there is a tool for that already


from my experience digging through Cython output: unless you type-annotate EVERYTHING C-style, Cython just uses the CPython C-API (PyObject_New, PY_INCREF, etc). so you'd have to ship all of CPython anyway... as mentioned elsewhere in the thread, that's too big for most websites


Some of the confusion is that people often blur languages and runtimes together. I started work on and then abandoned work on a Node implementation in Java backed by Nashorn. These distinctions become really clear when there's a mismatch like that. That, and some of the cpython issues would have been handled by pragmas in other languages (__slots__, for one), but got baked into the runtime. The unwritten zen of python is "bolt it on."


Or just use transcrypt to transcode your Python into native JavaScript.

http://www.transcrypt.org/


I came here to say this. Transcrypt takes the approach that there are already many good JavaScript libraries that exist for web development, and rather than try and replace those, it embraces them. What it allows you to do though is to write code that uses those libraries in Python, and then transpile that into JS for deployment. It has allowed me to write React + MaterialUI apps using 99% Python, without taking a performance hit in the browser.



It's Python if it can run popular libraries, such as scipy. That's already an achievement to get working outside CPython.


> But when thinking about the daunting task of creating a new implementation of Python, my brain also began asking the question of what exactly is Python?

The Python documentation is pretty clear about (1) What is the Python language, (2) What is the Python stdlib (“Python” is either the first or the combination of the two), and (3) what are CPython implementation quirks. If you have all of #1, you have a Python implementation, #1 + #2 is, I guess, a Python distribution.

And, yes, most of the things the article questions are part of #1, and definitively part of the essentials of the language.




Applications are open for YC Summer 2021

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: