Hacker News new | comments | ask | show | jobs | submit login
A Python Interpreter Written in Rust (github.com)
430 points by rch 17 days ago | hide | past | web | favorite | 194 comments



This is wonderful. This could become the best way to move Python projects to Rust: initially just run on the RustPython interpreter, but then optimize low level routines in Rust. In 15 years I wouldn't be surprised if this or something like it surpasses CPython in popularity.

Still, no discussion about Python implementations is complete without some mention of the infamous GIL (global interpreter lock). :-) CPython has it, Pypy doesn't (I think) (EDIT: yes it does), Jython doesn't, etc. What is the GIL plan for RustPython?


The GIL comes with great convenience as you don't have to worry about a whole host of data races. It's no silver bullet but it's mighty convenient if you only do some multi-threading like in a web-server.

Many libraries are not prepared for the disappearance of the GIL and while it's not a general problem for python per se it will be a great amount of work to make every library compatible with GILless python.

Therefore I think that you must always provide an option for the GIL that is enabled by default in order to provide backward compatibility.


> The GIL comes with great convenience as you don't have to worry about a whole host of data races.

This is true, but it doesn't mean that a GIL-less Python would need to have an option for "enable GIL so I don't have to worry about data races". It means that a GIL-less Python would have to ensure that there are no data races, without having to have a GIL.

> it will be a great amount of work to make every library compatible with GILless python

No, it won't; libraries won't have to change at all. The current interpreter makes a guarantee that those libraries rely on: "you don't have to worry about data races". A GIL-less interpreter would have to make the same guarantee; it just wouldn't have to have a GIL to do it. That requirement is what makes a GIL-less Python interpreter hard to do.


Right now you have one mutex for everything (the GIL itself) and everything else doesn't need locking. In order to achieve similar convenience without the GIL you would have to trade this for one mutex for every single data structure. Because the data structures in python are so flexible, every single variable needs its own mutex then. Locking every single variable access would be enormously costly.

Other languages achieve a good compromise by clustering data structures into fewer parts with only a handful of mutexes that are locked while other threads work on different datasets. This is usually done manually and with great care as it is the heart of both safety and performance. I don't know if there is an automatic solution to this problem that is compatible with the way python programs are structured.

The libraries basically assume that, while you call them, nothing else changes. In order to ensure that you need to lock everything down. Because you don't know what these libraries do and what data they access it needs to be everything (like it is today). It should be possible to only lock the GIL when such a library is called, so there should be kind of a middle way forward.


> Right now you have one mutex for everything (the GIL itself) and everything else doesn't need locking.

If this were true, all of the explicit locking mechanisms in Python's threading module would be pointless. But in fact the GIL's "mutex" is quite a bit more limited than you are saying. It does not prevent all concurrent code from running. It only prevents Python bytecode from running concurrently in more than one thread. But the GIL allows switching between threads in between individual bytecodes, and "one Python bytecode" does not correspond to "one Python statement that performs an operation that you want to be atomic"; plenty of Python expressions and statements are implemented by multiple bytecodes, so it is perfectly possible for multiple threads executing concurrently to modify the same data structures with such statements, creating race conditions if explicit locking mechanisms are not used to prevent it. That's why Python's standard library provides such explicit mechanisms.


> If this were true, all of the explicit locking mechanisms in Python's threading module would be pointless.

Not true. You can have serialized access to the same data structure that still have data race.

But as long as each Python process doesn't keep its local copies for those shared data structures, like free lists, no explicit locking is required if GIL is presented.


> You can have serialized access to the same data structure that still have data race

How?

> as long as each Python process doesn't keep its local copies for those shared data structures, like free lists, no explicit locking is required if GIL is presented.

I have no idea what you're talking about. Different Python processes each have their own GIL, and they don't share data at all (except by explicit mechanisms like communicating through sockets or pipes). Different Python threads share the GIL for their interpreter process, and if each thread doesn't keep its own local copy of data, there is explicit locking required if you don't want data races.


> How?

Simplest scenario, the read-increment-write cycle with 2 threads. Even with a mutex, it is still possible to have data race, if the lock is on per operation level.

For the second part, yep, it is a mistake, not processes, but threads.

With GIL, the thread is given the permission to operate on certain interpreter-related data-structures, like reference counts, or free_list like in PyIntObject. What I mean the active thread is free to modify those data structures without fear of data races, and there is no explicit locking required, if it doesn't hold its own copies of those interpreter internal states.

But GIL can only guard the interpreter's own states, not any user program's states. And yes, explicit locking for operating on your own data is still required.

https://docs.python.org/3/c-api/init.html#thread-state-and-t...


> Simplest scenario, the read-increment-write cycle with 2 threads. Even with a mutex, it is still possible to have data race, if the lock is on per operation level.

What you're describing is not "serialized access with a data race"; it's "multi-thread access that you didn't explicitly control properly".

> For the second part, yep, it is a mistake, not processes, but threads.

Ok, that clarifies things.

> the active thread is free to modify those data structures without fear of data races, and there is no explicit locking required, if it doesn't hold its own copies of those interpreter internal states.

I'm not sure I see why a thread would want to hold copies of those interpreter internal states, since if it did the issue would not be modifying them properly but having the local copies get out of sync with the interpreter's copies, since other threads can also mutate the latter.


The problem with GIL is that it's a mutex that you don't control. So you can't use it to do atomic updates, if that involves more than one low-level operation that is atomic.

So in practice I don't think it simplifies things all that much. If anything, it creates a false sense of security - first developers get used to the fact that they can just assign to variables without synchronization, and then they forget that they still need to synchronize when they need to assign to more than one atomically.


I'm pretty rusty on Python but my impression wasn't that the GIL meant just "no data races" but that it also meant "data can't change out from under you in the middle of executing a statement". You could write a Python interpreter that ensured no data races and yet still had divergent behavior by allowing shared data to be mutated by another thread halfway through executing a statement.


> my impression wasn't that the GIL meant just "no data races" but that it also meant "data can't change out from under you in the middle of executing a statement".

That's not quite what the GIL guarantees. It guarantees that data can't change out from under you in the middle of executing a bytecode. But many Python statements (and expressions) do not correspond to single bytecodes.


You don’t have to worry about data races at the expense of parallelism that is slower than single thread execution.



Jython is however stuck with Python 2.x compatibility only as far as I know (I'd be happy to be proven wrong).


Jython 3.x development is "in progress":

https://wiki.python.org/jython/JythonFaq/GeneralInfo

Last commit was over a year ago:

https://github.com/jython/jython3


Writing a concurrent runtime system including garbage collector is a serious effort, and that's why all those other versions of Python don't support it and are stuck with a GIL. Hence, I highly doubt that this Rust version of Python has gotten rid of the GIL.

I'd love to see a better separation of language and VMs. I think it's a bit sad that a language designer has to either implement their runtime system from scratch, or has to run it on top of a VM that was designed for another language (Java in the case of Jython).

Therefore, the thing I'm looking forward to most is a concurrent, generic and portable VM written in Rust.


I think there's an analogy between the two issues you brought up.

1. A concurrent garbage collector is 10x more work than a single-threaded one. People often don't realize this.

2. A language-independent VM is 10x more work than a VM for a given language. People often don't realize this this.

In other words, VMs are tightly coupled to the language they implement, unless you make heroic efforts to ensure otherwise.

WebAssembly is a good example of #2. I think the team is doing a great job, but they are inevitably caught between the constraints of different languages (GC, exceptions, etc.)

The recent submission The Early History of F# sheds some light on this with respect to the CLR:

https://news.ycombinator.com/item?id=18874796

An outreach project called “Project 7” was initiated: the aim was to bring seven commercial languages and seven academic languages to target Lightning at launch. While in some ways this was a marketing activity, there was also serious belief and intent. For help with defining the academic languages James Plamondon turned to Microsoft Research (MSR).

I think this is the only way to design a language-independent VM -- port a whole bunch of languages to it. And there are probably 4 or 5 companies in the world with the resources to do this.

I've seen some VM designs that aim to be generic, but since they were never tested, the authors are mistaken about the range of languages they could efficiently support.

Of course, you can always make a language run on a given VM, but making it run efficiently is the main problem.


> I'd love to see a better separation of language and VMs. I think it's a bit sad that a language designer has to either implement their runtime system from scratch, or has to run it on top of a VM that was designed for another language (Java in the case of Jython).

Wasn't Perl 6's Parrot kind of meant to fulfil that role?


> Wasn't Perl 6's Parrot kind of meant to fulfil that role?

Yes, that was an original project goal. You can see this as far back as Larry's State of the Onion 2003:

https://www.perl.com/pub/2003/07/16/soto2003.html/

... the "Parrot: Some Assembly Required" article written by Simon Cozens in September 2001:

https://www.perl.com/pub/2001/09/18/parrot.html/

... or, if you trust Git commits rather than articles which could have been edited in the meantime, the same article revised as introductory docs in the Parrot repository in December 2001:

https://github.com/parrot/parrot/blob/9bc8687beb5180e4cc8971...


I stand corrected.

I'd say not originally. But over time, as the Perl 6 project got delayed in the 2000's, it was decided that Parrot would be a runtime for all scripting languages. Which in turn meant it couldn't cater well enough for any. Which led to its demise.


Having recently been down this particular rabbit hole myself; I just want to note that there are other possible strategies, a GIL is not the only alternative to a fully concurrent runtime.

My own baby, Snigl [0]; doesn't even support preemptive threads, only cooperative multitasking; with a twist, since blocking operations are delegated to a background thread pool while yielding when multitasking.

[0] https://gitlab.com/sifoo/snigl#tasks


That is a nice strategy but it only allows IO to be parallel, and leaves the CPU sequential. Programs may be concurrent (because of the cooperative multitasking) but not parallel (because there is only one CPU thread). Users may find it disappointing that they can only use a single core.

Also, keep in mind that cooperative multitasking may cause unexpected high latencies, which is unfortunate e.g. in GUI applications and web servers; this is a result of queueing theory, and an example is given here: https://www.johndcook.com/blog/2008/10/21/what-happens-when-...

By the way, on POSIX systems there is a way to schedule background IO operations without even using threads: http://man7.org/linux/man-pages/man7/aio.7.html


I am aware, it's simply the most portable universal solution to the problem that I could think of. AIO comes with its own share of issues; Unix (and thus Posix) are pretty much dead; and it's not universal, I can't use it to open a database connection unless the database has support built-in.

From what I know, cooperative multitasking suffers significantly less from unpredictable performance than preemptive threading. The biggest source of uncertainty in Snigl is the preemptive IO loop.

The things is that I really don't feel like writing a concurrent runtime; been there, done that. I'm planning something along the lines of Erlang's processes and channels based on separate interpreters for those scenarios.


Ok. A problem with separate processes is that you have to serialize your data when passing messages. This is a pity because functional data structures + structural sharing can be very efficient. For example, I can't imagine how someone would implement a high performance database without structural sharing.


Processes by name, from Erlang.

They're implemented (in my mind so far) as preemptive threads, one per interpreter; which makes them slightly more heavy-weight than Erlang's NxM and a nice complement to purely cooperative multitasking.


Sounds interesting. I'm not familiar with Erlang, and I still wonder how shared memory is managed.


It's one of those languages that does things differently to solve actual issues, not to check boxes.

From my limited experience, Erlang doesn't share data between processes; you throw it over the fence by sending to the process inbox, which is where the locking takes place.

Still, shuffling data between OS threads is an easier problem to solve than serializing between OS processes.


> Therefore, the thing I'm looking forward to most is a concurrent, generic and portable VM written in Rust.

There is an effort to get the BEAM ported to rust, which would be very exciting.


Do you have a link to this info?


It might have been just one guy, but it was here on hn.


I've seen this one before here: https://github.com/kvakvs/ErlangRT


I am working on something similar but from a slightly different direction. The project is mainly focusing on compiler infrastructure for now, but I have a reference interpreter I use for validation. The short term goal is to make it able to run the erlang `compile` module. https://github.com/hansihe/core_erlang


If we are ever going to get rid of GIL, we need to get rid of Python's C extensions all together.

It is not a real Python implementation if not compatible with C extension, it is just embedded DSL that has Python flavor syntax.


Compatibility with C extensions seems to be the most difficult thing for an alternative implementation to achieve. PyPy struggled with this for a long time, and IIRC extension compatibility also caused the failure of Dropbox's Pyston.

Is there really no situation in which an alternative implementation that only supports "pure Python" would be useful?


> Is there really no situation in which an alternative implementation that only supports "pure Python" would be useful?

This really depends on your definition of 'being useful'. Jython is useful in a sense, it is being used in many Big Data solutions as a way to embed Python as DSL/UDF, like Pig/Hive, etc.

However, if without support for C extensions, it is not really a Python implementation, in a sense, I can't run a python script I just gripped from internet using the so-called 'alternative' implementation. So if the point of being useful is to be a replacement, then sadly, the answer is no, it is an everything-or-nothing situation.


In the old days, before Eclipse had a scratchpad area or Java finally got a shell, it was useful to me as means to explore some Java APIs, as I did not want to spend time with either Beanshell or Grovvy since I know Python superficially since 1.6.


Indeed, I'm very excited about this.

Rust + Python seems a natural combinaison to me, and being able to have one single dev env (and maybe in the end, one single deployment mechanism) to do both is a killer feature.

And actually, I think having Python written in Rust would provide some other very nice properties:

- limit the number of bug you can introduce in the implementation because of the rust safety nets;

- can still expose a C compatible ABI and hence be compatible with existing extensions;

- the rust toolchain being awesome, it may inspire people to make it easy to compile a python program. Right now I use nuikta, which is great, but has to convert to C then compile, which make it a complex toolchain.


What advantages would it have over using pypy with cffi as is currently done?


Do you mean, appart from the 3 points I just mentioned ?


> This is wonderful. This could become the best way to move Python projects to Rust: initially just run on the RustPython interpreter, but then optimize low level routines in Rust. In 15 years I wouldn't be surprised if this or something like it surpasses CPython in popularity.

What you are describing is simply a JIT compiler. Maybe are you suggesting to rewrite PyPy (its C part) in Rust?


Actually I was referring to manual translation of Python code to Rust. My experience with PyPy has been rather bad, unfortunately (it runs my code at less than half the speed of CPython) and I figure human translation should be a lot more effective than JIT optimization.


Maybe you should send your code to the PyPy team. I bet they will look into it and see how they could improve their JIT compiler. Last time I checked the project, they had a full benchmark running for each release to spot regressions and to track improvements


"What you are describing is simply a JIT compiler."

The user is describing the opposite of a JIT compiler: a gradual rewrite of Python apps in Rust to feed into an ahead-of-time, highly-optimizing compiler. A JIT compiler would do quick compiles of Python code while it's running. The performance, reliability, and security capabilities of JIT vs AOT vary considerably with context. For predictability and security, I avoid JIT's wherever possible in favor of AOT's.


Agreed, this looks like a great project!

Grumpy was supposed to accomplish the same for Python->Go, and although now abandoned, probably holds some lessons in how to design a platform to help Python projects get Rusted.

Grumpy compiled python code to fairly unreadable Go, and then quickly compiled the result. One effect of this is that a programmer could theoretically refactor the resulting Go code gradually.


Grumpy was actually forked and is still seeing development here: https://github.com/grumpyhome/grumpy


I think rustpython tries to generate the same bytecode as CPython. Not the same road.


I would personally try to move away from python at this point for greenfield projects. The GIL is so baked into the language, if you removed it a bunch of current python code will probably break in subtle ways.

Modern languages need proper multithreading support, static types and fast compile speeds. Use golang, use kotlin, use dart, use anything but python & javascript.


It's basically impossible to implement a CPython-compatible language without a GIL (or you lose single thread performance by using very fine-grained atomics/locking). Python has very specific multithreading semantics that are a function of the CPython bytecode and the GIL, and programs rely on this.


Other than refcounting (which is not a part of the Python language spec - it even specifically says that conforming code shouldn't rely on it), what other semantics did you have in mind?


You can list.append from 2 threads without worrying about crashing, although I would not recommend it.

It's possible to do without the GIL, but up to now, it's been a damn to way of doing that.


I don't think that's guaranteed, though.


loeg says, "programs rely on this".

Guarantee or not, it constrains whether something is usable as a drop-in replacement interpreter, especially if people can't tell which programs will break, and doubly so if the breakage is a subtle data corruption race that doesn't show up in tests.


PyPy has the GIL.


Oh right--I was thinking of the STM (software transactional memory) version of PyPy, which is still very experimental.


I thought they got rid of it. News to me.


I want it to have the GIL, because I want it to maintain compat with C extensions.

We already have a plan to bypass the GIL: multi interpreters.

Having an implementation in Rust may make future improvement to Python easier, so it's better to have something exactly similar first, then start to hack it.


By multi interpreters do you mean several interpreter processes without shared memory? (or only limited memory sharing)

CPython can do that too, but this isn't really multi-threading, and it only bypasses the GIL in a very trivial sense.

But yeah, keeping the GIL is probably the only reasonable way to go if you want compatibility with existing extensions.


Yes. Actually you get one gil per interpretter, and can spawn your thread in sub interpretter.

Cpython could do it, but currently can't provide much since the api to do it is only accessible from c.


Doesn't CPython do this with the multiprocessing module?

https://docs.python.org/2/library/multiprocessing.html


No, this spawns several process, while sub interpretters allow you to have several of them in one single process, making data sharing much cheaper.


Neat. Was able to clone the repo, run cargo run, and drop into a python shell. Doesn't seem like can do much right now, but I really like the idea.

  >>>>> a = [1,2,3]
  >>>>> a[2:]
  [3]
  >>>>> a[1:]
  [2, 3]
  >>>>> fh = open('~/.ssh/id_rsa.pub', 'r')
  thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: RefCell { value: [PyObj instance] }', src/libcore/result.rs:999:5
  note: Run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
What would be really cool if this could one day be like Nuitka- but in rust. Write in python, compile into Rust. Maybe even support inline Rust like cPython supports inline C.


While your code was valid, open('~/.ssh/id_rsa.pub'...) won't work on any Python other interpreter, you need to expand ~ from the path. For example you can use:

  open(os.path.expanduser('~/.ssh/id_rsa.pub'), 'r')


Does this line not cause the above stacktrace?


> cPython supports inline C

First time I am hearing this. Can you share an example?


https://cffi.readthedocs.io/en/latest/ & https://docs.python.org/3/extending/extending.html

There also is https://github.com/rochacbruno/rust-python-example which is something I want to look into.

As for an example, I'm using Snappy right now, so here you go: https://github.com/andrix/python-snappy/blob/master/snappy/s... & https://github.com/andrix/python-snappy/blob/master/snappy/s...

It still requires you to compile the C part- which is why sometimes you need GCC when you're doing a pip install.


I’ve done a lot of interfacing with C and C++. pybind11 [0] has been the easiest and most effective for me to use. It targets C++, but it’s easy enough to wrap C code with it. Cython brings along a lot more bookkeeping code and overhead in my experience. cffi isn’t bad, but it’s not as flexible/expressive.

[0] https://github.com/pybind/pybind11


CFFI has the huge advantage that you don't need to link against libpython, which in turn allows you to have a single wheel for multiple Python versions.


Cython supports inline C, CPython does not.

Maybe GP got confused?


It's a naive interpreter. The source is parsed into a tree and then flattened into byte code. All values are stored in PyObjectPayload structures. Each operation has a function. Here's "hex":

    fn builtin_hex(vm: &mut VirtualMachine, args: PyFuncArgs) -> PyResult {
        arg_check!(vm, args, required = [(number, Some(vm.ctx.int_type()))]);

        let n = objint::get_value(number);
        let s = if n.is_negative() {
            format!("-0x{:x}", n.abs())
        } else {
            format!("0x{:x}", n)
        };

        Ok(vm.new_str(s))
    }
There's a big dispatch table, generated at compile time, and an interpreter loop. Just like you'd expect. It's useful if you happen to need a Python implementation to embed in something and want something cleaner than CPython. Probably slower, though.


Serious question with no irony at all. But why? How is this useful?


At least 1/3 of Python's CVEs could have been prevented by using a memory safe language like Rust: https://www.cvedetails.com/product/18230/Python-Python.html?...


- You eliminate a whole class of errors from the Python implementation, making it not only safer, but easier to contribute too. Indeed contributing as a beginer on a big C project is super hard, and need a lot more supervision than in rust.

- You leverage higher level concepts. This makes it easier to debug, easier to read, easier to maintaing, and all in all more productive.

- You get a vastly better toolchain. So of course again more productivity, but also potentially the possibility to include external dependancies or splitting the project in several parts. You can do that in C, but it's hell of a lot more work. Cargo in rust has a stellar reputation.

- Instead of providing Python, you can just provide cargo. Suddendly, your dev plateform is Rust AND Python. Together. With C it's very hard to do, but with cargo, it's possible to abstract all that and make them work like a singular entity. The possibilities are amazing.

- You prepare for the future. C is a legacy language. We use it because we don't have anything better now, tons of existing code and documentation, plus experienced devs. But 20 years from now, you will wish Python is not written in C.

- Free webassembly: being able to emit webassembly out of the box is going to get more and more important, as everybody wants it to become the lingua franca. Rust offers this for free. But even as importantly, it may help us to use webassembly dependancies into our Python project.


Serious Question: does everything need to be useful?


I think everything in the Universe is useful for some purpose. Can you name anything that is completely useless?


Apparently the word "useless" itself, from what you're saying


If word "useless" is actually useless then the word "useless" cannot be useless


Oops, Russell's paradox strikes again


Most of the universe is completely useless. And I mean unbelievably vast quantities of space and matter.


Space and time can be quantum error correcting code. Also the Universe might not be stable for us to live if the matter and space is missing.

https://www.quantamagazine.org/how-space-and-time-could-be-a...


>Also the Universe might not be stable for us to live if the matter and space is missing

Let's put it this way: a million galaxies could blow up entirely to bits, and as far as we are concerned it wouldn't even make any difference.


It's probably useless for us now but a space faring civilization can use them for expanding their civilization :)


It interprets Python - what about that doesn't appear useful to you?


Serious question: why not?


It's not.


If anyone is at FOSDEM this will be presented tomorrow in the rust room. Can't wait!


I found the slides and info at [1] but does anyone have a recording of this presentation?

[1] https://fosdem.org/2019/schedule/event/rust_python/


The video is now available on the page.


If anybody had any doubt about how welcome the project is...

Since this has been posted to HN, the repo got 8 new PR.


Now this is something I will donate money to…

Python with Rust as its foundation sounds like the best idea ever.

I’m curious to know whether or not it would be possible (or reasonable) to eventually get the same or better performance as CPython.


CPython is really quite slow by design. The reference implementation is meant to be obvious and pretty easy to interoperate with C.

I think a RustPython implementation would be pretty cool. You could definitely take that opportunity to worry about performance more than CPython does while also worrying about interoperability more than PyPy does.

Or I'm missing your point and you're suggesting a drop-in replacement for CPython that supports all the same C-based libraries as CPython does.


PyPy is doing really well on compatibility these days. It can use all the C libraries natively, and a lot of the speed issues there have been resolved.


> It can use all the C libraries natively

Literally all of them, without any issues? I'm working on implementing support for Ruby's C extensions in an alternative implementation and it's a right slog.


PyPy dev here. Yes to both - all of them and it was a right slog. If it doesn’t work report an issue. We have a proposal looking for funding to make it fast.


This is wonderful news to me. So broadly speaking I should be able to drop in replace cpython with pypy for my fairly not so special projects?

I have a Django app that does some heavy data serialization and I'm not yet ready to optimize those serializers in another language.

I can't wait to try this out.

Crumb. I didn't realise pypy is on 3.5.3. Loves me my f strings.


We backported fstrings to 3.5. We also have an alpha quality linux 3.6 available on nightly builds


Great! I appreciate you sharing this. I checked the PyPy main page, Downloads page, and Compatibility page and while I didn't look exhaustively, nothing mentioned 3.6 or 3.6 features.


f-strings were "backported" to pypy and work fine.


No, numpy and pandas break with every new release, support for them is constantly lagging behind. Other C libraries too...


PyPy should be releasing soon, if you use cutting edge then you have to use cutting edge PyPy as well


From my limited experience with Rust, having support for C-based libraries shouldn't be too hard to do- as Rust (like Go) has support for embedding C. I'm not sure how portable the solution would be- I would guess you would need to compile the VM with the modules you want baked in.

I think not having support for C modules would hamper long term adoption. I would absolutely love to adopt this for my stuff, but off the top of my head- I use uvloop and confluent-Kafka, both of which are largely written in in C. Moving away from those would be hard-ish.


Calling into and out of C (whether dynamically or statically linked) isn't the hard part; the hard part is implementing the somewhat arbitrary interface against which C extensions are written in a way that is functionally compatible with CPython without losing the advantages of not being CPython.


> CPython is really quite slow by design.

The data structures are slow by design.

The way we use these data structures is quite inefficient.

Having a fast interpreter only about doubles the execution speed in most programs leaving another factor of 50 open for future generations.


Really ? I though Python dicts and lists were actually quite fast for a dynamic language.

Ints and classes can be slow though.

Anything I don't know ?


They are probably the fastest dicts and lists we can imagine today. If you really need lists and dicts for your data, you need them and won't gain much. But if you don't need them and instead could use something simpler, an array of C structs would be much faster than a list of dicts.


I though you were stating that Pythn were slow compared to what it could be, being a dynamic language.

Of course if you compare to static languages it's slow. Of course you can write low level specialized DS. Duh.


Awesome!

No, that was my point basically, although I could imagine something like this shipping with some basic libraries and package support, and then having a similar ecosystem to the current python ecosystem.


> Python with Rust as its foundation sounds like the best idea ever.

Why do you think it's the "best idea ever"? What are the benefits over any other Python implementation?


Python has massive use in a diverse array of fields with lots of educational resources helping beginners. There is and will be for a while lots of Python code. It's increasingly used in business-critical systems, like at Bank of America. That it's built on an unsafe language puts that all at risk of unreliability and security vulnerabilities. Long ago, I wanted to rewrite it in Ada with all safety features on to mitigate that risk while maintaining its convenience. Rust is another safe, systems language with extra measures for temporal errors. So, this project is doing the same thing.

Full security for Python apps would require consideration of each layer of abstraction:

1. User's code in Python.

2. The interpreter and extensions.

3. How these interact.

4. If added for performance or security, any assembly code plus its interactions.

Rewriting Python interpreter in Rust mainly addresses No 2. An example of a method to address all of them would be Abstract, State Machines which can represent simultaneously language semantics, software, and hardware. Tools like Asmeta exist to make them like programming languages. The verification would probably be manual, specialist work. Whereas, Rust's compiler gives you key properties with just annotations for many and working with borrow-checker for a few.


> That it's built on an unsafe language puts that all at risk of unreliability and security vulnerabilities

Has this actually been a problem, though? I'm no lover of python, but tons of people seem to use, for example, Django, without incident.


Now but you need an incredible high level of mastery to work on such a big C project so widely deployed.

Rust will allow to safely invite a broader range of contributors, because there are so many things you don't need to check. This also means a smaller number of required tests, and because Rust uses higher level constructs that C, more productivity in general.

So basically, on the long run, more people, able to do more things.

Besides, on of the goals of the main implementation is to stay simple, which is hard to do in C. For those reasons, and because of the potential for unreliability and security, CPython is quite slow.

We can't optimize it, because it would make it too complex.

But with a rust implementation, one can hope to suddenly be able to apply more optimizations.

It's all theorical of course, but it's a nice hope.


[flagged]


I'll bite - why do you believe this is a bad idea?


He doesn't. He just thinks it's a stupid idea.

There's a mature 30 year old codebase, with huge industry update and tons of major players using it, and some random project to rewrite it in a new fashionable language, that will more likely than not go nowhere (as 99% of such attempts do) and be abandoned when the committers lose interest and recognize how big the full task is.


I'd send you a bottle of wine for saving me the time and effort.


I'm curious to know that why isn't it the case right now? Isn't that one of the selling points of Rust, being blazing fast? Or perhaps just this particular piece of software has been implemented poorly?


Rust is approximately as fast as C. So a straighforward reimplementation of CPython in Rust would be approximately as fast as the original.

Rust makes it easier to write programs that don't leak memory and don't have data races, but it doesn't make them run faster.


Here the optimism would be around removing the GIL and similar stuff, it is not that rust would make the same architecture magically faster but that it could allow a better one.

Also others have noted that speed was not a primary focus of CPython


Rust is definitely not as fast as C, it's most likely not as fast as C++


It's within 5-10% on average, sometimes faster, usually a little slower, very rarely slower by much. The person you're responding to said "approximately". I feel like this is needless pedantry.

Rust also allows you to make architectural decisions in the name of performance that would be completely unmaintainable in C. See: the Servo project.


Further, it should be possible for Rust to eventually reach a state where it's faster than C because of the lack of pointer aliasing. This opens up a whole host of optimization opportunities, allowing it to get closer in performance to Fortran (which is intrinsically the fastest of the bunch).


Indeed, it's available but currently disabled due to a bug in LLVM that can cause incorrect code generation.

https://github.com/rust-lang/rust/issues/54878

This is not the first time it has been disabled due to an LLVM bug.

https://github.com/rust-lang/rust/issues/31681


What makes you say that? Optimized machine code is for the most part optimized machine code, and unsurprisingly benchmarks tend to come down about even. The bunchmarks game currently has them down as

C vs. Rust: 6 wins for C, one draw, 3 wins for Rust

C++ vs. Rust: 5 wins for C++, two draws, 3 wins for Rust

The wins one way or another are also not by particularly large margins.

https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

This may go on to shift marginally in Rust's favor once a soundness bug related to non-aliasing of references is fixed in LLVM and the compiler can safely leverage some guarantees that Rust provides that C and C++ cannot.


Any update on that bug being fixed?


That’s a false dilemma. We’ve had almost three decades to optimize CPython. It will take a while to catch up.


>We’ve had almost three decades to optimize CPython

Assuming that was actively pursued. But if it was, all those other projects (Unladden Swallow, Dropbox's Python project, PyPy, etc, whose intend was exactly to make Python faster, wouldn't have been started).


Some of those lessons have most definitely been used to improve the reference implementation.

It's not remarkably slow as long as you compare apples to apples, that is non-jitted vm-interpreters.

Jits pay a steep complexity- (and hence maintenance) price for their performance that should be taken into account when comparing.

Designing a significantly faster interpreter with comparable features is non-trivial from my experience [0].

https://gitlab.com/sifoo/snigl


It's not, and I'm curious as to why, I even suggested a reason very similar to your answer. I guess people assumed malice. Oh well.


The dilemma klodoph mentions is:

> [Performance] isn't [...] one of the selling points of Rust

or

> this [...] software has been implemented poorly

It sounds like you're maligning Rust (isn't keeping promises) or RustPython (is implemented poorly), and it's easy to read "implemented poorly" as an attack on the implementers.


Well, it's definitely a possibility. I am not trying to attack them, it might be the case, or it might not. I was just trying to get possible reasons, and that's what came to mind. I'm sure there are other reasons, that's why I asked.

I really don't know how slow it is, I've not done benchmarks, but given that Rust is supposed to be efficient, it certainly can't be that slow, unless the implementation is really poor, I guess. I'm not saying it is because I don't know. If anyone has done benchmarks, please do share!

The reason for why I got the idea that it is slow, is that I believed the parent[1], and people have repeatedly claimed that CPython is slow.

[1] "to eventually get the same or better performance as CPython."

I assume this means that it is slower than CPython, and CPython is already extremely slow according to some people even on this page.

Sorry for the confusion. :)


Thanks for the explanation!

One thing to consider is that CPython isn't slow because of the language it's written in, but because of optimizations it isn't doing (namely JIT, I think). Rust can't do the same things any faster than C can, and an early implementation of Python in Rust isn't likely to be much faster than an early implementation in C. Rust has the potential to make certain classes of optimization easier, eventually.


For an example of the huge developer productivity multiplier Rust is for this kind of thing check out what Yorick Peterse has done in his spare time with Inko: https://gitlab.com/inko-lang/inko


Even if RustPython is faster at the architectural level (which it probably does, CPython is infamously slow), CPython has the advantage of a mature codebase and two and a half decades of accumulated performance tuning.


>and two and a half decades of accumulated performance tuning.

Or two and a half decades of performance neglect.

I've read about things like dicts, sort and the like getting faster implementations, but I've never seen a big effort to make CPython faster in general. In fact the first versions of 3.x were even allowed to regress to slower than 2.x.


This is certainly the case when it comes to startup time (an important thing in a scripting language). On my linux system, python2 starts in about 10ms, python3 takes 20-25 ms.


I wish, I wish, I wish .. something like this is done for Tcl/Tk

I like Tcl the language a lot, and I love the idea of two language systems

One high level for scripting Tcl One low level for high performance commands and parts Rust

You can of course do that today, using Tcl and C But .. well C is no Rust


Why do you like Tcl? I worked at a place that used it for their general purpose web servers... and most of their programmers complained that it was like using Fortran for modern web development. That's the only experience I've had with Tcl professionally in the field.


>and most of their programmers complained that it was like using Fortran for modern web development

That's because they weren't capable of appreciating a language outside of fads and posts to social media.


I think you really have to like the idea of 2 language system, to appriciate the potential of Tcl

Having one team create complex tool using Rust, or Ocaml or Go or C++

And another team more domain oriented, created UIs and Interfaces and Script using Tcl (a language that is simple enough , yet powerful enough)

The one size fit all language, dont exist, I think, a two language team, is very good option

A (stretched) example, is SQL ..and the DBMS Expert programmers enrich the DBMS using C++ or whatever And domain experts use SQL and Procedureal-SQL to solve business problems

Tcl as a universal declarative language, is an idea I like


> The one size fit all language, dont exist

Lisp?

(or perhaps lisp is an n language system)


I think any one size fit all language, will have to have optional typing, and be both compile and interpreted

i dont know of any language, that have both, and was successful

Not sure if clojure specs, achieve the same outcome of optional typing, so maybe you have a point :)


> optional typing

The problem with that is that all the languages seem either to inherently be dynamic or static. Any attempt to add the other kind is hamstrung by the language's inherent tendencies, and it doesn't really work. Perl or python, for instance, have optional typing, but it's not checked at compile-time. And then there's things like c++ or d variant that--again, they don't quite feel quite as dynamic as they would in a dynamic language. I don't think these features can truly coexist well in the same language.

> both compile and interpreted

That's no hard ask. You just have to fight inertia, but there's really not much standing in the way of something like that.


> The problem with that is that all the languages seem either to inherently be dynamic or static. Any attempt to add the other kind is hamstrung by the language's inherent tendencies, and it doesn't really work. Perl or python, for instance, have optional typing, but it's not checked at compile-time.

Python optional typing is checked in an optional pre-compilation step. Except that there is no opportunity to use typing for optimization, this isn't meaningfully different, when used, from being checked as part of compilation. In fact, other than using type information for optimization it's pretty much what most compile time type checking does; compilation isn't an indivisible atomic step.


Racket does have Typed Racketif that counts?


Tcl is definitely not like Fortran. Everything is a string.

https://en.wikipedia.org/wiki/Tcl


Only if you are speaking of pre-Tcl 8.0.


In the early 2000's, I was part of a startup delivering an application framework for web development based on Tcl/C, similar to AOLServer, just with our own view how it should actually look like.

It was a memorable experience, which I still fondly remember.

However it was also what made me not invest in languages without JIT/AOT thereafter, having to always dive into C all the time.

The relation of Tcl/C code changed quite heavily during the growth of the company, until we eventually rebooted our stack on top of the newly released .NET.

Something that we keep seeing on those "X rewritten in Y" over here.


Given all the great ideas here, maybe it’s time to fork Python. Ok, so maybe a close derivative as opposed to a true fork.

Given the language names involved (Rust & Python), I’d like to suggest “Copperhead” as a name for it.


Forking implies keeping the C code.

I think a rewrite in rust is more future proof: we benefit from a safer, more modern language to implement it, which comes with cargo, and hence, the potential of an hybrid python/rust toolchain and dev plateform.


copper doesn't rust (it does patina though)


How would your Python derivative differ from RustPython?


I stopped paying attention to the Rust parsing ecosystem for a while, curious how LALRPOP compares to nom/pest/combine and if something about python's grammar led to the choice.


Totally different use cases. nom/pest/combine are parser combinators, where you stitch the functions together yourself. LALRPOP is more in the vein of yacc where you specify the grammar and it generates the Rust parsing code for you in a build step.


pest isn't a parser combinator. It also uses a grammar file.


Oh, wow! Thanks for the correction. I had somehow missed that. I'm working on a rust parser of Elixir myself, and using LALRPOP for it. But looks like I'll have to check out pest, too. LALRPOP has the advantage that it takes a BNF grammar, which the Elixir parser in erlang includes, so it's a somewhat straightforward translation. Not too sure how similar it is to a PEG grammar that pest takes.


LALRPOP is an LR parser, which is a very different formalism to PEG: it's easy to write a grammar in either that's difficult/impossible to express in the other.

If you'll permit the immodesty, another Rust parser is lrpar (https://crates.io/crates/lrpar) which is a more direct drop-in replacement for Yacc, but with better error recovery. [Note: I'm biased because I wrote parts of lrpar and the wider framework, grmtools, it's a part of.]


Yes. In particular, I like the potential for:

* using Rust's borrow-checking to develop new lightweight/shared-memory multiprocessing tools for Python (think "import SharedMemoryPool from multiprocessing") without having to mess with the GIL, so as to maintain compatibility with existing libraries;

* using Rust's type inference on Python code for applications in which type safety is highly desirable; and

* compiling Python code for speed, targeting all architectures and platforms supported by Rust (e.g., WebAssembly).


How does an interpreter written in a statically compiled language use the type system of said language after being compiled?


Not sure. Perhaps this project could eventually lead to compiled Python...?


I want to create the Ramon’s Law (as a corollary of Atwood’s Law):

> Anything that can be Written in Rust, will Eventually be Written in Rust

Please go on, cite me, consider it «Attribution, Share-alike»


Maybe, "Anything that can be Written in Rust Will Eventually be Suggested, and then Celebrated with Great Fanfare on HN as the Obvious Future of Reality."

Atwood's Law was a prediction about the future that was based on something that had already happened (zillions of apps and libraries rewritten in JS). I'll quote your law when we have anything like comparable evidence that this is happening with Rust.


Fair enough.


Cool. Rust with a garbage collector :)


Curious what the general story on Rust <=> Python interop is like now (beyond writing a Python interpreter in Rust). I'd checked out rust-cpython a couple years ago, and I see there's now PyO3, but I just looked at some code samples and it seems like they're a long way away from the convenience that you get with CFFI + C code, boost::python, or even SWIG. Anyone have experience writing Python extensions in Rust or embedding a Python interpreter into a Rust program? Could you comment on how hard the process was and how robust the result is?


This is awesome. I hope more features from CPython are ported. It looks a little bare bones right now. Is there any chance it will gain full Python compatibility ?


Let's wait for it to support dictionaries first, before we start dreaming about full compatibility.

edit - it looks like it supports dictionaries with string keys, but not with integer keys.


How nice, even a wasm interpreter, that's dandy


Neat. Now keep a hash of the types/info for instructions at runtime, and when a set is called enough times and appears reasonably pure/simple, JIT it with cranelift (akin to a tracing JIT).


Is there anything about Rust that would make it hard to use existing python extensions that are written in C? I see the FFI docs, so it seems like it would be fine, but have no Rust experience.


Fwiw, nothing about rust (probably) but a whole lot about CPython's implementation. Extensions can and do poke around in object internals and will break if the internals don't work the way the extension expects them to (that is, the way CPython works). This has long been PyPy's big stumbling block (though it seems to have gotten better).


I don't get the wasm craze in the rust crowd. There is no popular language that cannot be compiled to wasm. It just feels sad to spend time there.


Rust is particularly suited to WASM due to the lack of GC and runtime (which tend to be quite large, and have to be downloaded in a web context). And due to it being able to achieve very high performance (which is the whole point of WASM). The only other languages that really compete here are C and C++, and they don't have the great, easily installable library ecosystem that Rust has, and they aren't very accessible to JavaScript developers).

Personally I'm more excited about other uses of Rust, but I can see why people are excited about Rust and WASM.


You're talking about web technologies, and needing to download a runtime is the problem?

C++ isn't accessible but rust is? Only because they can't be bothered to learn: It complicated but its really not that hard.

Plug: D can happily compile to wasm


Well yes, the need to download a new runtime for every app you run is a problem if thst runtime is several megabytes.

C++ isn't accessible to web developers compared to Rust. It's not just that I can't be bothered to learn, it's that I'm scared of all the security vunerabilities and memory corruption bugs I will write while I'm learning. And wjy put in that effort when I can learn Rust more easily, and get the ongoing benefits of my code being safe and reliable.

True, D also competes here. I should probably have put D on the list. Although my understanding is that a large part of the D ecosystem still relies on GC.


Most runtimes are fairly small, and most issues could be done away with by either the browser storing common runtimes (Search for x, then checking hashes etc.) or by compiling statically against the parts you actually use.

This is still not conclusive, as the runtimes will probably have to be significantly modified (at the ABI/System level, so around the edges) given that they will have to get memory from the browser etc. This leaves much room for WASM specific optimisation, especially given that the actual (let's say) garbage collector implementation is probably quite small compared to the code used to interface to it.

Writing C++ defensively (i.e. Do what the guidelines tell you, Preach Andrei and Bjarne etc), and using sanitizers cleans up a huge amount of C++ code.


> Writing C++ defensively (i.e. Do what the guidelines tell you, Preach Andrei and Bjarne etc), and using sanitizers cleans up a huge amount of C++ code.

Well sure, but the joy of Rust is that I don't have to worry about any of that. I can write my code naively, and the compiler will throw an error if I do anything stupid.

> This is still not conclusive, as the runtimes will probably have to be significantly modified (at the ABI/System level, so around the edges) given that they will have to get memory from the browser etc. This leaves much room for WASM specific optimisation

Certainly if/when this happens, other languages will be a lot viable in the compile-to-wasm. But you can run Rust (and C/C++) in the WASM runtime without issues today. And Rust even has a number of high-level libraries which provide binding to JavaScript APIs (e.g. https://github.com/rustwasm/wasm-bindgen)


>they aren't very accessible to JavaScript developers

I wonder how Rust is more “accessible” to JS developers than C or even C++. If you find C too hard to comprehend, you’re definitely not ready for Rust...


As someone who learnt Rust after trying to learn C, I disagree. Basic C is easy. I could write a data structure. But as soon as you want to do anything useful, like parse command line arguments or use open gl, C becomes much more complicated.

Rust code is actually pretty similar to JavaScript code, in that I can pull in a library `cargo add regex`, and work with high level abstraction right away.

Of course, there are new concepts to learn, but the Rust book covers these pretty well (I've been unable to find similar documentation for C/C++ that doesn't run to hundreds of pages).

My observation is that many people learn C/C++ at university (where there is lot's of support for learning the arcane folk knowledge of "the right way" of doing things in those langauges), and subsequently find Rust hard, because it introduces new concepts, and doesn't work in the same vein as C/C++.

For those of us coming from higher level languages, Rust is much easier, because it provides guard rails and prompts us when we go wrong, and because once a few new concepts have been learnt, a lot of our existing concepts can still be applied.


I think the point is Rust is a very good language in its own right. The fact it compiles to WASM is a great feather in the cap.

They are able to deliver a demo in the browser for something that would normally require downloading and compiling. I think that’s pretty cool to show for a project people wouldn’t normally be able to try out with such low barrier of entry.


If this ended up making it possible to run Python in the web browser, that might be interesting to a fair number of people.

(There are advantages to running the same language on the server and client, and there's plenty of server-side web application Python code out there.)


There's already a client-side implementation of Python in JavaScript. See: https://brython.info/

But it might not give you the sort of compatibility between the client and server code that you're looking for.


You mean like repl.it or skulpt or ... actually there's a few different option with different features, but doing python in the browser has been a thing for many years.


Running Python, client-side in the browser.


There are already various options, many using Emscripten.


>There are advantages to running the same language on the server and client

I see this said quite often. What are they?


If you have a traditional server-rendered web application and you want to add a little bit of scripting on the client side, it can be nice to be able to share pieces of domain-specific code (for example, you might want the client and server to have the same notion of what a well-formed stock code looks like).

In more sophisticated systems, people sometimes like the client to "optimistically" do the same processing as it expects the server to do, so it can update its display more quickly.

Or if you have a mostly client-side application which builds up some fancy widget tree, sometimes people like to have the server do the same rendering as the client would so that it's there on initial page load, or so that search engines can see it.


You often have the same data types on the server and the client for data exchange. When both sides use the same language, you only write these types once and it’s impossible for the client and server implementations to differ.


To make sure the same bootcamp hacks can work on your backend and frontend, neither of which they fully understand.


Saving in developer salaries is the only tangible one. You could argue there is less context switching for developers between languages, but I’ve never actually heard that from someone who builds the apps.


Not having to learn 2 languages and duplicate efforts for starters...


I think it's less "the Rust crowd is excited about wasm" and more as "there is a crowd which is excited about doing certain things with languages and software, and Rust and wasm are both exciting means for those ends." Perhaps a good approximation is, they're both tools for using high-level programming techniques with high performance for domains that were previously constrained in terms of what languages you could use.


Most of the popular languages are not making a serious effort to compile to WebAssembly, though. It's totally reasonable for Rust to focus on.


Does it have a GIL? If not, it's already better than CPython :-D


Another amateur project that will go no where, look back at this project project 5 years from now, nothing will run on it / be maintain.




Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: