
Using Rust in Mercurial - oblio
https://www.mercurial-scm.org/wiki/OxidationPlan#
======
yeukhon
So my understanding is the hg developers are planning to rewrite a large part
of hg in Rust. If so, this is an unfortunate blow to Python, because often I
hear (and I do as well) Python developers cite hg as one of the largest
Python-based application (regardless of the C-extension). I certainly feel sad
if that is the case.

I studied hg quite in depth in undergraduate for a semester, when I was
implementing "bitbucket" myself. To be really honest, the codebase was easy to
navigate, and function names were pretty consistent with the actual hg
commands/internal spec. While the code itself is probably hard to write any
true unit tests (you'd have to monkeypatch like crazy) using mock -- which
means the function has a lot of code, overall the codebase quality was pretty
good for a complex software. I just had to know the variable name
abbreviations, get used to them and referred back to the Hg paper.

In anyhow, whatever the decision is, I'd learn a bit Rust to help out :-)

P.S. I am still trying to find an answer to this: if FB uses Hg, then what
about their git code?

~~~
kibwen
_> If so, this is an unfortunate blow to Python_

As both a Rust and Python user, I don't necessarily agree. This isn't a
rewrite to purge Python; it's clear from the post that Mercurial values
Python's flexibility, and wants to bend over backwards to ensure as little of
that flexibility is lost in whatever transition may occur. And Python's
credentials would be secure with or without Mercurial being written in it.
Frankly, I think that swapping out critical bits for a low-level language is a
great way to scale a dynamic-language codebase without completely sacrificing
the usual ergonomics of dynamism.

~~~
yeukhon
Basically, Mercurial is migrating away from Python as its primary and core
programming language. You don't say Git is written in Python even if Git were
to have a few Python code (I actually don't know, never read Git code), would
you?

The Python community agrees that in order to combat performance, Python
developers would write C extension and/or compile code with Cython before
considering a language migration. You can no longer say Hg is written in
Python. You can't cite that in any of your conversation unless you say "was
written in Python". This migration is to show that while Python's flexibility
and dynamism are great, especially in the context of development velocity,
Python is no longer Hg's first-class programming language of choice whenever
possible. There is a limit as to how much you can get from Python. That's a
blow, a temporary crying moment. I recognize this is cynical or entirely an
ego thing. I am, again, to be clear, I am not criticizing the decision to move
away to Rust - because I have an equal respect for Rust, but this announcement
is nonetheless a sad moment to see another Python project moving away.

Sometimes improvements to a programming language stem from the limitations
observed by a popular project (and equally by the large number of users).

P.S. On the other hand, moving CPython devl workflow from Hg to Git was a blow
to Hg.

~~~
wirrbel
> Basically, Mercurial is migrating away from Python as its primary and core
> programming language. You don't say Git is written in Python even if Git
> were to have a few Python code (I actually don't know, never read Git code),
> would you?

Not necessarily. It would show that you can prototype an application in Python
and once you have a stable product you would have a migration part to Rust for
optimization. That might be even a bit more convincing than the prospect of
starting a new project in a non-GC lang.

------
puddums
Initial reaction to headline was it sounded like (a) no more python, and (b)
this is a decided future direction.

Instead, it sounds like this is a proof-of-concept for flipping the main 'hg'
command from being python + C extensions, to instead being a rust binary with
an embedded python interpreter. Part of the rationale appears to be
performance, but also smoothing out cross platform experience, especially on
Windows.

Pulling out some related snippets:

\-----

 _While Python is still a critical component of Mercurial and will be for the
indefinite future, I 'd like Mercurial to pivot away from being pictured as a
"Python application" and move towards being a "generic/system application." In
other words, Python is just an implementation detail._

\-----

 _Desired End State_

 _hg is a Rust binary that embeds and uses a Python interpreter when
appropriate (hg is a Python script today). Python code seemlessly calls out to
functionality implemented in Rust. Fully self-contained Mercurial
distributions are available (Python is an implementation detail / Mercurial
sufficiently independent from other Python presence on system)_

\-----

 _" Standalone Mercurial" is a generic term given to a distribution of
Mercurial that is standalone and has minimal dependencies on the host
(typically just the C runtime library). Instead, most of Mercurial's
dependencies are included in the distribution. This includes a Python
interpreter._

\-----

 _This patch should be considered early alpha and RFC quality._

~~~
bulldoa
what does embed python interpreter mean? are they actually writing a python
interpreter using rust so they can write python code and compile to rust?

~~~
puddums
python has a concept of "extending" and also "embedding". It looks like they
are looking at embedding[0], which enables you use the normal CPython
interpreter from within another program. (So no, not writing a new Python
interpreter in Rust).

Sample snippet from python docs:

\-----

 _So if you are embedding Python, you are providing your own main program. One
of the things this main program has to do is initialize the Python
interpreter. At the very least, you have to call the function Py_Initialize().
There are optional calls to pass command line arguments to Python. Then later
you can call the interpreter from any part of the application._

 _There are several different ways to call the interpreter: you can pass a
string containing Python statements to PyRun_SimpleString(), <...etc..>_

\-----

[0]
[https://docs.python.org/3/extending/embedding.html](https://docs.python.org/3/extending/embedding.html)

~~~
puddums
If interested, you can see their work-in-progress main.rs in the related code
revision[0], which includes their Rust code calling down to the C function
Py_Initialize() to spin up the now-embedded CPython interpreter that is living
"inside" a Rust program:

    
    
        unsafe {
            Py_Initialize();
            PySys_SetArgv(args.len() as c_int,
                          argv.as_ptr());
            PyEval_InitThreads();
            let _thread_state = PyEval_SaveThread();
        }
    

\----

[0] [https://phab.mercurial-
scm.org/D1581#change-t24aVkGEJ5Xh](https://phab.mercurial-
scm.org/D1581#change-t24aVkGEJ5Xh)

------
krschultz
Leaving aside the actual project at hand, this is a great example of a well
thought out project plan. There is a clear rationale, clear end state, a bunch
of known problems to tackle, a front loading of risk, and it delivers
incremental value along the way.

------
steveklabnik
I was wondering how serious this was. I don't know a lot about how mercurial
is developed, but
[https://twitter.com/indygreg/status/937527180292014080](https://twitter.com/indygreg/status/937527180292014080)

[https://gregoryszorc.com/work.html](https://gregoryszorc.com/work.html)

> I am a significant contributor to the Mecurial open source version control
> system.

> I serve on the Mercurial Steering Committee, which is the governance group
> for the Mercurial Project. I also have reviewing privileges, allowing me to
> accept incoming patches for incorporation in the project.

So not sure, but it seems like at least one person on the team is into it?

~~~
ngoldbaum
This is a proposal not a plan. Right now it’s 100% vaporware.

~~~
steveklabnik
Yeah, totally; I was wondering if it's a proposal that came from the team
themselves, or some random person.

~~~
durin42
There's pretty broad consensus that we'd like to do Rust and not C for future
extension work, but the current plan is that a pure-Python hg will also be
something we support. The "rust binary that embeds Python" approach looks like
a straightforward win on Windows, where we have a native .exe that embeds
Python anyway. I'm not sure if that'll make sense on non-Windows, but we'll
see.

I've done some poking around with milksnake, which seems extremely promising
for writing native speedups.

~~~
tonfa
Do you know what is the current status for Hg on PyPy? (I guess that besides
portability that was also a motivation for keeping the pure python bits
around.

~~~
durin42
Last I knew it worked fine, but hg doesn't tend to run long enough for the JIT
to warm up enough and be an unambiguous win. Fijal complemented us on how good
our C was.

I think it does work if you do chg with a commandserver in pypy.

------
bla2
> The nice things we want to do in native code are complicated to implement in
> C because cross-platform C is hard. The standard library is inadequate
> compared to modern languages. While modern versions of C++ are nice, we
> still support Python 2.7 and thus need to build with MSVC 2008 on Windows.
> It doesn't have any of the nice features that modern versions of C++ have.
> Things like introducing a thread pool in our current C code would be hard.
> But with Rust, that support is in the standard library and "just works."
> Having Rust's standard library is a pretty compelling advantage over C/C++
> for any project, not just Mercurial.

Sounds like the main reason for rust is that Python has a weird dep on a fixed
MSVC version.

~~~
Niten
Which they mention they'd still have to implement workarounds for if they
adopt Rust, so I'm not sure I understand that selling point.

~~~
pornel
Switching to Rust removes a lot of papercuts from dealing with MSVC. These
things could be solved in many other ways, so it's not exactly _the_ reason,
just something you also get from adopting Rust.

For C programs Windows happens to be the odd one out for lots of things. It's
annoying with its unloved C compiler, missing unix-ish headers and tools,
string encoding pains, very different packaging story, etc.

So I think the motivation here is they'll solve just that one CRT problem now,
and will have fewer Windows problems to worry about later.

I've done that for pngquant. I can build things with Cargo rather than explain
to users that `make` may be `mingw32_make`. I can parse options with Rust's
getopts, rather than getting bugreports that Visual Studio can't find
`getopt.h`. These aren't hard problems. I could have solved all of them, but I
don't have to!

------
kyrra
For those that don't want to deal with the Python startup time, the Mercurial
team already has an attempt at fixing this with a tool called CHg[0]. It is a
C binary that interacts with the Mercurial CommandServer[1], which is just a
long running version of Mercurial CLI that you can interact with over a pipe.

Using Rust as the primary client would simplify this a lot, but is a lot more
work than what CHg accomplished.

[0] [https://www.mercurial-scm.org/wiki/CHg](https://www.mercurial-
scm.org/wiki/CHg)

[1] [https://www.mercurial-scm.org/wiki/CommandServer](https://www.mercurial-
scm.org/wiki/CommandServer)

~~~
ngrilly
This is already explained in the linked article.

~~~
kyrra
I noticed this after the fact. Thanks for pointing it out.

I had previously looked into understanding how mercurial worked and what
options there were for different frontends or backends. (And what it would
take to write my own backend just to understand a very basic flow of a clone
or pull). It's a lot of work for sure and there mercurial team (at least mpm a
few years back) preferred one true implementation.

------
ruke
I suddenly remember FbExperiment on building Mercurial server using Rust
[https://github.com/facebookexperimental/mononoke](https://github.com/facebookexperimental/mononoke)

And yes.. it's for extension modules (maybe just the beginning)

------
agentgt
Perhaps childish and trite but I do like the use of the word “oxidation” for
porting to Rust

~~~
nnethercote
There is precedent!
[https://wiki.mozilla.org/Oxidation](https://wiki.mozilla.org/Oxidation)

------
johnny_1010
"Rewrite everything in Rust, exactly the same way but in Rust it solve all
problem." \- some Rust programmer

------
bedros
how would they deal with hg extensions written in Python?

~~~
masklinn
By embedding[0] the interpreter and loading the extensions in there.

[0]
[https://docs.python.org/3/extending/embedding.html](https://docs.python.org/3/extending/embedding.html)

------
the_mitsuhiko
It’s not being rewritten in Rust. They want to use rust instead of C for
extension modules.

~~~
jxcl
One of us misread the article. I understood it to say that they want to
rewrite the core of the Python code to Rust so that they don't have to wait
for the Python interpreter to start every time someone uses the hg command.
The Rust binary will be able to call Python code, which is basically a
complete reversal of what they have now.

~~~
the_mitsuhiko
> One of us misread the article. I understood it to say that they want to
> rewrite the core of the Python code to Rust so that they don't have to wait
> for the Python interpreter to start every time someone uses the hg command.

It would still use the Python interpreter internally but they want to skip it
for some functionality like the basic command line interface. So they would
embed the Python interpreter in their new distribution. However the Python
code still just calls out to other Rust code.

~~~
merb
> However the Python code still just calls out to other Rust code.

this will probably be hard. calling back and forth is always harder than into
a single direction.

------
gcb0
really wonder why not Go.

original code was python with lots of C. the same can be done with Go, while
keeping much of the same philosophy of python.

This change will probably alienate most of the contributors since rust and
python or C are worlds apart.

~~~
seba_dos1
I would wonder why Go if Rust is available.

Rust isn't hard. The only somewhat hard thing in Rust is writing with borrow
checker, and honestly, if you want to seriously write in C, you really need to
go through that experience and understand it enough to feel somewhat
comfortable with it.

And even if that would really be a thing, giving up compiler checks in order
to allow more low quality code to be contributed doesn't really seem like a
good trade off to me.

~~~
Rusky
Rust is certainly harder than Go. There's no need to dance around the issue,
that doesn't help anyone.

Further, Mercurial is already 100% on the train of "giving up compiler
checks," though "low quality code" is hardly a fair characterization of why.

~~~
vvanders
In isolation Go is probably easier to write than Rust. However in terms of
embedding/interop Rust is certainly much easier.

Last I looked there wasn't a way to pin a pointer in Go and they explicitly
forbid passing around opaque handles so you're already constrained in how you
can interop.

Secondly you now have a whole nother runtime to hoist up instead of a simple C
FFI. It hit this just recently with Python+Rust. Needed to optimize an inner
loop. I could just drop down to a single Rust fn, write it and be on my way.
No need to spin up a whole runtime just for that inner function.

The parent also hit it on the head. If you're going to be writing C you
already need to understand lifetimes deeply, Rust just codifies that in a way
that lets you catch it at compile time.

~~~
wott
> _If you 're going to be writing C you already need to understand lifetimes
> deeply,_

Deeply? Unlike complicated languages like for example GC-languages, C only has
2 lifetimes: block scope for stack variables, forever (until explicitly
free'd) for heap malloc'ed memory.

You don't have to twist your head to know in which of N possible ways the data
you get from a function was allocated, you don't have to twist your head to
know if you need to free it manually, or semi-manually, or hope that it gets
freed automatically at some point.

You don't have to care about stack variables, you just have to be organised
about heap variables, and that's it. No complicated concept, not many
different paradigms. You malloc() something to get some memory when you need
it, you free() it when you don't need it any more.

Also you don't have to twist your head to know if you can access some function
argument, if it was passed by copy or by reference or by a third mean, does it
involve heavy data copying or not, what does it mean concerning access, etc.
Everything is passed by value (copy) and you can only pass simple objects
(plus structures), for anything else you pass the value of the address of the
object, end of story.

~~~
Rusky
> C only has 2 lifetimes: block scope for stack variables, forever (until
> explicitly free'd) for heap malloc'ed memory.

To the contrary, C has just as many lifetimes as Rust. They're just not
explicit. "free() it when you don't need it any more" is a complex problem and
C doesn't really help you solve it at all.

> Everything is passed by value (copy) and you can only pass simple objects
> (plus structures), for anything else you pass the value of the address of
> the object, end of story.

That's how Rust works as well.

