Hacker News new | past | comments | ask | show | jobs | submit login
Oxidizing bmap-tools: rewriting a Python project in Rust (collabora.com)
70 points by glenngillen on March 4, 2023 | hide | past | favorite | 33 comments



There's very little content to the article sadly, aside from links to the artefacts.

It also seems to have been done independently of the upstream, so it's not really an "oxidation" in the usual terms, more of a pseudo-fork of the specific `bmaptool copy` subcommand (though TBF it only has one other subcommand which is `create`, and the implementation in the upstream is about 1/3rd that of copy, so copy is clearly the "meat" of the project).


Wow I really like the terminology "oxidizing" for re-writing something in Rust.

Sorry for the unsubstantive comment.


> Usually a project is oxidised into Rust because of many reasons, the main usually being memory safety.

What about python’s memory model is unsafe?


The article felt kind of disjointed, I think that statement was just meant generally and not meant to suggest it applies here


Also how I interpreted it, though even there it's quite weird (e.g. better performances is also a common reason to convert things to Rust, especially when "easy binding" tools like pyo3, neon, or rustler are available and take care of the unsafe bits between the two).


On the other hand, performance isn't a unique thing Rust brings to the party, C or C++ both have attractive performance in the same ballpark. So if the question is "Why Rust specifically?" rather than e.g. "Couldn't I technically use an awk script?" then safety is a better argument than performance.

Rather than memory safety per se, I'd actually value Rust's type safety over Python's. Yes, strictly speaking Python is dynamically typed rather than untyped, but Rust will shift a lot of your mistakes "hard left" (ie report your goof during compilation not during execution) compared to Python because it gets to do these checks up front, and it is also able to catch a lot more of them because it's much stricter about what types are.

In Python we can write "if foo" and it doesn't matter too much what type foo actually has, Python will try to decide if it's "truthy" at runtime. In Rust either foo is a boolean, in which case this is a reasonable thing to do, or it isn't, that's a type mismatch, your program doesn't compile. Not everyone loves this, but I certainly do.


Yeah… It’s becoming a pet peeve that the Rustafarians believe they have a monopoly on “memory safety” and need to point it out all the time.


Its worth pointing out because rust has a monopoly on easy-to-write gc-less memory safety, with the alternatives being modern c++ or higher level languages where you run into a garbage collector


Even with garbage collection, check the fine print. Most of those languages aren't memory safe when multi-threaded, you can race a complicated data structure so that its state becomes inconsistent and now everything is on fire, so if the garbage collector comes along that just catches fire too.


Do you have any specific examples of this (or the fine print that implies it) for Rust? (Not sealioning[0], genuinely interested!)

[0] https://www.merriam-webster.com/words-at-play/sealioning-int....

Edit: Just realized you said "even with garbage collection", so you weren't talking about Rust. Never mind.


That's actually an entirely reasonable question, Rust's memory model is scarcely as well nailed down as the Java Memory Model (which is a GC model that does promise memory safety under concurrency and is worth reading about if you'd like to see how much thought is needed).

https://jcp.org/aboutJava/communityprocess/final/jsr133/inde...

(section 17.4 of the full specification)

However in a sense Rust is cheating. Java has to do a lot of work because they can't stop you writing a data race in Java, so they want that to be safe anyway if it happens. (Safe) Rust just won't emit the data race in the first place.

A data race requires (as mentioned below in the thread about Python concurrency) that there's mutation concurrently with access to the same memory. In Java that's a bad idea, but it does happen, usually by mistake. In (Safe) Rust mutation requires that nobody else has a shared (immutable) reference to the object, so the data race can't occur because if we can mutate something by definition nobody else has a reference to it - programs with this mistake won't compile.


Thank you!


Does Python do anything to enforce mutually exclusive access when mutating? If not, that’s a hole you could drive a truck through, isn’t it?


Isn't Python still run under a single global interpreter lock? Can't have simultaneous access while mutating if only one thing is running at a time...


For so long as the GIL persists, you are correct, and thus Python does not have data races and is able to achieve memory safety in this regard.

It is conceivable (but extremely unlikely, 'cos it was really, really hard) that after a GILectomy Python follows the Java path, in which data races are technically safe†. However it is most likely Python with a GILectomy will behave like Go or C# or numerous other languages and lose memory safety properties if a data race occurs.

† Data races can happen in Java, and astonishing things might happen, but objects always remain in some valid state, so there is no loss of memory safety whereas in most languages with data races you can e.g. race a hash table and mess up its internals and cause chaos.


Yeah, that’s something, at least. Wouldn’t the order that mutations happen still matter, even though they have to acquire a lock? Not a pythoneer, myself.


Matter? Sure, there can be race conditions.

But allow for memory unsafety? No, not if every ordering of the "critical sections" (chunks of code run as a unit while the interpreter is locked) is valid and upholds the invariants Python expects.


Not in the context of memory safety. You can still have race conditions up the ass, but not data races, unless you're using a native library which 1. releases the GIL and 2. is broken.


Interesting, I hadn’t realized how much the phrase “memory safety” understates what is desirable.


The thing is that races are good a lot of the time. If I have a set of tasks running in parallel that take an unknown/variable amount of time and I want to tell the users which ones are finished, my output needs to be based on a race between the tasks. If I'm scraping a website, I (may) want to have multiple connections going in parallel, and as soon as one of those connections spots a new link I (may) want to open a new connection to start scraping it, but I don't know which connection is going to spot a new link first, so there's a (benign) race condition.

Making a language that banned them outright would be making a language that couldn't do things that people wanted to do.


I figure that you could very easily mark which race conditions are good.


Which Rust-written Python tools are folks using?

I know of two big ones: ruff (linting) and pyflow (dependency management). The standard lib crypto module uses rust, too.

Are there other ones I should know about? Maybe replacements for mypy, pre-commit, tox/nox?



Does Pydantic use rust? When I check the github repo, it shows 100% python.


The Rust re-write is being worked on in a separate repo: https://github.com/pydantic/pydantic-core


He's working on a rust rewrite, to be used in Pydantic 2.0


> The standard lib crypto module uses rust, too.

This couldn't matter less, but I think you're confused with the third-party Cryptography package, which uses Rust.


My bad, thanks for the correction!


I'm a huge fan of maturin but that might be a bit meta. It's a python build tool for wrapping rust projects. There github page has a list of projects using it and you might find more of what you are looking for: https://github.com/PyO3/maturin#examples


I’m confused… you’re talking about avoid a local copy of sparse regions… Linux already does that at the level of the inode. There’s also a seek operation to move past the next hole. Not sure why you would carry around metadata the filesystem is already tracking for you.


> Not sure why you would carry around metadata the filesystem is already tracking for you.

Because bmap files are independent of the filesystem and OS, and thus would probably like to work even with filesystems which don't support sparse files, and OS which don't expose holes?

For instance until NFS 4.2 in 2016 you could write sparse files to an NFS volume, but there was no way to detect holes when reading. exfat doesn't support sparse files at all. And according to their man pages, OpenBSD and NetBSD have yet to support SEEK_HOLE/SEEK_DATA (which are non-standard extensions of POSIX lseek(2)).

Plus according to its history the bmaptools project was created about a year after the release of kernel 3.1, which introduced support for SEEK_HOLE and SEEK_DATA. Doesn't take much of a leap to assume that the project's creator didn't consider that widespread enough to be reliable (Debian wouldn't release a 3.x-based version until the following year).


Seeking through holes also doesn't work very well for compressed images, since usually there is no way to tell apart an insignificant hole from a long sequence of zeroes or other filler data.


An example from my work:

We have a Yocto build that results in about 120MB worth of files that make up our app and Yocto. Originally we had a script that would write a bootloader, partition and format ext4 our target's eMMC, and decompress a 120MB tarball to that filesystem.

That worked well, but we wanted our script to become OS-independent, as our field team ran Windows laptops. It's quite difficult to get Windows to do an ext4 format, and I wanted our tool to have a minimal number of dependencies (e2fsprogs requirement? some proprietary thing from Paragon? no thanks)

So instead, have Yocto produce an image containing the bootloader and all four pre-formatted ext4 filesystems. No operating system needs to do the format if the filesystem already exits within, it's just a raw block write. But now the image is 4GB, the size of our eMMC, and writing all of it would be painfully slow.

Thankfully Yocto also outputs a bmap file which maps the parts of that 4GB which are empty space -- blocks we don't need to write when commissioning our target device. So our commissioning tool was rewritten in Go, and I wrote a bmap implementation in Go to do the write. Flashing our target is as fast as it used to be, but now that tool can be easily made to work on multiple operating systems.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: