Hacker News new | past | comments | ask | show | jobs | submit login
Debian Running on Rust Coreutils (ledru.info)
492 points by todsacerdoti 9 months ago | hide | past | favorite | 330 comments

There's a great opportunity to not just reimplement the GNU coreutils but to actually rethink them. The way I install a Linux system these days and then immediately download ripgrep or fd reminds me of how the first thing I used to do on a Solaris system was install bash and the GNU coreutils. There's a rich history of doing this - BSD Unix started as a bunch of utilities and patches to the AT&T version, it only became a freestanding distro with its own kernel later on. And the GNU coreutils go way back before the Linux kernel - you just installed them on a commercial Unix to make it more usable. In both cases they went beyond just drop-in replacements and started adding lots of new features and flags. Things have been pretty static for a long time so it's good to see some innovations coming.

The GNU coreutils are the lowest common denominator of most GNU systems, and thus a lot of software targets them including the GNU extensions.

Replacements like rg or fd are often not even compatible to the unix originals let alone the GNU extensions. Yes, some of the GNU tools are badly in need of UI improvements, but you'd have to abandon compatibility with a great deal of scripts.

I think both a coreutils rewrite as well as some end user facing software like rg has its place on modern unix systems. I'm a very happy user of rg! But I'd like some more respect for tradition by some of those tools. For example, "fd" in a unix setting refers to file descriptors. They should rename IMO.

Another thing to mention: there is not only the compatibility at the side of software (scripts), that to me is easy to maintain (just keep the old software with the new compatible one and you are done).

The most important thing is compatibility with humans. The main reason because I tend to use a pretty standard setup is because I know that these tools are the standard, I need to ssh into a server to resolve some issue, well I have a system that I'm familiar with.

If for example on my system I have fancy tools then I would stop using the classic UNIX tools, and then every time I would have to connect to another system (quite often) I would type wrong commands because they are different or totally not present or have to install the new fancy tools on the new system (and it's not always the case, for example if we talk about embedded devices with 8Mb of flash everything more than busybox is not possible).

To me the GNU tools have the same problem, they got you used to non POSIX standard things (like putting flags after positional arguments) that hits you when you have to do some work on systems without them. And yes, there still exist system without the GNU tools, MacOS for example, or embedded devices, some old UNIX server that is still running, etc.

Last thing, if we need to look at the future... better change everything. Let's be honest, if I could chose to switch entirely on a new shell and tools, and magically have it installed on every system that I use, it would probably be PowerShell. For the simple reason that we are no longer in the '80 where everything was a stream of text, and a shell that can manipulate complex objects is fantastic.

> Last thing, if we need to look at the future... better change everything.

Absolutely. We had decades to work with a fairly stable set of tools and they are not going anywhere. Whoever needs them, they are there and likely will be for several more decades.

I am gradually migrating all my everyday DevOps workflows (I am a senior programmer and being good with tooling is practically mandatory for my work) to various Rust tools: ls => exa, find => fd, grep => rg, and more and more are joining each month. I am very happy about it! They are usually faster and work more predictably and (AFAICT) have no hidden surprises depending on the OS you use them in.

> we are no longer in the '80 where everything was a stream of text, and a shell that can manipulate complex objects is fantastic.

Absolutely (again). We need a modern cross-platform shell that does that. There are few interesting projects out there but so far it seems that the community is unwilling to adopt them.

I am personally also guilty of this: I like zsh fine even though the scripting language is veeeeery far from what I'd want to use in a shell.

Not sure how that particular innovation will explode but IMO something has to give at one point. Pretending everything is a text and wasting trillions of CPU hours in constantly parsing stuff is just irresponsible in so many ways (ecological included).

I sometimes think that the original Unix shell, when run on a PDP-11, was a paragon of waste, with CPU being so slow, and RAM so scarce. It was like running Ruby on an ESP 32. Still it was too useful to ignore.

I suspect that the endless parsing may be even economical compared to the complex dance needed e.g. to call a Python method. The former is at least cache-friendly and can more easily be pipelined.

Though it's ok to be slow for a tool which sees mostly interactive use, and the scripting glue use. Flexibility and ergonomics trump the considerations of computational efficiency. So I expect that the next shell will burn more cycles in a typical script. But it will tax the human less with the need to inventively apply grep, cut, head, tail, etc, with their options and escaping / quoting rules.

> I sometimes think that the original Unix shell, when run on a PDP-11, was a paragon of waste, with CPU being so slow, and RAM so scarce. It was like running Ruby on an ESP 32. Still it was too useful to ignore.

Sure, but it was a different time. People there were down for anything that actually worked and improved the situation. Like the first assembler written in machine code, then the first C compiler being written in assembly, etc. People needed to bootstrap the stack somehow.

Nowadays we have dozens, maybe even thousands of potential entry points, yet we stubbornly hold on to the same inefficient old stuff. How many of us do REALLY need a complete POSIX compliance for our everyday work? Yeah, the DevOps team might need that. But do most devs need that? Hell no. So why isn't everyone trying stuff like `nu` or `oli` shell etc.? They are actually a pleasure to work with. Answer: network effects, of course. Is that the final verdict? "Whatever worked in the 70s shall be used forever, with all of its imperfections, no matter how unproductive it makes the dev teams".

Is that the best the humanity can do? I think not... yet we are scarcely moving forward.

> I suspect that the endless parsing may be even economical compared to the complex dance needed e.g. to call a Python method. The former is at least cache-friendly and can more easily be pipelined.

50/50. You do have a point but a lot of modern tools written in Rust demonstrate how awfully inefficient some of these old tools are. `fd` in particular is times faster than `find`. And those aren't even CPU-bound operations; just parallel I/O.

Another example closer to yours might be that I knew people who replaced Python with OCaml and are extremely happy with their choice. Both languages have no (big) pretense that they can do parallel work very well so nothing much is lost by migrating away from Python [for various quick scripting needs]. OCaml however is strongly typed and MUCH MORE TERSE than Python, plus it compiles lightning-fast and runs faster than Golang (but a bit slower than Rust, although not by much).

> Though it's ok to be slow for a tool which sees mostly interactive use, and the scripting glue use.

Maybe I am an idealist but I say -- why not both be fast and interactive? `fzf` is a good demonstration that both concepts can coexist.

> Flexibility and ergonomics trump the considerations of computational efficiency.

Agreed! The way I see it nowadays though, is that many tools are BOTH computationally inefficient AND not-ergonomic.

But there also reverse examples like GIT: very computationally efficient but it's still a huge hole of WTFs for most devs (me included).

Many would argue that the modern Rust tools are computationally less efficient because they spawn at least N threads (N == CPU threads) but the productivity gain earned from that more aggressive use of machine resources is IMO worth it (another close example is the edit -> save -> recompile -> test -> edit... development cycle; the faster that is, the bigger the chances that the dev will follow their train of thought until the end and will get the job done quicker).



- We can do better

- We already are doing better but the new tools remain niche

- Old network effects are too strong and we must shake them off somehow

- We are holding on to old paradigmae for reasons that scarcely have anything to do with the programming job itself

>For the simple reason that we are no longer in the '80 where everything was a stream of text, and a shell that can manipulate complex objects is fantastic.

Stream of bytes is the only sane thing to do. Nothing keeps you from not having a flag in your cli programs to choose the input/output format. In fact many programs already has this and json seems pretty popular.

Having standard serialization is just gonna be boilerplate and unneccessary for many programs. Having user choosing how to interpret the input/output is the best way.

> Having standard serialization is just gonna be boilerplate and unneccessary for many programs.

And for a lot of programs having to keep generating and parsing strings is just a bunch of unnecessary boilerplate.

> Stream of bytes is the only sane thing to do. Nothing keeps you from not having a flag in your cli programs to choose the input/output format. In fact many programs already has this and json seems pretty popular.

Having an API like this is a great idea... as a thin wrapper on top of a tool with a standardized serde protocol or binary file format. The different API's can be exposed as separate tools or as separate parts of the API of a single tool from a user's POV.

Furthermore: JSON is not just a stream of bytes nor text, neither is CSV, nor any other text format. Handling text as properly typed objects makes a lot of sense.

I have little experience with shells that can work with objects like Powershell. But I'e seen screenshots of new objects-based shell developed in Rust some time ago and it was early days still so I didn't actually even try it out, but looked downright great compared to the current stream of text-based shells of today in terms of ergonomics and capabilities.

Not being compatible is often the point of tools like ripgrep.

If a Rust rewrite of Coreutils is not backward compatible, it is not a replacement and the current GNU Coreutils would still be needed for backward compatibility.

There is a long history of newer, better tools supporting backward compatibility to be a replacement for their predecessors:

- bash and zsh support backward compatibility to be replacements for sh.

- vim has compatibility mode to be a replacement for vi.

If the Rust implementation is not backward compatible, it should not be called "Coreutils".

The Rust rewrite of Coreutils (this post essentially) _is_ meant to be a drop in replacement for GNU/Coreutils. OTOH, ripgrep and fd are not "drop-in replacements". Rather, they are "replacements" that devs can use if they want to.

It's not the point of ripgrep. If ripgrep were only marginally better at common grepping tasks then it could hardly justify not being compatible. It aims to be much better though, so it can justify it well. But not being compatible is not the point or the goal of the project, I believe.

Yes, this is correct in some sense. I didn't set out to create an "incompatible grep." I set out to build a "better" grep, where "better" could include things like being incompatible.

Of course, ripgrep is still ultimately a lot more similar to grep than it is different. Where possible, I used the same flag names and prescribed similar behavior---if it made sense. Because it's good to lean on existing experiences. It makes it easier for folks to migrate.

So it's often a balancing act. But yes, POSIX compatibility is certainly a non-goal of ripgrep.

That said, thanks for making ripgrep so good. I use it 100s of times every day for 3-4 years now... (and iirc, it's the default engine in VS Code as well).

Not having to specify "-rHInE" on every invocation is one killer reason to use ripgrep, over grep. It's a concrete advantage of you not having 100% compatibility as a goal.

(The other killer reason is the blitzing speed.)

If that is the only incompatibility, it would be easy to make a patch that checks if it is called as "grep" and default to "not -rHInE" - so one could ship ripgrep as default and yet have backwards compatibility. Some busy boxes already do that iirc.

EDIT: So I've quickly looked into it and it seems nobody did an extensive comparison to the grep feature set or the POSIX specification. If I have some time later this week I might do this and check whether something like this would be viable.

It's not even close to the only incompatibility. :-) That's a nominal one. If that were really the only thing, then sure, I would provide a way to make ripgrep work in a POSIX compatible manner.

There are lots of incompatibilities. The regex engine itself is probably one of the hardest to fix. The incompatibilities range from surface level syntactic differences all the way down to how match regions themselves are determined, or even the feature sets themselves (BREs for example allow the use of backreferences).

Then of course there's locale support. ripgrep takes a more "modern" approach: it ignores locale support and instead just provides what level 1 of UTS#18 specifies. (Unicode aware case insensitive matches, Unicode aware character classes, lots of Unicode properties available via \p{..}, and so on.)

Pity! I did look; only "-E" and "-s" diverge from the POSIX standard parameter-wise. But making significant changes to the pattern engine is probably not worth it.

Thanks anyway, I enjoy rg quite a lot :)

It's worth noting that the implementation of ripgrep has been split up into a whole bunch of modular components. So it wouldn't be out of the question for someone to piece those together into a GNU-compatible grep implementation.

True, though is there any point? ripgrep's homegrown regex engine only supports true regular expressions.

To give backreference support, ripgrep can optionally use PCRE. But PCRE already comes with its own drop in grep replacement...

To the extent that you want to get a POSIX compatible regex engine working with ripgrep, you could patch it to use a POSIX compatible regex engine. The simplest way might be to implement the requisite interfaces by using, say, the regex engine that gets shipped with libc. This might end up being quite slow, but it is very doable.

But still, that only solves the incompatibilities with the regex engine. There are many others. The extent to which ripgrep is compatible with POSIX grep is that I used flag names similar to GNU grep where I could. I have never taken a fine toothed comb over the POSIX grep spec and tried to emulate the parts that I thought were reasonable. Since POSIX wasn't and won't be a part of ripgrep's core design, it's likely there are many other things that are incompatible.

A POSIX grep can theoretically be built with a pretty small amount of code. Check out busybox's grep implementation, for example.

While building a POSIX grep in Rust sounds like fun, I do think you'd have a difficult time with adoption. GNU grep isn't a great source of critical CVEs, it works pretty well as-is and is actively maintained. So there just isn't a lot of reason to. Driving adoption is much easier when you can offer something new to users, and in order to do that, you either need to break with POSIX or make it completely opt-in. (I do think a good reason to build a POSIX grep in Rust is if you want to provide a complete user-land for an OS in Rust, perhaps if only for development purposes.)

Well, the reasons I see for being POSIX-compatible would be:

1. Distributions could adopt rg as default and ship with it only, adding features at nearly no cost

2. The performance advantage over "traditional" grep

Number 1 is basically how bash became the default; since it is a superset of sh (or close enough at least), distributions could offer the feature set at no disadvantage. Shipping it by default would allow scripts on that distribution to take advantage of rg and, arguably, improve the situation for most users at no cost.

If one builds two programs in one with a switch, you're effectively shipping optional software, but in a single binary, which makes point 1 pretty moot. If you then also fall back on another engine, point 2 is moot as well - so the only point where this would actually be useful is if rg could become a good enough superset of grep that it would provide sufficient advatages (most greps _already_ provide a larger superset of POSIX, though). Everything else would just add unnecessary complexity, in my opinion.

But it would have been nice :)

Ah I see. Yeah, that's a good point. But it's a very very steep hill to climb. In theory it would be nice though. There's just a ton of work to do to hit POSIX compatibility and simultaneously be as good at GNU grep at other things. For example, the simplest way to get the regex engine to be POSIX compatible would be to use an existing POSIX compatible regex engine, like the one found in libc. But that regex engine is generally regarded as quite slow AIUI, and is presumably why GNU grep bundles it's entire own regex engine just to speed things up in a lot of common cases. So to climb this hill, you'd either need to follow in GNU grep's footsteps _or_ build a faster POSIX compatible regex engine. Either way, you're committing yourself to writing a regex engine.

I didn't look closely, but Oniguruma is pretty dang fast and has drop-in POSIX syntax + ABI compatability as a compile-time option. Could maybe use that.

The regex engine I maintain includes benchmarks against onig. It's been a couple years since I looked closely, but last I checked, onig was not particularly fast. Compare https://github.com/rust-lang/regex/blob/master/bench/log/07/... vs https://github.com/rust-lang/regex/blob/master/bench/log/07/...

Ahh, very interesting, thanks for sharing! Do you have any thoughts around why that is? I presume that's due to Oniguruma supporting a much broader feature set and something like fancy-regexp's approach with mixing a backtracking VM and NFA implementation for simple queries would be needed for better perf? (I am aware you played a role in that) [1]

I have been playing around with regex parsing through building parsers through parser combinators at runtime recently, no clue how it will perform in practice yet (structuring parser generators at runtime is challenging in general in low-level languages) but maybe that could pan out and lead to an interesting way to support broader sets of regex syntaxes like POSIX in a relatively straightforward and performant way.

[1] https://github.com/fancy-regex/fancy-regex#theory

No idea. I've never done an analysis of onig. Different "feature sets" tends to be what people jump to first, but it's rarely correct in my experience. For example, PCRE2 has a large feature set, but it is quite fast. Especially its JIT.

The regex crate does a lot of literal optimizations to speed up searches. More than most regex engines in my experience.

You can solve `-rHInE` with just an alias though.

Then you go back to incompatibility.

Aliases are only active in your interactive shell.

Exactly, if grep behaves differently in your scripts and your shell, there's no point in not using ripgrep (or ag, or silver searcher, or whatever strikes your fancy).

True, but then you have the occasional headscratcher that the command behaves differently in your script than when you run it manually.

I would love it if --crlf was enabled by default with the Windows version. This would make using ^ and $ easier.

Make a config file for ripgrep and enable it.


Slightly unrelated, but ripgrep is probably the best improvement to my day-to-day workflow in years. Thanks so much for making such a great tool!

tldr : ripgrep immediately strikes me as holding incredible medium term potential effects routed via the needs particular to openVMS; particularly complimenting the dramatically improved economics of the x64 port and the profound potential to attract highly experienced critical systems developers to FOSS via Rust breathing air into the VMS world. Oh and to burntsushi : I'm not worthy dude - excellent and thank you for ripgrep which I have only just encountered!

ed : within 30s of learning about ripgrep I learned that VSCode's search routine calls ripgrep and this has just gifted me leverage to introduce Rust in my company... we're a vms shop.. searching for rust and vms gets you the delightful statement that since Rust isn't a international standard there's no guarantee it'll be around in.. oh about right now...the real reason why any language doesn't last long on vms is if the ffi implementation is shoddy. but that's another story that turns into a book the moment I try explaining how much value vms ffi brings... my brain suddenly just now thought "slingshot the critical systems devs outta vms on a Rust hook and things will happen ".

(longer explicative comment in profile - please do check it out if you'd like to know why this really made my day and some)

Thanks for the kind words. I don't know much about openVMS. I read your profile bio. I'm not sure I quite grok all the details, but I would like to caution you: ripgrep is still, roughly speaking, a "grep." That is, it exhaustively searches through corpora. While it has some "smart filtering," it's not going to do much better than other tools when you're searching corpora that don't fit into memory.

I do have plans to add an indexing mode to ripgrep[1], but it will probably be years before that lands. (I have a newborn at home, and my free-time coding has almost completely stopped.)

[1] - https://github.com/BurntSushi/ripgrep/issues/1497

hi! thanks for your reply, and notes gratefully taken! in fact it's avoiding touching memory that's important for vms clusters since the model is shared storage but nothing else. the language support hopefully leans towards implementating a initial subset to poc. there's a serious problem going forward for the vms fs with a major adoption and migration from ODS5 recently abandoned. where my thinking is going is towards extending RMS the record management system which was extended to become what's now Oracle RDB. RDB still uses RMS and behaves cooperatively with fs tools and semantics. without more concentrated contemplation I'm stretching to venture there's a text search tool in the making, but I can say that is going to be carefully investigated and I imagine quite quickly because of the potential. halving licence costs per cpu and going from 8 to 28 cores with features separately very expensive in Oracle iXX.X is behind a lot of new commitment. Because everything is obedient to the system TM, that'll require integration, but let you run the ripgrep function on the raw database from low memory clients ~ so my tentative elevator pitch is going..

> you'd have to abandon compatibility with a great deal of scripts.

Having tried to reproducibly package various software, the C layer (and especially the coreutils layer) are the absolute worst, and I wouldn't shed a tear if we started afresh with something more holistically designed.

Debian has been building GNU coreutils and tons of more C reproducibly for years.

And it takes a tremendous amount of effort, discipline, expertise, and deep, intimate familiarity with the entire dependency tree. This is the whole problem. We should be able to build software without needing to be intimately familiar with every package in the tree and how it wants us to build it, what its implicit dependencies are, etc.

For example, for the overwhelming majority of Rust and Go projects, there are explicit dependency trees in which all nodes are built roughly the same way such that a builder can do "cargo build" and get a binary. No need to understand each node's bespoke, cobbled-together build system or to figure out which undocumented dependencies are causing your build to fail (or where those dependencies need to be installed on the system or how to resolve conflicts with other parts of the dependency tree).

I'm a DD, my C packages (I'm also upstream) are built reproducibly. I have absolutely no idea how it's done.

To be clear, that means I have devoted zero effort to it, and certainly it didn't require discipline, I have absolutely no expertise in the process, and my knowledge of the dependency tree is "gcc -o x x.c" does the job.

You can rely on the maintainers of the Linux distribution to do the work for reproducibility. While also providing security updates. And vetting packages for licensing issues. That's what distributions are for.

You can for the packages they support and at the specific versions that they support, but if you have other packages you’re on your own.I certainly appreciate the package maintainers’ toil, but it would be better for everyone if they weren’t doing work that would be easy for a program to do.

Absolute worst compared to what? Have you tried packaging software on Windows?

If you have something better in mind, please implement it. It’d get used.

"Worst" compared to packaging software in other languages.

> If you have something better in mind, please implement it. It’d get used.

Nonsense. It's not a dearth of better options that causes C folks to clutch their autotools and bash scripts and punting-altogether-on-dependency-management; we've had better options for decades. Change will happen, but it will take decades as individual packages become less popular in favor of newer alternatives with better features--build hygiene will improve because these newer projects are much more likely to be built in Rust or "disciplined C" by developers who are increasingly of a younger generation, less wed to the Old Ways of building software.

> It's not a dearth of better options that causes C folks to clutch their autotools and bash scripts and punting-altogether-on-dependency-management; we've had better options for decades.

I build and package golang/Python/rust/C/C++ binaries using GNU make, bash, and Debian's tooling. I have dependency management, parallel builds, and things are reproducible. I do it that way, because I need a glue that will deploy to VMs, containers, bare metal, whatever. I don't use it because I'm scared of other tooling. I use it because I haven't seen a better environment for getting software I work on, into heterogeneous environments.

I'm not attached to the old way; I'm attached to ways that I can be productive with.

My point is that reproducible builds in Go and Rust are much, much easier than C and C++ (and Python since everything depends on C and C++). If your C and C++ programs are building easily (including their direct and transitive dependencies), then you're almost certainly not doing truly reproducible builds (or perhaps you're just very, very familiar with your particular dependency tree in a way that doesn't generalize to arbitrary trees).

To be clear, I'm not arguing that there are better tools for working around C/C++'s build ecosystem; I'm arguing that our lives will be better when we minimize our dependencies on those ecosystems.

Look I loathe autotools as a developer but it has the advantage of being extremely battle-tested and comes with lots of tooling to package it out of the box. In RPM spec files any autotools project can be packaged with more or less a single macro.

fd is better IMO, plus it is more cross-platform. Same flags on Linux and Mac

fd doesn't do the same thing at all. fd matches against filenames, rg matches against file contents.

I think both approaches are interesting and valuable. Yes ripgrep is great for interactive usage, but in scripts I'll still be using grep for the foreseeable future.

I think the situation with GNU coreutils on Solaris/BSDs is a bit different: a lot of the time the BSD/Solaris tools on one hand and the GNU tools on the other were incompatible in some ways. Some flags would be different (GNU ls vs. FreeBSD ls for instance have pretty significant flag differences) and then you have things like Make who have pretty profound syntax differences between flavours.

As a result if you needed to write portable scripts you either had to go through a painful amount of testing and special casing for the various dialects, or you'd just mandate that people installed the GNU version of the tools on the system and use that. That's why you can pretty much assume that any BSD system in the wild these days has at the very least gmake and bash installed, just because it's used by third party packages.

So IMO people used GNU on non-GNU systems mainly for portability or because they came from GNU-world and they were used to them.

I know that there's a common idea that GNU tools were more fully featured or ergonomic than BSD equivalents, but I'm not entirely sold on that. I think it's mostly because people these days learn the GNU tools first, and they're likely to notice when a BSD equivalent doesn't support a particular feature while they're unlikely to notice the opposite because they simply won't use BSD-isms.

For instance for a long time I was frustrated with GNU tar because it wouldn't automatically figure out when a compression algorithm was used (bz, gz or xz in general) and automatically Do The Right Thing. FreeBSD tar however did it just fine.

Similarly I can get FreeBSD ls to do pretty much everything GNU ls can do, but you have to use different flags sometimes. If you don't take the time to learn the BSDisms you'll just think that the program is more limited or doesn't support all the features of the GNU version.

An other example out of the top of my head is "fstat" which I find massively more usable than lsof, which is a lot more clunky with worse defaults IMO. It's mainly that lsof mixes disk and network resources by default, while fstat is dedicated to local resources and you use other programs like netstat to monitor network resources. Since I rarely want to monitor both at the same time when I'm debugging something, I find that it makes more sense to split those apart.

You're right, there were a lot of compatibility problems in the past because tools reused the same names but didn't behave the same. Sometimes they were completely different like ps - the BSD and System V versions are totally incompatible. There were attempts to unify some tools with something new, for example pax (which was supposed to replace the incompatible tar and cpio, but which basically nobody uses). These days most tools are getting new names (ripgrep, exa) where in the past you'd just call it the same thing. I think that decision has freed people up from having to maintain backwards compatibility with the GBU tools - which weren't always backwards compatible with the tools of the same name that they replaced, as you point out. Once you lose the requirement to be compatible you can actually rethink how the tool should be used and not worry about a decision that was made by someone in the 70s.

Ah yeah I forgot about `ps`, that's definitely one of the worst offenders.

I completely agree with you overall point btw, I just don't think that's in an either/or type of situation. Rewriting the existing standard utils in Rust could provide some benefits, but it's also great to have utils like ripgrep who break compatibility for better ergonomy.

One of the standard things coreutils does right that many other implementations do wrong: after running a command with a filename argument, hit up, space, dash, some option you want to add, enter. coreutils handles the option you added. Many other implementations either complain or treat the option as another file.

That was the original feature that made me want coreutils on other systems.

That's a matter of taste. Argument permutation is evil, IMHO. It's also dangerous. If someone can't be bothered to order their arguments, they also can't be bothered to use the "--" option terminator, which means permutation is a correctness and security headache.

But it's the behavior on GNU systems, and it's even the behavior of most applications using getopt_long on other non-GNU, non-Linux systems (because getopt_long permuted by default, and getopt_long is a de facto standard now). So it should be supported.

If you're writing a script, perhaps.

I'm talking about interactive command-line usage, for which the ability to put an argument at the end provides more user-friendliness.

But the command doesn't know that, and in general best practice is (or was) for commands to not alter their behavior based on whether they're attached to a terminal or not.

I won't deny the convenience. (Technically jumping backwards across arguments in the line editor is trivial, but I admit I keep forgetting the command sequence.) But from a software programming standpoint, the benefit isn't worth the cost, IMO.

And there are more costs than meet the eye. Have you ever tried to implement argument permutation? You can throw together a compliant getopt or getopt_long in surprisingly few lines of code.[1] Toss in argument permutation and the complexity explodes, both in SLoC and asymptoptic runtime cost (though you can trade the latter for the former to some extent).

[1] Example: https://github.com/wahern/lunix/blob/master/src/unix-getopt....

> But the command doesn't know that, and in general best practice is (or was) for commands to not alter their behavior based on whether they're attached to a terminal or not.

I completely agree; most commands should behave the same on the command-line and in scripts, because many scripts will start out of command-line experimentation. That's one of the good and bad things about shell scripting.

> Have you ever tried to implement argument permutation? You can throw together a compliant getopt or getopt_long in surprisingly few lines of code. Toss in argument permutation and the complexity explodes, both in SLoC and asymptoptic runtime cost (though you can trade the latter for the former to some extent).

"surprisingly few lines of code" doesn't seem like a critical property for a library that needs implementing once and can then be reused many times. "No more complexity than necessary to implement the required features" seems like a more useful property.

I've used many command-line processors in various languages, all of which have supported passing flags after arguments. There are many libraries available for this. I don't think anyone should reimplement command-line processing in the course of building a normal command-line tool.

I personally don't think permutation (in the style of getopt and getopt_long, at least in their default mode) is the right approach. Don't rearrange the command line to look like all the arguments come first. Just parse the command line and process everything wherever it is. You can either parse it into a separate structure, or make two passes over the arguments; neither one is going to add substantial cost to a command-line tool.

So, this is only painful for someone who needs to reimplement a fully compatible implementation of getopt or getopt_long. And there are enough of those out there that it should be possible to reuse one of the existing ones rather than writing a new one.

> and in general best practice is (or was) for commands to not alter their behavior based on whether they're attached to a terminal or not.

Not so sure about that. ls has been changing its output format based on whether it is being used interactively or not for as long as I can remember at least. Both GNU and BSD versions.

Fair distinction. There's a kind of unwritten understanding of what programs should and shouldn't do based on isatty, and I've never seen it explicitly documented.

Things many programs do if attached to a TTY: add color, add progress bars and similar uses of erase-and-redisplay, add/modify whitespace characters for readability, refuse to print raw binary, etc.

Things some programs do, which can be problematic: prompt interactively when they're otherwise non-interactive.

Things no program does or should do: change command-line processing, semantic behavior, or similar.

> Things no program does or should do: change command-line processing, semantic behavior, or similar.

Arguably ripgrep breaks this rule. :-) Compare `echo foo | rg foo` and `rg foo`. The former will search stdin. The latter will search the current working directory.

In any case, I bring this up, because I've heard from folks that ripgrep changing its output format is "bad practice" and that it should "follow standard Unix conventions and not change the output format." And that's when I bring up `ls`. :-)

This is one of the thing that I loved starting using rg :)

Technically it's based on whether the output is a tty or piped/redirected into something, not whether it's run from the shell's prompt or a script.

So for instance if you run a bare `ls` from a script that outputs straight into the terminal you'll get the multi-column "human readable" output. Conversely if you type `ls | cat` in the shell you'll get the single column output.

It can definitely be surprising if you don't know about it but technically it behaves the same in scripts and interactive environments.

That's exactly what I meant. I used "interactively" to mean "attached to a tty." Look at what I was responding to:

"for commands to not alter their behavior based on whether they're attached to a terminal or not"

ls is a clear counter-example of that.

I think the behavior is a good thing. I'm pushing back against this notion of what is "best practice" or not. It's more nuanced than "doesn't change its output format."

> BSD system in the wild these days has at the very least gmake and bash installed, just because it's used by third party packages.

No. Gmake, maybe. But not bash.

The goal of modernizing coreutils is great, and doing it in rust is even better. However, it makes me very sad that this is licensed under the MIT licence.

Being licensed under the GPL is an essential part of the GNU project and their philosophy. Of course everyone is free to do as they please. However, IMHO, if one appreciates the GNU project and the ideals it stands for, then, maybe, it could be preferable not to rewrite parts of it under weaker licences that go directly against their mission!

Your argument seems to imply that GNU was the progenitor of the utilities. A quick sample of the first 10 utilities from the first column of http://maizure.org/projects/decoded-gnu-coreutils/ on my machine showed 80% of the utilities were copied from existing sh utilities. Sure, GNU has added features, but there is nothing unseemly about creating new utilities using a different licence that are largely compatible - that is exactly how GNU coreutils came to life in the 90s.

arch - GNU

chgrp - Descended from chgrp introduced in Version 6 UNIX (1975)

comm - Descended from comm in Version 2 UNIX (1972)

dd - Descended from dd introduced in Version 5 UNIX (1974)

du - Descended from du introduced in Version 1 UNIX (1971)

factor - Descended from sort in Version 4 UNIX (1973)

head - Descended from head included in System V (1985)

join - Descended from join introduced in Version 7 UNIX (1979)

ls - Spiritually linked with LISTF from CTSS (1963)

mktemp - GNU

Where does their argument imply that GNU was the progenitor of those utilites?

I think he/she commented on the wrong parent comment, it makes a lot more sense about the current top comment

>Being licensed under the GPL is an essential part of the GNU project and their philosophy.

Having "won the desktop from Windows" and "finished a next-gen OS (Hurd)" was also part of that plan, but it didn't really pan out, did it?

I'm glad for MIT software since I can use it in more places I can use GPL (which companies often wont touch, in it's later license version).

Do the places you work forbid the use of Linux or the GNU Coreutils?

In what context are you using Rust Coreutils? Windows? MacOS? BSD?

>Do the places you work forbid the use of Linux or the GNU Coreutils?

They allow them. I had cases where clients forbade some GPL code (banks, etc) - worse with GPL3.

They'll stop doing that if most code is GPLv3. If most code is GPLv3, we're all better off – but while people are still prepared to work around GPL-rejecting companies, those companies can still reject the GPL.

It's the tragedy of the Commons.

The GPL license usage is falling, and even Linux is GPLv2 and rejects GPLv3. Some Linux code is dual licenses GPLv2 and MIT.

ApacheV2, MPL, MIT, ISC and BSD licenses are gaining in popularity.

>They'll stop doing that if most code is GPLv3

No, they'll just find a MIT/BSD or probably proprietary alternative.

From a legal perspective, what is the advantage in a proprietary alternative?

E.g. they don't get to sue you for using it and not opening your own software that links to it, since you can buy a license allowing you to do just that.

That's also why some GPL software is dual licensed. GPL for the masses, and a proprietary license allowing you to do whatever without needing to follow the GPL if you can afford it.

Ah! You mean a special kind of proprietary license for them. That makes sense.

I thought you meant they would just use copyrighted code that wasn't under a GPL (which would be just as illegal and probably more dangerous in terms of enforcement).

Do they forbid using the code, or incorporating the code into their own works? I hear of places forbidding the use of GPL code (even in-house) where they use proprietary software like Microsoft Word with no worries.

Because Microsoft doesn't demand you open source your plugins. GNU software might force you to.

Corporations choosing to not use GPL software mostly don't know enough about it, to know it's ok. So they ban it outright.

> GNU software might force you to.

No, they might not. Just like Microsoft, they might demand that you don’t release your plugins. Addidionaly, GNU software gives you the alternative of releasing your plugins. But you don’t have to, and they can’t make you do it.

Yeah I really agree here. Especially when it comes to coreutil implementations. I've seen multiple implemented under MIT/BSD and it always makes me a little sad.

I have access to an incredibly high quality easy to use router firmware in the form of OpenWrt by virtue of GPLd coreutils. I wish FOSS developers weren't so afraid of it these days

Agreed. It would also be good of this Rust project to state that all of the utils are original works and no one has peeked at the C sources. Perhaps the maintainers of the C versions should do the same in reverse.

https://github.com/uutils/coreutils/search?q=gnu "This is the way it was implemented in GNU split." i dunno maybe there are cases where they copy more than behavior... but interesting to look through

There are more examples, here is one:

  // a few global constants as used in the GNU implementation

It's pretty easy to ask someone "Hey, can you go take a look at GNU split and see how they handle this case?" and then implement your own clean-room solution afterwards.

Alternately, to solve the problem yourself and mention it to someone and have them say "that's how GNU split does it, so why not?"

Isn't that, by definition, not a clean-room solution?

‘Clean-room reverse engineering’ refers to the practice of having one person look at disassembled source code, spec sheets, actual behaviour, etc. of some program; describe that behaviour to a second person; and having the second person implement the described behaviour.


Ah, that's interesting! I didn't know that was a sufficient insulation of IP.

yea seems pretty easy then right, sounds like no source that is available for eyes to see is safe from 'clean room reverse engineering'

> Perhaps the maintainers of the C versions should do the same in reverse.

Nitpick: they don't need to, they can just attribute the Rust developers if needed. They can also just take MIT code and publish it under GPL (reverse is not true).

Taking MIT code and re-licensing it under the GPL will cause a huge uproar:


I'm not sure that is even legal.

I didn't mean re-licensing it - the parts that are under MIT stay under MIT, but the patches on it have a different license. That is completely legal and license doesn't forbid this. (ianal, yadda yadda...)

A similar thing happened to OpenOffice / LibreOffice: [0]

> OpenOffice uses the Apache License, whereas LibreOffice uses a dual LGPLv3/Mozilla Public license.

> For some legal reasons, then, anything OpenOffice does can be incorporated into LibreOffice, the terms of the license permit that. But if LibreOffice adds something, take font embedding, for example, OpenOffice can’t legally incorporate that code.

[0] https://hackaday.com/2020/11/02/openoffice-or-libreoffice-a-...

It's doable if you can get OK from every single copyright owner of the source files.

LLVM did it. Switching from "University of Illinois/NCSA" to "APL 2.0".

But not without some big discussions.

Richard Stalman and others that believe the GPL is generally the best license are not against ever using the MIT license. Stalman has been willing to be pragmatic and support tactical choices of other licenses besides GPL when licensing.

So it is worth having some deeper thoughts about what the implications are for different licenses of coreutils. When does using coreutils create a derivative work that requires GPL?

Tactical considerations exist, but even so, a copyleft license might have better long term results in promoting software Freedom. The FSF supported a non-copyleft license for Ogg Vorbis, because at the time it seemed reasonable that this was the best way to fight the (at the time) patent-encumbered MP3 format. But in practice it had little benefit:

"However, we must recognize that this strategy did not succeed for Ogg Vorbis. Even after changing the copyright license to permit easy inclusion of that library code in proprietary applications, proprietary developers generally did not include it. The sacrifice made in the choice of license ultimately won us little."[0]

[0] https://www.gnu.org/licenses/license-recommendations.html

I don't know when that was written, but vorbis is used pervasively today, including in proprietary applications.

I think this is more of a reflection of mainstream OSS culture in general. Rust development currently is heavily tied to proprietary services (e.g. GitHub required for contributions, issues, crates.io login, CI, etc., Discord and Slack communities rather than Matrix, Zulip) and likes to license things under MIT. I think the lack of awareness/care for the big picture of fighting for software freedom is simply not there just as it is missing from mainstream OSS culture. Because convenience and network effect are king.

You do have a point but in case you are saddened by this phenomena, let me just point out that we live in a VERY different age compared to when GNU coreutils were born. Nowadays you only get a few minutes -- or hours if you are lucky -- to answer a ton of fundamental questions like "where do we host code?" and "how do we communicate on this project?" or "how do we do CI/CD?" etc.

The people in the past had all the time in the world to tinker and invent. Maybe I am mistaken though, past is usually looked through rose-tinted glasses right?

But the fact remains: nowadays answering the above questions is beyond my pay grade: in fact it's beyond anyone's pay grade. Services like GitHub are deemed a commodity and questioning that status quo is a career danger.

I really do wish we start over on most of the items you enumerated. But I am not paid to do it. In fact I am paid to quickly select tools and never invent any -- except when they solve a pressing business need and are specific enough for the organization; in that case it's not only okay but a requirement.

Beyond anything else however, we practically have no choice. If I don't host a new company project on GitHub I'll eventually be fired and replaced with somebody who will.

I would also onto your point and say that one reason why Rust is easy to get into is because of the convenience that these semi-proprietary platforms provide.

We here on HN have both the time and ability to set up things like CLI Git, and Matrix. But for a new language, forcing people onto esoteric (& superior) platforms makes them less likely to use them.

It would be nice if Matrix and self-hosted Git were the default, but when acquiring users/programmers is your goal, Rust doesn't have that luxury.

Agreed. Maybe they will tighten things up in the future but when you are after adoption and getting as much help as you can, it's indeed a luxury to be morally idealistic.

That's a little harsh.

Rust uses Github, but could easily switch to a self-hosted platform if Microsoft became opposed to Rust's goals; (and yes, Microsoft is a Rust sponsor, but not an essential one). Cargo has support for alternate registries built in.

Some of community is on Reddit and Discord, but most technical discussion takes place on Discourse and Zulip. The official "user questions" forum is a Discourse instance. Most subcommunities forming around Rust projects use Zulip instances.

The Rust community uses proprietary services when convenient, but it's hardly dependent on them.

> easily switch

Disagree, just look at bors and the use of Azure CI. It would be a huge PIA to switch.

> hardly dependent on them

I wouldn't say hardly, I think it's more like kinda. GitHub's network effect is pretty strong. Compare the number of contributors to golang for example which is hosted on Google code.

> it could be preferable not to rewrite parts of it under weaker licences that go directly against their mission

+1. It's very sad to see user freedom being thrown away as a goal and replaced by software that becomes unpaid labor for FAANGs. Especially since the SaaS takeover.

GPL does not protect against SaaS provider creating derivative as there is no further distribution of binary or source code. Only AGPL (and other more strict licenses) addresses SaaS derivatives issue.

In the case of coreutils, there is really no issue with presence or absence of GPL. cp is just cp and nobody is going to hang a proprietary extension off the side.

I think people should be able to license under whatever they choose to license under. Diversity is good for the most part.

Off topic - but this might of interest.


Wahou; this is terrific work! It would have saved me hours of work if I had known that before! Thanks for sharing!

Wow, that's great

I just wish someone would RIIR Gnu make.

I use it heavily and think it's extremely underappreciated, so instead of reinventing it, I would like to build on it. But - trying to extend the old C codebase is daunting. I'd even be happy with a reduced featureset that avoids the exotic stuff that is either outdated or no longer useful. The core syntax of a Makefile is just so close to perfect.

(I wrote about some of this in the remake repo: https://github.com/rocky/remake/issues/114 )

You had me until

> The core syntax of a Makefile is just so close to perfect.

I'd argue what is close to perfect is much of the underlying model/semantics, and what is terrible is very much the syntax. I've long wanted to make something similar to Make but simply with better syntax...

To me, the core of the syntax is:

    target: prereq | ooprereq
In my eyes, the most important thing when building something that is complex is the dependency graph and it makes sense to make the syntax for defining the graph as simple as possible. I think the make syntax just nails it and most of the other approaches I have seen so far add complexity without any benefit. In fact, most of the complexity they introduce seems to stem from confusion on the side of the developer being unable to simplify what they're trying to express.

At the level of variable handling and so forth, make is slightly annoying but manageable.

Anything beyond that - yeah, I'm with you, a lot of that is terrible.

The syntax has one major insurmountable problem - it uses space as a separator and so doesn't allow spaces in target names and therefore doesn't allow spaces in filenames.

Yes, there are workarounds for some situations, but none of them work in all cases and no-one ever applies them consistently.

That isn't a "major insurmountable" problem. Its a minor annoyance - just don't have spaces in your file names.

Or in your directory names both in your project, or in any parent folder. There's absolutely no reason why Windows shouldn't be able to call the folder "My Documents" but it broke a whole generation of builds that people tried to do in their home directory.

Spaces in filenames is a perfectly reasonable thing to want to do. They are supported on all three major platforms, but make just throws up it's hands and doesn't care. Yes, it is workable around, but it would be nice if it wasn't.

It's an annoyance with make overall, but if we're talking about the syntax specifically then yes I would call that a "major insurmountable problem".

One time, I had a work-issued laptop where they filled out my user account. They used spaces in my username. This meant that I had a space in every single path.

It caused so many issues. But mostly with developer tooling, like Make, that assumes these sorts of things. Most programs worked just fine.

> One time, I had a work-issued laptop [which I did not have root on].

Well there's your problem right there. That kind of shit needs to stop, and making companies that perpetrate it less able to develop software (and thus less profitable) is one of the only things people who don't directly interact with such a company can do to discourage it. (Not that that's in any way a good strategy, but to the tiny extent it has any bearing on makefile syntax in the first place, it's a argument against supporting spaces in filenames.)

I did have root on it. It was just set up for me by IT. Having root doesn’t change my account name.

You may be interested in [fac](http://sites.science.oregonstate.edu/~roundyd/fac/introducin...)

It's a build-system in Rust with some really cool features (and an even simpler syntax).

you might like ninja build then

I really like the idea of Ninja; I really want Ninja to be a single C(++) file that I can build with a trivial command line. I don't want a single C(++) file because of "elegance" so much that I want to ship Ninja as an inline dependency but now I've got two problems: I have to build my code and I have to build Ninja. If Ninja was a single file: boom! Trivial build line.

Other than that, I think Ninja is perfect.

You could probably post-process samurai (a rewrite of ninja into C) into a single-file: https://github.com/michaelforney/samurai

Ninja precompiled is a self contained 200kb single file binary.

Small enough to even checkin into git. Your compiler, cmake and all the other tools you use is going to be a much bigger problem.

ninja is normal part of Linux distros. So at least there you don't have to worry.

Ninja has as goal to be simple and fast, but not necessarily convenient for humans to write. It's in the second paragraph of their home page.

While ninja is great for many uses i would not recommend to use it for hand written rules. In fact any simple dependency-resolver like make or ninja will be lacking a lot of language context about includes and other transitive dependencies so you always want some higher level abstraction closer to the language as your porcelain.

ninja specifically handles includes automatically.

Anyway, not having any other automagical behavior and hidden rules and the fact that it can do incremental and correct job re-runs even when the rules change is exactly why I like to use ninja.

Here's one such use case, maybe not the cleanest:


I especially love it in projects involving many different compilers/architectures/sdks at once (like when doing low level embedded programming), where things like meson or autotools or arcane Makefile hacks become harder to stomach.

Thats quite a stretch of what handles means. While it supports integrating such use cases but you still have to do the gcc -M dance with https://ninja-build.org/manual.html#ref_headers.

As opposed to say cmake, bazel or other higher level abstractions where you just ask it to take all c-files in this directory and solve the rest.

Take meson. It's at the same level of cmake, but what handles the C include dependencies for it is ninja. cmake also has ninja backend. Not sure how it works exactly, because I don't use cmake, but I assume it will be ninja handling include deps too in that case.

Yes, ninja basically handles it for you, compared to what you have to go through when using Makefiles, to have autogenerated dependencies.

I will check it out, thank you for the suggestion - what I glanced at seems very interesting.

Shake would also be interesting to look at. It has much more sophisticated dependency graph handling.

Gnu Make is almost pathetic.

See eg http://simonmar.github.io/bib/papers/shake.pdf

That was a great read - thanks for linking.

I had trouble with the one class in uni that used Haskell and I've been working in a Python shop for some years now, so I kept expecting to encounter some impenetrable section that would make my eyes glaze over and my hand close the tab. I was probably closest around page 9!

But the writing was excellent, and clear, and satisfying, and little concepts I was so close grokking kept catching my eye and pulling me back in until I understood them, then their neighbors, then the section, then I was done.

Guess I've picked up some things since college. Wish I could go back and take that class again. I think learning to use Rust's Option type in anger on a personal project helped me understand monads more than anything in that class.

Also, I'm happy to see the method described by the paper does seem to have become the official GHC build system [0].

0. https://gitlab.haskell.org/ghc/ghc/-/wikis/building/hadrian

This is their minimal example to compile a single C file:

1 module Main(main) where

2 import Development.Shake

3 import System.FilePath


5 main :: IO ()

6 main = shake shakeOptions $ do

7 want ["foo.o"]


9 "∗.o" %> \out → do

10 let src = out -<.> "c"

11 need [src]

12 cmd "gcc -c" src "-o" out

Maybe there is a need for super sophisticated graph handling, but for most use cases the way more complicated syntax of Shake is not a worthy tradeof.

https://shakebuild.com/ agrees with you:

> Large build systems written using Shake tend to be significantly simpler, while also running faster. If your project can use a canned build system (e.g. Visual Studio, cabal) do that; if your project is very simple use a Makefile; otherwise use Shake.

For what it's worth, if I remember right, Shake has some support for interpreting Makefiles, too.

> [...] the way more complicated syntax of Shake [...]

For context, Shake uses Haskell syntax, because your 'Shakefile' is just a normal Haskell program that happens to use Shake as a library and then compiles to a bespoke build system.


> The original motivation behind the creation of Shake was to allow rules to discover additional dependencies after running previous rules, allowing the build system to generate files and then examine them to determine their dependencies – something that cannot be expressed directly in most build systems. However, now Shake is a suitable build tool even if you do not require that feature.

> I've long wanted to make something similar to Make but simply with better syntax...

If you haven't, I'd suggest having a look at pmake (sometimes packaged as 'bmake' since it is the base 'make' implementation on BSDs) - the core 'make' portions are still the same (like gmake extends core 'make' as well), but the more script-like 'dynamic' parts are much nicer than gnumake in my opinion. It also supports the notion of 'makefile libraries' which are directories containing make snippets which can be #include's into other client projects.

freebsd examples:

manual: https://www.freebsd.org/cgi/man.cgi?query=make&apropos=0&sek...

makefile library used by the system to build itself (good examples): https://cgit.freebsd.org/src/tree/share/mk

Looks interesting, thanks for sharing!

I had a go at making an experimental tool with a better syntax, fixing a lot of the obvious problems (whitespace, $ having a double meaning, etc). https://rwmj.wordpress.com/2020/01/14/goals-an-experimental-...

Wow, cool! Thanks for sharing! I'll have to take a look when I get the chance.

So you want ninja?

Yeah, GNU Make is actually a functional programming language. When I discovered this, I almost reinvented parts of autoconf in GNU Make. It's like trying to write a complex project with C preprocessor macros! It was a lot of fun.

GNU Make is hurt by its minimalism. Doing anything interesting in pure GNU Make is a herculean effort. Even something simple like recursing into directories in order to build up a list of source files is extremely hard. Most people don't even try, they would rather keep their source code tree flat than deal with GNU Make. I wanted to support any directory structure so I implemented a simple version of find as a pure GNU Make function!

At the same time, a reduced feature set would be nice. GNU Make ships with a ton of old rules enabled by default. The no-builtin-rules and no-builtin-variables options let you disable this stuff. Makes it a lot easier to understand the print-data-base output.

> The core syntax of a Makefile is just so close to perfect.

It's very simple syntax but it has its pain points. Significant spaces effectively rules out spaces in file names. It also makes it much harder to format custom functions.

Speaking of functions, why can't we call user-defined functions directly? We're forced to use the call function with the custom function's name as parameter. Things quickly get out of hand when you're building new functions on top of existing ones. I actually looked up the GNU Make source code, I remember comments and a discussion about this... It was possible to do it but they didn't want to because then they'd have to think about users when introducing new built-in functions. Oh well...

> something simple like recursing into directories in order to build up a list of source files is extremely hard

What's wrong with shelling out to find?

The find program might not be available, right? The developer might be building on Windows or something. Also, spawning new processes is an expensive operation so doing it from inside GNU Make might be faster.

Okay, I just wanted to see if I could do it in pure GNU Make.

  true := T
  not = $(if $(1),,$(true))

  directory? = $(if $(1),$(wildcard $(addsuffix /.,$(1))))
  file? = $(and $(wildcard $(1)),$(call not,$(call directory?,$(1))))

  glob = $(sort $(wildcard $(or $(1),*)))
  glob.directory = $(call glob,$(addsuffix /$(or $(2),*),$(or $(1),.)))

  recurse = $(foreach x,$(3),$(if $(call $(2),$(x)),$(x),$(x) $(call recurse,$(1),$(2),$(call $(1),$(x)))))

  file_system.traverse = $(call recurse,glob.directory,file?,$(or $(1),.))

  find = $(strip $(foreach entry,$(call file_system.traverse,$(1)),$(if $(call $(or $(2),true),$(entry)),$(entry))))

  sources := $(call find,src,file?)

Then I discovered GNU Make supports C extensions. It will even automatically build them due to the way the include keyword works. It might actually be easier to just make a plugin with all the functionality I want...

Have you seen Just? https://github.com/casey/just

It's make-like but supposedly improves on some of the arcane aspects of make (I can't judge how successful it is at that as I've never used make in anger).

We use it a bit for https://github.com/embedded-graphics/embedded-graphics and I have to say I love it. It's just the right balance between configuration variables/constants and "a folder full of bash scripts". I highly recommend giving Just a try.

One of the reasons I know I still rely on Make instead of Just, is about how available make is, and pretty much any tool (be it a new implementation of make in Rust or Just) will have to deal with the fact that make is already a default on so many systems.

That makes make a great entry point for any project. Does anyone have a general suggestion about how to work around this? The goal being, sync a project and not need to install anything to get going. One thing I do sometimes is to use make as the entry point, and it has an init target to install the necessary tools to get going (depends on who I know to be the target audience).

This sounds like what I do: Makefile with distinct steps: install, build, test, lint, deploy. Those steps call underlying tools, depending on the platform, framework, language.

E.g. in one project it will install rbenv, bundle, npm, yarn etc. Call Rake, npx, that odd docker wrapper, etc. Or deploy with git in one project and through capistrano or ansible in another.

As a dev, in daily mode, all you need is 'make lint && make test && make deploy'. All the underlying tools and their intricacies are only needed when it fails or when you want to change stuff.

Yeah. Exactly. I do this when I think there will be users who don’t already have all the dev tools installed, and I don’t want them to have to download and install everything.

I love Just, but I love it because it's not a build system — I already have a build system, I just need a standard "entry point" to all my projects so each command a user could use to interact with the actual build system is in the same place for each project. I wouldn't build, say, a C project with Just and Just alone, where it tracks the dependencies between files and only rebuilds what's necessary.

So while it's replaced one use of Make for me, I can't rightly call it a Make replacement.

Would you build a C project with make? One might argue that C could do with a better build system like other languages have. I don't use C much, but I hear that these exist.

i wont use this because the author sucks up all justfiles files with no way for a user to opt out and isnt fully clear on this. use it on some private or company thing that accidentally goes public, oops he has your file now in his repo.

if it was opt-in to him monitoring i might be more open. maybe i should fork and change the filename to keep this from happening

What are you talking about? Where does he "suck up" all Justfiles?

I see. Interesting.

I've seen it before but discarded it because since my workflow is heavily file-based, I DO need makes capacity as a build system, not just a task runner. Will check it out to see whether that changed.

edit: Yeah, I can't see how I can make it work for a file-based approach.

There is kati from Android:


This takes your existing Makefile(s) and produces a ninja build file. Not sure if Android still uses it or it is all soong now.

>The core syntax of a Makefile is just so close to perfect.

That's about the last thing I would call it.

Allow me to suggest build2 (the build system part) as another alternative. You would probably be first interested in ad hoc recipes: https://build2.org/release/0.13.0.xhtml#adhoc-recipe

Very VERY interesting!

The one thing that I'm currently struggling to find information on is dynamic dependency handling (dyndep in ninja terms).

Is that something that build2 covers as well? Any resource you could point me to?

Dynamic dependencies are supported by build2 (this is, for example, how we handle auto-generated C/C++ headers) but it's not something that is exposed in ad hoc recipes that are written in Buildscript yet (but is exposed in those written in C++). Our plan is to start with the ability to "ingest" make dependency declarations since quite a few tools are able to produce those. Would something like this cover your use-cases?

Very likely, yes. I'm generating my own makefiles anyways, so generating something that build2 could consume should be possible, too.

Would it be appropriate to open a github issue for discussing this further? I would like to share some example for how my current setup is working and having the github syntax available would be helpful.

Yes, please, that would be very helpful: https://github.com/build2/build2/issues

"issue" filed. My apologies for the lack of brevity.


That looks very good! I wonder if I can make it fit my file-based approach.

edit: Yeah, doesn't seem like it.

Someone is working on this, and could use help: https://github.com/michaelmelanson/fab-rs

I would love to see this completed to the point of passing the GNU make testsuite. Having make as a modular library would be wildly useful.

There's always ninja.


In case anyone would attempt this: please make quoting right.

Funny to see this here, as I remember years ago an effort to rewrite the coreutils package in perl:


Back then it was a simple way to get some things running under Windows, and I guess it would have also been called "memory safe" albeit not in a way that rust is!

I am so excited about rust slowly sneaking into the C territory. I am looking forward for the point in time when it is going to get to Linux kernel.

sooner than you might think: https://lwn.net/Articles/829858/

Yeah, Barriers to in-tree Rust [video] → https://www.youtube.com/watch?v=FFjV9f_Ub9o&t=2045s

The biggest barrier is probably going to be the diff between architectures supported by Linux and those supported by Rust. It'll take several years for Linux to prune some of the lesser used architectures [1] and also years for Rust to add support on either LLVM [2] or via a WIP GCC frontend [3].

So while most people involved seem to supportive, including Torvalds and GKH, it will take time for any non-driver code to be written in Rust.

[1] - https://lore.kernel.org/lkml/CAK8P3a2VW8T+yYUG1pn1yR-5eU4jJX...

[2] - https://github.com/fishinabarrel/linux-kernel-module-rust/is...

[3] - https://rust-gcc.github.io

This is addressed in the questions in the aforementioned video[1] and it's actually not that much of a blocker at this point. There's plenty of arch-specific stuff in the kernel (drivers, but not only), so Rust can (and will) be used there. And there's no point waiting for complete architecture compatibility before starting to deploy some Rust in the kernel.

Of course it means that they aren't going to rewrite the core kernel logic in Rust tomorrow, but they wouldn't do it anyway even if GCC supported Rust.

[1]: https://youtu.be/FFjV9f_Ub9o?t=2930

LLVM just landed an experimental m68k backend, this week IIRC. That's some progress, anyways.

Another step forward to add Rust language support to the Linux kernel: https://github.com/Rust-for-Linux/linux

My opinion is that Rust community focuses on a new modern kernel, and emulate Linux op top if needed(like Windows does it). In theory you would get a cleaner architecture and cleaner code base, no need to worry about regression and you might finish the task in 10 years. Open source applications could run directly on top of this kernel and Linux closed source apps could run on the emulation/translation layer.

How about the drivers though…

There's already a Rust OS[1] built on top of a brand new Rust kernel, but even if it gets all the traction possible, it won't have proper hardware support before several decades (see the current state of Linux, which is better than ever, but still far from perfect).

[1]: https://www.redox-os.org/

You will get some drivers at most, so maybe you could architect this drivers in such a way you could re-use part of teh code on the alternative kernels.

What is missing is someone with big pockets to pay very competent engineers to work on such kernels full time and setup a decent test lab with different hardware. Hobbyist might take a long time to finish the job.

sed seriously needs a Perl-regex mode... it's stuck in the Basic/Extended Regex syntaxes, which feel prehistoric and don't support the syntax that you will learn in practically any modern regex tutorial.

Maybe they could take the opportunity to add that. Otherwise I end up needing to

    sed -i 's/sed -i/perl -pi -e/g' *

FWIW this is one reason I designed the Eggex sublanguage of Oil to translate to extended regexps: so you can use it with egrep, awk, and sed --regexp-extended.


It really annoys me to have to remember multiple syntaxes for regexes in shell scripts!

Also, even if you added Perl syntax to sed today, it wouldn't be available everywhere. Oil isn't available everywhere either but there are some advantages to having the syntax in the shell rather than in every tool.

(It's also easy to translate eggex to the BRE syntax or Perl syntax, but nobody has done so yet)

Albeit the typical first reaction might be that of XKCD 927 (I must confess it was mine, at least), I believe that the regular expressions area is territory that could see some innovation. If told to pick just one feature from eggex, I'd choose the ability to parse statically!

FWIW I think this deserves a submission of its own, so here it is:


It looks like they did consider it: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=22801

Sed isn't part of the coreutils, though, so this would be outside of the scope of at least this project.

There is https://github.com/chmln/sd written in Rust, but it's far from a sed replacement – it's reducing it to search and replace for fixed strings, as far as I can tell.

Keep in mind that Perl regexen can not be recognized in linear time.

While true, you can get similar syntax without backtracking in linear time, which is what the Rust regex crate and ripgrep use.

Yes. You can take a subset of Perl's syntax.

This development is great news for automotive, since modern coreutils can't be used there because of the anti-tivoisation clause conflicting with government regulations, which forces vendors to use old unsupported versions.

This is a sad state of affairs. The world does not become a better place by rewriting working copylefted code into permissive licenses. Especially when the main goal of the rewrite is to ease corporate "theft" of these codes. We are walking several steps backwards with these rewrites (as much as I love the concept of rewriting common tools in a safer language).

EDIT: I'm actually outraged by this. I want to love the rust rewrite of coreutils, but I don't understand why they had to taint this beautiful work with stupid license politics.

Are you also outraged by the existence of clang, as it has a permissive license and competes with GCC?

From where I'm standing it looks like you are the one introducing the stupid license politics. The Rust Coreutils developers have chosen a license that they like, and you are the one making arguments about how terrible this is for the world and how they are tainting the GNU Coreutils.

>Are you also outraged by the existence of clang, as it has a permissive license and competes with GCC?


Maybe you should be outraged by GCC's deliberate sandbagging and less deliberate stagnation that allowed LLVM to swoop in and be a legitimately more useful (if not "better") compiler.

Stallman himself was largely responsible for GCC not exposing intermediate representation for analysis.


There is a world in which rustc might have been built on gcc instead - a world where GCC was interested in being a platfom rather than a Maginot line trying to prevent any possible theoretical use by proprietary software even if it means preventing free software from doing the same.

> that allowed LLVM to swoop in and be a legitimately more useful (if not "better") compiler.

Interestingly enough, if Stallman didn't miss the email[1], LLVM was intended by Latner to be given to the FSF[2]. So yes RMS did commit a management mistake, but it's not the one you think.

> The patch I'm working on is GPL licensed and copyright will be assigned to the FSF under the standard Apple copyright assignment. Initially, I intend to link the LLVM libraries in from the existing LLVM distribution, mainly to simplify my work. This code is licensed under a BSD-like license [8], and LLVM itself will not initially be assigned to the FSF. If people are seriously in favor of LLVM being a long-term part of GCC, I personally believe that the LLVM community would agree to assign the copyright of LLVM itself to the FSF and we can work through these details.

[1] https://lists.gnu.org/archive/html/emacs-devel/2015-02/msg00... [2] https://gcc.gnu.org/legacy-ml/gcc/2005-11/msg00888.html

Yes but didn't he later submit the patches and they were rejected anyways?


> Stallman himself was largely responsible for GCC not exposing intermediate representation for analysis.

This is wildly offtopic for this thread, but it is not clear that Stallman's stance on this issue was an error. The wide existence of LLVM-based compilers for proprietary architectures is, for many of us, a tragedy that Stallman correctly anticipated. His insightful and courageous steering of the GCC development was a crucial step in avoiding this problem for GCC.

I definitely believe that for the first N years of gcc the policy was absolutely necessary to drag vendors into actually contributing back.

Whether that's still true in terms of the trade-offs being worth it to maintain the policy in the current day, and if it's no longer true what value of N accurately describes when that changed, is something that I think people can reasonably disagree about.

(my extremely boring take being "there's so many counterfactuals here I'm really not sure")

At the expense of forcing people that would love to use GCC to produce open-source tools and research, into using LLVM.

See the email thread I linked for several such examples.






Most compiler research and tooling development now happens in the LLVM ecosystem, precisely because they dragged their feet for so long.

Isn't it also precluding several non proprietary compilers as well though? In turn making LLVM more popular and making it even more appealing to propriety compiler authors.

GCC is a great C compiler and most of us are happy to keep it that way. The constant drumbeat of 'different different different' is not something I'm interested in. That I can compile C code written before I was born is more than I can ask for. In the same period a dozen safe languages have come and gone with no one even thinking about them any more.

> Are you also outraged by the existence of clang

If clang wast named "Rust GNU Compiler Collection rewrite" I would certainly be!

It's disappointing but it's not worth assuming any malice or continuing flamewars. There's been some discussion on the project's GitHub page in the past about licensing.

What's worthwhile is to share and talk about why the license selection may result in proprietary and bloated applications built upon the stack, and the resulting inefficiencies and security issues that could occur years in the future as a result. Perhaps it'd even be possible to speculate about what incentives create and support the conditions for those.

If enough people agree and understand the situation, perhaps we'll see change. If they don't, we'll continue on for a number of years and maybe those hypothetical problems will occur, or maybe they won't and the concerns will have been unjustified.

(and FWIW: I also think it's excellent to see core utilities rewritten in rust)

Coreutils is mostly UNIX/POSIX standard utilities which have been implemented under many closed and open source licenses for decades now. This is not where you build the GNU/Moat.

But GNU coreutils are ahead of all proprietary alternatives in both speed and features, are they not? I've personally not had the opportunity to work on AIX, or Solaris, or whatever (there are no machines like that where I live), but that's what I always read: "we got a new AIX/HP-UX/Solaris/whatever server and the first thing I did was to build GNU coreutils and replace the stock crap with it".

>...first thing I did...

How do you know if the platform tools are faster\slower more featureful or less if the first thing you did is replace them?

GNU coreutils won on portability of knowledge, no need to learn how the platform's tools work when I can just roll with GNU. Which is great when you are dealing with multiple different varieties of Unix that are all slightly different. In general the base os tools where like all software that competes in the same space, it did somethings better and somethings worse.

People know that GNU tools were faster simply because the GNU versions' developers had famously optimized the hell out of them (many research papers written about this back in the day), while preexisting Unix tools didn't have that much optimization.

>I don't understand why they had to taint this beautiful work with stupid license politics

It's pretty simple reasoning: Rust evangelists want to maximize Rust usage. Period. Replacing GPL-licensed tools with alternatives written in Rust is an easy way to bootstrap this. Compare with: paperclip maximizer (https://www.lesswrong.com/tag/paperclip-maximizer)

I have to deal with government regulated certification in a different industry. In our case there's a hard requirement to make it impossible for the end-user, which is some business, to run modified software on the devices. Reason being, the devices carry out financial transactions and the possibility of running modified software on them would potentially enable fraud. Basically, the whole point of certification in this case is to verify that the software doesn't have provisions for doing anything fraudulent and then making sure it stays that way. So GPLv3 anti-tivoization requirements and the certification requirements are mutually exclusive. I imagine it's similar in the automotive industry for safety reasons.

> make it impossible for the end-user, which is some business, to run modified software on the devices.

Looks like that device might not be covered by the tiviozation close since it might not be an user product as defined in the GPL

> A “User Product” is either (1) a “consumer product”, which means any tangible personal property which is normally used for personal, family, or household purposes, or (2) anything designed or sold for incorporation into a dwelling.

If you sell to business, that should be alright.

Also, it is fine to make it impossible to modify, if you also don't have this possibility:

> this requirement does not apply if neither you nor any third party retains the ability to install modified object code on the User Product

Thanks! This is a very useful clarification. I wonder how the last paragraph relates to automotive. I guess they still want to be able to do updates.

Service technicians can and do update the firmware on automobiles to fix issues, and normally these are digitally signed by the manufacturer, so random people can't use this path to disable the emission controls on their car, for example. So the anti-DRM language is an issue.

> In our case there's a hard requirement to make it impossible for the end-user, which is some business, to run modified software on the devices.

I guess you're working on credit card terminals?

Anyway: why not have a secure hardware element / TPM that has a GPIO with a pull-up resistor that can be queried by the device? Then, have the bootloader check as part of the boot if the TPM attests with a digital signature that the GPIO is still high, and the application also regularly checking the TPM? Or an e-fuse similar to Samsung's Knox Guard?

That way, a user can remove the pullup resistor (and thus, as it's a hardware modification, has to break the seal of the device) to "unlock" custom firmware loading to fulfill the GPL requirement, but at the same time your application can be reasonably certain at run-time the device hasn't been tampered with.

In our case tampering with the sealed parts of the hardware by the user would be illegal. I'm not sure if providing an illegal method to load the firmware fulfills the GPLv3 requirements.

EDIT: So, I don't think this is actually a technical question. It's a legal question that boils down to the fact that a premise and its inverse can not both be true at the same time.

the hard part when enabling self-tinkering is the forensics and the burden of proof for accidents, especially lethal accidents.

how do you ensure beyond doubt that the vehicle was not "tampered with" by an "unauthorized unqualified third party"?

would I want to own a car where the oem can turn around burden of proof and can easily claim I have modified the vehicle software, and thus broke vehicle behaviour?

oem: you've patched coreutils, that's what killed your wife, your fault!

me: no. that's technically impossible. the drm disables third party modifications.

now if you provide means to do it anyhow, you'd need to make forensics crystal clear.

so you as a customer want the physical modification to be dead obvious even on a burnt vehicle. especially on a burnt vehicle. so you as a customer can show that you did in fact not unlock software modification.

and state regulations for that reason require drm from car makers, to make it impossible to evade responsibility with flakey claims of third party modifications.

Is it a hard requirement or is it about vendors being overzealous because it also fit their interest?

I know that the law requires some radio equipment to make it hard for users to modify so that they can't mess with some parts of the spectrum.

But for cars, the responsibility is typically of the user, that is, if you modify your car in a way that is not what the manufacturer certified, you can but you have to do by the rules or face the consequences, the manufacturer is not responsible.

But by locking down software hard, it is a way for manufacturers to make sure that they can't be held responsible and the side effect of locking down features is a nice bonus.

I understand your point of view, but it's important to remember the end outcome:

Users will be forced to use outdated, buggy software, or vendors will write minimal closed implementations covering only what they need, without the benefit of the accumulated good work in foss (bugs, lack of features, etc)

These are edge cases. It's hard to tell how many people give in to the rules of the license while before they wouldn't

I find it a bit strange that the rewrite is licensed under MIT. Are they doing a clean room reimplement or are they rewriting the core utils by looking at the source code?

In my experience looking at the source code of programs written in a different language is rarely helpful. It's only useful to glean some general ideas or algorithms. But rewriting with one eye on some C code? Rarely helpful and usually counterproductive.

> It's only useful to glean some general ideas or algorithms.

Sure, but they'll also have an easier time of ensuring they're "pretty close" to fully reproducing the original.

At least, in comparison to having to reverse something black box. :)

They're trying to be compliant to the interface of the original, not the implementation. Now whether the interface itself can be copyrighted ... the SCOTUS will weigh in on that soon (https://en.wikipedia.org/wiki/Google_LLC_v._Oracle_America,_...)

If the SCOTUS says yes, would the original GNU coreutils even retain their GPL license given that they reimplemented pre-existing, non-GPL utilities?

Not necessarily. The non-GPL utilities the GNU coreutils are based on are presumably licensed as MIT, ISC, Public Domain, or similar. If that is the case, those licensing regimes don't forbid adding extra restrictions, so adding the restrictions of GPL would be permitted, and the new derived (again, assuming interfaces are copyrightable) work would operate as being essentially GPL licensed. Things get murky as there are proprietary utilities too that coreutils are probably based on, but then there are also specifications of behavior (SUS, Posix) which might or might not sever the "derivation path" of the work, or might introduce further licensing constraints.

That case has no bearing on this because they aren't deciding if "interfaces can be copyrighted"

Do you view APIs as something other than an interface? A ruling from the Supreme Court may not reach the ultimate issue, but if it does, what assumptions would you make concerning the difference between an API and a program’s interface with the user?

An Application Programming Interface is ... an interface. As this very link points out, there are many programs that use coreutils via this interface.

Rust coreutils replacing GNU coreutils in these programs by implementing the same interface is the same issue.

I’m not sure if it’s possible to spell it out clearer than this.

They don't need the source code; they have the standards specs, the man pages, and the test suite.

Wouldn't busybox (GPLv2) or toybox (BSD) be a better choice than an very incomplete Rust rewrite?

To start, both of them were write with embedded applications in mind, busybox is feature complete since distros like Alpine and OpenWRT use it (not sure about Toybox, but I know Android ships it).

BTW, both of them cover way more tooling than coreutils too. Busybox for example have a primitive init system and a vi/vim clone. You can also of course disable them during build time.

One motivation for using Rust is to eliminate whole classes of bugs. Certainly busybox is great for embedded applications; one executable serves so many functions. But the more you add to it, the more attack surface you have.

I am sure that a project that is 21 years old has much less bugs than a new project that had its first release one year ago, independently of the language.

Huh? Plenty of CVEs found in C/C++ codebases were introduced decades before they were discovered. In fact, it seems to me that the older a C/C++ codebase is (especially C++), the more bugs is has, simply because improved coding practices and "modern C++" features have a far greater impact than amount of user feedback and experience. But best practices and modern C++ features are just a poor man's (semi-effective) borrow checker. ;-)

More seriously. I agree that you can't call rust-coreutils mature yet. But over time maintenance on it will probably be considerably easier and if it ever gets popular enough to be thoroughly battle-tested, it almost certainly will be more reliable and secure. In addition, odds are that even if still immature, it already has fewer serious bugs. Especially memory safety, but also in things like string-handling since C++ -- and especially C -- string handling is notoriously difficult to get right. Then again, maybe not, since coreutils and presumably also rust-coreutils take the stupid approach that strings are just streams of bytes and there's relatively little the Rust std and compiler can do to fix that mess. Oh, wait! Actually it can help a bit by allowing you to be "flexible" in your interface but also parse and validate at the interface, then use better abstractions internally.

Case in point: OpenSSH vs rustls: OpenSSH is a lot older than rustls, has been extensively battle-tested by a very expert community, has been audited, fuzzed, etc to a significant degree, and we're still now starting to see security experts advising using rustls in stead of openssh, or projects deciding to switch to rustls because rustls is likely (and according to a growing body of real-world experience and evidence) less buggy and more secure than openssh. (And as a bonus it's also faster.) Not all of openssh's issues are due to the language, and neither is rustls's choice of language the only reason for its reliability. But in both cases the implementation language does play a crucial role in their security and reliability.

Some governments require that one cannot install one's own software on automotives for safety reasons?

That is quite interesting, but since one can modify one's car to begin with to make it unsafe, it is also rather futile.

Rather, a sensible system would be that after such modifications, a car would have to pass inspection again to be deemed road-worthy. — one may change the software, but one must pay to have it certified again ere it be allowed on public roads, and if that not be an option, one can always simply drive it on private property only.

> That is quite interesting, but since one can modify one's car to begin with to make it unsafe, it is also rather futile.

In France, if you want to modify your car (in a significant way, not specified by the car manufacturer) you need to send your car to a specific administration were engineer will inspect your vehicle before you can get it a plate number. This is called [Passage aux Mines](«https://fr.wikipedia.org/wiki/Passage_aux_Mines»). You typically have to do it when you decide to repurpose a cargo van as a camping van.

That would shock me if you had to do the same after updating the embedded software of your car.

In Germany, it's called a TüV-Inspektion by the Technische Überwachungsverein. Even unmodified cars have to get certified in regular intervals to ensure they're still street-worthy. Of course you may still tinker with your car, and do all you want with it on private land, and you can get quite a few modifications certified, but flashing some firmware image from Github on safety-critical systems definitely wouldn't get through.

Same in Italy, but that is government enforcing their rules, you can modify the software on your car if you want to, it' s simply illegal.

You can disable airbags, ABS and every other electronic circuit in the car and it's all explained in the car manual.

There might be a good reason to do it (for example the airbag is malfunctioning)

It has nothing to do with the terms of GPLv3

> (for example the airbag is malfunctioning)

Then the car/vehicle is no longer road-worthy/road-safe and has to be repaired before going on public roads again. If the airbag isn't e.g. discovered in a repair shop, during an inspection or any other place where the car can be fixed without rejoining the public roads it has to be safely towed/transported to a place where it CAN be fixed.

And incidentally it's not a problem of "muh car == muh freedom".

If you want to drive a vehicle not safe by the standards everyone has to adhere to you're free to do that on private property.

If you want to drive a vehicle on public roads where probably nobody knows about anything stupid you've done to the vehicle, i.e. a 1-2+ tonne lump of metal, glass and plastic probably with some sharp points/edges moving at a significant speed quite close to other people without any protection whatsoever. It's just horrifying accidents waiting to happen.

Let's take the example of someone fucking around with the airbags in the car. If they're not known good, they might as well go off at any moment during the drive, possibly incapacitating the driver while the car is moving at normal road speeds, making the car veer around wildly and generally being an extreme hazard. There is a reason cars have to certified and adhere to standards of road-safety/road-worthiness.

P.S. When a topic like this comes up I always have to think back to my father when e.g. a commercial/TV-programm about super-/hyper-cars came on. He almost always said "Dafür bräuchte man eigentlich einen Waffenschein." ("One SHOULD need a weapons license for that thing.") in the sense that in our country (Germany) prospective gun owners need a license which IIRC requires amongst other things a psychological examination/certificate to ensure that no irresponsible, no mentally-ill, no mentally-challenged etc. people get the license to own guns. MEANING driving around on public roads with something that's essentially a road-going mix of a missile and a door wedge should probably something like an idiot-test. I mean it's already standard practice to deny RENTALS of cars above a certain power threshold to people under something like 23 or 25.

FYI: you have the right to disable the airbags in your car, and it fact it's needed to put an infant seat in.

Sure, on the front PASSENGER seat and not permanently. Screwing around with software control etc. of airbags and other security features is (I hope) very much illegal and a reason for taking away the operating license of that particular car.

> is (I hope) very much illegal and a reason

"You can" doesn't mean it's legal.

But it's documented because you might need to, for example carrying babies on the passengers' seat or if the airbag is malfunctioning and you need to go to the repairing shop it might makes sense to disable it.

> for taking away the operating license of that particular car.

why not the death penalty then? :)

The airbag arguably only saves the driver's life, disabling it has the same effect of smoking cigarettes, except cigarettes are vastly more dangerous.

They don't take away your license for smocking (I guess)

I agree with your view, but as I mentioned in another thread, bringing the change is very difficult, especially when you're not in a position with sufficient power.

What is the conflict between GPL3's anti-tivoisation clause and government regulations?

Oftentimes they require vendors to only allow certified software to be installed for safety or compliance reasons.

You should finish that thought: and the manufacturers won't invest their money to certify something that's open source and can be used by everyone, including their competitors.

If there was a will to do something about it, GPL3 wouldn't be in the way.

No, GPLv3 requires you let users replace the software -- in some cases there are safety checks in the software which could be turned off if the user installed an alternative they built themselves.

That isn't allowed -- you fail verification if you give users a documented way to disable required safety features.

No, manufacturers want to use free software, but they are forced to avoid GPLv3 for this reason alone.

Unfortunately, challenging the status quo is difficult when your customers are not the final users, but other companies: if you don't deliver them a free software firmware without GPLv3 components, someone else will, or someone else will deliver something that's completely proprietary.

Manufacturers want to use free as beer software, but want to avoid expenses required to upkeep free software, or share these expenses with competitors.

As developer of free software, I don't care about those, who don't care about me.

I think you are both wrong. The purpose of a certification is to mitigate risk. If the licensor of the software has no money, there is no one to sue, and the risk is not mitigated. I’m not advocating this mentality.

> No, manufacturers want to use free software, but they are forced to avoid GPLv3 for this reason alone.


It means that GPLv3 works as intended.

EDIT: as intended by the software authors, that chose freely GPLv3 as license, they were not forced to.

I'm quite sure they knew what they were doing.

If car manufacturers want to use GPLv3 software, they simply need to respect the license the author released their software under or rewrite the software.

If by that you mean preventing free software from being more widely used, I agree it does. However, I disagree with that aim.

What's the point of wishing for existing FOSS artifacts (supposedly) "being more widely used" if it stops giving the foundational freedoms to the user?

Because Stallman's specific conception of "foundational freedoms" doesn't actually necessarily align with what gives the best utility for users.

Free and open source software is not all about "best utility for users". It's about freedoms being valued extremely highly; people self-select for being or not being the consumers of it.

What is a "freedom" is highly subjective. For example the GPL doesn't enable the "freedom" to create proprietary derivative products or tivoized products. Why is that better than the "freedom" to flash their firmware? It is just a matter of opinion.

It stops being free software after being widely because it's not free software anymore.

> If by that you mean preventing free software from being more widely used

That's false.

It simply prevents the Tivoization.

GPLv3 was created exactly with the purpose of preventing free software from becoming a commodity.

There's a cost involved when you use free software:

- the software must stay free

- if you include software licensed under a FOSS license, you have to adhere to the license terms

simple as that.

If the authors of software X or Y chose the GPLv3 as license I imagine they were completely aware and agreed to the terms of the license they used, including the limitations it enforces.

> If the authors of software X or Y chose the GPLv3 as license I imagine they were completely aware and agreed to the terms of the license they used, including the limitations it enforces.

That is certainly not universally true. It is common, but not universally true.

And we should assume that the authors were NOT aware of real ecosystem implications of licenses - because nobody understands these in detail. I've been trying to understand them since the mid 1990s when I started contributing in a BSD environment, and I won't say that I properly understand them. I understand parts of them, but I don't understand all of them.

> challenging the status quo is difficult

and always is. and people like you are part of the problem, since you want to enable them in their ways.

Why? why cannot we just enable companies to behave mor ethically?

> Why? why cannot we just enable companies to behave mor ethically?

Please read the rest of the sentence, it answers the question.

> people like you are part of the problem

Thank you for turning the licensing question into ad hominem.

... and how is GPLv3 incompatible with certification ? I don't get it.

Because it doesn't allow users to install software modified by them, which is a requirement in GPLv3.

And then the user loses the certification, as you said in another reply[1] the “user” you're talking about are other companies, so I don't see why they would decide to break the certification they got from their supplier just for fun.

Edit since the parent was in fact speaking about the end-user, which I misunderstood: I don't see the problem either. The manufacturer has no obligation to prevent the end user from updating his car's software. There is no locks that prevents the car owner to just disable his airbag[2], or remove the safety belt. It's illegal to do so in most countries, and if the user do do and injure himself or somebody else because of that modification, they are on their own. I don't think it should be any different for software actually.

[1]: https://news.ycombinator.com/item?id=26397176 [2] Edit: in fact, this is a bad example, because you need to be able to disable the airbag to put an infant car seat next to the driver.

No, the user that matters for GPL is the car owner. The point in the linked comment is that given how the market works, there is little incentive to work towards changing regulation (or even just interpretation of or belief about the regulation, I have clue if there are actually countries where this is impossible, but know for sure it's a widespread belief in the industry), because the company applying for certification is not the one getting annoyed by having to avoid GPLv3.

I obviously meant end users in this case.

The chain of software delivery often looks like this:

Small subcontractor delivers parts of system→ big company provides ready to use solution → hardware vendor uses the solution and gets their devices certified → end user uses the final product

In this case, the hardware vendor is mostly interested in having their devices work as intended. Everyone up the delivery chain has to meet their requirements in some way to basically get paid. That's not a position where it's easy to make demands regarding certifications, since the hardware vendor may just go to someone else.

if your car kills your kid, eg the airbag misfired, or the lane assist went into oncoming traffic, then the car vendor will look for any opportunity to evade and refute responsibility. including claims of self tampering with the car. now in the burnt wreck, forensics need to show beyond doubt that it was, indeed, the car vendors shipment that killed the kid. how do you show as DA, that indeed the software was untampered with?


ps: airbags not working is less of a software problem than airbags misfiring.

Not a lawyer, but I don't think that's true.

GPLv3 says that manufacturers have to release all the information needed to run modified software on the device, it doesn't mean that there is one (and one only) certified version that can legally run on the device for safety reasons.

GPLv3 in this case would force manufacturers to release the information so that the owner of the car could run modified software, but legally if you do it, you, the user, not the manufacturer, are violating the law.

It's the same thing that happens with electronic blueprints, you can modify the HW, it will void the warranty if you do it.


Protecting Your Right to Tinker

Tivoization is a dangerous attempt to curtail users' freedom: the right to modify your software will become meaningless if none of your computers let you do it. GPLv3 stops tivoization by requiring the distributor to provide you with whatever information or data is necessary to install modified software on the device. This may be as simple as a set of instructions, or it may include special data such as cryptographic keys or information about how to bypass an integrity check in the hardware. It will depend on how the hardware was designed—but no matter what information you need, you must be able to get it.

This requirement is limited in scope. Distributors are still allowed to use cryptographic keys for any purpose, and they'll only be required to disclose a key if you need it to modify GPLed software on the device they gave you. The GNU Project itself uses GnuPG to prove the integrity of all the software on its FTP site, and measures like that are beneficial to users. GPLv3 does not stop people from using cryptography; we wouldn't want it to. It only stops people from taking away the rights that the license provides you—whether through patent law, technology, or any other means.


The key point is if certification is possible for a device that allows arbitrary software to be run or not. If it is, we have your scenario. If it isn't, it's not possible. I don't know if the former case is true for all countries, I certainly know that the market overall believes it's not, or not worth the hassle of arguing it with other companies and regulators.

As I understand it the manufacturers need to release only the information, nowhere GPLv3 says that the manufacturer should make the process of running custom software easy or economically viable, just that the information should be available.

But as I've said I'm no law expert and I wouldn't put my hand on fire about it.

I'm sympathetic to Stallman's goals, but for some applications this presents a huge problem: if anyone can modify their car's auto emissions software, everyone can play Volkswagen and make their car high-performance and dirty as hell. Or people can extend the range of their WiFi by exceeding the legal power levels for unlicensed bands and making a frequency band unusable by their whole neighborhood. Or, modify someone's medical device's software to provide a very sneaky way of killing them. Now, there may be a way to solve these problems, but it would probably involve adding some unchangeable mechanism to limit the behavior of the device to keep it safe. But that's very difficult to do. Certifying that a fixed program has certain safety properties is difficult but possible; certifying that a new kind of design that allows users more freedom to tinker, but not too much, is much harder.

Why certifying specific version of software would conflict with GPLv3?

What kind of obstacles to the certification process do GPL3 softs typically encounter?

GPLv3 banned tivoization, — a practice whereby hardware certificates are used to prohibit the installation of modified software on hardware.

Some governments require that automotive manufacturers implement these kinds of certificates to prohibit the installation of custom software in automotive applications for safety reasons, as they do not wish that users could install their own, potentially buggy software, at the potential cost of human lives.

Older coreutils were licensed under GPLv2, which has no such restriction.

You as a user can't get your modified software certified, hence you cannot install it, your GPLv3 rights are violated.

I'm confused. Surely you can install the modified software? It's just that now the local jurisdiction may not allow you to use your car on public roads. That's a bummer, but a political problem that exists between the user and his government, not a software problem between the user and the car manufacturer?

Not if it is a certification requirement that the system prevents the installation of modified software that's not certified, which is what's claimed.

Ahh! Got it. That's sad. One would hope that the law could instead shift this burden onto the whoever modifies the software :-/

If the firmware is burned into ROMs at the manufacturing plant, how do you propose the car owner installs their modified version? Do we need to force all parts manufacturers to put ROM burners hardware and (replacable) firmware into every part, and OEMS to provide consolidated hardware interfaces to allow that, and car manufacturers to dedicate all the infrastructure required to allow access to all the OEMS who allow access to all the parts for all the various firmware updates?

Sure. Starting price for a car would be several years salary but at least you could abide by GPLv3 licensing should any part manufacturer choose to use it.

> If the firmware is burned into ROMs at the manufacturing plant,

then you do not need to make it modifiable under GPLv3. It explicitly states that.

anyways, it's generally not, expect any Linux system in a car to run from flash memory.

Very little outside of the in-dash infotainment system runs on Linux in a car.

I deal with manufacturers and OEMs putting software in cars for a living. I am not speculating, I am just reporting the reasons they tell me they will not accept any GPLv3 software for anything that gets loaded into their target devices.


1. They are lying, and spreading FUD about regulations and licenses as an excuse to hide the real reason why they don't want to let user install their software. (Eg, because options costs extra and if it was free software one could install it for free)

2. Or, they are mistaken and don't understand the license. Maybe the cost to use alternative is less than the cost of figuring out.

3. They are right and that is a sad reality that the government give more power to companies than power to end users for things they owe.

Since you mentioned something about the ROM which was clearly false, that could very well be option 1 or 2.

I mean, why would the gouvernement want to restrict users to update the GPS software or the media player?

I think that's a problem of interpretation more than a hard obstacle.

In many projects there is the requirement of a fixed release for 3rd party dependencies, versions for which all tests have been checked to pass (this is what is done in NodeJS with packages.json). There is even a requirement of reproducible build sometimes (like with the ongoing project to reach full reproducibility in Debian builds).

Wouldn't these fit the same thinking pattern as the requirements of certification of software for the industry?

I'd love to hear RMS on this subject, maybe he would, too, say that the solution exists inside of GPL3 rather than outside of it.

I'm sure there are ways to solve it, but they probably require both lawyering and developing technical solutions, which unfortunately the industry isn't much interested in doing, partly because there isn't enough pressure directed at them to change the course of things. Such pressure, in my understanding has to come from the regulators, but for that to happen they need to be convinced this is the right thing to do, and that isn't an easy task.

well, then i hope the new rust tools will do this too.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact