Hacker News new | past | comments | ask | show | jobs | submit login
Cross-platform Rust Rewrite of the GNU Coreutils (github.com)
299 points by doppp on June 12, 2014 | hide | past | web | favorite | 208 comments

Note GNU coreutils has a good test suite which just calls out to the various tools from shell and perl scripts. It should be easy enough to run this implementation through it. More effort goes into the coreutils tests than the code.

One of the hardest parts of coreutils is keeping it working everywhere, including handling that buggy version of function X in libc Y on distro Z. That's handled for GNU coreutils by gnulib, which currently has nearly 10K files, and so is a significant project in itself.

Some stats:

coreutils files, lines, commits: 1072, 239474, 27924

gnulib files, lines, commits: 9274, 302513, 17476

Actually that makes me think this project may be very useful. If gnulib is that needed for compatibility, then rust-coreutils should be a great test-case of system compatibility for rust itself. It could uncover many things that should be done inside rust rather than this coreutils implementation.

Curious to the benefits of this. Is Rust performant and on par with the native C version of coreutils? Seems the readme makes a big deal about being easy to compile on windows (which has never been a problem for me, I just use Cygwin or something).

Also, comments like this make it seem this is not "fully baked" yet and still needs some dev time:

> fn parse_date(str: &str) -> u64 {

> // This isn't actually compatible with GNU touch, but there doesn't seem to

> // be any simple specification for what format this parameter allows and I'm

> // not about to implement GNU parse_datetime.

> // http://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=blob_plai...

From my understanding, Cygwin should almost be considered a different platform than Windows proper. Projects like MSYS do a much better job of being Windows-native.

Rust definitely needs more dev time, but if coreutils already has such an excellent test suite, this sounds like a great way to test Rust in action.

Rust is still behind C, but I feel that's just a maturity issue. C has had nearly 50 years to get to where it is in performance Rust hasn't had 5.

I am a performance noob, but given that fortran can be faster than C due to increased information about aliasing, and given how much more aliasing information Rust has, I don't see why Rust couldn't eventually even beat C on certain kinds of things. (And obviously, aliasing is only a tiny part of performance, just something that comes to mind.)

That said, I am FAR more concerned about shipping a solid 1.0 than on a maximum performance one. It'll be a while, but we'll get there.

A nitpick: C has been able to achieve the same performance as Fortran since C99 introduced the "restrict" keyword.

I was reading the other day that Windows doesn't really support most of C99, is restrict part of that or not?

Visual Studio ships only with a C++ compiler. That doesn't mean you cannot use other C compilers on Windows.

Still, there seems to be something similar since VS2005. »__restrict is similar to restrict from the C99 spec, but __restrict can be used in C++ or C programs.«: http://msdn.microsoft.com/library/5ft82fed.aspx

PellesC has a new release candidate. LCC-WIN32 (compatible with what standard???) has a steady stream of releases. Openwatcom v2 has also changes. VS is only good for VisualD.

Ugh, yes, I should have said "MSVC++," or "Microsoft." Thank you.

MS specifically implemented __restrict:

> __restrict is similar to restrict from the C99 spec, but __restrict can be used in C++ or C programs.


As of VS2013, MS supports almost the entire C99 spec.

Would using a GPL test suite with an MIT implementation make the whole GPL? You're not "linking" to it, but it'd worry me somewhat.

>Would using a GPL test suite with an MIT implementation make the whole GPL?

Absolutely not. MIT is a FSF-approved GPL-compatible license and you're free to use it with whatever GPL-licensed packages you wish[0].

However, if you package and distribute the Rust coreutils with the GPL test suite, then the users of the package are obliged to either use GPL, MIT, or some other FSF-approved OS license.

But it's simple enough to package the Rust coreutils without the test suite, which puts the Rust coreutils users under no GPL obligations.

GPL2 is all about the distributing of software, not how you use it.

I do agree that any use of a GPL package will make some folks using this package in commercial software nervous, given the very few (none?) actual court cases that have decided these issues.

0. https://www.gnu.org/licenses/

> However, if you package and distribute the Rust coreutils with the GPL test suite, then the users of the package are obliged to either use GPL, MIT, or some other FSF-approved OS license.

I don't think this is right. The programs in question are the test suite, and there's can certainly distribute complete programs licensed under the GPL alongside with programs under even non-free licenses (otherwise Mac OS X couldn't ship).

Since the test suite doesn't link against the individual utilities, it won't affect them, license-wise.

It is well understood that the GPL doesn't cross executable boundaries (again, otherwise Mac OS X is in trouble).

You're absolutely right. I was thinking of AGPL, which does cross network boundaries.

And that's why it's to be avoided like the plague. :(

Or, that's why to use it instead of the GPL.

IANAL, but AFAIK as long as the test suite is not distributed as part of the package, it does not affect the licensing of the package as a whole; whereas if the test suite is distributed with the package, then the whole package can only be distributed under GPL, but the original sources the package authors wrote need not be relicensed. So, the sources will be MIT, the test suite will be GPL, and the whole package, including both the test suite and the sources will have to be GPL.

Nor am I a lawyer, but I'll point out that the GPL is only based in copyright, so if there's no chance of copyright infringement then the GPL terms don't apply.

I believe the relevant part of the GPLv3, regarding "whole package" is:

> A compilation of a covered work with other separate and independent works, which are not by their nature extensions of the covered work, and which are not combined with it such as to form a larger program, in or on a volume of a storage or distribution medium, is called an “aggregate” if the compilation and its resulting copyright are not used to limit the access or legal rights of the compilation's users beyond what the individual works permit. Inclusion of a covered work in an aggregate does not cause this License to apply to the other parts of the aggregate.

From my understanding, no. The tools are not linked by code.

I'm free to write a proprietary coreutils test suite and charge $100 for it, so long as it doesn't hook directly into the source of the tools. If it's only interacting with the product, so be it.

I guess there's no problem if people just use them together and obtain the results without redistributing both together.

I was already secretly imagining a future where Rust will completely replace C as the low level language for everything. Even the Linux kernel rewritten in it eventually. It'd be a much better world with no #ifdefs, strong type safety and memory safety depending on compilers instead of fallible programmers.

But I was expecting it would take a while before people had the ambition to start doing these things, even when it comes to the smaller ambition of rewriting the GNU coreutils. Rust is a great language already, but it's not a stable language yet.

I wonder if my imagination will become reality eventually. There's really good buzz around Rust now, and it's not even 'production ready'.

The best advantage of writing libraries in C is that it allows the library to be usable via FFIs in many other languages. Also, C is extensively standardised, widely used and thus greatly test-covered. This is a nice property of the language, as the programmers can be sure that as long as they follow the standards, they'll find a compiler to compile the programs they write. I have not used Rust for more that «Hello world!», but I know that I'd regret sooner or later, if I wrote a multi-million line project in a language that is not somehow standardised.

Also, memory safety is hard to enforce, I reckon, while controlling the memory hardware, which is what an operating system like Linux* has to do. Rust can be used, but still, there are many unsafe things that OS has to do, like moving and assigning pages, managing which programs use which pages, swapping, etc.; which, I reckon, requires knowledge and usage of physical memory addresses and access to individual bytes.

Rust may be an appropriate and advantageous language to use for the non-core parts of a microkernel, though.

* I acknowledge that Linux is not an operating system per se.

Rust has memory safety by default. However, it can do memory unsafe things in special blocks[1]. So it can do everything low level that C can, including controlling memory hardware [2]. It does unsafe things only in special blocks that can be specifically audited. That's part of why I find it so promising.

C has no memory safety by default. So if you want to audit your C program for memory safety problems, those problems can be anywhere. Even if that C program/component doesn't actually need such low level access in the first place!

[1]: http://static.rust-lang.org/doc/master/rust.html#unsafety

[2]: Here's a list of (hobby) kernels written in Rust: https://github.com/mozilla/rust/wiki/Operating-system-develo...

Everytime I say something negative about Rust, the counterarguments push me to learn the language. I'll definitely check out the kernels page, thanks a lot for that.

With your comments I have to concur. I have also read a comment on name-mangling somewhere on this thread, probably one that you've written. If C can be replaced with Rust for most-if-not-all applications in future, it is a step forwards IMO.

In my opinion, your argument about only C being usable from FFI is a strong one against using Rust if you're making a library.

It is possible to expose Rust data and functions to Python, Ruby, etc, since Rust is ABI-compatible with C. Unfortunately, you lose most of the Rust standard library since the Rust runtime is not freestanding. And I'm not sure how -- or if -- Rust will fix this issue. It's certainly desired, but it will be a lot of work.

The only bright side to all of this is that Rust is such a modern and pleasant language to use that scripting language bindings are almost not needed. Almost. But it's still a enough of a low-level language that bindings are highly desirable for quick prototypes, exploratory programming, and other scripting tasks.

  > I'm not sure how -- or if -- Rust will fix this issue.
We're addressing this as we speak! We've split out the stdlib into many constituent libraries that depend on each other, and then just made the stdlib a convenient facade for these libraries. After opting out of the stdlib, you can then opt into any of the "core" libraries as you like, a la carte. Many of these smaller libs don't rely on the runtime (they don't depend on librustrt) nor do they rely on allocation at all (liballoc), and can thus be used in a freestanding environment. See the other comment in this thread for more details:


Oh wow, that's great to hear. Any idea when/if basic, safe network/file IO will be possible without the runtime? Or is it already?

> The best advantage of writing libraries in C is that it allows the library to be usable via FFIs in many other languages.

Rust has allowed this for over a year at this point. For example, see the (old) blog post "Embedding Rust in Ruby": http://brson.github.io/2013/03/10/embedding-rust-in-ruby/

And the third production deployment of Rust is at Tilde for skylight, which uses a Ruby extension written as a Rust library with a thin C wrapper, due to Ruby's extensive C macros which would be annoying and bug-prone to port.

> Also, C is extensively standardised

Not really, at least not in the sense that you mean here: The generation of machine code in shared objects. There is a distinct lack of definition between C syntax and the resulting machine code.

> The best advantage of writing libraries in C is that it allows the library to be usable via FFIs in many other languages.

You can create C shared objects from Rust (or from a ton of other languages which generate native code). No big deal. It can do what you want.

> There is a distinct lack of definition between C syntax and the resulting machine code.

And more importantly: there is a distinct lack of similarity in behavior in this respect between vendors, versions, architectures, language features, ambient temperatures, and times of day. Change any of those things and you can get different output from your C compiler.

But not C include files. I think nimrodhas something here. Vala? I do not know.

That's just because Rust is still in development, and tooling like that is developed incrementally. There's already an issue about automatically creating C headers representing a Rust library, https://github.com/mozilla/rust/issues/10530 (and Alex Crichton has even started working on a tool for it, see his comment).

Like creating servers or user space drivers. That is what IOMMU is for.

Total language displacement is a pretty rare event, and technical superiority isn't even a prerequisite.

And I happen to like having a preprocessor.

Yes, it's far too early to tell. The good thing to take away is that, at the moment, the answer to "is Rust a viable threat to C/C++?" does not seem to be a categorical no. The same cannot be said for other languages in the area, such as Pascal, D, Ada, Nimrod and even Go, although all of these languages have their use.

Any adoption of Rust in areas where people otherwise use C/C++ is an improvement I think. It doesn't have to be total displacement, and that will never happen.

D is a viable alternative to C++, period, not necessarily to C. I believe a rewrite of Xorg in D would be googd. The only problem with all these languages is that you cannot e.g. create a dll in Ada and generate appropriate headers to use it in C, even more in Go. In this respect I feel that this is the real problem here. The biggest obstacle is the C++ legacy code.

D is absolutely a viable alternative to C++ and probably C. There's no argument there.

D is not a viable threat to C/C++ (this is what I said!), as noone is really thinking that D will actively displace the use of C/C++ in many projects in the future. The same goes for all the other languages I mention. In my perception there's a buzz around Rust that indicates it might be viewed differently.

Can you use D without GC? How would it look like? If very restricted, then I don't see it as a viable C++ alternative. I'm ok with GC on a web server, but coreutils are better without it.

It is possible to allocate objects in the stack (objects without polymorphism, defined with the struct keyword). No use of heap and GC in that case at all.

It's not that different to using C++ objects in the stack.

If by no GC you mean to use heap allocated objects with manually managed destructors, then I think the answer is no.

I'm looking at "http://dlang.org/garbage.html" section "D Operations That Involve the Garbage Collector", and there are some pretty interesting things on the list that require GC: all dynamic arrays, associative arrays, closures, asserts. Can you even read a file without these?

In college, a group of is (mostly my friends) wrote a kernel in D, with no GC: http://xomb.org

It's true that D's stdlib still uses GC, but he D team has been working on eliminating that.

You can write the non-GC part in C and call it from D. What is the prercentage of an app that is going to get harmed by GC? I believe it is manageable in C or even FreeBasic or even Freepascal or even Assembly. I used to believe that C++ is the best. Not the last three years.

Any systems programming language can displace C.

You just need a successful OS vendor pushing the said language as only means to target their OS.

I think the reason many people are rooting for Rust is because it's got comprehensively better features than competing languages. I'm not talking just about memory safety, but also its adoption of functional language features. I'm of the opinion that immutability-by-default, pattern matching, first-class functions and real macros combined with no garbage collection are a Very Good Thing™ for this type of language (and I'm not the only one).

So yeah, I disagree that any systems programming language can displace C. Other system languages simply don't come with the same amount of improvements that Rust offers.

> So yeah, I disagree that any systems programming language can displace C. Other system languages simply don't come with the same amount of improvements that Rust offers.

So, given my premise, how do you target super-cool-everyone-wants-it-os that only offers new-cool-language as their system programming language in the OS SDK?

Are you going to write a C compiler in new-cool-language to be able to keep on using C?

> super-cool-everyone-wants-it-os that only offers new-cool-language

I'd suggest that these are fundamentally incompatible statements. "Only offers new-cool-language" is another way of saying "it's a pain in the ass to port to - you have to rewrite your whole app!" if you're coming from anywhere that's ever touched not-quite-as-cool-language.

Why not cross compile?

You could as well.

In case case the C runtime has to be implemented on top of the new-cool-language ABI, as it is the only way the vendor supports creating software for their system.

What I am describing is not that different from thirty years ago when C was a UNIX only language, and other OS had other languages as their systems programming language.

Having a preprocessor is nice. Having a real macro system is nicer.

I consider them to serve different purposes. For stuff like the Linux Kernel, the preprocessor is great because it can be used with any number of different types of files the Linux Kernel has in it's source. The Kernel configuration basically just becomes a bunch of #define's, and you can access that information from C, assembly, linker scripts, etc...

How is this nicer than a configuration file template?

the nice thing about cpp is that you can use it on any file.

... unless the file contents happen to clash with CPP syntax.

I hope that never happens, because I think such a world would be extremely dystopian. Strong safety guarantees sound very beneficial at first ("safe" anything in general tends to evoke that response), but as shocking as it may seem, a lot of the freedom we have today is a result of using "unsafe" languages like C: iOS jailbreaks, Android rooting, homebrew on game consoles, and many other examples of freeing one's devices from restrictions imposed by the original manufacturer and giving control back to the user depend on this. These bugs and vulnerabilities, although they can be exploited maliciously, are in some ways like an escape hatch from the tyranny of corporate control. If strongly safe languages become the norm, all these would disappear and our computing experience become even more locked-down and restricted. The thought of them being used for DRM is scary (I remember reading an academic paper that suggeted this as one of their applications, and that's one of the thing that provoked me into really starting to think about this.) Even open-source software where safer languages appear most beneficial can be used against the user.

Ultimately is a security vs. freedom tradeoff, and I think we've sacrificed too much of the latter already.

I'm baffled by that opinion. The fact that memory-unsafe software keeps on being produced in C/C++ is a good thing? The capabilities of criminals to run local (or remote!) root exploits on many devices is offset by the fact that some hackers can tinker with them?

Freedom should be achieved in different ways. Please don't make freedom and security contradictions.

Practical experience shows that once you run into even routine systems programming elements, like writing device drivers and whatnot, you eventually have to concede some safety, or end up with an unprogrammable behemoth of such baroque complexity that you likely have more security holes, just of some other kind. There has been a lot of research into this back in the 1990s; most of the useable results produced security by architecture rather than tools.

It's almost as if what one would want is a language where you could shrink unsafe operations to as small a footprint as possible by confining them within a block with a special keyword, like this:

     unsafe { … }
Someone should invent such a thing.

Edit: …yeah, I was being tongue-in-cheek. This is exactly what Rust provides.


"...When a programmer has sufficient conviction that a sequence of potentially unsafe operations is actually safe, they can encapsulate that sequence (taken as a whole) within an unsafe block. The compiler will consider uses of such code safe, in the surrounding context…"

If I call a function that is running unsafe { } inside of it, do I know? Because I really want to know. And I want my function to be marked as unsafe as well (because it is) as well as any function calling my function, etc.

The way that you mark a function as unsafe is to stick a keyword in front of it:

  unsafe fn foo() { ... }
Any function marked as such is then allowed to call other unsafe functions:

  unsafe fn bar() { foo(); }
But there is a way to break the chain, which is to use an `unsafe` block without marking your function as unsafe:

  fn qux() {
      unsafe {
That said, it's incorrect to think that Rust is any more unsafe than any other language because of this; most languages simply defer this behavior to their FFI. By pulling it into the language itself, Rust is actually safer than e.g. calling C from Python, because Rust can do the low-level fiddling while still retaining at least some of the safety checks of normal Rust code. Even unsafe Rust is safer than C.

> Even unsafe Rust is safer than C.

This is an important point. `unsafe` blocks only let you do a few extra operations[1], not anything you want. A lot of safety checks still happen inside of unsafe blocks.

1: http://static.rust-lang.org/doc/master/rust.html#behavior-co...

Well, no, you can still theoretically do anything you want, you just need to be very, very explicit about it. :)

Some things are undefined behaviour[1]... so you really don't want to want to do them (i.e. you can do them inside `unsafe`, but the compiler optimises/reasons assuming they never happen: if they occur at all, you could have an arbitrarily broken program).

[1]: http://doc.rust-lang.org/master/rust.html#behavior-considere...

The point that I'm trying to make here is that you cannot make any assumptions about an unsafe block. Anything can happen, including really terrible undefined behavior. But the fact that anything can happen is why Rust is as powerful as C in this area.

My point is that while anything _can_ happen, it's not like Rust just turns off every single check. Yes, they can be gotten around, but it's not like the type system suddenly goes away.

You don't know, but you can use a compiler flag that will tell you if you are.

> I want my function to be marked as unsafe as well (because it is)

This misunderstands the nature of `unsafe`. `unsafe` means "I promise that this is actually safe, even thought it can't be inferred." Take, for example, a type which has mutable, shared state, but enforces all access through a mutex[1]. This type presents a safe interface, but the compiler can't know that it's safe, so you have to use `unsafe` internally.

1: http://static.rust-lang.org/doc/master/std/cell/struct.RefCe...

> If I call a function that is running unsafe { } inside of it, do I know? Because I really want to know.

Pretty much every function is going to be transitively running unsafe code, since the core libraries use it to implement the base primitives.

It should be noted that the unsafe code in the core libraries is kept to a minimum and (at least in theory) heavily scrutinized.

All code in mozilla/rust gets a code review before landing, so there's always at least 2 sets of eyes on it.

I think a lot of us are somewhat happy with freedom being maintained by the laws of the land. Granted, those do seem under attack, as well. Oddly, by a similar mantra as what leads type safety.

Consider, in a "type safe" law system, intent doesn't matter. Only what you did. Now, look at the laws that run on that principle.

Now, I personally think this is probably orthogonal to type safe programs. But I'm not completely convinced, yet. Type safe programming seems hamstrung by the fact that a type system only really protects that which is specified in the type system. Which seems to mean you can't easily have a variety of ways to do things, as ultimately the type system has to grow to encompass all of the system.

Now, I grant I'm probably just soured by some bad systems in the past where a change to one part required a rework of the entire system.

Most type safe systems have escape hatches that allow for conversion between types.

Also, IME, type safe languages don't hobble me as a programmer. However, the systems and things I work on (and am trying to get work on) have generally well considered specs associated with them. Be it radio protocols, file formats, or whatever. A type safe language (haskell, sml, ocaml, ada, rust (it seems, not explored it much yet), etc.) can really help with these projects. Instead of needing a dictionary translating what magic # 134 means if it's used in this context (god, these old fortran programs break me), we can actually specify things in code in sane, clear ways that map from spec/design to code cleanly. OO, in theory, can help us with this, but it also adds a lot of overhead to create a ton of classes when really we want integers that the compiler knows some are meant to be temperature and some are meant to be pressure. If you start adding temperature and pressure, it flags it. If there's some formula where that actually makes sense, you deliberately, intentionally, explicitly handle the conversion. This is a good thing. Slightly more verbose code, but such a boon for mainenance and v&v work.

Doesn't shock me too much. The experience I have had is when folks basically try to do their programming in the type system. This can work for some cases, maybe all if you "do it right." I've not been lucky enough to see that. Instead, I like it when the types are basically there to help, not force, you.

Yes, it's definitely controversial but I think a little bit of insecurity is a good thing but not too much -- the question is how much. Consider that had the NSA been far more secure in their processes, the whole Snowden thing may have never happened... which is somewhat ironic.

> Freedom should be achieved in different ways. Please don't make freedom and security contradictions.

In an ideal world, they wouldn't be. Unfortunately that's not how it is in reality.

So what do you think about the NSA's capabilities to easily rootkit your "free" exploitable device? I don't think this is a consistent worldview.

To play a devil's advocate, I want to answer this directly as answered.

Available information indicates that NSA used various coercion tactics to ensure cooperation from various companies and that it has the methods to ensure hardware modifications in its own interests.

So the damage to NSA from improving security too much is less than damage to my ability to use the computing device I have paid for as an actual computing device.

Of course we should also consider smaller threats.

But the better the platform is locked down, the less limitations there ae for advertiser-friendly platform design, which makes people expect a lot of permissions requested by applications, so phishing and trojans become easier.

The problem here is about the effects of scale: it is hard to produce just a few thousands phones with good specs.

That freedom also means I have the freedom to modify my device after exploiting it in order to remove those exploits. On the other hand, a device that is completely unexploitable due to being written in a strongly safe language, would mean I no longer have that freedom - and while it may be unexploitable to everyone else, I'm pretty sure the NSA has its ways to get in anyway; if it does, then there's no escape.

So, you like languages that are prone to obscure bugs because they allow you to remain competitive with the NSA?

Freedom is not about consistency in everything.

I've never said that freedom is about consistency. Opinions should be consistent if they are to be taken seriously.

This guy wants to intentionally leave software exploitable, ready to be unknowingly hacked by criminals or governments, and call it freedom. And he gets upvoted for it!

What's not consistent?

Also, it seems that "criminals" are the new "terrorists".

This is a really frightening post. There is a real tradeoff between security and freedom, but memory safety is not where it manifests. It manifests in the fact that if you give the user power to mess up their system, sometimes they actually do it. However, the dichotomy between freedom and basic memory safety is false and needs to be fought tooth and nail. Your believing it is a minor victory for the corporations you excoriate.

When we build the next generation of free operating system, I want it to be as secure as humanly possible. Your device isn't free if the NSA or some criminal has owned it.

How is it a false dichotomy if things like jailbreaks and console homebrew essentially rely on the lack of memory safety to work?

> Your device isn't free if the NSA or some criminal has owned it.

It's not free if you can't own it either - and that's where the problem lies: how do we stop the secure systems we create from being used and secured against us.

It's a false dichotomy because there's a possible world where you don't have to resort to that crap to get control of the device you bought, where you're in charge at the start (looking back, I wasn't explicit about this, sorry). That's what we should be working on, not holding back the state of the art of programming languages. That's the wrong place to attack the problem.

That would be an ideal world, and I agree that safer languages definitely do make sense there, but the big question is how do we get there.

> That's the wrong place to attack the problem.

Then what do you think is the - currently practical - way to attack the problem?

I think there is a bit of misunderstanding I created in my original post; I'm not completely against security and hate buggy software as much as anyone, but only pointing out that we rely on insecurity for many good things and freedom we have today, and thus we should give more consideration to the implications of using more secure languages before they get to the point of becoming so popular that there is no turning back. They're just such a good fit to be used in trusted computing/DRM systems, and that's what scares me the most.

That fear is akin to governments being afraid of cryptography. The genie's out of the bottle, we know how to make better languages/development environments. The question becomes, do we deliberately hobble ourselves or do we accept the reality and move forward. The "enemy", if they're rational, will be moving towards these better tools. On the other side of the fence, we should as well. Otherwise we're leaving our own code and systems open for abuse.

Ultimately, the only way to correct this is the active encouragement of DRM-free content. The active development of FOSS software. The active development of open-specced hardware. Let's take these tools that let us develop things with fewer errors (or where errors are made blindingly obvious) and make this free world we want.

More freedom for mediocre or bad coders = more ways to fuck up. More freedom for good coders = more ways to make great things (and fuck up).

So basically it's a tradeoff. If people working on the Linux kernel were very careful and good hackers, writing in C would be much less of a risk than letting fresh-out-of-college programmers tinker with memory on the low-level. But even very careful and good hackers can screw up sometimes—see what happened with OpenSSL.

I don't see a reason not to use Rust except for performance, or when the unsafe features of C are required to get the job done. I can't say a lot here because I've only seen Rust's syntax and feature list and never programmed in it. (On the other side, I've written a fair amount of (functioning) code in C, and I feel that debugging race conditions coupled with memory management bugs has shaved a few years off of my expected lifespan already.)

Rust shouldn't be a performance problem. The language semantics allow for all the things that make C fast, as well as doing a better job at aliasing.

Idiomatic Rust will drop a little bit, to the level of idiomatic C++, but there's nothing stopping you from optimizing that to C levels in exactly the same way when necessary.

What about bounds checking and stuff? Sorry for my ignorance on the matter.

It appears the compiler optimizes out bounds checking if and only if it can tell during compile time that the index is correct [1]. Arrays of dynamic size or with dynamic references are bound checked. In dynamic cases you should be writing manual bound checking guards in C anyway.

[1]: https://github.com/mozilla/rust/issues/9024

Not only do iterators help avoid them in most cases, I'm anxiously awaiting Intel MPX for more hardware support for bounds checks.

It seems you can use unsafe versions of vectors/arrays/strings to bypass bounds checking. Essentially, if you're very confident that your code is correct, then you can speed it up (is it worth it?) to C levels.

On the other hand, a lot of bounds checking is probably going to happen where you'd manually enter it into C/C++ code anyways (if you wanted to avoid certain errors).

Additionally, idiomatic Rust code will employ iterators rather than direct indexing where possible. Rust's iterators are memory safe and do not perform bounds checking.

Good post. We need to remember that security is a transitive verb: we secure something against something or someone. As you say, DRM is "securing" a device against the user.

Fortunately Rust has unsafe { ... }

It's interesting to think of an insubordinate dev being tasked to write some horrible locked-down system in Rust and going, "yeah, this thing has to be in an unsafe block because xyz, oh, woops, I guess I introduced a jailbreak opportunity".

Not that jailbreaks aren't a good thing, because I'd hate to be forced to use a locked (aka dumb) device.

Agreed... I'm quite worried about unsafe code in libraries that I'd use.

I doubt that most jailbreaks have anything to do with buffer overflows or pointer bugs in C code.

The iOS jailbreaks used to (haven't followed them recently) involve exploits in C/C++/Objective-C code. They've taken advantage of various system applications to install the jailbreaks. Some used exploits in PDF handling, others in USB handling.

You don't achieve freedom through bad software. What the hell? Note that the NSA spying on you is also facilitated by all the complex, buggy, broken software we use.

I don't think it makes sense to rewrite the Linux kernel in Rust. As long as we're dreaming, let's talk about a new OS without all the legacy cruft. Be sure to make it hard to lock the user out, so userbinator is happy too, but (again) as long as we're dreaming that's on the agenda too.

Developing a lot of relatively small programs seems like a good way to get experience with the language, find issues with it, report them, etc.

"uutils is licensed under the MIT License - see the LICENSE file for details" - funny to see GNU software reimplemented in another open source license.

I do not understand the down votes. Can somebody explain?

I did not mean to hurt any feeling. I was just surprised to see the license of a software where "GNU" is in the title. I naturally assumed that it will be in GNU license. So it was a surprise to discover MIT.

Now I understand. I should have written "surprising" instead of "funny". Sometimes writing about a subject gives you the answer.

Sorry for those that I hurt.

Because there is a large anti-GPL contingent on HN.

Is it anti-gpl or anti-gpl3? It seems from a public relations perspective gpl3 really lost a lot of mindshare in the open source world, including me.

It's anti-GPL. Remember, a large target audience of HN are entrepreneurs, who want to use open source components in their closed-source applications that they're selling. (A)GPL prevents them from doing this, (not selling, but keeping it closed source, which many see as being important to selling) so they get pissed.

A large part of HN is developers who like to release good code and let anyone use it as they see fit.

GPL has neither of those goals and actively works against the latter.

> actively works against the latter.

For your definition of 'anyone.' BSD software often does not let the end user use it as they see fit, as they no longer have access to the source code.

How does BSD remove access to source code?

Someone else may add some modifications to my code and I may not be able to see those modifications, true. But nothing's been removed.

That's not true for me and I suspect for a lot of other people here.

If I'm going to release code under a license, I need to understand every line in that license. GPLv2 is pretty understandable, GPLv3 is ponderous in places and ambiguous in others (needs lawyers and lawsuits to understand), and AGPL is simply around the bend. Loopholes you could drive a truck through.

I don't want to use GPLv3 or AGPL until their ambiguous bits have been clarified, but their adoption has been pretty low so that may never happen. Whether that's a good thing or a bad thing depends on your point of view.

Also, the GPLv2 vs GPLv3 fragmentation has turned off a lot of people.

I want to spend my time coding, not worrying about license issues and lawsuits.

The GNU coreutils are of course themselves a rewrite of the original Bell Labs / System V tools, plus a few decades of minor added features.

Free software reimplementation projects go to great pains to avoid code written by people who have looked at proprietary source. I hope this rewrite extends the same courtesy.

I'm a big GNU fan, and actually prefer to use the AGPL, but I don't understand this opinion. It seems to contradict the whole concept of free software.

EDIT: Thanks for the informative responses! This sort of effort might ought to receive the occasional audit, which probably could be done in semi-automated fashion.

Looking at source for something implementing the same thing opens you to accusations of being a derivative work. If the licenses are compatible, then this is no downside. If the licenses are incompatible, this makes it harder to defend against a lawsuit (even if you probably, in principle, remain in the right).

The MIT license is compatible with the GPL; you can use MIT-licensed code in GPL projects.

Yes, but the reverse is not true; you can't use GPL-licensed code on your project and still distribute it all as MIT-licensed.

It's true that the GPL-licenced code will remain GPL licenced, the MIT-licenced code will remain available under MIT though.

Yes. Compatibility is not a commutative relationship.


That would have been more correct, yes.

If someone were to write MIT licensed code that was heavily "inspired"[1] by reading GPL licensed code, people could then take that MIT licensed code and make proprietary clones and forks, effectively bypassing the GPL restrictions. It'd be essentially "code laundering".

[1] in a copyright infringing way, as defined by the courts in your preferred jurisdiction

So if I replace coreutils with this on my Linux box, I won't have to call it GNU+Linux anymore?

That's kinda stallman's position, however, he still insists on his talks using GNU/Linux, even if you want to ask questions about the general Linux ecosystem, such as embedded development, or android, which have few to no GNU components.

How stable and usable is Rust's ABI? Can I easily call Rust code from C or other languages that can easily call C functions? I ask because I think this is one area in which C++ has really failed. C++ is a bad language for building frameworks and libraries for consumption from other languages because interfacing with C++ code is such a nightmare. I'd hope any language that aspires to displace C++ has a better story here.

Rust's internal ABI (i.e. used for a default `fn foo() { ... }`) is not at all stable/usable/specified.

However, you can easily define a function with a C ABI (and a non-mangled symbol name):

  pub extern "C" fn foo() { ... }
I'm not sure if you regard this as better than C++'s situation.

You might be interested in the Portable C++ ABI proposal: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n402...

Considering how quickly C++ compilers are iterating, it might even be available in GCC/Clang before Rust 1.0.

While they're at it, I hope they write complete man pages rather than pointers to info pages.

I see what you're saying, but then you'd have people complaining there was too much text in the man page. Note the info pages are available on the web so you could have a wrapper like:

    cinfo() { xdg-open "http://www.gnu.org/software/coreutils/manual/html_node/$1-invocation.html#$1-invocation"; }
Which you could use like:

    cinfo dd

While I know the basic differences between info and man pages, can you shed some more light on your preference here for the less well informed.

Man pages are more convenient for the typical unix hacker - command-line oriented, vi-using.

They are rendered in less, which supports vi-like searching.

Info, on the other hand, is one of those odd pre-www hypertext systems. Advanced for its time, it is now clumsy and nonstandard.

What asher said. Man pages are convenient and man works nicely with other Unix command line tools. Info behaves more like a browser and plays by its own rules.

Its not that I want to discourage anybody from using their preferred documentation resource. If a developer would rather spend their time writing info pages, that's okay. But at least dump it into a man page. Even an oversized man paged like the one for mplayer is preferable to a pithy page telling me to use info. If I want something with hyperlinks I'll check the web.


info > man

> However, [other ports of coreutils to Windows] are either old, abandoned, hosted on CVS, written in platform-specific C, etc.

I don't get how on the earth can using CVS for version control be an appropriate reason to consider a software project bad. Yes, CVS is old and centralised, but, is it that big of a deal that it's usage by a project per se projects the project old and inactive?

I know from my own experience—when you want to start a project that has been done before you, it's easy to fall into the trap of faultily rationalizing why none of the other attempts have worked and why you're going to do it better.

However, I don't think you should ever avoid writing something you'd enjoy for the sole reason that there are prior attempts (successful or unsuccessful). If you aren't motivated by money, you won't lose anything if your project doesn't get a single user, but you will always gain enlightenment and great experience.

I understand your comment and support it -- I have, many times, thought of implementing a POSIX userland in Go, but the task has daunted me, so that has not been realised till now. What bothered me was the fact that they show usage of CVS as an alert for deprecation. I concur your argument though, no-one can judge no-one else for what they choose the create in their own time. But CVS is perfectly usable software (and I am telling this as someone who has spent a month trying to CVS-checkout OpenBSD source tree, only because $CVS_RSH=rsh rather than ssh).

> What bothered me was the fact that they show usage of CVS as an alert for deprecation.

CVS still works just as well as it ever did, but it is super crusty at this point. If an open source project hasn't transitioned off of it, that's a sign to me that the maintainer either doesn't know any modern DVCSes, or doesn't care about the project enough to transition to them.

It's not a guarantee that the project is dead or outdated, but it's a smell.

Of course, I consider still being hosted on Sourceforge to also be a smell.

> CVS still works just as well as it ever did, but it is super crusty at this point. If an open source project hasn't transitioned off of it, that's a sign to me that the maintainer either doesn't know any modern DVCSes, or doesn't care about the project enough to transition to them.


Yes, I know. The *BSDs are the exception that proves the rules. I also said it was "a smell" and not an outright dismissal.

I agree with you completely. CVS is old, but that doesn't make it bad per se. If it could serve the needs of source control before, I doubt the needs have changed that much to require a different SCM tool. We just see new solutions that work better so we naturally migrate to them.

My point is that the argument for CVS is probably just a forced, faulty rationalization by the author so he/she can justify writing a new implementation. But I'm repeating again that there is nothing wrong in reimplementing something just for the sake of it—you have nothing to lose and a lot to gain :)

I would go further and say that none of these reasons is particularly compelling, especially without further qualification.

(psss, they were just trying to be funny)

This looks like it's handy for learning purposes, but is there actually any practical use for this? I don't think that the code size and the pace of development really require changing the core language.

Past re-implementations have focused either on the learning experience, code size (for embedding or Unix "purity" reasons) or the license (i.e. not GPL).

Most of these tools are small enough and simple enough that they sort of "get finished." Probably not interesting but to some purists.

There is something to be said for the nice clean implementations of some of these tools. There is a little to be said for the language itself.

Really, something bigger and better is needed. A new HTTP server, a new sendmail or something, a new DNS server. I'd love to see Go or Rust take on Bind and produce a safe, secure, high performance implementation. The Ada guys never stepped up and produced anything interesting to show their tools' superiority, there is a lot more interest and community in Rust and Go. What's the leakiest, buggiest part of the equation right now? Go make a better one.

Well, they're building a browser engine...

Nice. However, I feel there're two things really needed to be done here: move all that cat|du|... to some src|source|whatever directory and add 'tests' directory with somewhat more robust tests layout than the current one. It would be really cool to replace (seriously, just for the arts sake) coreutils with this one, but I wouldn't dare to do it if I'm unable to easily verify that they both work and work at least almost as fast as coreutils on my platform. It's important stuff, you know, otherwise it wouldn't be called "coreutils", really…

It will be interesting when they get down the to-do list to fmt, which contains the Knuth-Plass paragraph reflow algorithm -- the only C-language version I've been able to find, and about the only readable one, after Knuth's "literate" one.

[1] http://onlinelibrary.wiley.com/doi/10.1002/spe.4380111102/ab...

Why Rust? Is it better at this than, say, Go?

Go is sort of a cross between Python and C with a bunch of concurrency support and a fairly elegant design. It's small and low cognitive overhead, but at the expense of certain types of expressiveness. It requires it's garbage collector, so it's hard to use it too far down the stack. It's gotten the most traction as a Python replacement, though I'd be happy to see it replacing Java in some cases too.

Rust is sort of like a cross between C++ and Haskell. It has better facilities for abstraction and enforced safety than Go, but at the cost of more cognitive overhead and being harder to learn. People who plan to use it generally use C++ now.

These languages are actually going after different use cases, and for my robot there are some parts of the codebase I'd write in Go if we were re-writing, and some in Rust.

Rust is also a bit like C (and not C++) by not having implicit calls to constructors, copy constructors etc, so its values are less magic.

Less magic is such a good value to have. You can always make magic from unmagic pieces, but it's not possible to de-magic-ify magic.

For example, I really dug the decision to remove GC from the Rust language and move it to the standard library.

Having a GC as a library type means many optimizations that need compiler help are not possible unless the library types are "magic" to the compiler.

Even if the GC were in the language, the optimizations that you speak of would have negative performance ramifications on all code that did not use GC. The reason that we took the plunge and moved GC into the stdlib was because we weren't willing to make that sacrifice.

It does mean that our GC will never be particularly optimal. And that's fine, because if you really need shared ownership you should be using our really great refcounted pointers instead. :)

I used to agree with Rust approach, assuming the compiler does dataflow analysis to minimize needless refcounting.

Now that the refcounting is also moved to a library, I am not sure.

Or are those types known to the compiler and the respective optimizations applied?

They fall out automatically from move semantics, a reference count only occurs on an explicit clone call:

  fn just_a_ref(x: &T) { ... }
  fn rc_by_val(x: Rc<T>) { ... }
  fn rc_by_ref(x: &Rc<T>) { ... }

  let some_rc_pointer: Rc<T> = ...;

  just_a_ref(&*some_rc_pointer); // no ref counting
  rc_by_ref(&some_rc_pointer); // no ref counting

  rc_by_val(some_rc_pointer.clone()); // ref count incremented

  // last use of a value (statically guaranteed that 
  // some_rc_pointer is never used again):
  rc_by_val(some_rc_pointer); // no ref counting

Thanks for the clarification.

Rust has a far better type system than go, and does not use a garbage collector, instead using its own idea of "variable ownership" to protect against freeing memory prematurely.

On the other hand, it's also in version 0.10, keeps changing all the time, and the stdlib doesn't contain nearly everything that Go's does, by design.

The Rust stdlib will not be as comprehensive as Go's, but it will still be much more "batteries included" than systems programmers have come to expect.

Whilst there is some crossover, The two languages occupy different niches, and have quite different philosophies driving their design.

Rust is far more extensible than Go - for example all of the concurrency primitives are built as libraries, and users can create their own depending on their unique use cases. Go's on the other hand are built in, and are difficult to extend.

Rust aims for generic, zero cost abstractions and control over allocation without compromising on safety, but this means the type system is more complex than Go's. Go's static type system is quite simple, and easy to learn, but is hard to write generic abstractions over unless you want to resort to using `interface {}`. Go also uses a mandatory GC which makes low level programming and interfacing with C difficult, but abstracts away from up-front memory management.

Rust's compiler is slow, but generates very fast code because it performs a great deal of static optimizations via LLVM. Go on the other hand builds blazingly fast but performs little to no optimization at compile time, nor compensates for that with a JIT.

If you want raw performance, control over allocation, and an expressive type system, choose Rust. If you want simplicity without having to think about allocation, choose Go.

Go may have gotten a headstart, but Rust seems like a much more worthwhile language to work with.

If we're going to ditch C for something proper, lets do it properly.

Go just seems like another half-assed minimum effort Google project.

From the README:


Many GNU, Linux and other utils are pretty awesome, and obviously some effort has been spent in the past to port them to Windows. However, those projects are either old, abandoned, hosted on CVS, written in platform-specific C, etc.

Rust provides a good, platform-agnostic way of writing systems utils that are easy to compile anywhere, and this is as good a way as any to try and learn it.

Cygwin does a very good job of letting you use UNIX utils on windows. But because it's hosted on CVS, that makes it uncool and worthy of a rewrite?

I've nothing against rewriting coreutils in Rust as part of a project to produce a "Rust operating system" in the same way that UNIX is the "C operating system". I do find "project rejected due to not on github" to be silly and a slightly unpleasant trend.

I don't know how the rust runtime or this project achieves portability, but I don't think cygwin is a great example. It might appear to work well on the surface but it is a heavy layer, which comes at performance cost and is somewhat inelegant.

Maybe cygwin is OK if you are fine with the costs and want to maximize the amount of Unix-like functionality (and you don't hit the bugs - I have seen some very nasty ones in the cygwin dll - deadlocks, random crashes). But something like MinGW/msys does a much better job of compiling for Windows.

Let me rephrase it more accurately

"Cygwin does a very good job of trashing your HD on windows." It cost me 90 euros for two hard drives.

How on earth did it manage to do that? Two drives? Not recoverable by formatting? I didn't think it even installed in the driver layer.

Porting the core utils was never a language problem - it's always been a "the terminal works pretty fundamentally differently on Windows" problem.

The file system and security model differences are much more important than any terminal differences. Looking through the programs included in coreutils, I can see only two or three that care about the terminal at all (`stty`, `nohup`, ls output coloring).

I think "terminal" here also includes differences in the shell. For example if you type "command *", a Unix shell will expand the star before calling the command. On Windows the star is passed to the command.

not the mention the way disks, processes (unix forks and execs, windows spawns) and I/O work

And character sets. A lot of poorly-done ports of Unix software cannot cope with Unicode on Windows.

Well rust doesn't have garbage collection, its designed to replace* the languages that coreutils are currently written in - C (I don't think any gnu coreutils are written in C++). Go is not designed to replace these languages.

*by replace I mean occupy some of their current marketshare.

Very few of the coreutils would be affected by gc. You could probably write them all to barely allocate memory other than at startup if you wanted, and most of the garbage would be produced by the parsing.

Simply comparing the two based on GC is very superficial.

But if GC doesn't matter, then why Go and not something like Lua?

Not really. The day my `cat` pauses to GC is the day I leave this field...

Why? Do you think you would notice?

(I realize your post may have been tongue in cheek, but it can be hard to tell on the interwebs.)

interestingly, Go was originally envisioned to do exactly this - a systems development language

Well, they used the phrase "systems language" when they introduced it, and it still appears prominently in the FAQ, but what they had in mind vis-a-vis what-exactly-constitutes-a-system was very different. (They were thinking more along the lines of a cloud infrastructure language.)

If you watch this excellent panel discussion from this year, you can hear Rob Pike express how they later regretted how their poor choice of terminology created such confusion:


(approx 6 minutes 45 seconds into the talk)

Because it's fun. I don't think there is a very strong reason for preferring Rust over Go except than the authors like it more. Hackers don't need to justify their decisions. They do stuff for fun and sometimes they end up with useful things.

Go produces fat binaries, probably not what you expect from this tiny tools. I even believe some linux distributions forbid packages that don't use shared libraries.

The last time I played around with rust (a few weeks ago), rustc compiled binaries included a statically linked runtime as well.

I've just built rust from the latest git and it still seems true (at least with the default compiler flags).

The "hello world" from the tutorial compiles to a 1MB executable (dynamically linked with the C library but statically linked with the rust library). The equivalent C program creates a 6.5KB binary with gcc.

It goes a lot smaller with `--codegen prefer-dynamic`, also around 6K for me.

But AFAIK there is a subset of Rust which doesn't require the runtime.

The language itself doesn't need the runtime: it's entirely implemented in libraries, and the libraries are slowly being rearranged to offer as much as possible to clients who wish to avoid the runtime (e.g. "libcore").

High-level description the runtimes: http://doc.rust-lang.org/master/guide-runtime.html

Avoiding the standard library/runtime: http://doc.rust-lang.org/master/guide-unsafe.html#avoiding-t... (don't miss the "Using libcore" subsection)

Rust gives you the option of either static or dynamic linking, but prefers static linking by default.

Maybe they could design it like BusyBox where there is a single binary and a bunch of symlinks to it.

You can use gccgo for dynamic linking.

No sure about the downvotes. It seems that Rust has better explicit resource management.

The downvotes: your original comment is very easy to mistake for trolling and/or Go fanboyism.

I gave you an upvote, because I gave you benefit of a doubt and assumed that was an honest question.

Or, say, C?

Why not?

From what I've seen, the by far biggest thing that Rust and Go have in common is that when one of them mentioned on HN, the other is very likely to be mentioned, too. Beyond that, I don't see much similarity.

It's not really cross platform - whoami for example depends on libc which afaik isn't native on Windows.

I wrote one of the first utilities for this when it was first opened up for collaboration, so I hope it succeeds :) I need to go back and write tests for my util.

There are many "libc"s; according to the docs, Rust's libc is a module that binds to the platform-specific libc implementation: http://static.rust-lang.org/doc/0.10/std/libc/index.html

I love this idea! When I get a few hours I'll have to try porting a small coreutil... Seems like a great way to learn Rust.

Any chance of Rust taking over embedded systems programming? That's still mostly done in C and quickly devolves into horror.

> Any chance of Rust taking over embedded systems programming?

I don't know about 'taking over' but there are some people who are using Rust for this use case.

That would be super cool. Anything publicly available to look at?

Freescale's Freedom boards are so cheap, so capable, and have such miserable tooling (mbed, DIY usb, Processor Expert, CodeWarrior, ugh)... I wonder if Rust could make them attractive.

I just know that people come into IRC, and there was some discussion about bitfields to help support embedded stuff. Not sure there's a lot written about it yet.

Interesting project. Why is there no grep in this, or on the to-do list?

grep is not part of the coreutils.

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact