Hacker News new | comments | show | ask | jobs | submit login
Rust and the Future of Systems Programming [video] (hacks.mozilla.org)
475 points by philbo on Nov 16, 2016 | hide | past | web | favorite | 486 comments



Now I have four services running on production, all written with Rust. If it compiles, it usually works. Of course you have these late night sessions where you write that one unwrap() because, hey, this will never return an error, right? And bam...

I'm seriously waiting that tokio train to be stable and a unified way of writing async services without needing to use some tricks with the channels or writing lots of ugly callback code. Also the native tls support is coming and the dependency hell with openssl would be gone forever.

If you need http server/client, I'd wait for a moment for Hyper to get their tokio branch stable and maybe having support for http2 by migrating the Solicit library.


Speaking of HTTP clients, just yesterday the person behind Hyper announced their new high-level HTTP client crate: http://seanmonstar.com/post/153221119046/introducing-reqwest


Not sure what to think of that. Does everything have to be async I/O now? How often do you need massive numbers of client connections?


Asynchronous nonblocking interfaces are more general-purpose than synchronous blocking interfaces. I can't speak for this library or Rust specifically, but in my experience well-designed asynchronous libraries allow you to interact with them in a synchronous style as well, if you wish.

Netty is an asynchronous, event-driven network framework for Java, and it's perfectly possible to expose synchronous blocking abstractions on top of it. The mechanism is pretty simple: the asynchronous framework exposes a future representing the result of an operation, and to provide a synchronous interface you simply block on the completion of that future before returning. Client libraries can handle this for you, providing the interface of e.g. a regular blocking HTTP client on top of Netty async IO.

This approach can be convenient, since it's possible for both synchronous and asynchronous style code to coexist easily in the same application. The application designer can incrementally change parts of the application into asynchronous style as performance needs dictate. For example, you might choose to serve typical small RPC requests using blocking workers in a thread pool, but when you need to stream the content of a large file across the network you could use a separate nonblocking worker pool that interacts with both the file system and network asynchronously.

The ability to interact in a blocking way via futures means that asynchronous facilities can serve both synchronous and asynchronous needs, making them the better choice for most frameworks today. While it used to be the case that async IO frameworks took a performance penalty compared to well-implemented sync IO ones, from what I understand that gap has been closed, and the highest performance frameworks are now all async IO. For example, check out the TechEmpower Web Framework Benchmarks. Most or all of the top performers use asynchronous approaches: https://www.techempower.com/benchmarks/#section=data-r13&hw=...


One is not more powerful than the other: they are considered "duals" and this has been proven in the literature (back in 1978, no less); most of the supposed downsides of threads are due to people assuming a specific implementation of threads (many if not most of which suck).

Here are some papers that would normally be assigned reading in a graduate level Computer Science course in Operating Systems as background reference.

https://pdfs.semanticscholar.org/2948/a0d014852ba47dd115fcc7...

http://capriccio.cs.berkeley.edu/pubs/threads-hotos-2003.pdf

But like, it should be obvious: with a lightweight co-routine library you can convert anything that is synchronous into something that is asynchronous with no more if not less overhead than you would get from context switches as you are forced to incur from returning and calling a new function to implement event processing. This is no more onerous than using that same co-routine library to implement blocking on a future (to convert an asynchronous API into a synchronous one).


The fact that two styles or concepts are formally dual does not make them equally practical or useful in all circumstances.

Consider: in calling conventions, the continuation passing style is dual to the "direct" calling convention (i.e., call stack with return values); the call-by-name style is dual to call-by-value style; Lambda Calculus and Turing Machines are dual in their ability to compute all effectively calculable functions.

These dualities do not mean it's equally practical to build systems in both ways. Sometimes one approach ends up being more practically useful.

Most programmers prefer to use the direct calling convention, and find complex continuation passing style to be difficult to read and maintain. JavaScript programmers may be familiar with the pain of CPS due to excessive use of callbacks (not strictly CPS but has similar drawbacks). Similarly, writing code purely in call-by-name style can be confusing and have difficult to predict performance impacts (e.g., Haskell lazy evaluation semantics).

In their article the article "On the Duality of Operating System Structures", Lauer and Needham present a similar conclusion [3]:

> "The principal conclusion we will draw from these observations is that the considerations for choosing which model to adopt in a given system [...] [are] a function of which set of primitive operations and mechanisms are easier to build or better suited to the constraints imposed by the machine architecture and hardware."

In that passage they are describing message passing vs. procedure call systems, and I interpret this to be their acknowledgment that, though the systems are dual, one architecture or another is more appropriate in certain circumstances.

Getting back to our original topic: this thread was about the decision of a Rust library to offer async or sync IO as its choice of primary primitive. I think async is the better general-purpose choice, because it's clean, simply, and straightforward to expose a synchronous interface on top of an async interfaces with futures; and the other way around is messy and difficult.

Can you elaborate on the lightweight co-routine library that can be used to convert anything synchronous into async? I'm curious about that, because Rust previously had support for coroutines (green threads), and decided to remove them due to a number of problems [1]. Meanwhile, Rust developers were able to devise a zero-cost futures abstraction on top of asynchronous IO [2]. Unlikely the problematic green threads strategy, this approach doesn't impose any complicated constraints on the systems that use it (FFI requirements), and doesn't add runtime overhead.

What co-routine library would you recommend that avoids the downsides in [1]?

[1] https://github.com/aturon/rfcs/blob/remove-runtime/active/00... describes some pretty tricky challenges.

[2] https://aturon.github.io/blog/2016/08/11/futures/

[3] https://pdfs.semanticscholar.org/2948/a0d014852ba47dd115fcc7...


Yes, I know, async I/O is the new cool thing. Here's an async I/O program from 1972.[1] John Walker wrote this. EXEC 8 had the IO$ system call, which, unlike IOW$, returned immediately. A "completion routine" was called when the I/O operation finished. Note how similar those libraries are to what's used today, now that people are reading Dijkstra again. The problem, of course, is that a callback system dominates the architecture of the entire program.

(When I moved from UNIVAC mainframes to UNIX, things seemed so sequential. No threads. No async I/O.)

[1] http://www.fourmilab.ch/documents/univac/fang/


Huh, so are you implying that hyper should stay synchronous so that it doesn't appear to just be copying things from 40 years ago?! This comment sounds like you think that it was good back then, but now you don't know what to think of a library that is aiming to switch to asynchronous IO and/or don't know why it's a good thing?

(It's also not like the comment you're replying to said that async IO is a recent invention, your low-effort sarcasm as a response is unfortunate.)


Where pretty much anything related to concurrency is concerned we've been busy reliving the 70's for most of the last decade. Locking, asynchronous I/O, you name it.

Hell even Microsoft had I/O Completion ports back in what, 2003 or so? Or am I wrong and it was a lot earlier? The coolest things in Javascript land were all done by Microsoft first and everyone (me especially) can't bring themselves to acknowledge that.


IO completion ports were introduced in NT 4.0 in 1996.


Isn't the NT async IO API just a front for a kernel side thread pool though, and may block depending on worker thread availability? They say[1] "if you issue an asynchronous cached read, and the pages are not in memory, the file system driver assumes that you do not want your thread blocked and the request will be handled by a limited pool of worker threads"

Things may be different for socket IO, but there Unix had select() much earlier, around 4.2BSD (1983)

[1] https://support.microsoft.com/en-us/kb/156932


Web servers and GUI apps are both long-running, event driven programs that need to do IO or slow computations while staying responsive to new events. It's not surprising that they are both well supported by the same programming model. The Win32 API is ugly but the methods of app/OS interaction it supports are fundamentally sound for high performance interactive programs.


1. This introduces a primarily synchronous API for now. Async will come later. All those code samples are synchronous.

2. Async I/O is an extremely hot topic in Rust right now, so it's likely that Rust people do care about it.


I assume you're talking about why reqwest is necessary (i.e. why hyper is moving to async), rather than the single paragraph mentioning asynchronicity as a possible future direction for reqwest?

Async I/O is important for more than just multiplexing a million requests. Hyper is moving to it for that purpose, and also because it is a more natural way for working with in-flight I/O, e.g. cancelling requests or selecting over them.


Simply put, IO as accessed by the Os is asynchronous by nature. The synchronicity you are used to is a nice artifact and lie the OS tells you to make simple programming easier. Underneath, the OS is doing everything asynchronously, and just waiting until complete to return, otherwise we would be throwing away massive quantities of compute cycles waiting for IO to complete.

Why let the OS and other programs running at the same time reap all that benefit? You too can program such that while you would normally be waiting for IO and some other programs was utilizing the CPU, your own cycles can be used and you can be accomplishing so much more.


I don't think everything has to be async, but if there's ever a place to use it then it's HTTP. There are too many systems that rely on an external web service and collapse in a ball of threads if that external web service ever gets a bit slow.


> I'm seriously waiting that tokio train to be stable and a unified way of writing async services without needing to use some tricks with the channels or writing lots of ugly callback code

By which [1] is meant, for those not knowing.

[1] https://github.com/tokio-rs/tokio


Wasn't there some other effort in Rust to enable asynchronous I/O?


There's been a few; this one is built on top of mio, one of the most popular previous ones.


I meant this one which I saw mentioned recently: https://github.com/alexcrichton/futures-rs


Ah yes. Basically, tokio is mio + futures.


tokio is not an alternative to futures, but rather a more high-level framework that builds on top of futures - both are being actively developed.


Isn't if let similar to Swift's if let where it either unwraps safely or does something else? I really wish I could disable forced unwrapping as it mostly leads to mistakes by less experienced or overconfident programmers and the amount of extra code by using guard instead if you program smart is negligible.


Rust's if let is indeed inspired by that of Swift and behaves practically the same way.


Though Swift's `if let` is AFAIK hardcoded to their built-in optional type, whereas when Rust lifted the idea they made it work with any enum. I believe Swift recently gained `if case let` as an equivalent to how `if let` works in Rust.


It was new in Swift 2 I think. Didn't use it yet. But it's basically a single case statement lifted from switch, nothing more. But enums and switch statements can be really complex beasts by themselves in Swift.

If Rust only allows it on enums that would be extremely weird.

In a switch statement I sometimes want to switch on the object, then see if I'm allowed to unwrap it to a certain class and immediately use it afterwards. Useful for parsing an array of mixed object types that should be processed differently.

But I honestly think Swift allows people to write elaborate illegible codegolf-y bullshit sometimes. Sometimes the Swift compiler still chokes on too complex expressions and you need to add some explicit types or separate out in several statements what you were trying to do in one statement. Usually a good warning that your code is hard to read even by humans.


Any unrefutable pattern, I believe.

EDIT: whoops! I got it backwards. It's refutable, right, duh. :)


Any refutable pattern; for irrefutable patterns the "if let"'s pattern would always match, so it would be kinda redundant. :-)


If-let can be used with refutable patterns (the refutation/counterexample to the pattern is when the else case runs). Irrefutable patterns can be used with regular let.


Would it make sense to have a cargo/rustc flag to disable unwrap and friends when building for production?


"Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?"

Unwrap implies "either this operation should succeed or we should panic". Doing anything other than panic seems like it's terribly difficult to determine what that should be. And I might've put unwrap() there because that's really what I want to happen. For `mv`: don't let's dare go ahead and do the unlink() if the link() failed!

Tangent: errors and exceptions aren't bad, they're how computers work. I've encountered folks who, when faced with runtime errors pepper their code with "if (!NULL)" or truly evil things like "except:pass"/"catch (...) { }" which rarely make sense anywhere but the base of the stack and even then don't usually. If you've ever asked yourself "but how did we even get here‽" it may be because someone dropped something totally incongruous like that in a related module.


> "Pray, Mr. Babbage ..." (for those who do not know the quote)

“On two occasions I have been asked, ‘Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?’ . . . I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.”

— Charles Babbage,

Passages from the Life of a Philosopher (1864)


I think the idea would be the program would fail to compile, and you would need to go back and replace the unwrap with proper error handling. The goal would be to allow the use of unwrap during development, but require the final polish before the code goes into production.


unwrap is proper error handling. It says "Try to do this. If it fails, panic". Its like an assert on an invariant that the compiler requires.

If my script depends on a database connection, I might connect to a database and unwrap() it so the script errors out if the database isn't available. If I wrote that logic myself I would just be awkwardly rewriting unwrap.


> If my script depends on a database connection, I might connect to a database and unwrap() it so the script errors out if the database isn't available.

I think that's a bad example, as it is one of those things that can really fail at runtime and which should be properly handled. Even if handling means printing an error message and stopping the process with an exit code - but not crashing.

I think unwrap is for things that really should not happen if everything is implemented correctly.


Thats fair. I said script for that reason. A better example would be assert equivalents - if an API could return null (Option) under normal circumstances but you know it can never be null based on how you're using it, unwrap() makes sense. That contract could only be violated if there's a bug in the implementation. If thats the case all guarantees are out the window and usually the best / only thing you can do is to crash and allow the process to restart.

Also in those case (in my experience) having a human-readable error message is rarely useful. When assertions are violated I almost always have to consult the code anyway. And 80% of my asserts are never hit. I usually don't bother preemptively writing decent error messages. File name and line number is the right information, and panic provides that anyway.


> If thats the case all guarantees are out the window and usually the best / only thing you can do is to crash and allow the process to restart.

I've found that there are a lot of minor bugs in the implementation that, in something client-facing (e.g. not on a server somewhere that can simply be taken out of the load balancing rotation until it restarts or whatever) probably shouldn't crash.

Report and log errors remotely - to be fixed - and skip some logic that relied on those guarantees, but not crash.

> Also in those case (in my experience) having a human-readable error message is rarely useful.

I'll settle for developer-readable, then ;). Panic can format error messages, and itself provides context information (the file and line you mention) as a decent means of reporting fatal errors. Some assertions are obvious enough as to their reason and cause from context - as you say, they don't need a message.

But I've also found taking the 10 seconds or so to think of a decentish error message pays off quite frequently. Even if I'm pretty sure it's unnecessary. Sometimes it may save me only a minute of context switching by telling me exactly what the problem was (instead of roughly describing some assumption made for unknown reasons), sometimes the only way I can make progress is by adding more logging and messaging and reproducing the problem because I couldn't suss out exactly what was happening - and deciding this could take a lot longer than a minute if I know it's hard to reproduce.


To be more accurate, unwrap can be a legitimate means of error handling when used in an application, as opposed to a library. But if you're writing a library, then unwrapping rather than using Result is a surefire way to make your users hate you. :)


Ah! I stand corrected. "library" is a pretty sane use case for barring unwrap(), one that cargo knows is the current goal. I still don't think it's general enough but maybe it's worth a warning.

Aside: CPython's gdbm support is provided by libgdbm that calls exit() for you if if finds something it's not happy with (corrupted database, e.g.). O.o


Sometimes you've proven some invariant in some other way, so you know that unwrapping is guaranteed to not panic. Although in those cases I prefer to use .expect("this will not fail because of blah") instead in the spirit of self-documenting code.


Agreed, I use `.expect("Infallible")` for such cases.


Even in a library there are valid use cases for an unwrap. If you can literally guarantee that the value is present it's fine.

In libraries one should be wary about it, and never use it if the unwrap might actually panic, but otherwise you're good to go.


Wouldn't it be more useful to fail with an error message at least ?


Sometimes unwrap is the proper error-handling behavior.


I think the idea is to fail to compile if you have certain kinds of panic?

I agree that making unwrap/expect silently ... not happen will just cause worse problems.


Sure, but in that case it would effectively elide it from the language spec! Who doesn't target "production" eventually?


Early exploration and tests amount to a lot of code, and the language needs to make those parts pleasant to write as well. I think .unwraps() are especially common there.

I imagine `println!()` is another thing whose design is influenced by meeting the needs of early exploration implementations (and it's another example of a library function that handles errors with panics, for example, but it's not just the thing that makes me think so).


> Sure, but in that case it would effectively elide it from the language spec!

No. Some unwraps are necessary. You wouldn't be absolutist here; you could still allow some carefully-labeled unwraps.

Also, not all production users need to care about panics that much.


Yes, I would envision an allow attribute of some sort.

This is simply a contract with compiler that you didn't copy and paste some example code somewhere in your codebase that fails with an unwrap. It's about extending the "if it compiles it runs" near-guarantee that we so love about Rust.

Ideally a program would fail fast and be restarted if it reached an unrecoverable state, with supervision trees like Erlang. Also ideally, unwrap would be used for exceptional states, not only ones that are unlikely to fail until something goes wrong, like a port being closed or a file unreadble or not present.


This is basically what https://github.com/Manishearth/rust-clippy/wiki#option_unwra... does

It doesn't work transitively (so if a crate you depend on unwraps you can't protect yourself), but https://github.com/llogiq/metacollect plans to fix that


To put this into perspective, this would also necessarily disable expressions of the form `xs[i]` where `xs` is a slice. Why? Because `xs[i]` is equivalent to `*xs.get(i).unwrap()`.

In other words, banning unwrap isn't really that productive because an unwrap, when properly used, is an expression of a runtime invariant.

The problem is that unwrap can be very easily misused as an error handling strategy in a library, and in that case, it's pretty much always wrong. But that doesn't mean using unwrap in a library is wrong all on its own, for example.


Sometimes I do wish I could disable the indexing syntax, though. :P At least in my own code, I find that I naturally reach for iterators rather than doing any manual indexing.


unwrap has legitimate use-cases, and it's not clear what "disable" it would be, as it changes the type of the thing it returns. You could write a lint to fail the build, if you want, I guess...


.unwrap() is only the right choice if you need to optimize for binary size and can't afford the cost of the precise error message you would pass to .expect(). There are situations where you can't possibly continue running the application if an error occurs, but you shouldn't rely on a backtrace (which you might not manage to capture, e.g. if RUST_BACKTRACE is unset or you don't have symbols) as your only method of communication with your future self.


This is not true. For example, consider this code:

    if foo.is_some() {
        let foo = foo.unwrap();
    } else {
        // other code
    }
Here, I _know_ that foo is some. The extra error message from expect will _never_ be seen.

Now, this is a contrived example, and would better be written with `if let` in today's Rust, but this is the _kind_ of situation in which unwrap is totally, 100% cool, but the compiler can't know.


I write Swift daily and I just don't force unwrap anymore, ever. I don't think a hard crash is very usable in a production application, a lot of people disagree and want a hard crash while testing but I think for those bugs that slip through the user experience between for example "loading the first screen but my avatar isn't set" is so much better than "loading the first screen and the app kills itself" just because you force unwrapped the URL of the avatar from the JSON response that had a slight problem in production.

Well perhaps we should have something that logs the error in production but keeps on trucking and crashes the application when the debug or test flag is set?


panics are explicitly for unrecoverable errors, so recovering from them and keeping on going means that you're not using the right kind of error handling. If that's the behavior you'd want, then you wouldn't want to use unwrap.


The error message in this case might be something like "foo became None after verifying it to be Some". This could happen, for example, if incorrect unsafe code in another thread concurrently mutates foo through a raw pointer. My point is that of course while writing them you don't think your unwraps will fail, but if they do, it's good to have a reminder of what's going on. Even if the expect never fails, the message provides additional documentation for those reading the code.


You'll like Rust, because concurrent mutation of a value is impossible if you hold a `&` or `&mut` to the value. So this can in fact be ruled out by the programmer.


It's only impossible in safe code. Unsafe cade can violate those rules all day long. You can't guarantee that there's no unsafe code running concurrently.


Any use of `unsafe` that breaks unrelated safe code is broken and buggy; if that scenario would happen like you describe it, the code is breaking Rust's aliasing rules: that's possible using `unsafe` but invalid and leads to UB.


I'm not talking about 'uses of unsafe', I'm talking about code that is unsafe. Much of that code is not even written in Rust, so there's no 'unsafe' to use.


Ok, so code that is memory unsafe (broken!). One must still say "unsafe" to bring it into Rust (to use ffi, or make a safe wrapper); so there is still a clear location in the Rust code that is to blame.


Concurrently modifying aliased memory (`&` references and pointers) is undefined behavior. Not just in Rust, but in just about any language.

As an aside, alias unsafety in Rust is always UB, even without concurrency.


> Now, this is a contrived example, and would better be written with `if let` in today's Rust, but this is the _kind_ of situation in which unwrap is totally, 100% cool, but the compiler can't know.

If the author knows it's safe, they should be able to express how they know in a way that the compiler can understand. Certainly I think there's a large space of use cases where the extra guarantee provided by forbidding unwrap would be well worth the cost of outlawing some "legitimate" cases, especially if we're just talking about doing so on a project level. (Though maybe they're not the rust target audience).


What if I want to have a reference to the last element in a vector that I just pushed? Without a push method that immediately returns such a reference, this will always involve an unwrap. And this is not just some weirdly constructed example, I've needed to do this in real code a couple of times already.


Pushing an element to a vector could return a guaranteed-non-empty vector. Admittedly it's unpleasant to write non-empty collections in a language that lacks HKT, since you have to reimplement a lot of stuff, but I'd consider that an argument for HKT rather than an argument for unwrap.


I can't reply to steveklabnik for some reason. But I think if let would replace that unwrap he uses there.


If you try to reply on HN too quickly, it hides the reply button as to discourage quick back-and-forths.

I mention in the post that this specific code would be best written with if let, but that it's not about the specifics, it's about the general pattern.


This is pedantic. It seems clear from context that this conversation is about panicking on the None/Err cases, and unwrap is a shorthand for "unwrap or except or match with a panic branch."


Right, it would likely use clippy but it would essentially be a --production target or profile, intended for builds where the binary will be run in production. And by 'and friends' I mean calls that panic for the same reason as unwrap, such as expect or ok.

The goal is to stop code from reaching production inadvertently, not to prevent all sources of panics.


But if we accept the premise that "it's acceptable in some cases to have unwrap() in code that targets 'production'" then it wouldn't make sense to have a production profile that bars its use. The word "production" is in the global namespace and I think you want something more specific to your use case.

Rather, one could define a rust coding guide for themselves that deems unwrap() inappropriate for production use. (and in that case use the lint Steve suggests).


Seems like you'd want a lint rule where if unwrap is used, it must have a comment preceding it (of some formal syntax) describing why it's necessary or appropriate. Thus the build can have all uses of unwrap known as explicitly allowed, allowing all the possible sites of panics to be enumerated and known, a useful property to have


In that case you may want "expect" over unwrap.


clippy has this already :)


For those not familiar with Clippy:

https://github.com/Manishearth/rust-clippy

It's a very useful tool; the main thing about it that annoys me is that it only works with nightly.


If by "and friends" you mean "calls that can panic", there are a few of those. Slice bounds checks, for one.


> GC pause ... sufficiently low power hw .. cheap phone

Yeah, but even high-powered hardware can take a "major" hit from a GC pause when your application is extremely latency sensitive.

IMO it would be great to get folks who write the enormous base of existing realtime apps driving critical devices everywhere to sit up and take notice of Rust.

EDIT: I mean to say that many of my colleagues who write realtime software dismiss new languages as including GC baggage by default (because so many do!). So, hey, good that the video calls this out.


> IMO it would be great to get folks who write the enormous base of existing realtime apps driving critical devices everywhere to sit up and take notice of Rust.

It would, and definitely should, move into the direction of safer languages than C.

The biggest problem I see is tooling and legacy. Tooling, because there's a ginormous amount of testing and design software that "works with C" (whatever that means in the context of the tool). Legacy, because everyone already has their 20 year old codebases and it's just not convenient to start focusing on two languages and switching the old code to Rust is just plain impossible economically.

Third problem is compiler. LLVM (rustc is a frontend to it, no?) is really good choice, but gcc has enormous advantage in supporting so many small platforms and it's very significant in this area of SW dev.

On the plus side, I've really gotten the impression that the Rust folks are truly trying to make adoption as smooth as possible. If that works, and Rust proves to be much better than C for these kinds of systems, I'd guess adopting Rust rather than not would start looking economically viable to companies. I mean, in the end that's what matters to them the most, and it's not easy to replace all you C ninjas with competent Rust writer.


Rather than a ground-up rewrite, I expect people will begin using Rust in the same way that Firefox has: identify individual components that would most benefit from Rust, segment those components off behind well-defined C interfaces, then write a compatible Rust lib using Rust's ability to expose C interfaces.


If you're considering switching to Rust for code/memory safety reasons, SaferCPlusPlus[1] may be an easier/cheaper/low risk option. It allows you to add memory safety to your existing code base in a completely incremental way, with no dependency risk. (At the moment, standard library support is required though.)

[1] https://github.com/duneroadrunner/SaferCPlusPlus


"Safer C++", like all C++ template libraries, is not memory safe.


Are you confusing SaferCPlusPlus with a different library? SaferCPlusPlus is a new library that makes it practical to stick to a memory safe subset of C++ (i.e. no native pointers, no native arrays, no std::array<>, no std::vector<>, etc.).

Using the SaferCPlusPlus library to replace all uses of C++'s unsafe elements does result in code that is as memory safe as Rust, or any other modern language. The main shortcoming at the moment is that it doesn't yet provide memory safe replacements for all of the standard library's unsafe elements, just the most commonly used ones.


Create a vector. Push an element onto it. Take a reference to that element with operator[]. Clear the vector. Call a method on that dangling reference.

Create an object on the stack. Return a reference to that object. Call a method on that reference.

Create a vector. Push an element onto it. Call a method on that element that clears the vector and then calls another virtual method on itself, via the this pointer.

Accidentally share a vector between threads. Race push_back() and remove().

Etc. etc. We didn't implement lifetimes for no reason.

Additionally, the pointer registration mechanism that that library uses has a runtime performance cost worse than a GC write barrier (because it incurs writes on reads).


> Create a vector. Push an element onto it. Take a reference to that element with operator[]. Clear the vector. Call a method on that dangling reference.

> Create an object on the stack. Return a reference to that object. Call a method on that reference.

References are one of the unsafe C++ elements that SaferCPlusPlus is intended to be used to replace [1].

> Create a vector. Push an element onto it. Call a method on that element that clears the vector and then calls another virtual method on itself, via the this pointer.

Yes, that series of operations is safe. A related example from the "msetl_example.cpp" file:

        typedef mse::mstd::vector<int> vint_type;
        mse::mstd::vector<vint_type> vvi;
        {
            vint_type vi;
            vi.push_back(5);
            vvi.push_back(vi);
        }
        auto vi_it = vvi[0].begin();
        vvi.clear();
        try {
            /* At this point, the vint_type object is cleared from vvi, but it has not been deallocated/destructed yet because it
            "knows" that there is an iterator, namely vi_it, that is still referencing it. At the moment, std::shared_ptrs are being
            used to achieve this. */
            auto value = (*vi_it); /* So this is actually ok. vi_it still points to a valid item. */
            assert(5 == value);
            vint_type vi2;
            vi_it = vi2.begin();
            /* The vint_type object that vi_it was originally pointing to is now deallocated/destructed, because vi_it no longer
            references it. */
        }
        catch (...) {
            /* At present, no exception will be thrown. We're still debating whether it'd be better to throw an exception though. */
        }
I agree with the gist though. This kind of thing should be prevented at compile time. Rust has an excellent static analyzer/enforcer built into its compiler. Arguably, it would be a service to the community to unbundle it from the Rust compiler and make it available for application to C++ code as well. Arguably.

> Accidentally share a vector between threads. Race push_back() and remove().

SaferCPlusPlus addresses the sharing of objects between asynchronous threads [2]. A particular shortcoming of C++ wrt to object sharing is that it doesn't have a notion of "deep const/immutability".

> Additionally, the pointer registration mechanism that that library uses has a runtime performance cost worse than a GC write barrier (because it incurs writes on reads).

Um, yeah, modern code should try to avoid the use of general pointers (and generally does). Most modern languages don't provide general pointers. SaferCPlusPlus makes them safe and slow (and available for easy porting of legacy code). When writing new code you would instead, when required, use one of the faster pointer types available in the library.

Don't interpret SaferCPlusPlus as an assertion that C++ is a uniformly better language than Rust or other modern languages. It's more of a suggestion that C++ and existing C++ code bases can be salvaged to a greater degree than one might think.

[1] http://www.codeproject.com/Articles/1093894/How-To-Safely-Pa...

[2] http://www.codeproject.com/Articles/1106491/Sharing-Objects-...


> References are one of the unsafe C++ elements that SaferCPlusPlus is intended to be used to replace [1].

OK, so you can't use references. Then, as I said before, your pointer replacements have a runtime performance cost worse than GC write barriers.

> Yes, that series of operations is safe. A related example from the "msetl_example.cpp" file:

I don't think you understood me. I mean the this pointer. "this" is hardwired into C++ to be an unsafe pointer.

> I agree with the gist though. This kind of thing should be prevented at compile time. Rust has an excellent static analyzer/enforcer built into its compiler. Arguably, it would be a service to the community to unbundle it from the Rust compiler and make it available for application to C++ code as well. Arguably.

Not possible. It's totally incompatible with existing C++ designs.

> Um, yeah, modern code should try to avoid the use of general pointers (and generally does). Most modern languages don't provide general pointers.

I think you're getting lost in the weeds of what a "general pointer" is and is not. It doesn't matter.

The point is that if your references track their owners at runtime, then you are just creating a GC. If the overhead of doing that is worse than a traditional GC (which, if you are doing that much bookkeeping, it will be), then there's little purpose to it.


> OK, so you can't use references. Then, as I said before, your pointer replacements have a runtime performance cost worse than GC write barriers.

The library provides three types of pointers - "registered", "scope" and "refcounting". I believe you are referring to the registered pointers, that indeed have significant cost on construction, destruction and assignment. But registered pointers are really mostly intended to ease the task of initially porting legacy code. New or updated code would instead use either "scope" pointers, which point to objects that have (execution) scope lifetime, or "refcounting" pointers. Scope pointers have zero extra runtime overhead, but are (at the moment) lacking the needed "static enforcer" to ensure that scope objects are indeed allocated on the stack. (Their type definition does prevent a lot of potential inadvertent misuse, but not all. And Ironclad C++ does have such a static enforcer.)

> I don't think you understood me. I mean the this pointer. "this" is hardwired into C++ to be an unsafe pointer.

You're right, that's a good point. But really it's a practical issue rather than a technical one. I mean technically, use of the "this" pointer should be replaced with a safer pointer, just like any other native pointer.

For example this is technically one of the safe ways to implement it in SaferCPlusPlus:

    class CA { public:
        template<class safe_this_pointer_type, class safe_vector_pointer_type>
        void foo1(safe_this_pointer_type safe_this, safe_vector_pointer_type vec_ptr) {
            vec_ptr->clear();
            
            /* The next line will throw an exception (or whatever user specified behavior). */
            safe_this->m_i += 1;
        }

        int m_i = 0;
    }
    
    void main() {
        mse::TXScopeObj<mse::mstd::vector<CA>> vec1;
        vec1.resize(1);
        auto iter = vec1.begin();
        iter->foo1(iter, &vec1);
    }
That is, technically, if you're going to use the "this" pointer, explicitly or implicitly, you should pass a safe version of it (in this case "iter"). But yeah, in practice I don't expect people to be so diligent. I wonder how often this type of scenario arises in practice?

So do I understand correctly that the Rust language allows for the same type of code, but the compiler won't build it unless it can statically deduce that it is safe?

> Not possible. It's totally incompatible with existing C++ designs.

Even if you prohibit the unsafe elements? Including (implicit and explicit) "this" pointers?


Hmm, a more practical approach might be to mirror the GC languages and only permit (not-null) refcounting pointers as elements of dynamic containers such as vectors. Ensuring that all references don't outlive their targets, thereby eliminating the implicit "this" pointer issue. I think. Is that how Rust does it?


> Is that how Rust does it?

No, safe Rust only has safe references, and that includes "this" ("self" in Rust). Because the lifetimes are part of the type, it does not require the runtime overhead of reference counting.


Rusts references behave like plain raw C/C++ pointers at runtime, without any bookkeeping code running at all.

The magic all lies in the compiletime borrow checker, which roughly works like this:

    - All data is accessed either through something on the stack or in static memory.
    - Accessing data, say by creating a reference to it, 
      causes the compiler to "borrow" the value for the scope in which the reference
      is alive.
    - The references can be alive for any scope equal or smaller than for which    
      access to the data itself is valid.
    - References track the original scope for which they are alive around as a 
      template-paramter-like thing called "lifetime parameter".
      Note that Rusts use of the word "lifetime" is thus a bit narrower than the
      one used in C++, since it just talks about stack scopes, and not the lifetime 
      of the actual value as would be tracked by a GC or ref counting.
      Example:

      let x = true;
      let r = &x;

      Here, r would infer to a type like `Reference<ScopeOfXVariable, bool>`.
      (The actual type in rust would be a `&'a T` with 
      'a = scope of x, and T = bool).
    - Because the scope is tracked as part of the reference type,
      it is possible to copy/move/transform/wrap references safely, since
      the compiler will always "know" about the original scope and thus can
      check that you never end up in a situation where you accidentally outlive the 
      thing you borrowed, say if you try to return a type that contains a reference 
      somewhere deep down.
    - The borrow itself acts as a compiletime read/write lock on the thing you referenced,
      so for the scope that the reference is alive for the compiler prevents
      you from changing or destroying the referenced thing. Example:

      // This errors:
      let mut a = 5;
      let b = &a;
      a = 10; // ERROR: a is borrowed
      println!("{}", *b);

      // This is fine:
      let mut c = 100;
      { 
          let d = &c;
          println!("{}", *d);
      }
      c = 50;

    - The above examples just use `&` for references, but Rust has two references types:
      - &'a T, called "shared reference", which cause "shared borrows".
      - &'a mut T, called "mutable references", which cause "mutable borrows".
    - Both behave the same in principle, but have different restrictions and guarantees:
      - A mutable borrow is exclusive, meaning no other other borrow to the same data 
        is allowed while the &mut T is alive, but allows you to freely change the T through 
        the reference.
      - A shared borrow may alias, so you can have multiple &T pointing
        to the same data at the same time, but you are not allowed to freely change T through 
        the reference.
      - (If those two cases are too rigid there is also a escape hatch that
        a specific type may opt-into to allow mutation of itself through a shared reference, with 
        exclusivity checked through some other mechanism like runtime borrow counting.)
    - Through these two reference types, Rust libraries can abstract with arbitrary APIs
      without loosing the borrow checker guarantees. Eg, the "reference to vector element"
      example boils down as this:

      let mut v = Vec::new();
      v.push(1);
      let r = &v[0]; // the reference in r now has a shared borrow on v.
      v.push(2);     // push tries to create a mutable borrow of v, which conflicts with the 
                        borrow kept alive by r, so you get a borrow error at compiletime.
      println!("{}", *r);
The important part is that all this is there, per default, for all Rust code in existence, so you can not accidentally ignore it like a library solution you might not know about, or like language features that don't know about the library solutions.


Great explanation. Thanks.


Correct me if I'm wrong, but it looks like this just provides some 'safe' alternatives to unsafe C++ things. It's still up to the diligence of the programmer to not use those things and nothing is getting statically verified.

By contrast, when I write Rust, memory safety (and type safety) are verified by the compiler.


That's right. SaferCPlusPlus is not complete and does not yet include a static verifier/checker.

Without a static verifier, memory safety is not guaranteed, just dramatically improved. And for many cases where there is a large investment in an existing code base, this might still be a more expedient solution. Even if only an interim one.

For example, I would estimate that, with concerted effort, it would take a matter of weeks to "port" the existing Firefox C++ code base to SaferCPlusPlus. Presumably this would dramatically reduce "remote execution", and other memory bugs while we wait for the Rust implementation.

In cases where guaranteed memory safety is desired, you might think of it this way: In Rust, the static checker is built into the compiler. In C++, static checkers/analyzers are separate tools. You could choose to require that your C++ code must be verified to be safe by a static analyzer of your choosing. In C++, it can be difficult/inconvenient to write non-trivial code that fully appeases the static analyzer, just like in Rust. You can use SaferCPlusPlus to make it easier to fully appease the static analyzer (like the Rust language does).

I should also mention "Ironclad C++". It's similar in function to SaferCPlusPlus, but it uses garbage collection (where SaferCPlusPlus does not). It does include a static verifier/enforcer.

As a fan of "memory safety without using GC", I'm rooting for Rust. But I think the idea of achieving memory safety in C++ can be too quickly dismissed.


> In C++, static checkers/analyzers are separate tools. You could choose to require that your C++ code must be verified to be safe by a static analyzer of your choosing.

The problem is that, in C++, there is no such static checker in existence (except ones with GC).


Well, like I said in the other comment, you guys could fix that by unbundling the static checker in the Rust compiler and making it applicable to (a subset of) C++ code as well :)

So then would you agree with the notion that (a practical subset of) C++ combined with a static analyzer could be just as safe and fast as Rust if, hypothetically, there existed an enthusiastic community comparable to Rust's? Or are there intrinsic technical issues? Or syntax issues?

Also, let me throw this notion at you: Rather than disallow code that can't be verified to be (memory) safe, the compiler could instead inject runtime checks that would be optimized out using the same analysis that the static checker uses.

That is, instead of requiring that the code be fast and safe or it won't compile, it becomes: If your code is not clearly, intrinsically safe then it will have runtime checks that will slow it down. And the compiler could list any runtime checks that it wasn't able to optimize out.

The reason I suggest this is that memory safety is just the enforcement of certain invariants. There's no reason why we couldn't let the programmer define additional, application specific invariants and have the build process treat them the same way it treats memory access invariants.

So for example, when a user defines a class, it could have a standard member function called "assert_object_invariants()" or something, that the programmer can define. Then anytime a (non-const?) member function is called, the compiler can insert runtime asserts at the beginning and end of the member function call. And again the compiler can tell you when those runtime asserts aren't optimized out. Wouldn't that make sense? I haven't really thought it through.


> Well, like I said in the other comment, you guys could fix that by unbundling the static checker in the Rust compiler and making it applicable to (a subset of) C++ code as well :)

The problem is that you still need extra annotations. Namely lifetime annotations (or something similar relating between borrows -- either that, or use a lot of elision which can be crippling). On top of that, the programming style Rust encourages is not the same as the ones you tend to see in C++ codebase, and programming in the C++ style will lead to code that doesn't compile.

> Rather than disallow code that can't be verified to be (memory) safe, the compiler could instead inject runtime checks that would be optimized out using the same analysis that the static checker uses.

This might be more tractable (and is an interesting idea). But that optimizer would be hard to write.

> So then would you agree with the notion that (a practical subset of) C++ combined with a static analyzer could be just as safe and fast as Rust

I think this is what the new ISOCPP core guidelines are trying to do? Though they don't go far enough in preventing memory unsafety IIRC (this may have changed).


> The problem is that you still need extra annotations. Namely lifetime annotations

Well, the idea is not to have the static analyzer verify typical C++ code. Just some practical subset. So for example I think it's quite practical to write C++ code that uses only "scope" pointers (basically pointers to objects on the stack) and (not-null) refcounting pointers, that intrinsically don't outlive their targets. Lifetimes would be implied by the types. So wait, what more does Rust's static analyzer give us again? Does it somehow remove the need for refcounting heap objects?

> the programming style Rust encourages is not the same as the ones you tend to see in C++ codebase, and programming in the C++ style will lead to code that doesn't compile.

I have no problem with that. I have no attachment to the "traditional" C++ programming style.

> This might be more tractable (and is an interesting idea). But that optimizer would be hard to write.

Why? The static analyzer has an opinion on whether or not a program is safe. The optimizer just wants to know if it still thinks it's safe when you remove a runtime check.

> I think this is what the new ISOCPP core guidelines are trying to do? Though they don't go far enough in preventing memory unsafety IIRC (this may have changed).

The ISOCPP core guidelines approach is to recommend the use of C++'s intrinsically dangerous elements in a way that is "usually safe", but not always, and rely on their static analyzer to catch bugs. So the question becomes, what do you do in the many cases where the static analyzer doesn't know if it's safe or not. You can try to redesign your code so the static analyzer can understand that it's safe. But that's often very inconvenient or has a performance cost. Often the most practical (safe) solution is to resort to something like SaferCPlusPlus.


> So wait, what more does Rust's static analyzer give us again? Does it somehow remove the need for refcounting heap objects?

Refcounting is rarely needed because most sharing is done via "borrows", which usually work via scope-tied "references" which may point to either the stack or the heap.

Implementing and enforcing local scope pointers in C++ via static analysis is not hard. Making it possible to thread borrows through APIs and annotate things with the borrowing semantics (which is what makes Rust avoid refcounting or even allocation costs) requires a bit more work.

> I have no attachment to the "traditional" C++ programming style.

Right, but at this point you have a very weird looking subset of C++ that can't seamlessly integrate with other libraries, and can't be translated to from regular C++ without significant human intervention -- why not just use Rust?

> Why? The static analyzer has an opinion on whether or not a program is safe. The optimizer just wants to know if it still thinks it's safe when you remove a runtime check.

I guess I misunderstood your proposal. This sounds doable. But, again, you'd be using a weird subset of C++ that doesn't seamlessly integrate, and you're just better off using Rust at this point.

Instead of trying to port Rust's guarantees to C++ it makes more sense to use the same principles to organically build on top of C++, in a different way. IMO this is sort of what ISOCPP is trying to do, but they're not quite there yet, and trying to find a compromise between making the language too different and making it safe is hard.

> So the question becomes, what do you do in the many cases where the static analyzer doesn't know if it's safe or not. You can try to redesign your code so the static analyzer can understand that it's safe.

This is always going to be a problem regardless of the static analyzer. You have to design it to reject these cases. Rust does this too; there are some edge cases where you need to design around the borrow checker (though usually this doesn't incur additional cost, and the most common of these are going to be addressed). If designing low level abstractions like vectors and stuff (or doing FFI), Rust gives you an escape hatch ("unsafe"), which has a couple of checks disabled and can be used to write the code you need (verifying safety of a program then just requires verifying that these blocks of code are sound and do not rely on any invariants that can be broken by code outside of them).


> > Why? The static analyzer has an opinion on whether or not a program is safe. The optimizer just wants to know if it still thinks it's safe when you remove a runtime check.

> I guess I misunderstood your proposal. This sounds doable. But, again, you'd be using a weird subset of C++ that doesn't seamlessly integrate, and you're just better off using Rust at this point.

My proposal is sort of language independent. I'm just suggesting a better way to address the code safety/correctness issue might be with runtime asserts, because it's more general. Some of the runtime asserts (like the ones regarding memory safety) will be automatically generated by the compiler, and others would be user defined (but compiler placed). And the static analyzer (I guess "the borrow checker" in Rust) would be repurposed to strip out the unnecessary runtime checks. And the compiler/optimizer would tell you which runtime asserts it was unable to optimize out. (Presumably good Rust code would result in all the memory runtime asserts being optimized out.)

This allows for programs that are not just memory safe, but "application invariant" safe as well. Right? I mean it's not really a totally new concept, I guess it's kind of "design by contract" or whatever, but with a slight performance bent because the optimizer tells you what runtime checks it's having trouble getting rid of. And maybe there would be a way to indicate that you expect the optimizer to be able to get rid of certain runtime checks, and instruct it to generate a warning (or error) if it doesn't. I'm just sayin'...


I don't think it works. All of the "runtime asserts" require bookkeeping. That bookkeeping ends up being worse in terms of performance than what you have with a GC.

It's hard to beat a modern, tuned GC.


> Right, but at this point you have a very weird looking subset of C++

It's a little weird looking at first glance, but ultimately it's not really that weird. The main unfamiliar thing is that objects that are going to be the target of a (safe) pointer need to be declared as such. So

    {
        std::string s1;
        auto s1_ptr = &s1;
    }
becomes

    {
        mse::TXScopeObj<std::string> s2;
        auto s2_ptr = &s2;
    }
s2 acts just like a regular string. It's just wrapped in a (transparent) type that overloads the & (address of) operator so that s2_ptr is a safe pointer. (For example, in this case s2_ptr cannot be retargeted or set to null).

> that can't seamlessly integrate with other libraries,

Sure it can, that's the point. For example:

    {
        std::string s1 = "abc";
        mse::TXScopeObj<std::string> s2 = "def";
        auto s2_ptr = &s2;
        std::string s3 = s1 + s2; // s2 totally works where an std::string is expected
        s3 += *s2_ptr;
        *s2_ptr = s1; // and vice versa
    }
> and can't be translated to from regular C++ without significant human intervention --

Umm, it could be automated, but you would need a tool that can recognize object declarations. But modern C++ code is mostly safe already. I mean you're supposed to try to avoid pointers in favor of standard containers and iterators. So just replace your "std::vector"s with "mse::mstd::vector"s and your "std::array"s with "mse::mstd::array"s and you're mostly there.

> why not just use Rust?

My impression is that Rust has been evolving a lot. Is the language stable now? Is it time to jump in? Has it vanquished D as the successor to C++? Are we happy with Rust's solution for exceptions?

Even if Rust is the future, and the future is here, I'm still stuck with existing C++ projects. And I'd feel better if they were (at least mostly) memory safe. There must be others in the same boat.


> It's a little weird looking at first glance, but ultimately it's not really that weird.

Readability is important for maintainable code. And safe coding patterns tend to involve a lot of sum types (which you can model in C++ with the visitor pattern, but it's significant overhead in code length and possibly even at runtime), and a fair amount of generics (which are cumbersome in C++, and the error reporting is awful). If you're not going to get the existing tool/library infrastructure either way, so you're just evaluating on their merits as languages, I don't think you'd ever want to pick C++ over Rust.

> modern C++ code is mostly safe already.

I've been hearing that for about a decade now (and I suspect the only reason it isn't longer is that I wasn't programming before then). And yet we still see bugs, all the time. Not subtle bugs, but stupid, obvious bugs.

> Is the language stable now?

Yes, as of 1.0.

> Is it time to jump in? Has it vanquished D as the successor to C++? Are we happy with Rust's solution for exceptions?

Yes.

> Even if Rust is the future, and the future is here, I'm still stuck with existing C++ projects. And I'd feel better if they were (at least mostly) memory safe.

My belief is that no amount of whack-a-mole is going to make those projects memory-safe, and none of the linters/checkers/dialects is ever going to reach a point where it offers actual guarantees. If it were possible it would have happened by now. The only way you're going to get to memory safety is by rewriting those projects, bottom to top (which is probably what you'd have to do to use one of these C++ dialects anyway). If you want to do the migration gradually (and you should!) rust has pretty good interop.


> The main unfamiliar thing is that objects that are going to be the target of a (safe) pointer need to be declared as such.

Your proposal was to take Rust's static analysis and make it work with C++. It's clear you don't know Rust. Why are you so confident about what kind of effect that would make on the language? Rust is not "like C++ but with more static analysis", it's a very different language. A lot of the safety that modern C++ gets you is something that Rust gets you, using different mechanisms.

> Sure it can, that's the point. For example:

This example seems to be a SaferCPlusPlus example? I'm talking specifically about your proposal to take Rust's static analysis and use it on C++. That isn't what SaferCPlusPlus seems to be doing. It seems like you might be talking about something else? The general applicability of safety based static analysis? I'm not arguing with that.

> My impression is that Rust has been evolving a lot. Is the language stable now?

Still evolving, just like C++ is, but is stable now. Has been for more than a year.

> Are we happy with Rust's solution for exceptions?

I am. Most folks in the Rust community are. There are no missing pieces now, though.

> Has it vanquished D as the successor to C++?

No, and that's subjective, and your C++-with-Rusts-static-analysis will not be in a different boat.

> I'm still stuck with existing C++ projects. And I'd feel better if they were (at least mostly) memory safe.

That's my point. The amount of work to convert existing C++ code to something that satisfies a static analyzer using Rust's exact set of invariants is just as much as the work required to convert to Rust. You won't be able to just throw a new static analyser at C++ code and stuff will magically work. It will require significant refactoring and effort. Nor will your code be able to easily talk with other C++ libraries.

> Umm, it could be automated

No, "human intervention" I said. It can't be automated easily, because the style it enforces is significantly different. I've done quite a bit of jumping back and forth between C++ and Rust these days (in the same codebase, with FFI), and the fact that the structure and style of programs is different is very apparent.

There is work on translating C to Rust (and might grow to C++ some day?), but IIRC you still need significant human intervention. For C at least there is no existing safety system to replace, so it's still easier, but translating from C++s (largely incompatible) existing safety system will be tough.

Translating code will need the translator to figure out what the code is trying to do, basically. This isn't like Python2->Python3. Like I said, the style enforced is different. I don't mean syntax style, I mean how code is structured at a higher level.

> I mean you're supposed to try to avoid pointers in favor of standard containers and iterators

If you want to be 100% safe you need to solve iterator invalidation and Rust's solution is something that is very hard to make work with C++s usual style of coding. If you want to avoid all unnecessary allocations and refcounting you need a lifetime system. To use Rust's model the mechanism of moving would have to be tweaked considerably.

Again, these problems can probably be solved organically from C++ itself (which I guess is what SaferCPlusPlus is doing?), building a static analyser that tries to solve them building on the existing mechanisms in C++. But importing Rust's analysis will just get you a completely new language which has almost no use.


> It's clear you don't know Rust.

Oh yeah, didn't mean to give the impression otherwise. But I think I've gained some understanding since yesterday. I'm just learning, but tell me if this I'm getting this at all:

- Rust only considers scope lifetimes (and "static" lifetime which is basically like the uber scope)?

- References can only target objects with a superset (scope) lifetime.

- You can only use one non-const reference to an object per scope. This solves the aliasing issue?

> This example seems to be a SaferCPlusPlus example? I'm talking specifically about your proposal to take Rust's static analysis and use it on C++.

Sorry, I misunderstood. I thought you'd switched context. Let me try again:

There are a couple of reasons for pursuing "Rustesque" programming in C++ as opposed to in Rust itself. First let me point out that there would have to be a mechanism for distinguishing between "statically enforced" safe blocks of C++ code and the rest of the code (just like Rust's "unsafe" blocks I guess).

So then the obvious advantage is a better interface to C++ code and libraries. Rust only supports plain C (FFI) interfaces? Is that right?

But another argument is that there multiple strategies to achieve memory safety (and code safety in general). The two popular ones are the Rust strategy and the GC strategy. One is not uniformly superior to the other. Superior maybe, but not uniformly so. Presumably the Rust strategy will be more memory efficient, and maybe theoretically faster, whereas the GC strategy might facilitate higher productivity.

If you choose Rust, you're committed to one strategy. Now, I don't know if it'll turn out to be realistic, but I'm wondering if it's possible that C++ can support both strategies. (And maybe some other ones too.) Not just different strategies in different applications, but even in the same application. The Rust static analyzer would of course only work on indicated blocks of code.

Of course writing code in one strategy or another would be more clunky in C++ than a language specifically designed for it, but everything's a trade-off. The question is, is it worth it?

It's easy to say the clunkiness isn't worth it, but Rust probably has the weakest argument in that respect. Right? (I mean doesn't Rust have a reputation of being clunky anyway?)

Again, I barely know any Rust, but it seems to me that the main safety functionality that Rust provides over, say, SaferCPlusPlus, is the static enforcement of "one non-const reference to an object per scope" as an efficient, but restrictive, solution to the aliasing issue.

Hmm, obviously I have to find some time to learn Rust better, but intuitively, it seems like the simple Rust examples I've seen so far would have a corresponding C++ implementation, and it's not immediately obvious to me why a static analyzer couldn't work on the corresponding C++ code. Is there a simple example that demonstrates the problem? Am I just underestimating the difficulty of static analysis?


> You can only use one non-const reference to an object per scope. This solves the aliasing issue?

More accurately, if you have a mutable reference you cannot have any other references.

> Rust only supports plain C (FFI) interfaces? Is that right?

Yes, but with bindgen you have a decent C++ interface.

My contention is that the "better interface" is only slightly better, and probably not enough to justify basically creating a whole new language. Note that for your safe RustyCPP code, the regular-C++ code will be completely unsafe to use and you'll have to write some safety wrappers that encode in the guarantees you need. I've been doing this in the Rust integration in Firefox, and I'm sure that a dialect of C++ that uses Rust's rules will need to do something similar. That's where the bulk of the integration cost comes from.

> If you choose Rust, you're committed to one strategy

I mean, you can just blindly use Rc<T> or Gc<T> in Rust (Gc<T> only exists as a POC right now but we plan to get a good one up some day).

But yeah, magical pervasive GC would be hard to do in Rust.

> The question is, is it worth it?

You're arguing between choosing Rust vs CPP-with-static-analysis. I'm arguing between choosing Rust vs CPP-with-Rust-esque-static-analysis. I think the latter strongly points towards Rust, but the former has interesting tradeoffs.

> I mean doesn't Rust have a reputation of being clunky anyway?

Not ... really? It has a reputation for having a steep initial learning curve.

> it seems like the simple Rust examples I've seen so far would have a corresponding C++ implementation

Oh, this would work. But the reverse -- taking C++ code and making it work under the Rust rules -- is very hard. Not because of the aliasing rules, but because of how copy/move constructors are used in C++ (Rust's model strongly depends on initialization being necessary), the whole duck-typed-templates thing in C++, and similar things with respect to coding patterns that don't translate well.

Again, you could build a safety system on C++ that respects these patterns, but it would not be the same as taking Rust's rules and enforcing them on C++.


> Well, like I said in the other comment, you guys could fix that by unbundling the static checker in the Rust compiler and making it applicable to (a subset of) C++ code as well :)

No, we can't do that. It is incompatible with C++.

> Rather than disallow code that can't be verified to be (memory) safe, the compiler could instead inject runtime checks that would be optimized out using the same analysis that the static checker uses.

That is not possible. It would require massive bookkeeping, much like your library does. That would eliminate most of the benefits of Rust.


Sometimes I think Rust people lose the forest for the trees. The end goal isn't for the compiler to verify the safety, the end goal is for the software itself to be safe in a way that's cheaper.

It doesn't really matter if they both end up at the same place, which is safe software.


> Sometimes I think Rust people lose the forest for the trees. The end goal isn't for the compiler to verify the safety, the end goal is for the software itself to be safe in a way that's cheaper.

I don't care if the software is verified via libraries or compilers. The problem is that C++ verifiers don't work.


> It doesn't really matter if they both end up at the same place, which is safe software.

I think the contention is that, unless you're applying NASA style rigor, you don't end up in the same place without verifying the safety automatically, because in practice it's too expensive to verify the safety manually (without getting squeezed out of the space by your competitors.)

SaferCPlusPlus's goals are noble, but approaching the problem with a library-only solution is problematic. None of the huge swaths of legacy and third party code I'd like to sanitize uses it - and a large scale rewrite to 'fix' that may very well introduce more bugs than it fixes. A library cannot 'fix' fundamental language constructs either, short of telling you to please remember to perfectly avoid those language constructs even if you're very very used to them. Frankly, I'm skeptical of how useful I'd find SaferCPlusPlus even for new projects - especially when modern SC++L implementations already have a lot of error checking code built into them as well, at least for debug builds.

Meanwhile, I already credit these to saving me at least a month of debugging time: http://clang.llvm.org/docs/ThreadSafetyAnalysis.html

I'm interested in Rust because it takes the same approach to securing code from bugs as seems to help a lot when I apply it to C++: Static analysis and annotations, designs to make edge cases impossible to ignore, and where static analysis cannot perfectly find all problems, let it error out reliably at runtime instead of randomly corrupting memory unless I really really really mean it.


> I think the contention is that, unless you're applying NASA style rigor, you don't end up in the same place without verifying the safety automatically, because in practice it's too expensive to verify the safety manually (without getting squeezed out of the space by your competitors.)

That's a claim that has yet to be shown to be true. Maybe it is true, and maybe it isn't, but C++ compilers tend to give pretty good warnings that you can treat as errors, and coupled with good external tools it isn't clear that rust is significantly safer than C++.

The scary part of it all is how many rust users seem to think that it is a given when even the rust standard vec container has unsafe code in it.

I personally think that if rust is shown to statistically decrease the security/error rate on large projects, it's going to be with the use of 3rd party tools, not the specific semantics of the language. I'm of the opinion that the beauty of the unsafe block isn't in any inherent "safety", as much as it is giving more semantics for 3rd party tools to analyze.


> C++ compilers tend to give pretty good warnings that you can treat as errors

They miss far too many simple cases for this to possibly be a sensible claim, e.g. neither gcc -Wall nor clang -Weverything warn about the two massive problems in the following code:

  #include<vector>

  int &foo() {
    std::vector<int> v{ 0, 1, 2, 3 };

    int &x = v[0];
    v.clear();

    int y = x; // dereferencing dangling pointer!
    (void)y;

    return x; // escaping a dangling pointer!
  }
Rust is clearly a step up since it does actually catch these. The Rust compiler is the "third party" tool that helps get better code, unlike C and C++, the static analysis is built-in.


MSVC is continuing to improve their detection of invalidated pointers: https://youtu.be/hEx5DNLWGgA?t=3231

This is running the static analysis pass instead of the normal compile pass, but stuff is improving. Of course you're preaching to the choir as far as I'm concerned - this stuff is way late to the party, and speaking generally, has issues with false positives and failing to detect things.


I'll have to check when I get home, but I'm fairly certain you're purposefully suppressing the compilers warnings here. That's not a good way to make your argument about the compiler not being able to warn you about problems.


Yes, I'm purposely suppressing the unused variable warning with the (void)y;, because presumably real code will actually do something with the value: I could've printed y or left out that line, whatever, the compilers still don't warn about the actual major problems.


Your argument as to why the compiler won't warn you about problems is to show an example where you purposefully supress the warnings the compiler gives you.

I honestly don't think we should continue this conversation.


Suppressing the unused variable warning is:

- unrelated to the dangling pointers

- not suppressing a warning about a memory safety problem

- not effecting the lack of warnings for the memory safety problems: remove the `(void)y;` line and there's still no warnings about the dangling pointers.

Seriously, you are focusing on something irrelevant. Either pretend I didn't write that line, or pretend it was std::cout << y << std::endl;. The fundamental fact remains that the compilers do not warn about the major problem of handling dangling pointers, despite both of these being fairly trivial cases, just a tiny step up from pure stack allocation.

Yes, C++ compilers do have some warnings for some things, but the interesting warnings for this topic are insidious memory safety bugs like dangling references, not the basic unused variable ones. Rust warns about both, C++ compilers catch only the second one: the code I wrote is wrong for two reasons, and neither of those reasons is the unused variable.

If you're going to tout the quality of C++ compiler's warnings, they better flag as many cases of problems like use after free (and use after move), dangling references and iterator invalidation as they can, but I've never had a C++ compiler warn about any of these (other than the most basic case of returning a reference to a local variable).


> The fundamental fact remains that the compilers do not warn about the major problem of handling dangling pointers

I'm going to quote myself, emphasis mine.

"The end goal isn't for the compiler to verify the safety, the end goal is for the software itself to be safe in a way that's cheaper."

[snip]

"C++ compilers tend to give pretty good warnings that you can treat as errors, and __coupled with good external tools__ it isn't clear that rust is significantly safer than C++."


quoting pcwalton up above:

"I don't care if the software is verified via libraries or compilers. The problem is that C++ verifiers don't work."


right, I too can make assertions with no evidence to back them up.

If that's really your bar, then we don't have much more to discuss.


With no `unsafe` blocks, show me how you could use the `Vec` type to break Rust's memory safety guarantees.


That's the point, there are plenty of things that flat cannot be done without using inherently unsafe operations, even in rust.

This is why it has still yet to be shown that rust is actually safer than C++.


That's not the point. Array types in Ruby and Python are implemented in C. No one goes around saying those languages are actually no more memory safe than C++ (or maybe you do?).


> No one goes around saying those languages are actually no more memory safe than C++ (or maybe you do?).

It's unfortunate that you've chosen to try and make the scope smaller by referring specifically to "memory safety".

As a result, this will be my last response to you, I just don't have the energy to go back and forth with someone who isn't willing to be honest in this discussion.

But to answer your question, those languages are no safer than C++. I can write a C plugin in both that contains memory leaks and various safety issues. And in fact, both projects have had their own security problems.


> But to answer your question, those languages are no safer than C++. I can write a C plugin in both that contains memory leaks and various safety issues. And in fact, both projects have had their own security problems.

This definition makes any comparison of the safety of different languages totally useless: according to it, all languages are equally unsafe. You're free to want to use that definition, but it's a tautology and thus doesn't actually allow distinguishing between anything nor serve any purpose.

It's true that all languages offer escape hatches, but it's also true that there's a major qualitative (at least) difference between the constrained rarely used escape hatches of Python, Java and Rust, and the "the whole language is an escape hatch" approach of C++ and C.

In mathematics and the verification of programs, proofs will build from small proofs: first show that a function `foo` has a certain behaviour and then use this to show that `bar` (which calls `foo`) has another behaviour, etc etc, until the whole program is proved correct. Languages like Python, Java and Rust are designed with this in mind: prove the unsafe code correct and the language guarantees the rest of the code is memory safe. C and C++ have no such properly: a proof of memory safety requires touching every single line of code, not just the small number that actually need to escape down a level.


> It's true that all languages offer escape hatches

And that all languages experience safety issues as a result of these escape hatches, and that all languages suffer security issues despite sequestering these escape hatches.

Which goes back to what I said before.

"That's a claim that has yet to be shown to be true. Maybe it is true, and maybe it isn't ..."

[snip]

"I personally think that if rust is shown to statistically decrease the security/error rate on large projects, it's going to be with the use of 3rd party tools, not the specific semantics of the language. I'm of the opinion that the beauty of the unsafe block isn't in any inherent "safety", as much as it is giving more semantics for 3rd party tools to analyze."

> In mathematics and the verification of programs, proofs will build from small proofs: first show that a function [snip]

This is a non-sequitur. You're trying to compare a deductive proof in a formal logic system whose only requirement is to be internally consistent with messy reality. Look at the difference in approach. I said we won't know if until we have enough experience and data to analyze to see if there's a significant statistical difference between the error rates of software written in C++ vs Rust. You basically said we already know because we can write small programs that are safe, therefore we can write large programs that are safe. It's a non-sequitur.

> a proof of memory safety requires touching every single line of code, not just the small number that actually need to escape down a level.

And the same can be said of Rust, the unsafe blocks give a false sense of security. No one really cares if it crashed in an unsafe block if the root cause is from state manipulated in safe code somewhere away from the unsafe block. It takes a lot of discipline and scrutiny to make sure you don't accidentally put the state into a spot where the unsafe block can do bad things. This is the same sort of discipline required in C++.

That's the point you're not getting, and it's why I think 3rd party tools that can tell us more about the code being affected by the unsafe block is going to be more useful in the long run. Imagine a tool that gets run on checkin, or at specific intervals that can identify immediately that there are code changes that manipulate state that an unsafe code block depends on? It means developers can then examine the changes to make sure nothing bad happens.

Or you're in an IDE that changes the variable color to indicate that what you're working with affects an unsafe block, so you can be sure that you need to pay extra careful attention and definitely get a code review.

These same techniques work succesfully in C++. People deal with it in the exact same manner, they put it behind an interface and use code reviews and external tools to identify potentially dangerous things that human beings then step in and examine much more closely.

The point is, there is nothing inherent in rust that definitely makes it safer C++. There are potentially aspects of it that enable better tooling that could eventually make it safer than C++, but it will take time and careful analysis before it's obvious that it's safer.

Modern C++ tends to sequester these things off the way Rust would.


> It's unfortunate that you've chosen to try and make the scope smaller by referring specifically to "memory safety".

Okay, back to the broader scope - what's an area that you think Rust might do worse than C++ at? I'd be very interested in fixing any blind spots I might have.


unsafe blocks play no role in any kind of security aside from memory safety. You are communicating with extreme disingenuity.


Hey, you're right, getting memory usage correct doesn't affect safety at all.

good day.


Reading an array out of bounds is definitely unlikely to be correct/be a security vulnerability. Memory safety is absolutely a prerequisite for any other sort of safety one might want.


We agree on that, my point is that C++ does it via libraries, Rust does it by hiding unsafe blocks behind interfaces (aka libraries).

Time will tell which approach is ultimately superior (if either one of them is actually better), but until the it isn't clear that the Rust approach is statistically better than the C++ approach.

Ultimately the advantage Rust has is the ability to possibly provide better 3rd party tooling that will enable developers to make the right decisions more often than C++ does. Consider a tool that runs on code checkin that spits out a report of all sites where code that manipulates state that could affect an unsafe block was changed/written so that developers could then have a very focused peer review of the code to ensure the safe code doesn't put the state in such a spot that it causes problems.

I think in this way Rust may eventually be shown to be better than C++, but then again, maybe not.


> We agree on that, my point is that C++ does it via libraries, Rust does it by hiding unsafe blocks behind interfaces (aka libraries).

That's a false equivalency and completely ignores the fact that C++ is one giant `unsafe` block.

> but until the it isn't clear that the Rust approach is statistically better than the C++ approach

Could you please explain what kind of evidence would convince you?


> That's a false equivalency and completely ignores the fact that C++ is one giant `unsafe` block.

This is exactly what I meant when I said rust people miss the forest for the trees.

> Could you please explain what kind of evidence would convince you?

you quoted me explaining what I would need.


and not just you, but also show http://rust-lang.org/security.html


> None of the huge swaths of legacy and third party code I'd like to sanitize uses it - and a large scale rewrite to 'fix' that may very well introduce more bugs than it fixes.

SaferCPlusPlus is designed for compatible interaction with unsafe legacy code and library interfaces. Some may see this as flaw. But it allows you to incrementally "improve" C++ code without requiring a total rewrite. It also means that members of a team can adopt it unilaterally. It's regular C++ code that won't interfere or impose on your co-programmers, even when you're working on the same code.

> A library cannot 'fix' fundamental language constructs either, short of telling you to please remember to perfectly avoid those language constructs even if you're very very used to them.

Right, but the "safe replacement" elements in the library are designed to behave just like their unsafe counterparts, perhaps making the transition easier. In terms of enforcement, I think it may be a "use it and they will build it" scenario. Once there is significant adoption of the SaferCPlusPlus library, it should take a relatively modest effort to implement a static enforcer. I mean, you just want to flag any uses of unsafe elements, not even do any analysis on them.

> Frankly, I'm skeptical of how useful I'd find SaferCPlusPlus even for new projects - especially when modern SC++L implementations already have a lot of error checking code built into them as well, at least for debug builds.

That's the beauty of SaferCPlusPlus. Let's say you're using std::vector<> somewhere in your program. You can just replace "std::vector<>" with "mse::mstd::vector<>" and now your vector is (optionally) safe. With a compiler directive you can choose to "disable" the safety features in any build (i.e. mse::mstd::vector<> will be automatically aliased back to std::vector<>). Compilers generally just do bounds checking (the "sanitizers" notwithstanding). SaferCPlusPlus checks for things like "use-after-free" as well.

And you don't need to link to any library. You just need to add a couple of header files to your project.

> Meanwhile, I already credit these to saving me at least a month of debugging time: http://clang.llvm.org/docs/ThreadSafetyAnalysis.html

The sanitizers are fantastic. But they're not quite a substitute for SaferCPlusPlus [1]. SaferCPlusPlus addresses the issue of safely accessing objects from asynchronous threads.

> Static analysis and annotations, designs to make edge cases impossible to ignore, and where static analysis cannot perfectly find all problems, let it error out reliably at runtime instead of randomly corrupting memory unless I really really really mean it.

SaferCPlusPlus is not a competitor to, or an excuse to neglect static analysis. SaferCPlusPlus exists because static analysis does not fully solve the problem.

[1] http://duneroadrunner.github.io/SaferCPlusPlus/#safercpluspl...


Sorry, I misread "ThreadSafetyAnalysis" as ThreadSanitizer [1]. Like I said, static analyzers are great. Some may feel that they sufficiently address the code safety issue in practice, some may not.

[1] http://clang.llvm.org/docs/ThreadSanitizer.html


You're 100% correct that the end goal is safe software in a way that's practical to achieve. However, having the computer check one's code is generally regarded as a great way (even the best way) to do this: NASA's JPL doesn't accidentally recommend[0] turning on all compiler warnings and using static analysis tools, and it seems a little unlikely that most major tech companies would be spending millions on static analysers and statically-typed languages if they didn't think it helped them write correct code.

[0]: https://en.wikipedia.org/wiki/The_Power_of_10:_Rules_for_Dev...


Sure, but C++ also has tools to do this checking in an automated way.


Well... how do you know the code is safe otherwise? Exhaustive tesitng is unreasonable. How do you ensure that you have achieved the end goal of safe software?


> because everyone already has their 20 year old [C] codebases

It's comments like these that remind me how exclusionary the software world is. Your definition of "everybody" is such a tiny number of people. But that's who you have in mind when you are constructing the world around you each day.


Well, one might notice "everybody" has to be a rhetoric exaggeration, since there's absolutely no way even remotely close to "everybody" would have any codebase at all whatsoever, not to talk about 20 year old legacy. Right?

Second, I was responding to a comment talking about "existing, critical, real time applications". Of which a huge number of cases do have existing, very old legacy codebases.

Third, I fail to see what you tried to bring to the conversation. If your only problem was with my rhetoric, see above.


I wouldn't hold your breath for C to die though. C sucks at many things, but it's pretty good in embedded, if you're not writing ASM.


Actually, the only real advantage that C has over rust there is the availability of trimmed libcs.


Rust has libcore, which is, in many ways, more featureful than libc despite also having zero dependencies. From my perspective, the main advantage of C is the way in which chip manufacturers only provide (poorly supported/bug-ridden) C compilers, but this is likely to become less important as ARM takes over more and more of the world: it is only getting easier and cheaper to throw a full ARM chip into a device, due to economies of scale.


C seems to be going through yet another renaissance.

C is a smaller language to learn.

There is tons of legacy code even for embedded systems.


C is a smaller language to learn.

There are engineers with 20 years of C programming experience that will still make security errors while handling basic strings. "Small" does not mean "good" and "learning" a language doesn't mean you'll write good code with it.


No, small is good. But C isn't small. It's actually massive, and not terribly orthogonal. It's peppered with special cases, and things people think but aren't actually true (how would you check for an integer overflow in C?).

It's like comparing x86 to, say, m68k (or most things, really). One was designed. The other is an ungainly mess of hacks on top of hacks, which has a good, elegant design in there somewhere, desparately trying to get out. Guess which one is x86.

Now guess which one is C.

Worse really is better. Or at least, good enough.

C isn't a complete mess, and you can write good code in it if you're very careful, but it's not great.


Why would you use libc embedded? Most of libc kind of expects an OS to be there.


Uh? There are definitely trimmed libc's (not GNU libc!) that run on bare metal. Such as the one used by avr-gcc for example...


Yeah, but why? I've never used a libc on embedded, and I'm kind of confused as to why you'd want to. Am I just Doing It Wrong™?


Atleast for avr-gcc, (part of) startup code and vector table layout come from avr-libc.

I'm kinda surprised you got by without libc - no strxxx, memxxx ,xxxprintf/no math.h functions ever?


No. But my embedded projects weren't particularly impressive. I can at least see the need now.


A load more developers as well?


It's starting to go that way but that's a very hard space to push new things into. I talk with all my ex-gamedev contacts and they're hesitant to even use lambda or other C++11 features that have been around for a while now.

I think Mozilla's plan of driving things forward with Servo and using that as an large-scale example of the gains that can be made is a good approach.


> IMO it would be great to get folks who write the enormous base of existing realtime apps driving critical devices everywhere to sit up and take notice of Rust.

Rust cannot make any latency guarantees either. Reference counting and its lifetimes also have pathological cases, ie. worst-case, an object can reference the entire heap which will take time proportional to the number of dead objects to free.

Copying collection in this case takes literally zero time, but it's pathological case is when all referenced data survives the current GC cycle, ie. proportional to live objects.

There's no free lunch!


> Rust cannot make any latency guarantees either. Reference counting and its lifetimes also have pathological cases, ie. worst-case, an object can reference the entire heap which will take time proportional to the number of dead objects to free.

Rust doesn't use reference counting by default. Refcounting is very rare in Rust, much more rare than it is in C++. Most large C++ codebases I've worked with have thrown in the towel and started refcounting all the things. In Servo, for example most of the refcounting is across threads (where you basically have no other option), and a few interesting cases in the DOM, each with very good reasons for using refcounting.

Lifetimes are a concept at compile time and don't exist at runtime.

Edit: Oh, I see what you're talking about. A sufficiently large owned tree/graph in Rust will introduce latency. It's predictable latency though. I can make the same argument about for loops.

Unpredictably sized large trees in Rust are again pretty rare in general.


Trees don't have to be refcounted in Rust. Single-ownership trees are possible. As long as they don't have backpointers. Backpointers are a problem under single ownership.


Right, I never said that trees have to be refcounted. A sufficiently large ownership tree will get deallocated all at once, which is the kind of latency the GP is talking about.


Linked data structures in Rust get complicated, though. See the "Too many lists" book.[1] Doubly linked lists, or trees with backlinks, are especially difficult. Either you have to use refcounts, or the forward pointer and backward pointer need to be updated as an unsafe unit operation. There might be an elegant way to do this with swapping, but I'm not sure yet.

[1] http://cglab.ca/~abeinges/blah/too-many-lists/book/


Right, so you implement them with unsafe. While you can implement doubly linked lists safely with refcounting, you're perfectly free to implement them with unsafe code. This is what unsafe code is for, designing low level abstractions with clean API boundaries.

(Also I don't see how this is relevant at all)


Right, so you implement them with unsafe.

If you need unsafe code for basic operations within the language, something is wrong with the language. This isn't about talking to hardware, or an external library. It's pure Rust code.

(Some pointer manipulations can be built from swap as a basic operation. That may work for doubly-linked lists. The other big problem is partially valid arrays, such as vectors with extra space reserved. There's no way to talk about that concept within the language. There could be, but this isn't the place to discuss it.)


> need unsafe code for basic operations within the language

Building custom back-referencing data structures is not a "basic operation" anywhere outside programming classes. Adding significant complexity to rust Rust to make a 2% case marginally safer would be make the language worse. As long as the vast majority of code is not unsafe, then it achieves its goal.


I have been writing Rust code for almost three years now.

I have helped design a low level data structure exactly twice. In both cases, this was a highly custom concurrent data structure, which would have been even harder to get right in C++ or some other language.

If you need a regular run-of-the-mill datastructure it will exist in the stdlib or crates ecosystem. This is not a "basic task". Just because schools teach it early does not make it a "basic task". It's a task that needs to be done at some point, but doing it once and making it part of the stdlib or a crate is all that is necessary. It has become a "basic task" in C++ because it's easy enough to do that you don't need to reach for the stdlib, but that doesn't mean that it's necessary to have a bespoke implementation of a DLL that often in C++; usually the stdlib one will do.

The same "too many lists" book you linked to explains why DLLs are niche datastructures on the first page (singly linked lists can be implemented safely in Rust, though they can be somewhat niche too).


You will always have these pathological cases when you choose to use higher level memory management like simple reference counting or garbage collection no matter what language you use, whether it's Rust or assembler. The point of Rust is that you have complete control over what you use and pay for. If your concern is the overhead of lifetimes then you need to evaluate if you can afford heap allocation in the first place. Otherwise you can make de/allocation explicit in Rust just like in C, without losing the benefits of ownership checking.

Embedded hardware and software can only provide realtime guarantees because they are simpler, without complex pipelines, caches, branch predictors, or thread schedulers. If you want low latency embedded software you have to document the pathological cases, test whether they happen in real world use, and profile the code with each microarchitecture you're targetting anyway, let alone every product family. What language you use doesnt change that.


> You will always have these pathological cases when you choose to use higher level memory management like simple reference counting or garbage collection no matter what language you use

Not true, soft and hard realtime garbage collectors exist. Your runtime simply needs to bound the amount of reclamation work done at any given time.

For instance, the cascading free behaviour Rust is currently susceptible to can be broken up into a bounded series of free operations interleaved with ordinary program execution. Rust would then be realtime without truly changing its observable behaviour, except its timing in some programs.


> For instance, the cascading free behaviour Rust is currently susceptible to can be broken up into a bounded series of free operations interleaved with ordinary program execution.

You can probably make this work by plugging in a different allocator, if jemalloc doesn't do this already. The ability to batch up frees and mallocs isn't tied to GCs.

This won't reduce the perf impact of running a large tree of `Drop` impls, but it will reduce the free calls.


> You can probably make this work by plugging in a different allocator, if jemalloc doesn't do this already. The ability to batch up frees and mallocs isn't tied to GCs.

That gets tricky, because Rust people no doubt expect deterministic destruction on scope exit. But yes, my ultimate point is that low latency is a property of a runtime, not a language. C/C++ or Rust aren't going to automatically give you bounded latency, and adding tracing GC doesn't automatically take it away.


> Rust people no doubt expect deterministic destruction on scope exit.

Deterministic destruction, but not deterministic deallocation :)


But this expectation is transitive. If you have an array of file handles, if you defer deallocating some of them but destruct them all upfront, you still have the latency issue we've been discussing. And if you defer destructing too, then you still have non-deterministic destruction and deallocation. I'm not sure there's a way around this tradeoff.


Right, I already mentioned this a couple of comments ago.

You can use various arena-like structures where you explicitly forgo the guarantee of deterministic drop for this.


>> Not true, soft and hard realtime garbage collectors exist. Your runtime simply needs to bound the amount of reclamation work done at any given time.

That doesn't change anything! You're just choosing a garbage collector with a default deterministic pathological case, which is a guarantee you can make about almost any GC by carefully tailoring your memory usage to your scenario and choice of algorithm. That's all realtiem embedded software development is all about: writing code that has predictable timing given your expected inputs and environment. If all you need to do is flip a bit once every 10 minutes with a precision of 1 second while reading 1 bps from a sensor even a full blown Linux distribution on a modern Intel i7 running a Python or Ruby daemon can be considered "realtime". The language doesn't matter as long as you can predict how long everything is going to take in the worst case and your micro[controller/processor] is fast enough to react.

>> For instance, the cascading free behaviour Rust is currently susceptible to can be broken up into a bounded series of free operations interleaved with ordinary program execution. Rust would then be realtime without truly changing its observable behaviour, except its timing in some programs.

You know that's what the Drop trait is for, right? All you have to do is add whatever memory management code you'd have (in your C program) into the trait implementation and your memory deallocation will behave exactly as it would in any other low level language. These low level facilities have been part of the Rust design from the start, they just don't require you to manually call free() by default. That doesn't mean anything in Rust is stopping you from doing so and if you want to, you can opt out of that behavior entirely by providing a blank Drop implementation. After that, literally anything you can do in C you can also do in a Rust unsafe block.


> That doesn't change anything! You're just choosing a garbage collector with a default deterministic pathological case, which is a guarantee you can make about almost any GC by carefully tailoring your memory usage to your scenario and choice of algorithm.

The fact that you don't have to tailor anything is precisely the point. Latency is a property of a runtime, not a language. This has been my point all along. C/C++ or Rust don't guarantee low-latency realtime properties, and introducing tracing GC doesn't guarantee high-latency non-realtime properties.

> You know that's what the Drop trait is for, right? All you have to do is add whatever memory management code you'd have (in your C program) into the trait implementation and your memory deallocation will behave exactly as it would in any other low level language.

Great, but it doesn't guarantee any properties of code you haven't written, so it still can't achieve the global properties I've been talking about.


> C/C++ or Rust don't guarantee low-latency realtime properties, and introducing tracing GC doesn't guarantee high-latency non-realtime properties.

We completely agree.

> Great, but it doesn't guarantee any properties of code you haven't written, so it still can't achieve the global properties I've been talking about.

How is this any different from C/C++? They do not give you any guarantees that Rust takes away in this regard. Any library that uses Box::new or vec! is exactly the same as a C library that calls malloc/free internally and you can implement the same heap allocation free algorithms in Rust as you can in C/C++.

I don't understand what global properties you expect a low level systems language to guarantee. They definitely can't guarantee that code you haven't written doesn't heap allocate, you have to check that they don't call malloc/free yourself.


> Your runtime simply needs to bound the amount of reclamation work done at any given time.

Wouldn't this transform the problem into a "no more predictable maximum memory usage" problem? As you can't really know if and when your GC will keep up it with the amount of work to do.


Possibly, but maximum memory usage is rarely predictable anyway. I expect it might be even less predictable than maximum latency.

However, it may still be possible to conservatively bound your maximum memory usage too, as long as your reclamation-work phase keeps up with your program's allocation rate, then you achieve a steady-state.

Suppose some amount of reclamation is done on malloc(), a tunable parameter could measure the ratio of allocation speed of the running program and amount of unreclaimed garbage. This ratio would control how much reclamation work to do before returning from malloc() so you can fall into steady-state.


> Possibly, but maximum memory usage is rarely predictable anyway. I expect it might be even less predictable than maximum latency.

Well, if you don't need a bound on memory usage you can just never deallocate.

> Suppose some amount of reclamation is done on malloc(), a tunable parameter could measure the ratio of allocation speed of the running program and amount of unreclaimed garbage. This ratio would control how much reclamation work to do before returning from malloc() so you can fall into steady-state.

Sure, but that doesn't guarantee anything about what your maximum spikes are going to be. You can have a firm bound on memory consumption or a firm bound on latency, but you can't get both without doing some serious application-specific work.


I keep seeing this latency claim about GC, but it would be trivial to solve with free if it were actually a problem: just add freed objects to a list and incrementally free over time to achieve whatever latency guarantees you wish.

The reason why no malloc/free implementations that I'm aware of actually do this is that the latency of freeing isn't a problem in practice.


> The reason why no malloc/free implementations that I'm aware of actually do this is that the latency of freeing isn't a problem in practice.

Partly, and the other part is that it degrades allocation performance for the majority of non-problematic programs, which is what most people actually focus on.

But if we're being fair, latency of tracing GC isn't a problem for most programs either. So latency is largely a red herring, except when it's not, and you had better know when it's not, regardless of whether you're using C/C++/Rust or a runtime with tracing GC.


It's worth mentioning that there are several strategies for avoiding cascading deallocations like arenas or arena-backed graph abstractions. For example:

https://crates.io/crates/typed-arena https://crates.io/crates/petgraph

Rust's generics make this fairly pleasant to work with, and lifetimes/borrowck ensure safety when managing your own object allocations.


Indeed, if you can live with the wasted memory use of objects outliving their conceptual lifetime, regions/arenas are a good solution.

Note however that you'd probably still have to run destructors when destroying an arena (to free file handles for instance), so you can still see high latency. With an arena you can perhaps schedule this better though.


> if you can live with the wasted memory use of objects outliving their conceptual lifetime, regions/arenas are a good solution.

If you can live with the wasted memory use of objects outliving their conceptual lifetime, garbage collectors can be a good solution too.

Not that that's a bad thing for many use cases, but your above comment implies a comparison between Rust and GC. I think the quoted critique falls down a bit when Rust lets you opt-in to generational-esque GC-ish behavior with a very similar downside to what you'd get from a GC.


Lifetimes are compile-time only and do not do any reference counting. So Rust has the same latency guarantees as C, for example.


> Lifetimes are compile-time only and do not do any reference counting.

I never said they did, I said lifetimes and reference counting both have this pathological case.

C also doesn't provide latency guarantees, as the same pathological programs can exist in C as well. It's a total myth that you need C in realtime domains due to "latency".

Maximum pause times are a property of a particular runtime, not a language.


"C also doesn't provide latency guarantees, as the same pathological programs can exist in C as well."

But you have to code them. They are predictable, or you let them in by allowing data structures to grow indefinitely

With C you can make latency guarantees. You can write code that does not have such guarantees but it is your choice

With GC you are not in control so there are fewer choices. You will not be able to make guarantees.


> With GC you are not in control so there are fewer choices. You will not be able to make guarantees.

Not true. Hard and soft realtime GCs with sub-microsecond latencies exist. Latency is a property of a runtime, not of manual vs. automatic storage reclamation.


That is a misunderstanding of latency.

"Very fast" is still latency.


"No latency" is a fiction. The only question of any relevance is how much latency is tolerable for a given domain. And describing latency in worst-case timings is standard, so I understand latency just fine thanks.


What is the pathological case with lifetimes? You're saying it takes "time proportional to the number of dead objects to free", but as the parent said, lifetimes are a compile-time construct, so they have no runtime properties.

(I'm not saying that for sure there are none, I'm saying that it seems like you're talking about refcounting only, the lifetime bit is unclear to me.)


I suspect they're referring to graphs of Drop implementors, based on the sibling thread. If you for some reason have a linked-sea-of-nodes data structure that has to traverse itself on drop, that can behave similarly to dropping an Rc graph, though it still doesn't use lifetimes.


I guess that would make sense, but I'm not sure it's lifetime-specific though. C++ doesn't have lifetimes but would still have this problem.


Yes, C/C++ would also have this problem. The point I was trying to make is that incrementality/latency is a property of a runtime. If your program has deep ownership graphs, any kind of naive reclamation procedure is going to have high latency, even if it's written in C/C++.


Quite fair! Thanks for elaborating.


Regardless there is no overhead for lifetimes. It is a data flow analysis problem used solely to verify correctness of the code.


I think they're referring the the calls to "free" that are automatically injected.


Rust itself can't, but you as the programmer can.

It's not quite C, but you can do a lot of reasoning about what the compiler will do with your code, and avoid pathological cases.


Absolutely, you have to be aware of the ownership graph depth in both C and Rust if you want to bound latency.

You don't have to do this with tracing GC though, you just need a runtime that implements latency bounds.


Wait, why would you have to do that? Most ownership is determininstically resolved at compile time, so you can know exactly when a resource will be freed. What you do have to know is about the rare refcounted variable, and what edge cases require ownership checking at runtime.


The latency problem isn't caused by determining what to free, the latency problem is caused by actually freeing. Imagine an array with 2^31 pointers, and now fill it with with 2^31 distinct pointers to the remainder of the 2^32 bit address space. When that array goes out of scope, you can now enjoy 2^31 individual free operations, because reclamation for Rust lifetimes and reference counting are both proportional to the number of dead objects (copying collection takes zero time in this case).

If bounded latency is a goal, you have to bound the depth of of your ownership graph if you're working on a platform that doesn't impose global latency properties. C/C++ and Rust do not do this.


This is only true for pure two-space copying collectors, which are rarely used in practice because of the absurd memory overhead. Once you introduce mark/sweep for some portion of the heap (like production GCs do), you reintroduce overhead proportional to the number of dead objects during the sweep phase.


Ah.

But C has this problem as well. If you've malloc(3)ed an array of 2^31 pointers, each pointing to an object, enjoy your 2^31 free(3)s, or prepare to start leaking RAM.

So what's your point?


Yes, C/C++ and Rust have the same problems, as I said elsewhere. My ultimate point is that low latency is a property of a runtime, not a language. Using C/C++ or Rust aren't going to automatically give you bounded latency, and adding tracing GC doesn't automatically take it away.


The language and runtime typically cooperate to provide the operational semantics that developers are looking for. The case of Objective-C is especially interesting in this regard: developers evolved a number of conventions around reference counting, because reference counting allowed for controllable, minimal latency and contributed to a snappy UI. The language gradually absorbed these conventions into the compiler, such that certain patterns of use are part of the language specification (certain method names, basically) and the ARC code is generated for developers.


Oh.

Well then actually, we're in complete agreement. Sorry.


I believe their point is that an object can own potentially many objects, and when it is dropped it could cause a cascade of dropping. Which may not be expected by someone.


Well, the good news for you then is that there is progress underway to implement Gc-as-a-library in Rust, to give you this option if you need it as well.


The latency of your Rust or C program is "static" in that you can infer it from the program text. This is not actually true of most garbage collected languages. (Erlang, with per thread heaps, is a notable exception.)


> even high-powered hardware can take a "major" hit from a GC pause when your application is extremely latency sensitive.

That's true, but high-powered, abundant-RAM realtime applications can use approaches that are cheaper than Rust's. See, e.g., the interesting work currently being done updating realtime Java[1]. The idea is that memory is composed of a few kinds of lifetimes: eternal, scoped and GCed-heap. Scoped memory is basically nested arenas, and GCed heap contains objects that are used by non-realtime portions of the app (which, even in realtime systems, may comprise the majority of code, especially when the system runs on large servers).

An approach like Rust's, however, is crucial when the application is RAM and/or energy constrained.

[1]: https://www.aicas.com/cms/en/rtsj


If I had infinite free time, I'd love to explore the problem space of implementing interpreters for GC-based languages on top of Rust. It's quite hard to get the concurrency right, and indeed we see a number of major languages that gave up on even trying.


What for? If you have the extra RAM and power for a GC, you don't need Rust for safety. HotSpot's next-gen (JIT) compiler is written in Java and is absolutely amazing.


> If you have the extra RAM and power for a GC, you don't need Rust for safety.

It isn't just about memory; Rust's safety guarantees in combination with RAII also mean that other resources such as mutex locks, open files, etc. also get closed in a deterministic fashion. (I'd argue that this is quite important for locks, but I've ran into hard-to-debug bugs b/c files weren't being closed out until a GC got to them.)

The way I've always viewed it is that RAII is general to all resources; GC only solves memory.

(I'm assuming the comment you're responding to is discussing getting a concurrent GC to work quite right, which isn't fully relevant w/ my reply; but I do think there is more to Rust's safety than just the memory management, which is what I got from your reply. I'd also argue that memory, in particular, is not abundant, both on mobile devices, but also out in the cloud, where it translates directly into cost both from more expensive VM instances, and from me needing to continually tweak the GC's params.)


I was referring to hinkley's idea for using Rust to write GCed VMs. As to other safety features, those are easily added to cheaper GCed languages. Memory is the hard bit, and if you can afford a GC, it is usually cheaper to just use one. As to RAM being costly, I think RAM is one of the few things that is getting very cheap relative to other resources, and GCs require less and less tuning; working hard to avoid a GC when you can afford one seems to me like the mother of all premature optimizations. But I see no point in debating the issue too much. Every company would make its own consideration about which approach is cheaper.

In any event, there are certainly very important use cases that simply cannot afford the power and RAM overhead required by a GC (again -- latency is not an issue; if you have the resources, there are cheap ways of getting extremely low latencies without doing away with a GC) and those use cases would benefit tremendously from a safe language.


I was in fact thinking of getting concurrent GC to work right. [edit] but also concurrency in general. Global Interpreter Locks when even my laptop has 8 cores?

I also agree that the free memory lunch is going to be over for a while. Java in particular is going to lose out in the container space. I don't think it's an accident that they've suddenly begun taking memory footprint very seriously. They have to.


What other programming languages do you know of that have an 'absolutely amazing' GC implementation? Wouldn't you like that answer to be 'lots'?

The Java team has worked a lot longer and a lot harder on this problem than pretty much everyone else, and even they hit a wall at 1GB. One that took a dreadfully long time to overcome (so long in fact, that it contributed to me being an ex Java developer)


Java has quite a few very good GCs, some in OpenJDK, some by Oracle, and one by Azul. Quite a few of them don't have a 1GB wall. They will become even better when Java finally gets value types and the GC won't have to work hard to do stuff it doesn't have to (this is why Go has decent GC performance even though its GC isn't very sophisticated). In any event, I don't see how Rust can make the work any easier. Coming up with that algorithm is 98% of the job.

Lots of languages do have good GC because they run on the JVM. OTOH, we don't know how hard it is to write similar kinds of applications in Rust. Good things take a while to get right -- Java has taken a while, and Rust has, too. It will be a while yet until Java has a GC that everyone likes, and it will be a while yet until Rust is fully fleshed out and its strengths and weaknesses understood. My personal opinion is that the two approaches are complementary, each being superior in a different domain.


s/Oracle/IBM


Lack of extra RAM and power to run GC is not the problem. The problem is that GC makes code behavior not predictable.


That's not true in general. I've used realtime Java in a safety critical hard realtime application (running on a large server), with strong deadline guarantees (we're talking microsecond range). If you have the power and RAM to spare, the predictability issue is more cheaply solved with the approach I mentioned above (by cheaply I mean in terms of development costs; it is more costly than "plain" GC in terms of effort, but still cheaper than the Rust approach).

The only real cost of GC these days is RAM (and power).


Which VM did you use for that out of curiosity?


Sun's Java Real-Time System. It is no longer supported, AFAIK, and I don't know which RT JVM the project has switched to because I'm no longer there (my guess would be IBM's).


Real-time GC's exist. Look up Aonix's Java stuff for what embedded or predictable apps do. Or JamaicaVM below. For enterprise, Azul has some amazing GC tech plus Java CPU's (Vega's).

http://www.ptc.com/developer-tools/perc

https://www.aicas.com/cms/en/JamaicaVM


However that's not something that is automatically solved by manual memory management. Using malloc/free on a desktop OS does also not provide a predictable runtime behavior, although unexpected pauses might be smaller than with most GCs.

The safest bet for predictable memory management and latency is the approach that is used by lots of embedded and realtime software: Don't allocate at all. Or at least don't do it in critical phases.


I think this is the point that is obscured when discussing "manual memory management" vs "GC" languages and just focusing on the behaviour/life-cycle of an individual allocation: the former generally provide tools and features that make easier/more natural to avoid allocations, whereas the latter makes the assumption that allocation is usually OK (which is, of course, a perfectly acceptable trade-off for the domains those languages target).


That's not true, some GC algorithms make code behaviour unpredictable. Real-time tracing GCs exist. Reference counting is GC too, but it too is unpredictable.


Will that next-gen JIT be enabled by default for Java 9 or 10?


Not by default in 9, but you will be able to run it with a stock version of Hotspot.


I keep trying to learn rust but fail miserably.

They do say on their website that there's a hump that you have to climb over before everything fits into place, which is probably applicable to everything you'll learn, but sometimes I think that hump is too much of a hurdle


How are you trying? What are you getting stuck on? I'd love to improve things.


I'd get excited and go through some tutorial on their website, probably the main? tutorial, bu then when you get to the lifetimes section is seemed to get really complicated instantly. I tried twice and think I hit the same problem. Maybe some more good examples would help?


Makes sense! I'm working on a second draft of the book right now, and it's making that stuff more clear. http://rust-lang.github.io/book/ is the draft; the lifetimes bit hasn't landed yet though.


Did you go through these screencasts?

http://intorust.com


Not specifically about your book, but I would love if there was a quicker way to find methods in the docs.

Right now, if I want to find the methods used by BTreeMap you have to wade through a good amount of information until you can find how to just get the keys. I'm currently on mobile where the issue is more prominent.


Are you missing the search bar at the top? Typing "keys" in shows BTreeMap's keys right away, even https://doc.rust-lang.org/stable/std/?search=keys

That said, yeah, I hear you. It can still be tough sometimes. At some point, I'd love to work with some sort of information design / UX person to totally re-do rustdoc's output. There's a surprising number of thorny problems there. But there's always so much to do...


If you click the "[-]" symbol at the top right of a docs page it collapses all the text and just shows the method signatures. Then you can click the "[+]" symbol next to any single method to get full details.


If I don't find what I'm looking for from the docs, I often use ripgrep on a copy of the rust repo locally to find answers.


> I often use ripgrep on a copy of the rust repo locally to find answers

I mean, that works, but it's a workaround and not a solution.


What are you getting stuck on? I'd love to improve things.

I feel I am getting over the hump of learning rust now and coding in rust is becoming less frustrating for me.

However, one thing that slows me down is the lack of indices in the documentation. For instance, if I want to know the return type of a vector len() I go here:

https://doc.rust-lang.org/std/vec/struct.Vec.html

.. and then I have to search the web page for all instances of "len". It would be good if there was an index similar to Javadoc, Godoc or Doxygen.

There might be a good reason for not having the index, but as a beginner it is lost on me.


Two things:

If you click the little [-] button, you'll get an index for that page.

If you use the search bar at the top, https://doc.rust-lang.org/std/vec/struct.Vec.html?search=vec... will let you go right to the method. (In this case, you have to know that it's slice::len though)

Does that help?

EDIT: UX is hard! Glad people are discovering this. It's the same symbol HN uses, incidentally...


Whoa, I have been working with Rust for a little over a year, and had no idea about the [-] button.

I would echo the suggestions to make that button much more visible. Or perhaps even have the top-level description expanded by default but method/trait descriptions hidden. I can't think of a case where you'd simultaneously want to see every method description.


Been using rust for months now and I never knew the [-] did that. I never even thought to click it.

Seems like there's some UX improvements that could happen there.


Thanks, thats very helpful. I did not know about the minus sign.

A polite suggestion - maybe the link marked [-], that takes you to the index, could be labelled "index".


Well, it's not so much that it's an index, it's that when you have a page with only signatures, it feels like an index. There's no redirect, just some JavaScript :)


Maybe 'collapse' and 'expand' would be good substitutes for the '-' and '+' at the top?


Not OP, but as someone who theoretically would like Rust - I'll bite. Maybe my usecase is a common one.

I understand memory management in C. I understand it modern C++ (destruction when going out of scope, smart pointers etc).

Basically, a description and discussion of borrow-checking for people who have already used system programming languages would be really helpful. I feel like the book is targeting people who have only used garbage collected languages.

Or is the memory management of rust so novel it can't be described in those terms? I find the concepts aren't very concrete to me.


It's trying to be accessible to those people, but not strictly for them. I don't think the issue here is that it's for GC'd users, but that it's trying to explain things from whole-cloth, where you're looking for a direct comparison, "c does this, rust does that."

I try generally to keep other languages out of Rust's docs for various reasons, but agree that these kinds of resources are useful; I wrote "Rust for Rubyists" after all!

I'm hoping that others will step in and fill this gap; the repo in my sibling is a great start.


Have you tried looking at https://github.com/nrc/r4cppp ?


That looks like exactly what I'm after actually. I'll read this next time I try rust.


I'm also in a similar boat but my primary thing being that I learn by doing and since it's branded as "system programming" I immediately think of big projects like kernels and drivers. I wish there were some small projects that I could do apart from just doing "project euler" that would be helpful to me. I even bought raspberry pi to learn rust but don't quite know what to do with it and rust.


There are a large number of projects on Cargo that maybe-perhaps do something useful, but don't get much love in the testing/documentation/polish department since the authors tend to move onto other projects. My personal wish list:

Varints: https://crates.io/crates/varint

Bloom filters: https://crates.io/crates/bloom

Iron: https://github.com/iron/iron

All the accessories for Iron: authentication middleware, integration with OAuth2, cookie-signing, integration with templating systems, etc. Also, an omakase framework (like Django or Rails) that pulls together a bunch of useful libraries into one crate that you can just use (with good docs) and not have to wire everything up.

Websockets: https://github.com/cyderize/rust-websocket or https://github.com/housleyjk/ws-rs

Server-sent events. I don't see a good alternative for these yet.

ElasticSearch: https://github.com/benashford/rs-es or https://github.com/erickt/rust-elasticsearch or https://github.com/KodrAus/elasticsearch-rs

An easy documentation/website generator for small libraries, that pulls examples, tests, and README files out of GitHub, runs RustDoc, and generates a professional-looking website that provides all the info that you need to get started with a library, with a minimum of extra effort for the library author. Basically, automate the job of going through libraries built to scratch someone's personal itch and "productizing" them.

Any of these could be a good project for a beginner, since there's already a lot of existing code to learn from, a small well-defined task, and an existing maintainer who has an incentive to help. Basically, just take a library, try to use it in a small test program (Linus's Law: "Never try to make a big important program. Always start with a small trivial program, and work on improving it"), and if anything is difficult or doesn't work right, figure out how to make it less difficult for the next person who runs across it and submit a pull request with that. As an added bonus, you can learn a lot of domain knowledge or basic CS data structures through digging through this, and that transfers to programming outside of Rust.


This might be too much like Project Euler but I started by solving some common interview problems using Rust.

Try solving the problems without looking at the solutions first: https://github.com/brianquinlan/learn-rust

Here is a rough rank of difficultly: https://github.com/brianquinlan/learn-rust#understandability


Makes sense! Have you seen http://www.chriskrycho.com/2016/using-rust-for-scripting.htm... ? Maybe something like that can be of inspiration.


same feeling here. There are docs here and there, but not good.

I am waiting a good book on rust. similar to Haskell, I really did not get much until the book http://learnyouahaskell.com/

currently, "The book" is too dry.


Programming Rust by O'Reilly is in early release and I recommend it.


I am having the same inertia.

I have resorted to using Nim for now, and it is going real well. I would like to rewrite the Nim stuff in Rust to compare for my own sake when I have done something substantial in it.


Have you tried: http://rustbyexample.com/


Feel free to stop by the IRC and ask questions/ seek help - lots of people there who are willing to answer your questions.

Or the rust forums: users.rust-lang.org

Or the rust subreddit: reddit.com/r/rust


For me the hump was from the get-go.

Following the book, I installed Rust directly, but then i realized that I should've installed it via Rustup. Next, I want a good editing environment, so I install VS Code and Racer, but then I find out that I can't use Clippy unless I use Nightly... and I'm not interested in using Nightly, so I'll wait.


Rustup has been in beta for a while, which is why we don't recommend it in the book. Soon though!


I just installed the nightly tarball and used my usual editor (emacs). I'm not sure what the editing environment has to do with a learning hump.

Now, coming from loose dynamic typed languages or really terrible C or C++ code bases is another matter.


A good language should be like a good game: easy to learn, hard to master.

We don't care about expert cases, we only care about getting productive ASAP, which means having students hoping into a language and learning it quickly. Solving the details is easy: just teach coding discipline and enforce good practice and do code reviews, and discourage "throwable" code. Security is not only the job of a programmer, it has many many facets.

In short: english is better than latin or esperanto. The more time and space you need to describe your language, parse it or the more character or syntax it requires and the longer it requires for an individual to read a program and guess what it does, the worse it is.

I'm getting a little tired of the "novelty" languages these last few years. Maybe I'm more conservative and don't like the hype. To me only D is relevant and it has been there for a long time now.


Do you have a proposal to achieve Rust's goals in a way that's easier to learn?


Hi pcwalton, I've seen a couple of your videos and want to know if you've posted any tutorials on integrating Rust with Xcode and using the native tools, or if others have? Some of us are coming from IDE Javaland, and while Rust is quite usable from a text editor, print debugging only goes so far, and gdb is incomprehensible gobbledygook for us.


syntax


Be specific.


As someone getting into and loving Rust, I have a few:

1) Drop the semicolons and implicit returns in multiline functions (I.e. like Swift). Eliminates hard to understand errors around missing or present semicolons.

2) Allow silent lossless integer upcasts. Sprinkling as usize and friends everywhere is unergonomic.

3) ? For return is fine but the line should still have one "try" at the beginning for legibility.

4) Allow full inference of generic type parameters. Would make it much easier to split code into helper functions.

5) Macros are nicer than in C but still hostile to comprehension.

There are others, but these are ones I've personally run into. Love the language and especially love Cargo but it has some newbie-hostile rough edges like this and others.


Many of these are not possible due to backwards compatibility. Even if we could:

1. Most Rust users, including me, like the implicit returns. We would face a ton of pushback if we tried to drop them. Rust doesn't actually need that many semicolons: you can frequently leave them off.

2. This is tricky, because it can have surprising semantics if not done right (e.g. right shifts). There are proposals to do something like this, though.

3. I disagree. That would eliminate much of the benefit of ? to begin with.

4. Not possible. This would complicate the already very complex typechecker too much.

5. Macros 2.0 is coming. We can't just remove macros: they're fundamental to basic things like #[derive] and string formatting.


Hi pcwalton,

1) If you dropped semicolons like Swift, you could keep implicit returns.

I just don't see the advantage of having them in multiline functions at the cost of having to retain semicolons. IMO, a trailing value after a multiline function also just looks plain bizarre for newbies.

I agree it's probably too late for this, but you asked how Rust could be easier to learn and this is one of the ways.

2) It would certainly be much nicer to have this.

3) See Swift for an example of where this was done more nicely IMO than in Rust.

4) Unfortunate. Splitting out a helper function means we lose Rust's inference which already exists when used in a let statement, and requires adding dependent crates from libraries to the binary's Cargo.toml and importing them into the module and hard coding the types. The client shouldn't have to care about this.

5) Great!

I still enjoy the language just the same, but you asked how it can be easier to learn and this is how I see it.


LuaJIT + C have suited me just fine for systems programming. I don't care about security, so I find it hard to care about Rust. All I care about is lessening the burden on me as a programmer.

I think Rust may be useful in cases like ripgrep, where you basically rewrite an existing, established tool or service used by many to be as performant and secure as possible. But other than that niche use-case, I don't think Rust will catch on in the long-term.


As the author of ripgrep, I can assure you, the burden on me as the programmer was lifted quite a bit! I probably wouldn't have been able to build it otherwise. (Not because it's physically impossible, but because it would have taken too much time.)

It's not like I just rewrote grep. ripgrep is built on a large number of libraries that are reusable in other applications. You can see my progress on that goal here: https://github.com/BurntSushi/ripgrep/issues/162 (And those are only the ones I wrote, nevermind all of the crates I use that have been written by others!)


Oh wow, The creator in the "flesh"! Thanks for replying to me.

Hypothetical scenario: Let's say you're writing an experimental tool that doesn't exist anywhere else. You don't care about security, you don't care about speed. You just want it to exist so you can see what it does and possibly iterate on the idea if it ends up working out. Would Rust still be feasible?

From my impression of it, you would need to take care of a lot of corner cases and such (which don't exist in other languages) that may slow you down in the short run. I'd imagine those corner cases would be extremely helpful in the long run if you want to squeeze some extra performance out of it (or avoid technical debt). But from the perspective of "figuring out what's possible" I feel like Rust would get in the way a lot.


Good question! The inherent problem with asking me that is that I've been writing Rust continuously for over 2.5 years by now. I live and breathe it. It comes as naturally to me as Go or Python does at this point (which I've been writing continuously for even longer).

I will say that a comparison between Rust and C is much easier, because in the past, I've spent so much time debugging runtime errors in C with valgrind. Rust is an easy win there for me personally. I've never done much C++ so I can't provide a comparison point there.

After a bit of Rust coding you get quite familiar with the workings of the borrow checker, and it becomes pretty natural to work with it. There are plenty of things you can do in C that Rust's borrow checker will forbid because it isn't smart enough to prove it safe, but there are usually straight-forward work-arounds. Sometimes the borrow checker might even help make the code a bit clearer. :-) Some of this is institutional knowledge though, so there's still a lot more documentation work left to be done!

To bring this back to earth: Rust won't replace those ~100 line Python scripts that I sometimes write for quick data munging.

The other important bit of context is that before I started with Rust, I had already had quite a bit of experience with C, Standard ML and Haskell. This meant that the only truly new thing I had to cope with in Rust was the borrow checker, so it might have been easier for me to digest everything than it might have been for most.


Thanks for the insight! I may give Rust another shot at some point. Interoperating it with Lua may be fun.


I learned Rust in a couple of weeks, and I'm writing a game in it right now. It's a pleasurable experience, and far outside that niche. "I can't learn it so nobody will use it" is such a silly thing to think.


It's worth it, imo. I was in your boat a few months back and hated every minute of it. With that said, i'm still not 100%. I semi-regularly see syntax that makes me go "Wat is what!?", but then i sit for a moment and understand it. Rust introduces a lot of visual baggage and that seems to cause me syntax blindness.. not enjoyable.

Unfortunately though, i'm back on Go. I want to be on Rust, but i had to pick a language for work and i can't ask my Team to go through what i did. Rust, despite the safety, is too unnatural for our larger codebase.

Luckily i think Rust has seated itself as the language we will use if the need is truly there. Unfortunately though, not everything.. just the specific things that need it.

More

Applications are open for YC Winter 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: