Hacker News new | comments | show | ask | jobs | submit login
The 'Tootsie Pop' Model for Unsafe Rust Code (smallcultfollowing.com)
65 points by GolDDranks on May 27, 2016 | hide | past | web | favorite | 19 comments

There has been a lot of discussion (and critique, and confusion) about using "unsafe" in the comments of HN threads concerning Rust. One should remember that using "unsafe" is NOT inherently a bad thing – Rust builds on a model that allows doing tricks with unsafe code and then wrapping all the magic behind safe interfaces, abstracting the unsafeness away.

However, it's true that there is still no consensus what exactly is allowed with unsafe and what's not. A parallel to this would be a confusion whether pointers of different types are allowed to alias in C – it's undefined behaviour, despite being counterintuitive to many.

But there's an ongoing effort to define and formalize the Rust memory model, which would provide an end to this debate. Recently, there has been also discussion that it should be not only defined, but also more intuitive than C's one, for instance. Undefined behaviour shouldn't be allowed to take you by surprise too easily.

From my point of view, it should be the same as in Ada, Modula-3, C#, Go, Swift and many other memory safe native languages, some of which targeted to systems programming. Where unsafe relates to memory corruption and numeric errors.

Anything else about expressing unsafety of algorithms, or validations not expressible in Rust type system shouldn't be marked unsafe just because of it.

Then again I am just a vocal language geek and the Rust community should decided what is best.

I used to hold this position, but I heard an example (I forget the source, but I think it's on the web) that convinced me otherwise: a &str is no different from a &[u8] in terms of representation, except for the type-system guarantee that it contains valid UTF-8 sequences. (Hence the syntax as &str instead of &[str], since "str" refers to sequences of bytes.) In a valid UTF-8 string, if you see the first byte of a multibyte sequence, you can assume that there is at least one more byte. It would be nice if we could write decoders that used that property, without having to do bounds-checks: the type system promises us that &strs are, in fact, valid UTF-8. But changing a &str to be invalid UTF-8 isn't inherently a memory-unsafe operation.

So we're left with two options. The first is to say, despite the typesystem, a UTF-8 decoder for &str isn't permitted to do anything that would be invalid/undefined/wrong if done to an arbitrary &[u8]. (In other words, &str is merely a hint, and everyone must code defensively as if any &str could be an arbitrary byte string.) The second is to narrow "safe" down to "does not break typesystem invariants," even though the set of possible typesystem invariants is pretty large.

I think Rust actually has a good claim to being different from other languages here, given how much more of a typesystem it has, and given how much more it tries to do with newtype wrappers and zero-cost abstractions. The inability to use newtypes for optimization would be pretty unfortunate in a language that otherwise does so many excellent things with detailed typing. There are some languages where types are just hints (I think Objective-C basically works this way), but that's definitely not Rust's style.

> But changing a &str to be invalid UTF-8 isn't inherently a memory-unsafe operation.

It kind of is related though since there are APIs that do unsafe things based on this assumption, IIRC.

The type has some guarantees, and while the guarantees aren't related to memory safety, breaking them will cause memory safety issues.

Yes, and I think most people think the same. "Unsafe" is not a general marker for "proceed with caution" – it's specifically for memory safety. However, the debate is about what is considered memory safe. That's important, because that affects what the compilers are allowed to optimize and what not.

In what way is Rust suitable as a systems language if we don't have a picture of what unsafe code is or is not allowed?

At least in C there is a consensus about what works and what doesn't -- i.e. pointer aliasing is technically undefined behavior but the modern compilers let you get away with it. In Rust there doesn't even seem to be a consensus about how the compiler should work.

> i.e. pointer aliasing is technically undefined behavior but the modern compilers let you get away with it. In Rust there doesn't even seem to be a consensus about how the compiler should work.

Careful, there has been much gnashing of teeth over the fact that higher optimization levels of gcc do very surprising things with "undefined" code.

C technically has consensus on what is and what isn't undefined behavior. You don't have sufficient tooling around that though, which means that while that consensus exists, it's hard to take advantage of it. Additionally you don't have consensus at all on what should and shouldn't be undefined behavior.

In practice that consenus has little to no meaningful impact.

In Rust on the other hand you have people that deeply care about problems like this, demonstrated not just by what they say but what they do. I think you can therefore have a lot of confidence in them not just reaching consenus on this issue but creating meaningful solutions based on that consensus.

We do have a good-enough picture right now with which you can write code that works and will continue to work; we just don't have an exact picture (where you can knowingly cut corners and things still work). Niko's post is mostly about where these corners should be.

> pointer aliasing is technically undefined behavior but the modern compilers let you get away with it

This is exactly the kind of situation Rust wants to avoid.

Okay, and that's fine if Rust doesn't want to be used for making kernels, programming language runtimes, etc. You know, systems language stuff.

This is ill-informed and dogmatic bravado. It has no content. Please don't make comments like this in the future, they are an invariably negative contribution to the conversation. You do not understand the post you are commenting on, which assumed a level of understanding of Rust that made it a poor fit for this venue.

There are already kernels written on Rust.

C is also 44 years old, according to wikipedia.

I don't think anyone outside the US knows what a tootsie pop is.

Sounds broadly similar to a chocolate eclair:


My first thought was a pop singer\model from the Tutsi ethnic group but that didn't sound right, so i googled it.

Apparently a "tootsie pop" is an American sweet (candy) which has a hard centre and a soft middle. The allegory is that having a hard fixed API with a dangerous/unsafe centre is the way to wrap unsafe code. Though why the article wasn't just titled "how to wrap unsafe code in Rust" is unclear to me.

This post is not about 'wrapping unsafe code' - in fact, its not even about writing Rust at all. Its about trying to delimit the scopes in which the compiler will perform optimizations based on invariants that are guaranteed in safe Rust but can be violated in unsafe Rust. This is a post about the design of the language, not of code in the language.

The name was chosen because it was the metaphor the author thought of; as a bonus, you've learned something about another culture. Tootsie pops (the most common lollipop in the United States) have a privileged place in American pop culture because of this commercial: https://www.youtube.com/watch?v=O6rHeD5x2tI

Applications are open for YC Summer 2018

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact