Hacker News new | past | comments | ask | show | jobs | submit login
Rust Needs an Official Specification (tweedegolf.nl)
54 points by todsacerdoti 1 day ago | hide | past | favorite | 47 comments





I don't think the C++ standard can be held up like that. Many compilers simply ignore it (or ignored it, they're getting better about these things). It's not because there's an omission, it's because writing a compiler that conforms to a thousand page document of rules is hard. And tedious. Some will fall through, there is no "one true C++" that follows the standard perfectly, most fall short. The ambiguity is still there, many times you can read the standard and know what the compiler will do, but many times you can't. Even if the standard is unambiguous.

So with Rust, the implementation is the specification. The author mentioned gccrs - hasn't that already had some positive standardization benefits without a true standard? I'm not super knowledgable about the status there, though.

I feel like more standardization is certainly worthwhile, but writing a specification is an entirely different beast. I see that as years of painstaking work just to keep everything mostly the same and harder to iterate. Seems like a hard sell even if it would be great for systems programmers ;)

> For developing external code analysis tools such as Coverity

You can do that by just using intercepting `rustc` and using the static analysis tool's common IR(s), right? I'm nitpicking here, though.


The major C++ compilers are perfectly conformant with the standard. Where they can lag is after standards come out. Modules have been a challenge, but other language features get implemented within a few years. At the time of writing, all the major compilers conform with C++ 17 and aside from modules, conform with C++ 20. There absolutely is "one true C++" as set by the standard, and compiler writers have a roadmap for how to conform to it.

"The implementation is the specification" is frankly a sophomoric position. It works fine for small projects and projects of limited complexity, but Rust is neither of these.


Looking over https://en.cppreference.com/w/cpp/compiler_support, it seems there is good conformance on C++11 and C++14 but there are gaps on C++17+ among gcc, msvc, and clang.

The gaps are on the standard library side: clang libc++ has fallen behind a bit on conformance, but you can use libstdc++ on clang as well (and it is the default for example on Linux). MSVC lags on C conformance, which is known not to be primary concern for MS (but still miles better than what it used to be).

In any cases these tables only help up to a point, even green cells often hide bugs and conformance divergences.

Still for most straightforward code compliance across compilers is very good and significantly better than the dark ages of two decades ago.


Completely agree. I'm only calling it out because of the claim "perfectly conformant with the standard".

The compiler is conformant. The STL is not necessarily.

The documentation for destructors actually does contain the answer to their motivating example [0]. It's difficult to read, but it's there, and it thoroughly explains what's happening here. Local variable bindings have the scope of the block they're in. Other expressions have a temporary scope. The _ wildcard pattern is documented [1], and is explicitly not an identifier pattern. The documentation specifically says that it does not copy, move, or borrow values. Therefore, expressions on the right hand side of a let that uses _ as the pattern do not bind to a value and therefore use a temporary scope.

Yes, the language here is difficult to understand, but that's true of specifications as well. Just like with these docs, you'll have to refer to multiple sections of the spec in order to piece together the expected behavior.

There may be other reasons for a spec (it puts different implementations on equal footing instead of having one official reference implementation), but in this case if the reference is hard to follow I don't see any reason to believe a spec would be better.

[0] https://doc.rust-lang.org/reference/destructors.html#tempora...

[1] https://doc.rust-lang.org/reference/patterns.html#wildcard-p...


I get the author’s point so this may be a useless nitpick but I am not surprised in Rust that drop is called immediately when a value is assigned to _.

That’s almost like calling delete in c++ and if you changed the C++ code to a class with a print in the destructor, then called “delete” you would get the same behavior.

The “just” underscore in rust I believe is special and basically makes that value inaccessible. So that value really does “destruct” there.


It is surprising for a c++ programmer; the equivalent:

   auto _ = Foo{};
would leave the object alive till the end of scope. This is true for all RAII classes. In C++26 the unnamed placeholder '_' is also special cased to allow redeclaration.

For a C++ programmer, the related "suprising" equivalent is to not name the variable. We had a regex linter in our code to ensure locks were named because of bugs from this.

You mean like `clippy::let_underscore_lock`? I'd like it if there was an annotation for arbitrary types to declare that they behave like a lock so that the lint would pick them up too.

For Rust, yes, we do have that and that'd be great to generalize it. That was a C++ code base using custom lock types that I used a regex linter for. Glad to not be touching it anymore for many reasons.

This quirk needs to be in a tutorial or a developer-oriented reference. Involving an ISO committee in this wouldn't produce a document that makes it any easier to understand it.

This is because from the language perspective, there's no such feature. It's just a surprising combination a wildcard pattern that doesn't bind, and a drop scope of temporaries created in expressions. Each of these features is independent of the other, and are documented separately.

The Ferrocene specification and the Rust reference are almost identical in this regard:

https://doc.rust-lang.org/reference/destructors.html#drop-sc... https://doc.rust-lang.org/reference/patterns.html#widcard-pa...

These features should be described more precisely, but because `let _ = temp` is not a special case, but a regular case of `let PATTERN = EXPR`, a formal spec wouldn't redundantly document that combination.

Understanding Drop order doesn't need a spec, but needs all of the relevant bits of the specification extracted and described in one place. For example, the spec is pretty clear that `if let` is translated to an equivalent `match` statement, which is sufficient to implement it. What it doesn't spell out for the readers, is that scope of temporaries in `match` covers the entire match statement, and this happens to include `_ => {}` fallback pattern for the `else` in `if let`, so temporaries from the condition are still alive in the else block. This is a surprising quirk, but the surprise doesn't come from lack of specification, just lack of end-user-friendliness of the spec format, and you shouldn't expect a formal spec to be a tutorial.


I have long been pro a specification, but also think that it’s fine so far. I think a lot of people over-weight the need. For example, Rust is being used in safety critical contexts today, a standard is not actually required, only a specification for the specific compiler.

Incidentally, the behavior here is well known, and there’s a subtlety they didn’t quite get to, though they almost did with the quote. Things that impl Drop must be dropped at the end of the lexical scope, but things that don’t can be. So a borrow can end early, but something with custom Drop cannot.

This decision was made because subtly changing when something goes out of scope could make writing unsafe code significantly more difficult and brittle. At the time of the decision, there was already a lot of unsafe code out there, and not breaking it was a priority.


One should mention the RustBelt project: https://plv.mpi-sws.org/rustbelt/ here. It was in place to develop a specification for Rust that is accessible for formal verification. I think that is the way to go, rather than semi-formal standadese language. I've some related background and would love to do it (given there was a good postion/pay).

The blog post links to ongoing work on the Rust specification:

> In any case, this ball is rolling! [RFC3355][0] was accepted one year ago, and [a specification team][1] is [at work][2]. We are watching!

[0]: https://rust-lang.github.io/rfcs/3355-rust-spec.html

[1]: https://blog.rust-lang.org/inside-rust/2023/11/15/spec-visio...

[2]: https://rust-lang.zulipchat.com/#narrow/channel/399173-t-spe...


I feel like the intent of this article is ambiguous. It reads like there is no interest for a descriptive standard and is arguing for one. However, it ends by calling out the standard work in a single sentence. I was expecting to have to call it out as I read the article. I can assume others might not already know about it and miss the last sentence.

As for the motivating example, to me it feels weak from my own experience. In my ~20 years of C++ experience, I've never consulted the standard, in part because of lack of access but also because of the barrier of standardese. I've instead consulted reference documentation geared towards programmers. Even the Ferrocene block they quote, I can see it being hard to connect the dots.

That isn't to say no standard is needed but I doubt most developers will use it.


Why is a written specification better than the code that implements it? One might suppose that a written spec is less mutable, but why should that be? The example provided is not particularly good since the semantics are well defined and can be looked up in the code. A better example would be one where what happens is not well defined, but I'm not aware of any of those places in rust.

Essentially, any change to the code that changes the semantics is a breaking change and will be avoided, just as any change to the spec would be a breaking change and would be avoided.


The reason is that if the code is the specification is not obvious when a certain behaviour is an intended feature or a bug.

Specifications can have bugs of course, but when the spec and (potentially multiple) implementations agree gives more confidence on expected behaviour.

A conformance suite can be an alternative of course, but it is not necessarily always possible to infer general behaviour from a test case.


All the nuance is hidden at the end! And not signposted earlier. The example confusion, too thinly justified by common sense and analogy to C++, points to benefit of understanding intermediate compilation stages & how the language understand itself descriptively.

The slippery slope, descriptive, from example confusion to general concerns shows more about the value of learning about design of the language locally relevant to your current problem, than a more indirect solution.

It would serve the case and conversation better, to collect lists of people posting about these issues. Instead of picking a bait title & speaking to it the issue not being relevant.

Despite the title, I agree with the author

> There's no need to change that...[not having a specification] yet. But in ten years, I might feel differently.

I hope he's gotten the engagement & attention he wants.

I'm curious what caused the C++ standard to evolve, what caused there to be so many competing compilers -- their cost, and lack of adequately free, open, and bleeding edge development?


I think this is a thorny issue.

On the one hand standardization stiffles development and can even make the language worse going forward.

On the other I really think that before governments start pushing for Rust adoption it should be standardized, i.e. there should be an official document that explains the correct semantic.


I'm sorry, but even if this exact example does not appear, this case is technically specified by the reference.

The page on identifiers (https://doc.rust-lang.org/reference/identifiers.html) calls out that a "single underscore character is not an identifier". If you then follow the trail (Search for "Underscore"), you will find that it is instead a wildcard pattern (https://doc.rust-lang.org/reference/patterns.html#wildcard-p...). It says that "[u]nlike identifier patterns, it [the wildcard pattern] does not copy, move or borrow the value it matches."

From this, it follows that the right-hand side ("value expression", roughly equal to an rvalue) is not moved into a place called _, but is effectively just a value matched against a wildcard pattern. This does nothing (the value is not moved into any other place), so it gets dropped.


A more accurate title would be 'Marc Needs an Official Rust Specification.' It’s entirely possible that the Rust community’s needs don’t align with this.

It seems like a reasonable title, given that it's what he is arguing through his writing.

A spec would only help with this if you read the spec.

> if I simplify the name of the variable

Technically, this isn't what the author did, since _ isn't a variable name.


So I'm no Rust expert but when I saw:

    let _ = Foo;
my immeidate thought was "this is going to get cleaned up immediately", which turns out to be the case. If you come from a language where braces define scopes and variables only get cleaned up when a variable goes out of scope, Rust's behavior may be surprising but it's consistent and correct. The object is immediately out of scope.

Three thoughts:

1. Up until a certain point, being able to develop the language without updating a specification is useful. When you reach a certain level of maturity, you need a spec. I think Rust is nearing that inflection point;

2. A spec will make it harder to make breaking changes. That's a good thing. I can't but help think of Python 3. The seal was broken and every point release (it seems) makes breaking changes. That's not a good thing; and

3. A spec doesn't prevent undefined behavior. C/C++ have lots of undefined behavior. Sometimes later spec revisions will address this. Sometimes not.


> If you come from a language where braces define scopes [...] Rust's behavior may be surprising but it's consistent and correct. The object is immediately out of scope.

As someone from a language background where braces define scopes the behavior currently demonstrated is actually perfectly in alignment with my expectation. The variable is never even entering the block scope defined by the function boundary. The variable was created within the temporary scope of the line and then is destroyed because no symbols exiting that line were binding it to a value. Aka - the variable never formally entered the symbol table for that block.

Maybe a standard of some kind encoding these changes would be helpful but I did live through the 20 billion different C compiler era and, tbh, if you weren't trying to be fancy you could usually get shit done - standards require a large upfront and continual maintenance cost and I'd prefer to see some actually dangerous ambiguous operations before the community invested in one.


`_` in a lot of languages is called a discard or wildcard. It comes from Rust's ML heritage, and that's essentially what it does in ML too. C# has something similar. It's not unexpected at all, especially given how the syntactic feature is described.

But in this it is not.

   let _tmp = Foo;  
The confusing bit is renaming the object from _tmp to _ changes the rules. I assume that in rust '_' is not an actual name, but a pattern to be matched, but even that the actual rule applied is not obvious (does it match everything? Only the lifetime of objects matched with a name is extended?)

Correct, _ is a pattern that does not bind to any name. While the implications of that may not be simple, the description certainly is.

Names that start with underscores are still names, and work like any other name. The “unused variable” into can be suppressed with a leading underscore like any other name, but that’s purely about the lint and not semantics.


I independently came up with a similar hack in TXR Lisp. When a variable is named by a gensym, unused warnings are suppressed. Gensyms are almost always symbols generated by macros. Machine generated code often has cases where variables are unused.

This design decision is a trade off. Any situation in which a variable is unused is potentially a bug, even if the variable is in code written by a macro and named by a generated symbol.

But it is a burden on macro writers to ensure that all possible cases of general code are free of unused variable warnings.


So the issue really is that the match-and-discard wildcard uses a spelling that would otherwise be a valid identifier name causing an ambiguity to the reader, if not to the compiler.

In retrospect using a different character (say '*') would have been better, clueing the reader that something different was happening; but I guess there is long history of this use of _ in ML languages.


I mean, it’s never a valid identifier, and that is spelled out in the reference too.

of course it isn't, it acts as a keyword, but it looks like a valid identifier. As this difference doesn't matter 99% of the time, it can be be easily overlooked, becoming a pitfall the 1% that matters. Of course it too late to change it, but pointing it is a must. I understand rust has a linter pass specifically to catch this sort of issues.

Those are two different things.

- A _ prefix tells the compiler not to warn you a variable isn't used.

- A _ variable name is a reserved symbol to say something cannot be used. Trying to do so will cause a compiler error.

So they're scoped differently.

Consider this Go snippet:

    func A() { fmt.Println("A") }
    func B() { fmt.Println("B") }
    func main() {
      var B = A;
      B()
    }
What function gets called? It's A but you can reasonably make a case for B given the capitalization of Go function names and variables. My point is that scoping and resolution rules and many other aspects of a language can be viewed as intuitive (or not) based simply on what you're used to. Even then they can be arbitrary.

If B is called, and the language is otherwise reasonable, then it must mean that there's an A variable in the program which is not showin in the above code.

The language must have separate namespaces for functions and variables. But in that case var B = A must be resolving A in the variable namespace; it does not refer to the function A. The local B variable is initialized from some nonlocal A variable, and neither of those are related to the A or B functions, nor have any interaction with the B () call which ignores the variable space, looking up in the function space.


In a language where functions are in the same namespace as variables (like the majority of languages), I would expect A, in a lisp-2 I would expect B.

But I don't see how's that relevant, this is not about the scope of bindings, but about the lifetime of objects.

As Steve explained in a sibling comment, _ is in fact not a variable name, and doesn't extend the lifetime of the object. The semantic is perfectly reasonable and I'm sure it has a good reason to be that way, but the difference between two similarly looking syntaxes is surprising and violates the principle of least astonishment.

FWIW, C++ temporary lifetime extensions are also surprising in many cases.

[Note: I don't know if rust is defined in term of temporary lifetime extensions, it is just a way to understand it for a c++ programmer like me]


The definitions are a bit different than in C++ but the overall effect is the same: https://doc.rust-lang.org/stable/reference/expressions.html#...

Mostly, the ANSI 89 C standard ensured portability, and long-term core functionality of ecosystems through bootstrap compilers etc. Arguably, the standards compliance was indirectly responsible for the gcc becoming popular.

The issue with modern C++, was users that began to mix core features with externally defined popular abstract Standard Template Library feature implementations. i.e. while an attempt was made to bring the compiler features into global harmony. The outcome was an ever spiraling complexity with conflicting use-cases, and multiple valid justifications in every case.

Rust will likely end up like NodeJS/PHP/C++ ecosystems... not simply because of llvm quality issues, but rather it is human nature to check-off all prerequisites for a "Second-system effect".

YC has shown some of the proponents are irrationally fanatical, and unfortunately all too predictable... one may append the standard AstroTurf opinions that deny reality below... =3


[flagged]



A summary can also be considered opinionated and a conversation starter.

[flagged]


not like Java is, agreed

but it does feel like Rust is designed for people learning about ALL of the modern compute stack, legacy compatibility included.


But then anyone could make a compiler and call it a rust compiler. Can't have that!

There are multiple Rust compilers, for example

- https://github.com/Rust-GCC/gccrs

- https://github.com/thepowersgang/mrustc


they are both incomplete

Does that matter? I was saying that rust doesn't want anyone else to use the word rust, even beyond what I would consider their right to.

This comment was showing that there are other projects using the word rust in their name & description.

The progress of those projects doesn't change the fact that they exist and call themselves rust compilers.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: