Hacker News new | past | comments | ask | show | jobs | submit login
Bastion of the Turbofish (2020) (github.com/rust-lang)
103 points by scott_s on Feb 5, 2021 | hide | past | favorite | 56 comments



So, the rust language developers made some syntax choices in early versions of rust, and when those choices are extended to their natural conclusion the result is that specifying type arguments is a bit more awkward than one might like in some circumstances. This topic is full of nuance, tradeoffs, history, and backwards compatibility; instead of simply lamenting or encouraging endless debate on the choice, the author opted to build a temple around it instead. A piece of art that unapologetically explains the concequence yet still pays homage to the nuance.


Except that's not where it ends. The Bastion is only a demonstration that it takes a breaking change to get rid of the turbofish. But every new keyword is also a breaking change, and in practice new keywords cause more breakage than the proposed approach for getting rid of the turbofish. Rust editions can easily handle that.

Ordering the events on @varkor's attempts to eliminate the turbofish:

* RFC 2527

* Bastion of the Turbofish

* RFC 2544

If you'd like to know why Rust really still has a turbofish, the huge RFC 2544 thread has the full story: https://github.com/rust-lang/rfcs/pull/2544

Important points in that discussion (I'm only linking to the comments summarizing other comments):

* Sep 2018: Lang team rejects the change for 2018 edition due to lack of time: https://github.com/rust-lang/rfcs/pull/2544#issuecomment-423...

* Jan 2019: Lang team proposes to merge the RFC (accept the change): https://github.com/rust-lang/rfcs/pull/2544#issuecomment-453...

* A bunch of people objected, people can't agree on subjective points: https://github.com/rust-lang/rfcs/pull/2544#issuecomment-453...

* The discussion got derailed and closed as too heated: https://github.com/rust-lang/rfcs/pull/2544#issuecomment-453...

And that's where the project to eliminate the turbofish died. Nobody in the know wants to deal with that much drama, so AFAIK no one has tried again.


How is a new keyword a *major* breaking change? They can only be introduced at edition boundaries, which means they are opt in. And code can be automatically upgraded to a newer edition by replacing uses of `keyword` with `r#keyword`.

New keywords are a relatively minor inconvenience so long as there aren't too many.


I didn't say a new keyword is major breaking change.

New keywords are small breaking changes; I'm saying that the breakage when eliminating the turbofish is even smaller!

If you read RFC 2544, some team members were considering removing the turbofish without a new edition, because a crater run did not find any real-world code that would break. The "Bastion of the Turbofish" might literally be the only Rust code in existence that would be broken.


If you're wondering, "What is the turbofish?", it is the awkward syntax Rust requires when specifying generic arguments in an expression, such as:

  Vec::<u8>::new()



Am I the only one not really bothered by the turbofish? I just think 'declarations, no colons. expressions, colons.' and that rule generally works fine for me.


I love it. I'm not here to debate whether it should be there, but i like it nonetheless. Mostly because of the cute name.

And i came from being a Python "kill all syntax!" zealot back in the day, hah. Funny how in both Types and Syntax i have migrated from minimalist to explicitness over the years. I was such a zealot on both fronts in those Python days.


It's a straight-forward rule that (I think) I have mostly internalized. But coming from C++, I was initially mystified.


C++ has a rule similar to the turbofish though:

    #include <Eigen/Dense> // a library where I know it's required

    using namespace Eigen;

    int main() {
        VectorXd zeros = VectorXd::zeros(5);
        VectorXf zeros_float = zeros.template cast<float>();
        return 0;
    }


I'm sure you know better, but at first glance this seems wrong. Reason: `main` is not itself a template, so `cast` is not a dependent name.

C++ compilers used to be liberal about accepting unnecessary instances of the `typename` keyword so lots of people would just sprinkle them everywhere. Modern versions of both Clang and GCC call this out.

The actual rule: https://en.cppreference.com/w/cpp/language/dependent_name#Th...


You're right, since I'm frequently using templated functions I encounter this issue quite often, but here when I wanted to give an example I forgot I needed to be in a dependent context. I guess I'd better test my example next time I want to give one...


And I have never internalized it! This is me, but for `typename` and `template` in expressions: https://xkcd.com/1168/


The syntax gives me a warm feeling, reminding me in the moment that I'm not writing C++.


Specifically it's the ::<u8> part. It's only used when writing type parameters in an expression.

    let x: Vec<u8> = vec![]; // spelled Vec<u8> in "type context"
    let x = Vec::<u8>::new(); // spelled Vec::<u8> in "expression context"


Yes, and googling around for this rule is what landed me on this epic poem and test.


I find specifying type on the left side of assignment operator much easier to interpret.

> let x: Vec<u8> = Vec::new();


let x: Vec<u8> = vec![];


It’s also never clear where this appears, like a::<b>(c) or a::<b>::c()


in both case, `<b>` is a parameter of `a`, but in the former, `a` is a function with generic parameter and in the latter, `a` is a type with generic parameter.


The generic paramer is directly after the thing that has the parameter (it could be on the type or the method). With an exception, for enum variants it's after the variant, like None::<i32>


It makes me wonder. If things went differently, and if Rusty always needed this syntax for generic parameters, would the consistency have made the syntax more palatable?


As in, if declaring generics looked like `fn foo::<T>()` rather than `fn foo<T>`? I dunno, at think at that point you just stop worrying about adhering to the principle of making declaration resemble use.


I guess I was only thinking of types, where turbofish comes up. E.g. for declaring generics, like `struct Point::<T>`.


IMO no. I find even :: by itself unpalatable.


It is indeed unfortunate Rust settled on <> for generics. Looking back, [] is obviously a superior choice. But alas, that train passed.


Yet [] still has the same problem while you use [] for indexing:

  fn a[T](_: T) -> &'static str { "type" }
  struct b;
  fn ambiguous() -> &'static str {
    let a = [|_: b| "expression"];
    let b = 0;
    let c = b;
    a[b](c)  // "type" or "expression"?
  }
So you very probably want to unify indexing and function-calling, which becomes very difficult for Rust because of the whole lvalue/rvalue situation.

Because manual implementation of the Fn traits is unstable and no generic implementations exist, I do actually think that this could technically be done (a new edition could go to [] for generics sans turbofish), but it covers a lot of other hazardous ground too (e.g. adding maybe-uninitialised reference types) so that it couldn’t be done for still some years, and I doubt very much that it ever will be (because I doubt many people think it would be worthwhile even if it was free to do, let alone when there’s major implementation cost).

I and a few others did try a bit to convince people to shift to square bracket generics in late 2013–mid‐2014, but failed. Certainly the language was in no shape to unify indexing and calling (the best I could come up with was to keep them separate for now, just using the same syntax with the compiler selecting between the mutually-exclusive implementations, with the potential for merging the two in future), so at that time it would probably have retained the turbofish, which didn’t help with convincing people of the superiority of square brackets (no compelling benefit—switching to something unfamiliar, and remember Rust’s weirdness budget was already about expended, without fixing what many felt was the biggest problem with angle brackets).


> Yet [] still has the same problem while you use [] for indexing

That’s not entirely true: with [] your parse is unambiguous, the node contains everything up to the matching ], doesn’t matter whether it’s an index, a literal, a generic, … the resolution of what’s what requires more work, but you can at least build your tree.

With <> however you can’t even parse without knowing whether < is a comparison operator because the knowledge of where its rhs ends depends on that information. This issue is compounded by the right-shift problem.


OK, perhaps not quite the same problem, because the hostility of angle bracket delimiters to the token tree parsing approach is definitely a real pain point, but it still has the problem of ambiguity, which is anathema to Rust even if it could be resolved by knowledge of the items in the current scope (which I don’t think it could be). Inside the square brackets, you don’t know whether you’re parsing an expression or a type, and that matters.

If anything, I’d say that this is much more trivially problematic than angle brackets, because on angle brackets you have to write quite convoluted code before the actual ambiguity arises; with square brackets, syntactic ambiguity arises much more easily and commonly—I can think immediately of a few examples of code that I’ve seen or written that would be ambiguous.


I guess you’re referring to rust having separate value and type namespaces, so you could fairly easily get into a situation which would be rather hard to disambiguate?


Rust very deliberately treats parsing as an isolated step in compilation. The ASTs of {function + type parameter + arguments} and {value + indexing + call} are very different. At parse time, you don’t know what types and values are in scope—they’re resolved later.

Some languages have been willing to compromise on this (C and C++ certainly do, and Perl is famously unparseable), but it always causes a great deal of pain, as you can’t build any sort of isolated tooling around the language any more, but must use an actual compiler to even try to parse the thing properly if you want to be robust; static analysis is thus made more difficult, especially as you must now look at entire projects and can’t properly look at individual files—and worse, you must be able to compile the project to be able to analyse it.

So Rust has deliberately gone the other way: you can parse the language easily and completely, so any form of static analysis doesn’t need a real compiler to back it and can act on files in isolation.

In short, Rust is philosophically obdurately opposed to needing to do disambiguation in the parsing phase.

So it’s actually not really about the actual ambiguity, which might not even be a thing—I can’t remember whether {function + type parameter} would be looking up in the type or value namespaces, so my example might actually be wrong there—but rather that you can devise supporting code that would seem to make `a[b](c)` mean two different things, and that’s untenable for Rust, because it refuses to let scope contents influence the parse.


I don't really understand why the ASTs need to be different. That is I don't really see why it's so different from

  fn a<T>(_: T) -> i32 { 0 }
  fn main() {
      let a = |_: i32| 1;
      let c = 0;
      a(c);
  }
except that a[b](c) requires that an expression can be a type.


As I say, the particular example I used may not properly demonstrate the precise ambiguity after all (not certain), but the fact stands that Rust is unwilling to put something in its AST that might be a type or an expression. You must know at parse time whether it’s to be taken as a type or an expression.

And semantically they’re quite different, as is always the case with types versus values: one is a function call with generic type parameters, which are resolved at compile time, while the other is doing indexing at run time, and then calling it. For them to have the same AST, even were it possible, would be making the AST hollow.


Using `<` and `>` for generics is not only bad due to their primary meaning as comparison operators, but also due to having to special handle `>>` in the lexer. It took almost 30 years before the C++ standard supported expressions like `std::vector<std::vector<int>>`. For some reason, Bjarne Stroustrup managed to pick the only two character in C that could be merged into a different lexeme.


Clearly the superior choice would have been ⸨⸩, as in foo⸨a⸩(b).


OK. I've been coding in a lot of rust recently, and I can't make heads or tails of why that expression is legal and why the resulting type is (bool, bool). I assume that this is part of the point.

Anyone care to spell it out for me?


  fn main() {
    let (oh, woe, is, me) = ("the", "Turbofish", "remains", "undefeated");
    let _: (bool, bool) = (oh<woe, is>(me));
  }
So we have four strings, one named "oh", one named "woe", one named "is" and one named "me".

  (varA, varB)
is a tuple in Rust that contains two variables, named varA and varB.

  oh < woe
Is checking to see if "oh" is less than "woe"

  is > (me)
is checking to see if "is" is greater than "me" (ignore the extra parentheses.

So:

  (oh<woe, is>(me))
with some more spacing to make it easier to see, and with the extra parentheses removed is:

  (oh < woe, is > me)
Basically it creates a tuple with the result of the above comparisons.


And for anyone (like me) that thought, "ok that's fair enough, what's the problem?", the issue is that it could also be interpreted as generic arguments.

A full explanation of the ambiguity is in the linked issue:

  let a = (b<c, d>(e));

  This can either mean a tuple

  let a = ((b < c), (d > e));

  Or a pair of generic arguments.

  let a = b::<c, d>(e);


And here's the answer: https://github.com/rust-lang/rfcs/pull/2527#issuecomment-414...

WOW that's certainly a `wat`

EDIT: ok - now that I look at it, I understand. My brain was very, very primed to see the <woe, is> as type parameters - and it's late


To some extent the problem exists in all languages that overload the less than and greater than symbols to also mean angle brackets. C++ infamously has to disambiguate using either the `template` or `typename` keyword in some cases; Rust on the other hand uses the shorter `::` sigil but also more consistently so the programmer doesn’t have to guess or memorize complex rules. Java uses the rather awkward `object.<Type>method()` syntax when you need to explicitly specify a type parameter in a method invocation (which luckily is not very often, and to be fair does mirror the method declaration syntax so there’s that).


I'm not sure if this is were it came from, but "wat" reminds me of this short funny talk:

https://www.destroyallsoftware.com/talks/wat

Just some context.



That page should have a link to that talk by Gary Bernhardt, because is the reason I reach for the term for "unexpected code behavior." And I'd wager that's true of many in the software community.


(2018), not (2020).


I return to my original thesis in re: Rust lang. It's a playground for schmarties. (The fact that it can be used to generate some decent software binaries is a side-effect, used mostly to fool people who might otherwise object to all these schmarties goofing off.)


The joke didn't land, sorry.

Just to elaborate (not troll!) I believe C with tools[1] makes more sense economically than Rust. Rust is popular not because it's groundbreaking but because it's fun.

[1] Such as: https://compcert.org/ https://frama-c.com/ https://www.cprover.org/cbmc/


The Rust documentation assumes you know what "snake case" and "camel case" mean. The compiler has opinions about what should be written in each form.

Then there's the problem of where to put the lifetime parameters.


> The Rust documentation assumes you know what "snake case" and "camel case" mean.

Those are really common terms which have existed long before Rust though. And if you don't know them already, they are easily googleable. The documentation also assumes the reader know what heap and stack are but surprisingly (not) it doesn't bother you.

> The compiler has opinions about what should be written in each form.

Those are mere lints though. If you don't like compiler's opinion on that subject, you can disable them (with `#[allow(non_camel_case_types)]` for instance, see https://play.rust-lang.org/?version=stable&mode=debug&editio...))

> Then there's the problem of where to put the lifetime parameters.

Lifetimes are the one novel concept Rust introduced, then obviously it feels alien at first, because it is. But you end up getting used to it.


> The documentation also assumes the reader know what heap and stack are but surprisingly (not) it doesn't bother you.

Not sure if it counts as documentation, but the Rust Book actually has a very good explanation of stack and heap. I know because that's how I learnt about these concepts myself.


Is Rust as first programming language a thing?


It is. There are regularly people new to programming posting questions in r/rust. Although even there most people encourage newbies to consider learning python first. Rust as a first low-level language is very common though. Lots of folks who are interested in getting closer to the metal but are put off by the idiosyncrasies and lack of guard rails of C and C++.


The book is not designed for folks who have never programmed before. However, many people in our audience have never dealt with the stack and the heap as concepts. You can go an entire career and only use languages that don’t have that distinction.


As someone who has been interested in Rust for a long time but has only in the past few weeks had the opportunity to use it in non-trivial ways, which of course then involves lots of documentation reading: Thanks, Steve!


It's certainly better than C++ as a starting language.

And, if you're just starting out, is Rust really more alien than C?

C, quite famously, needs you to understand at least double indirection of pointers (pointer to pointer of type) to be even mildly useful in the language. You can go a long way in Rust before needing to hit that level of complication.

I still wouldn't recommend C or Rust for a first programming language, but Rust really isn't that bad compared to the alternatives.


Yeah there seems to be a huge influx of webdev people. They don't have lot of idea about stack and heap.


https://doc.rust-lang.org/stable/book/ch03-03-how-functions-...

> Rust code uses snake case as the conventional style for function and variable names. In snake case, all letters are lowercase and underscores separate words.

... how is that assuming? I will agree that the camel case explanation is much shorter: https://doc.rust-lang.org/stable/book/ch10-01-syntax.html#in...

> Rust’s type-naming convention is CamelCase.

By Chapter 10 you’ll have seen way more code already.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: