Hacker News new | past | comments | ask | show | jobs | submit login
A half-hour to learn Rust (fasterthanli.me)
927 points by xrd on Feb 29, 2020 | hide | past | favorite | 148 comments



This is the most useful introduction to a language I have ever read. Often language introductions produce a "Wall of Complexity" and I failed in my last two attempts learning Rust failed because of that. This is just great.

Be warned, the "half hour" part is probably a bit like "99 cents" as a price tag. I've already spent more than that, but it is time well spent.

Thanks for writing this!


You might like this: https://learnxinyminutes.com


I also like Derek Banas videos (https://www.youtube.com/channel/UCwRXb5dUK4cvsHbx-rGzSgw) for this same quality. He doesn't do many "deep dives" into the languages, but he does a fantastic job getting your feet wet with very common tasks... much like these learnxinyminutes pages do.


I use this website as the most effective cheatsheet ever. Forget about PDFs and all these things people put together. Pretty much every programming language is on LearnXinYMinutes and it is like a standardized cheatsheet across all languages. Brilliant, so brilliant!


I wish this could be extended to "Learn X in Y minutes coming from already knowing Z". However this (X,Z) matrix is hard to maintain...


I agree, this is a great page.

I failed to make it through the Rust book in (I think) 2017, and kind of hated it; I made it through easily in 2019 and enjoyed it... the book has improved that much. Oh, and the compiler error messages are better, too, which helps enormously.


I agree. Reminded me of this quick start for pandas that got me quickly up-and-running!

https://pandas.pydata.org/pandas-docs/stable/getting_started...


> This is the most useful introduction to a language I have ever read. Often language introductions produce a "Wall of Complexity" and I failed in my last two attempts learning Rust failed because of that. This is just great.

> Be warned, the "half hour" part is probably a bit like "99 cents" as a price tag. I've already spent more than that, but it is time well spent.

> Thanks for writing this!


whoops, sorry.


Underscore is not exactly "throw away" but rather it is exactly "don't bind to variable". The difference becomes evident with the statement:

    let _ = x;

This is a no-op and the value remains owned by the variable x, and is not thrown away.


I debated which terminology to use and thought "throwing away" was more intuitive, especially if you've never heard of "binding" before.

It has its limits though - in your example I'd say you're throwing away the result of evaluating "x", just as if you did:

    x;


I don't agree, because we can try the two following programs, and one of them does not compile. It compiles with underscore.

1:

    let x = String::new();
    let _ = x;
    println!("{}", x);
2:

    let x = String::new();
    x;
    println!("{}", x);   // ERROR: Use of moved value x.

And this is why I said that let _ = x; is a no-op. :)


Huh, TIL! Had no idea `x;` actually moved `x`.


But where is it moved to? Who owns it now? Or is it just lost?


yes, it is dropped


Ahh, I remember a while back reading a blog where the author talked about `std::mem::drop` being their favorite standard library function because they used this effect. It is literately defined as:

    pub fn drop<T>(_x: T) { }
https://doc.rust-lang.org/std/mem/fn.drop.html


Right, so drop is useless :) We can drop things fine without it. Since of course it's scope/ownership based and not something that's explicitly called.


Well,

  drop(x);
is a lot clearer than

  x;


So is drop like free, or is it something else?


Yes, it's effectively like free -- though C++'s "auto" is probably a closer idea.

The main difference to C's free is that you simply cannot double-free a value (without using unsafe). That's because a value is only dropped when it falls out of scope -- and since it values only have one owner (and references must not outlive values) there cannot be any dangling references after it's dropped.

The documentation for std::mem::drop is explaining that dropping a value is a property of the language and scoping rules -- thus there is nothing for the function to do other than take a value and let it go out of scope.


It’s more like delete in c++ because you can implement the Drop trait for a type in order to run code when it is dropped in order to clean up resources etc.


Drop is just what runs when a value goes out of scope. Like destructors in C++.


When you put it like that, I see that it's a rather minor detail.


I think no-op is a bit misleading, and throwing away is more precise, since Rust has strict evaluation.

    let _ = timeConsumingFunction();
Still needs to spend time performing timeConsumingFunction(). It might not move, but it still did the work and produced a result, you are just not using it (and hence not need to move it).

If you you had lazy evaluation I would agree with your argument for the no-op terminology.


That statement is different, the rhs is not a variable's name, so I agree with you, all of that is not a no-op.


In C#, _ is called a "discard" variable. I found that to be an apt term to grasp the concept quickly.


Underscore can also be the equivalent of a wildcard in pattern matching.


I think that's the same thing as in let bindings; just like you can bind to a name in the wild card case, you can also choose not to bind to it. The main difference is that not binding in the wild card case is more common than in `let` bindings.


There's a difference between

    let _ = <expr>;
and:

    let _x = <expr>;
The first one drops ownership over the variable right after `let`, and the second drops in the end of the scope where the `let` is contained. It was mentioned in an open Rust github issue, but I can't seem to find it.


I did wonder about this, due to consumption of ... values / refs (terminology?) ... in Rust.

Thanks for adding this important detail.


I guess I always viewed that as dropping a reference which is a noop, but still dropping.


Nothing related to the tutorial itself, but I've seen many Rust basic tutorials recently, and this sorta remind me of Haskell - especially its Monad.

The Haskell community once was flooded with Monad tutorials[1]. People kept trying to put meanings on this mathematical construct, and had come up with tons of different ways to describe it. The problem is, this simple thing can fit into so many different places, so people couldn't possibly run out of new ideas. This oversaturation only confused newcomers even more, so the community concluded not to write any more Monad tutorials.

While individual tutorials enrich the community in general, it's more crucial to have a few good documentation that kill the need for more tutorials (e.g. MDN + W3School), so that efforts can be redirected into more productive things. Rustonomicon is one good documentation, but it always falls few steps short in practice. That's probably why more tutorials are being written, and why they get upvoted well in both HN and Reddit.

[1]: https://wiki.haskell.org/Monad_tutorials_timeline


The dust is settled and the fight is over. Which monad tutorial won "best monad tutorial?"


None of them. The concept is fundamentally flawed. It's as if everything you ever read about C was all about bitwise manipulation operations, on and on and on about bitwise manipulation, to the point not-C programmers think the language is primarily about bitwise manipulation and people start porting bizarre misunderstandings of bitwise manipulation into other languages and claiming they're just like C now, when it's just a part of the language that you'll pick up over time. Not a perfect metaphor but close enough.


However, the c community never felt like going on and on and on about bitwise manipulation... Anyone have any idea why people get so hung up about explaining monads?


When I was learning C 20 years ago, it felt like the C tutorials were really hung up on how scary pointers were, in a very similar way to Haskell's fixation on monads. It got to the point that I questioned my understanding pointers because I didn't get why everyone spent so much effort on explaining them.


I've seen that some programming communities focus on writing software, while others focus on talking about software.

C and Haskell are two good examples of this dynamic, although C is used in so many different projects that "community" only loosely applies.


Interesting, maybe the Sapir Whorf hypothesis is much more relevant for programming languages than for natural languages; maybe the language shapes the things a human can comfortably express, and this shapes the communities

https://en.m.wikipedia.org/wiki/Linguistic_relativity


I think there's a good chunk of it that's just nerd-sniping.

Some it was just a sort of stand-alone complex, too; people started writing monad tutorials because people writing monad tutorials was the thing to do. In the Go community, there was a run for a couple of years of people making stupidly overoptimized HTTP routers, even though there's very few websites where that's actually the problem and even fewer where the answer was to create a fancy router. Why? Because other people were making stupidly overoptimized routers. It was a smaller instance of the same thing and ultimately damaged the community less than I think the overfocus on monad tutorials did for Haskell, but it was not the best thing for Go.

Monad tutorials did have a particular problem, though, which is that rather a lot of them were wrong, too. I made my own list of issues here: http://www.jerf.org/iri/post/2928 That particular post is in the context of people trying to implement "monads" out of Haskell, but the misconceptions I list tended to come from the lower quality Haskell tutorials in the first place. Then, as is the way of things, the wrongness spread around the world before the correction had its boots on, as the saying goes.


You can write useful C programs without ever using bitwise manipulation. It's quite hard in Haskell without monads since the standard I/O library is a monad. You are limited only to evaluating values with REPL unless you use a monad. And, because the I/O is a monad, you cannot just say "use this function to read input and this other function to print output" so you need to explain the whole monad situation early into the language study.


For C it was about the pointers.


So how far can I go in Haskell without "getting" monads? I C that would be pretty far without bitwise operators.


Very. You just read the type signatures to see when you need to use the monad version of functions or syntax, and just get on with it. The whole system will naturally guide you anyhow. After some practical experience, you'll have enough understanding to use it perfectly fluidly. If you then feel like coming back around and reading one of the better explanations, you'll get the theoretical side better, and may be able to write your own if you are so inclined, but, you are reasonably likely to never want or need to.


I remember seeing someone write that they could tell whenever a new chapter of "Learn You a Haskell for Great Good"[0] came out, because questions on that topic would just disappear from the discussion lists!

[0]: http://learnyouahaskell.com


The way I like to think about the expression / statement divide in Rust is that everything is an expression, and the semicolon is an operator akin to `ignore` in F#; it takes any operands, disposes of them, and returns the unit type `()`.

Rust in general is just such a pleasant language. It can be a bit tedious to work with sometimes, but that's usually because the problem in question is tedious in and of itself, and you didn't even notice all of the little screw-ups that could occur until you saw how Rust dealt with it.


Reminds me of Learn X in Y Minutes: https://learnxinyminutes.com/docs/rust/


Only, it’s more one hour and a half rather than half an hour. Excellent quality stuff nonetheless


It's only an half an hour if you are familiar with these concepts (generics, traits, pattern matching, error propagation, etc...). Some concepts are exclusive to Rust (lifetimes/ownership), so I doubt you can get it in 30 minutes if you have never written code in Rust before.


I’ve seen this guy’s posts a few times lately, he needs to learn to edit things down. What he’s saying is good, but it is extremely digressive, and the digressions never seem to return to the original point. I’m not the audience for this so I don’t have that much time to spare reading it, but I can’t imagine that style working well for an introduction to a language.


I respectfully disagree but - can't make everybody happy.

There's plenty of introductory articles to Rust in the style you desire :)

edit:

To expand a little bit, the digressions aren't just about me "getting distracted" while writing - they're very much on purpose. I always try and write pieces that expand in many different directions, because there's so much to discover, always.

Some folks come to an article wondering why Rust has two string types and end up spending an hour on Wikipedia reading about legacy code pages - and I think that's great.

My way of getting people interested in something is never a top-down, present-the-bare-minimum way, it's always about showing how what we're discussing is connected to a lot of other things, which are also fascinating and that you should check out if you want!


I think I'm talking more about your Mr Golang piece, but now that I have your ear, I guess may as well. I am less concerned with the existence of digressions than the way they're introduced. When I say needs editing, I mean that because I cannot navigate the piece properly, it takes longer to understand what you're getting at if I want to, so it seems like it's too long. Editing down is only one solution to that.

An example of this is when I was really confused reading about path extensions and non-UTF8 paths. I was promised by Cool Bear that you had a point to make using an example about a stat call, but you were no longer talking about stat, and yet you were still writing as if it was central to your stat discussion. So instead of expanding, I thought you just couldn't express what was wrong with the stat call. I nearly gave up waiting! And I was surprised that, in the end, you could (and put it quite well). Unfortunately, there are thousands of thinkpieces out there that never make the point they promise to, instead just dumping information at you and hoping it hits you like it hit them. So I am trained to close the tab when that is happening.

It might help to state why you're going to digress before you do, with a promise to return, so that people who are hooked can be confident you'll eventually make your point (and may skip back and forth to digest your argument again without interruption). If nothing else, you are making a promise to yourself to structure things in a way that is friendly to the reader.

Rather than making the piece less exploratory, adding signposting helps get the reader into the mindset you describe in the last sentence there. Otherwise, they are not sure whether to be interpreting what you're saying as a core argument or some side discussion. It hurts both the exploratory and the argumentative qualities of the writing if one is confused for the other. Without making it clear, the argument runs like my first para above: unnecessarily long-winded and questionably relevant, with no exploratory levity. With good signposting, you get to nail both. It's about two sentences' and two headings' difference, maybe a little shuffling around.

Hope that helps, looking forward to your next one.


The Golang piece is definitely subpar, it was quickly thrown together in half a day, didn't expect it to be spread so widely haha.

For other articles there's usually a few days of research, and 1.5 days of writing and editing, then a lot of touch-ups in the following weeks/months responding to feedback.

Thanks for the more detailed criticism though - I agree with the general sentiment, and haven't solved the navigation problem yet!


Has anyone struggled with the lack of HashMap literal?

https://github.com/rust-lang/rfcs/issues/542

I just cant bring myself to doing something like this:

    let m1: HashMap<&str, i32> = [("Sun", 10), ("Mon", 11)].
    iter().
    cloned().
    collect();
https://doc.rust-lang.org/std/collections/struct.HashMap.htm...


Would a macro work ok for you, like the inbuilt vec! one? There's a crate for that named maplit: https://docs.rs/maplit

    let map = hashmap!{
        "a" => 1,
        "b" => 2,
    };


one of the fun things about Rust, is any time you think "man I wish there was some syntax to do this thing in Rust" you can make a macro to make it reality, usually one already exists.


You shouldn't be writing literal HashMaps in your code very often anyway; you should use a function that pattern matches on a string for the day-of-week-to-number example (returning Option of course). Most other cases where you might want literal HashMaps are well-served by either pattern matching functions or structs. The point of a HashMap is that it has keys that you don't know at compile-time; if you use a HashMap literal, that's basically saying that you do know the keys at compile time.


I assume it must be a common finding. I have too wanted to initialize a map value with a literal, only to find that one needs to juggle around their absence. Very weird thing to miss in such a great language, really.

I guess that the syntax proposal would get too complicated with all the different possibilities (like what happens if you want to store one ref, now you need to add lifetimes and such), but overall for the novice it just looks like a strange thing to not have.


you can so easily make a macro in rust to do exactly what you want, that it doesn't likely make sense to include it as part of the language. though perhaps the macro should be in the core libs.


Yeah, probably such a macro for maps should be provided in the core libs. It would be more consistent with providing the equivalent one for vecs. From a user (of the language) perspective, having one but not the other is just confusing.

Also writing macros is not the first thing one learns when getting to know Rust, so a learner won't know how to write her own.


This is very good but... Description of traits suddenly dumps references, mutable references, reference bindings, dereferencing with zero explanation:

--- start quote ---

Trait methods can also take self by reference or mutable reference:

  impl std::clone::Clone for Number {
    fn clone(&self) -> Self {
        Self { ..*self }
    }
  }
--- end quote ---

What? What does this even mean? What is <asterisk>self, and why?

Too many questions, not nearly enough answers.


If you have a struct of type T and an instance of that struct t, you can create a new instance using the syntax

  T { ..t }
which means "a new T with all the fields set to their values in t".

Inside the trait "Self" is an alias for "Number".

"&self" is shorthand for "self: &Number", i.e., a reference to Number.

To dereference a reference, prefix with

  *
So:

  self
has type "&Number"

and

  *self
has type "Number".

Thus,

  Self { ..*self }
is creating a new instance of Number with its each set to the same value as self.

Disclaimer! everything I know about rust I learned 2 days ago from this blog post, so I might be wrong :-)


What a nice approach to presenting the language in a clear cut way. We should have this for every language out there.


ive been perusing a bit of rust code for the past 2 years, reading an article here and there....always heard it was a complex and low level , next generation C language. comparing it to go etc... but never wanted to learn it... This article is _so_ simple in its delivery. It really got excited in the 'beauty' presented by each simple chunk.


This is great write up. I wish every language had this on their official page. I'd personally love one for C++.


This is so good. I would love this kind of "learning" for every language. It's definitely not beginner friendly but for anyone that already knows a language this is a super introduction and quick start, maybe even tipping point to start using the language.

Only remark: If I had anchor tags at specific points creating a reference would be easy, this is referenceable material.

Great job though, I am amazed at the quality of this. Hell yeah!


I am really happy I found this blog post. If you like this one you should check out some of his other posts. He has some amazingly insightful other posts on Rust including some very in depth topics. Also, a great comparison of Rust and Go. This is also why I love Hacker News. I would have never found these posts if this one had not made it to page one.


I want off Mr. Golang's Wild Ride [1] is a great read. The part about how different OS file permissions are handled in Go vs Rust is great. Even though I don't know Rust, I thought the Rust approach looked easiest since it's accurately represents the underlying systems. It was really surprising to see that Go's motivation in glossing over the complexity is to keep things simple. Is a half working implementation really simpler?

1. https://fasterthanli.me/blog/2020/i-want-off-mr-golangs-wild...


Every young language must have tutorials like this.

Start with the utter basics but move quickly through the common issues.

More than anything, these kinds of tutorials make it possible to at least start grasping the rest of the documentation.

I find with so many funky languages, the little examples aren't so hard, but there's too much 'unknown' syntax in the way to make sense of it.

These are essential.


I want books written like this. I've had too many books where I die of boredom over and over again in the first 150 pages and then I never read them again. It's like there's some rule saying books must be super wordy.


That rule is usually called the publication contract, and it usually literally says how big the work needs to be, even if you can say it better in fewer words/pages/chapters/etc.


TIL. This seems to be waste of paper/time for consumers. What is the incentive for publishers by preferring wordy version than terse one? Is it because they afraid of being perceived as not worthy enough to be a book?


Holdover from another era. Pre-internet, we had more time to spend w/ books, and it was harder to gain context to make things make sense -- a little hand-holding and digression was more appropriate then.


I really enjoyed this, and it also inspired me to try a quick experiment: what would the same blog post look like for Go?

I tried it and wrote https://dmitri.shuralyov.com/blog/27. It was a fun exercise to go through.


Thanks for sharing. I enjoyed this. Could you elaborate on your closure example though? This was the only one I didn't understand. The explanation is a bit terse = "Closures are just variables with a function type with some captured context." Are these anonymous functions? What is calling the anonymous function if so in main? Cheers.


Thanks for feedback. I’ll try to clarify that section.


I did enjoy the article. It was quite clear but it made me wish for a bit more. Unlike the Rust article I didn't see any thing that suggested how golang might be special. I did see your suggestions for further reading though. I'll check them out.


Thanks for feedback. I think the “special” part for me was that there were large sections of the Rust blog that I could skip having to explain, because they aren’t a part of the Go language. To me, Go is a language that’s much easier to learn and be productive with. Maybe I should make it more clear what I skipped.


Numbers could also be differentiated in Rust. For example, to make sure your "16" is an i32 you could write

> let x: i32 = 16i32;

Otherwise, it's a very comprehensive article. The only missing parts are the async/await keywords and the sync primitives (like Mutexes/RwLocks).


C# has a similar feature, but it's less verbose, e.g. "16l"/"16L" is a long (64-bit integer), "16u" is an unsigned, 32-bit integer etc.

I find the rust syntax a little difficult to read. Take "16i32" for example - it's not immediately clear which part is the actual value.


The problem with "long" "unsigned" "short" "long long", etc. in the C world is that they all can mean different things depending on the architecture and compiler. i32, u64, f32, etc. make it very explicit what sized object you're working with. I'm not sure that's hugely applicable in C# land as there's only, what, two compilers and two? three? hardware architectures supported.

I agree that something like 16i32 is difficult to read which is why I typically use an underscore to separate the number from the type if I need to write a literal that way (e.g. sometimes I find 0_f64 easier to grok than 0.0).


Not a problem with C#, but I recognise it's an issue with C.

I didn't know you could use an underscore to separate the type - I think that definitely helps!


One neat thing Rust let you do is put underscores in number literals. It's great for readibility:

    0xFFFF_FFFF // easier to count than 8 Fs
And you can use that with type suffixes:

    let a = 255_u8;
Although in that case, you'd probably either omit u8 entirely, letting the compiler infer the type of 'a' from usage, or use a colon to give it ane explicit type.

Type suffixes are especially useful for arguments to generic functions:

    write_le(12_u32);
    write_le(255_u8);


You can do the same thing with C#, which is great for groupings, e.g. "2000000" becomes "2_000_000".

As a rust noob, I didn't realise rust supported the same syntax, or that you could use it to separate the type from the value (which in rust's case, I find really helps readability).


C# 7 (2017) and higher also allows underscores in number literals.


You don't need both of those "i32"s. Rust has type inference, so you can just write either `let x = 16i32` or `let x: i32 = 16`.


Also you can use underscores in numeric literals. So you can write

  let x = 65536_i32;
Or

  let x: i32 = 65_536;


I use it mostly if I'm passing arguments to another function. It serves as some sort of code documentation.

> my_function( 44i32 );

That makes it clear from the code that function accepts an i32 and not any number.


The type signature of the function will also enforce that.

You can also use generics to allow you to accept a wider variety of numbers, e.g.:

  fn takes_a_float_like_number<T>(number: T) where T: Into<f64> {
    // Number will now be a 64-bit IEEE floating point object
    // for this scope
    let number = number.into();
  }
This will only compile if your type is small enough to fit into an f64. So takes_a_float_like_number(500_u32) will compile but takes_a_float_like_number(500_i64) will not. The error message will be a wee bit obtuse as it will be something like "trait Into<f64> not implemented for i64".


How would you improve that error?


"Literal {} is too large for type {}" (or thereabouts) is how this is handled in C/C++ compilers. I think it is enabled by -Wall or -Wpedantic in clang.

In other languages I've seen something like "type {} can't represent literal of value {}" which is a bit more generic and applies to things like floats/signed ints and values that are too small/negative.


Nice. What is dont understand is why is the second Vec2 used in this example? It introduces x and y, which are of type float right, not Vec2?

let v = Vec2 { x: 3.0, y: 6.0 }; let Vec2 { x, y } = v; // `x` is now 3.0, `y` is now `6.0`


Because you're ---unpacking--- destructuring a Vec2.

The most common form of this idiom I've seen is:

    if let Some(x) = foo() {
        println!("{:?}", x);
    }
where fn foo() -> Option<_>, i.e. foo returns either Some(_) or None. (if let is used to account for the possibility of None; it's technically different syntax to let, but very similar.)

Here, x isn't a Some (a variant of Option). It is, however, representing a value inside a Some, which we want to get out. Likewise with your example; we want to destructure from a Vec2, so we specify a Vec2 (with identifiers instead of data) on the left hand side and the data on the right hand side, and it takes the data out and binds it to the identifiers.


This is fantastic! Using it, I was finally able to write my favorite toy-problem in Rust (scoring a boggle board). Rust solution came out 25% faster than my (I thought) highly-optimized C++ solution, wow...


I've been writing Rust for about 2 years, I read this and learned all kinds of stuff I still did not know. Fantastic job on this! I am now a bit upset that everything isn't taught this well :o


I wish all languages had this kind of page as a requirement of the language manual. Author, I will hold your beer...


What does `type Output = Self;` for the `std::ops::Neg` trait mean? It wasn't explained on the page.


IIRC that's called "associated type" and is mostly used for shortening the function declaration inside the trait through using e.g. "Self::Output".

E.g. the definition of the real std::ops::Neg trait: https://doc.rust-lang.org/std/ops/trait.Neg.html


It’s not about shortening things, it’s actually about semantics! Choosing to make something have a type parameter vs an associated type depends on what you’re trying to accomplish.


I found this extremely helpful. I’ve always been interested in Rust but never got around to it cause most of the source code I’ve read on GitHub was overwhelming. This is a nice walkthrough to get someone like me interested. Great work!


Loved the approach, no time wasting and you can dig on things are not clear for you.


Close to the beginning there is a phrase you'll want to review:

> If we we really we wanted to annotate the type of pair, we would write

Ok understood, we're us and we're all together in reading this! :-)


For a clickbaity title, it's surprisingly useful. Of course, the time to read it might take half an hour, but the time to really absorb it might take two weeks.


this is the style that let me get into python reading a python book by new riders within 3 days. i replace perl based module within a week of reading.

this: teaching material should give immediate confidence to apply learned material and - refine it later.

excellent write up


Typo: “If we we really we wanted”


Does anyone know of a similar tutorial for C, C++, Python, et al?


is there a poignant guide to rust ? something similar to ‘learn you some erlang for great good’ ?


Next up: Learn mandarin in 0.1^99999999999999999999 seconds.


Very nice! first impressions is that it is very similar to Kotlin which i love working in, seems modern languages are converging on common set of niceties


Serious question; why aren't we all using Rust?


I think for a lot of use cases, the safety and security and whatnots you get from Rust just aren't required and people already have a large time investment in Python/Go/Java/etc.

Certainly I can do pretty much everything I need to with C/Python/Go because I accept the tradeoffs and know enough to work around the safety/security issues.

(But Rust is on my list of things to get around to in 2020.)


> let x: i32 = 42;

I'm sorry but this notation will always make me scream. C is so much simple:

int x = 42;

not "let", no colon, and no ambiguous "i32 = 42"


I worked with C for nearly 20 years, and it is everything but simple. It is complicated, not "low level" as many thinks, full of weird edge cases, compilers will happily compile non standard code, fragile (hard to refactor), full of implicit conversion you didn't expect...

Rust has a good design, it still have a few rough edges but the syntax is great and improving.


I found in "real" rust code I very rarely declare any types other than in function signatures and I keep functions pure, short and to the point.

It's so rare to have hardcoded values anyway.


You mean, like when declaring an array:

    int[4] arr; // oops, doesn’t compile
Or a function pointer:

    int (*)(int) fptr; // oops, doesn’t compile
So much for the simplicity of C declaration syntax.


One can make the argument that the C declaration syntax is simple, because the rule to make a declaration is to simply follow a type name by an expression where the declared variable is used. The fact that you write "int[4] arr" shows that you don't know how it works (which is not a criticism; it's just not well known how it works).

The correct way is to write

    int arr[4];
and to interpret it as "arr[4] is an int" (which is only a slight lie because arr[4] is undefined if arr is a 4-element array).

How do you declare an array of pointers? Again, you write

    int *arr[4];  // array of 4 pointers to ints
because in C expression syntax, "* arr[4]" means to index into the array first and then to dereference. If that is an int, it means that arr[4] is a pointer to an int, and consequently arr is an array of pointers to ints.

If you want a pointer to an array of ints instead, do this

    int (*arr)[4];  // pointer to an array of 4 ints
again, because that's how regular C expressions work. Functions are (mostly) not an exception:

    int myfunc(int x);
    int (*myptr)(int x);
which is to say that "myfunc(x) is an int" and "(* myptr)(x) is an int", i.e. myptr is a pointer to a function that takes an int and returns an int.

Note that in the beginning (i.e. K&R C, pre-1989) the way to declare functions was consistent: Declarations had to be

    int myfunc(x);
i.e. there was no types in the argument lists. The types in the argument list appeared, I believe, after Stroustroup added them to C++ in order to improve type-safety.


> the rule to make a declaration is to simply follow a type name by an expression where the declared variable is used.

Not really. You can't just use _an_ expression, you have to use a specific expression. For example, * ppX is a perfect valid expression for a pointer to a pointer named ppX, as in:

    int **ppX;
    if (*ppX == NULL)
So you need to use an expression where the declared variable is used that results in a non-pointer type. And then, there's actually more to it... only certain types of expressions are valid. E.g. this isn't a valid declaration for a pointer, even though it's valid as an expression:

    int pX->;
I find that basically anytime somebody tells me that C rules are to "simply [...]", they've inevitably ignored a whole bunch of cases. Your post is no exception.


> So you need to use an expression where the declared variable is used that results in a non-pointer type.

I think you are wrong here, you can't even make something other than what "results in a non-pointer type". Because by definition, you're making the type to the left with the expression. (That could still be a pointer type if it is a typedef'ed type; such as "typedef int * intptr; intptr x;" but I don't think you meant that by "pointer type" [0]).

And your first example is perfectly syntactically valid (other than missing the conditional statement that must follow the if-condition).

And no, "pX->" is not a valid expression. Was that a typo?

And yes, only a subset of expressions are valid. Basically, the expressions that you can form by applying subscripts, (x[3]), dereferences (* x), and function calls. Because, a declaration like "int x + 3;" or even "int x + y;" just doesn't make sense. I don't see a problem there.

Btw. I'm not saying that C as by the current standards is super straightforward and pure. It's definitely not, and C does actually have a lot of historical baggage that makes our lives a little harder. I'm just explaining the underlying unifying principle, which IMHO is actually nice. And honestly it seems you, too, are still confused because there is just a lack of clear explanations about C declarations. That principle should be much more well-known, and almost all problems that novices have with declarations are unnecessary frustration that they wouldn't have if someone would have told them the trick.

[0] By the way, typedef is another thing that seems to be super obscure, while it is extremely simple: It's just a keyword that modifies declarations to declare an alias for that type, instead of a (named) variable of that type.


> The correct way is to write > int arr[4]; > and to interpret it as "arr[4] is an int"

No. The "interpretation" is "arr is an array of ints", and that is its type.

The complexity of this declaration is evidenced by the fact that you spend another page of text "randomly" adding characters and delimiters around variable declarations to change its type:

  int arr[4] // array of ints
  int *arr[4] // array of pointers to int
  int (*arr)[4] // pointer to array of ints
These are all changes to type. Yet, instead of changing the type declaration, a bunch of stuff is added all around the variable. And you have to come up with ridiculous explanations like "arr[4] is an int which makes arr an array of ints".

That's exactly why most languages said: "if we're changing the type, we're going to reflect this in the type". In a better world the examples above would be something like

  int[4] arr; // array of ints
  *int[4] arr; // array of pointers to int
  *(int[]) arr; // pointer to an array of ints


> The correct way is to write > int arr[4]; > and to interpret it as "arr[4] is an int" >> No. The "interpretation" is "arr is an array of ints", and that is its type.

I would have been happier if I had found my explanation interpreted in a more generous way. But that's basically what I was saying (and literally what I was saying in another comment).

As to the rest, the advantage of the C approach to type declarations is that there is no type declaration syntax. Just expression syntax. And that it's very terse.

> That's exactly why most languages said: "if we're changing the type, we're going to reflect this in the type". In a better world the examples above would be something like

There's a problem in that your proposed syntax is not even properly parseable. How would a parser recognize that your lines start with types i.e. are variable declarations? For example the example "* (int[]) arr", it would start reading the asterisk and the opening parenthese as an expression, and then suddenly find a type name (int), and could then not throw an error if it was one, but had to start all over again and try to parse the whole thing as a type declaration. That's not exactly nice - good syntax is parseable with a single token of lookahead. That not only makes parser implementations easier, but is also easier to read for humans and leads to better error detection.

Apart from that I think that your examples are about what D does, and this stuff is WORSE in my opinion. While the real problem with C declarations, which is the need to thread a symbol table through the lexer/parser, is still existent in D syntax (I believe), it introduces other problems:

How do you use an array that was declared as "int[5][10] arr"? Using it as "arr[4][9]" is an error: it must be "arr[9][4]". In other words, your approach to type declarations requires the programmer to constantly turn around declarations in his/her mind, which leads to lots of mistakes. It gets even harder when you add pointers / functions, for example "int* [5][10] arr" I believe you must access as "* arr[9][4]", or whatever the D dereference syntax is.

Java can afford to let you declare "int[][] arr = new int[5][10]" and let you access "arr[4][9]", at the cost of cheating. Java can "turn around" the dimensions because it doesn't actually have an "algebraic" type syntax, which it doesn't need because it doesn't have pointers / function pointers so there is no interaction there.

That's one of the reasons why most newer languages have the type to the right of the variable name, and types grow to the left (towards the variable name). For example, "let arr: [5][10]int" you can access as "arr[5][10]" which is easier, but that principled approach to syntactic construction of types also puts requirements on the expression syntax: For example, "let arr: [5][10]* int" would have to be accessed as "* arr[5][10]", which is weird - or the expression syntax must be changed to use a postfix dereference operator.

In short, it's not as easy as you thought, and the C syntax is in fact pretty smart. And from a practical standpoint I prefer the C way very much because it's so much terser and has less punctuation than all the alternatives. The only thing that annoys me is the lexer hack.


http://c-faq.com/decl/spiral.anderson.html Is the rule I learned to understand C declarations and while the rule there is described as simple I think the examples even without argument types are actually fairly complex.

It also seems telling that no recent language has followed C’s example for declaration style, which is more implicit than explicit.


The "spiral rule" doesn't get at the heart of declarations. It's just by some guy that tried to figure it out on his own, and what he discovered was basically not declarations but the precedence rules of C expressions ;-)

> It also seems telling that no recent language has followed C’s example for declaration style, which is more implicit than explicit.

Actually most languages don't let the user do what C declarations let you do. For example, in Java (almost) everything is an object, and you can't just create a triple-indirected pointer. So, these languages can afford a declaration syntax that is less potent.

And then there are other more systems-oriented languages that chose to not copy C declarations. They come with their own gotchas. As examples I will pick D and Rust.

In D, you create a multi-dimension array like this: int[5][10] arr; Leading you to believe that you can use it as arr[4][9]; Wrong. That's an out-of-bounds error. You need to write arr[9][4]. Now, was that totally not confusing? The alternative is to expand these types systematically to the left, i.e. write [10][5]int, and maybe move the type to the right of the variable name, as in "let arr [10][5]int;". Honestly I don't like that either.

I've never really used Rust (either), but its downside, in my opinion, is that it has much more distracting syntax / punctuation.

I would love if there was a uniformly better way to declare things than the C way, but I still think C has the best tradeoffs for my practical work. The next time that I toy with language design I might try to simply go with C declarations, prefixed with a sigil or "let" or something, to remove the need for the lexer hack.


I've been writing code in C and C++ since I was 12 and I still sometimes forget where the array size and brackets are supposed to go.


The C approach makes the compiler much more complex, and introduces extra typing in other language constructs. (like parens around if statements) This is why many newer languages do something more like the Rust way. Overall it is simpler for programmer and compiler.


Pretty much. The mere existence of cdecl(1) says a lot about the simplicity of C's type declaration syntax.


It's unfortunate that the simple (and easy) underlying principles are not well known. See my other comment.

The bigger reason why recent languages have different declaration syntax is to avoid the need to carry a symbol table during parsing, and to avoid the need to parse all files serially instead of independently. Because to recognize a declaration the parser has to know which words correspond to types in the current scope.


I don't think parentheses around if-conditions are related to C variable declarations (if that's what you were saying), and I think it's fair to say that C's syntactical terseness is unmatched.

The parentheses are required to separate the condition from the following conditional statement.


How about `let x = 42` and let type inference do the work? I used to do a lot of C-style variable declarations in other languages but I'm warmed up to Rust's really fast because most of the time I don't need to explicitly name the type.


"let" should be removed, and it doesn't change that the type should be placed before, not after with a colon.


Whether a type should come before or after the name is a fairly subjective matter. I believe that after is generally superior, especially when the type is optional.

But there’s a very practical reason for requiring the `let` token: it makes parsing very much easier. With `let`, you can keep a LL(1) grammar, because seeing `let` tells you to next parse a pattern, then if there’s a colon a type after that. But if you don’t put something in there, you get a genuinely intractable problem once type grammar is not trivial: sure, `int foo;` is simple and obvious, but what about `A<B, C> d;`? should that be parsed as an expression (respaced, `A < B, C > d;`). Some languages have not resolved this style of parsing ambiguity at all, and figure it out at runtime, based on what else they find (Perl is infamous for this). I think others only kind-of resolve it, by looking at what symbols are present at compile time, to decide what was meant. Others just declare that such ambiguities are parsed one way, and you can rewrite your code (e.g. add parentheses) if you want to mean the other. Still others have resolved it otherwise, by other more subtle syntactic means, so that even if you need arbitrary look-ahead while parsing, there’s not quite any overlap between the two syntaxes (e.g. don’t support commas in this way as a kind of alternative to semicolon within expressions; or use proper matched delimiters like [] or () for generics).

Rust chooses to make parsing simple, which benefits humans as well as machines, reducing cognitive requirements in reading code.

Furthermore, in Rust what follows `let` is not an identifier or identifier list, but rather a pattern. Imagine the following contrived example:

  let x = [[0]];
  type x = [u32; 1];
  let a = 0;
  let [a]: x = [1];
That falls over completely if you put the type first: `x [a] = [1];`—does that define a new binding a with value 1, or does it set x[0] to [1]?

And finally, as I mentioned, the type is optional, and not commonly required, so you end up with something like C++’s `auto` keyword, which is basically `let` but spelled worse (and with worse semantics).

The end result is that for Rust specifically, what you desire is quite unsuitable, and what it has works very well—and that its reasons for doing things that way are well worth while considering.


then you have to introduce new keywords in other places to remove ambiguity.


lol okay, how about you just stick with C then since you like it so much.


C is the opposite of simple. If you want to create a binding in C, you need to learn multiple, unnecessarily complex rules. Sure, creating a binding to an int is easy:

int foo = 42;

but doing the same thing for pointers to arrays or function pointers is not:

void(foo)(int) = bar; // function pointer

int (foo)[N] = baz; // pointer to array

OTOH in Rust you just need to learn one rule: bindings are created with the grammar "PATTERN [: TYPE]" ([] means the ":TYPE" is optional.

That's the only rule you need to know: (1) it works consistently everywhere in the language (let, match, for, if-let, function arguments, while-let...), (2) it lets you create bindings, and (3) it gives you pattern matching and destructuring for free. For example,

let x: i32 = 42; let y: fn(i32) = foo; let z: \const [i32] = bar;

but also:

struct Entity { id: i32, vel: (f32, f32) }

let e: Entity; // given

let Entity { vel .. } = e; // vel points to e.vel

let Entity { vel: (v_x, ..), .. } = e; // v_x points to e.vel.x;

// Works in function arguments:

fn foo(Entitiy { vel, .. }: Entity) -> (f32, f32) { vel }

// Works in for loops:

for Entitiy { id, .. } in entities() {

   // use the id of the entity 
}

match entity { Entity { vel: (v_x, ..), .. } if v_x > 0 => {

       // do something if the entity.vel.x > 0

   }
   _ => /* do something else otherwise */ 
}

etc.

That's IMO the definition of efficiency: one simple rule, that has no exceptions, works everywhere, and lets you do a lot.

What C does of having multiple different incompatible rules, some for doing simple things like "int a = 42;" and some for doing complex things like "void (foo)(int) = bar" is not simple. It's just a pain. It means that professional C programmers need to look up rules for things they don't use as often, which is why Stackoverflow is full with questions about "How do I assign a function pointer to a variable in C?", "How do I pass a function pointer as a function argument in C?", etc. Having to learn multiple rules to do the same thing just sucks, and is one of the main reasons I love using Rust: when I learn something, I learn it only once, and things I learn later just reinforce that I learned it in the right way.


But that's just not true. There is a single principled way for C declarations, and it's as easy as "a binding is a type name followed by an expression and a semicolon".

And while small inconcistencies have been introduced over time, it was entirely consistent when C was conceived. The problem is just that the simple rule how to read declarations is not well-known (I don't understand why). See my other comments.


> "a binding is a type name followed by an expression and a semicolon".

I find this hard to grock. Do you have a link to the actual grammar and production rules that apply to all cases ?

When I see:

  int a = 3;
I don't see a "type name followed by an expression and a semicolon", but rather the grammar "TYPE_NAME NAME = EXPR;". However, that's not correct, since TYPE_NAME cannot be any type (e.g. a function pointer won't work). When looking at

  void (*foo)(int) = expr;
  int (*foo)[N] = expr;
or at how the keywords "struct" and "union" are part of the type name in some contexts, but not others, I see quite different grammar rules.

I've tried to find literature about this, since I hack on a toy C parser every now and then, and those could simplify it, without much luck.

I've never thought of these as "CDECL = EXPR;" or similar, since that does not work either (e.g. a function declaration would be "void foo(int);" but that's not exactly the same as "void (*foo)(int) = expr;").


> Do you have a link to the actual grammar and production rules that apply to all cases ?

A google search returns that Annex A in the ISO standard has something like a grammar. Here is a link to an unoffical version of the standard: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf . However, this is not the right place to look for if you just want to understand the simple principle behind declarations, because the purity of declaration syntax has been considerably diluted in the last decades. So, only look in the standard if you are in good psychological health, and need to implement a production-grade C compiler. Also, note that grammars are overrated and tend to make things more complex than they really are. They are often too theoretical of a construct to be applicable, and that is certainly true for a language like C.

Instead, I recommend you to read the idea from the horse's mouth, here is Dennis Ritchie talking about it: https://www.bell-labs.com/usr/dmr/www/chist.html

> I don't see a "type name followed by an expression and a semicolon", but rather the grammar "TYPE_NAME NAME = EXPR;".

Let's ignore the optional equal sign + initializer expression, and just focus on declarations without initializers. The syntax is (as I said) "TYPE_NAME EXPR;" where EXPR is an expression that makes use of the newly-declared variable.

> (e.g. a function pointer won't work)

Not sure what you mean by "function pointer", but I'm pretty sure it's not a type name in the way I mean it. Here is how to look at your examples:

    void (*foo)(int) = expr;
                     ^^^^^^ (optional) initializer
         ^^^^^^^^^^^ expression (originally it was (*foo)(x),
                                 but as I said nowadays there are
                                 type specifiers in the list which
                                 is a little inconsistent.)
    ^^^^ type name 

    int (*foo)[N] = expr;
                  ^^^^^^ (optional) initializer
         ^^^^^^^^ expression
    ^^^ type name
Basically, the first example says "(* foo)(int) is a void", so you can conclude that foo is a pointer to a function that takes an int and returns a void.

The second example says "(* foo)[N] is an int" so you can conclude that foo is a pointer to an array of N ints.


How do you feel about C's function pointer syntax?


To be pedantic, there is no (specialized) function pointer syntax. The syntax to declare function pointers is just general declaration syntax, which in turn is basically regular expression syntax.

How to declare a function pointer is hard to grok when not being introduced to declaring variables in a principled way. But it makes sense and is not too clunky if you're only declaring a function pointer every now and then.


Exactly! C is internally consistent in that, for example, the star in:

  int *a;
is part of the variable declaration, not the type. And the function pointer syntax derives from that. (I know you know, just providing context).

However, just because it's consistent doesn't mean it's easy to remember or use. Just repeating "it's easy!" doesn't make it true. Others have brought up that cdecl exists, which illustrates this pretty well.

It seems you've worked with this syntax for long enough, and it fits your way of thinking well enough, that it's not an issue at all for you! But there's ample evidence of a lot of people struggling with it, which should be sufficient to deem it "not easy".


I can understand why newer languages have moved away from this style of declaration syntax, and moving away has brought merits such as becoming more intuitive to understand to novices, as well as better support for tooling / IDEs, including parsing performance gains.

On the other hand, nothing is quite so easy to read and write for me as the terse C declaration syntax (granted I don't declare a lot of pointers-to-functions-returning-pointers-to-functions).


I just made a similar comment before I read yours, but about C# rather than C, where just like C you can do:

`int x = 42;`

But since int (32-bit integer) is the default integer type, you can also do:

`var x = 42;`

If you want to use another type, for example, ulong (unsigned, 64-bit integer), you can do:

`ulong x = 42;`

Or:

`var x = 42ul;`

C#'s syntax is not only terser, but seems a lot easier to read to me. With rust's syntax, it's not immediately clear what the value is, and what the type is.


It's only clear because you already know what "long" and "int" mean in C#. It's potentially quite confusing for someone coming from a C or C++ background as they mean different things there. i32 and u32 on the other hand are ambiguously 32 bits.

In C++ I've actually had code using a value like `42l` that works on Linux but not on Windows, because the sizes of those types aren't fixed.


> In C++ I've actually had code using a value like `42l` that works on Linux but not on Windows, because the sizes of those types aren't fixed.

if you want fixed size for literals you can use the "function macros for integer constants" :

    #include <cstdint>
    auto x = INT32_C(-456456435); // guaranteed at least 32-bit
    auto y = UINT64_C(45645654654321685768); // guaranteed at least 64-bit.

Those will enclose the constant in the proper literal suffix - ull on 32-bit windows and ul on linux for instance.


A valid point about knowing what the types mean, but even if you don't, it is at least immediately apparent which part is the type, and which part is the value.

I've only dabbled with rust, but I came across this very early on, and was baffled by the syntax. After further dabbling, I still can't see it and immediately know what the value is.


AFAIK, syntax is:

let variable_name [: type] = value[(i|u|f)bits];

In rust, 123l is written 123i64 or 123i32 (also 123_i32, or 1_2_3i32), depending on what 123l actually means.

I don't actually use rust, but it seems very clear and obvious to me (obvious once you know the syntax above).


You're not wrong, but also not comprehensive: it's not just variable names in that position, but full-on patterns.:

    let (a, b) = some_two_tuple;
will introduce two varaibles, a and b, as respective halves of the tuple.


Meh, 123_i32 seems like a big improvement to me, but with the others, I just can't immediately grok it - it's having numeric digits as part of the type name that throws me.

I realise of course that not everyone will feel the same.


C99 has effectively identical types in the standard library (uint8_t ... uint64_t, and ditto for int8_t ... uint64_t). There are very few modern C codebases where I haven't seen these used (personally I use them because it's much easier than remembering what is the minimum guaranteed size of unsigned long). The Rust ones have just slightly more terse names (which it's understandable to dislike, though I personally find the endless _t suffixes in C type names to be a bit annoying as well). And Rust's usize is basically uintptr_t.


Sorry, I should have phrased that better; it's more like, having numeric digits as part of the type name that immediately follows the value throws me.

So something like this (rust):

`16i32`

Compared to something like this (C#):

`16i`




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: