This is the most useful introduction to a language I have ever read. Often language introductions produce a "Wall of Complexity" and I failed in my last two attempts learning Rust failed because of that. This is just great.
Be warned, the "half hour" part is probably a bit like "99 cents" as a price tag. I've already spent more than that, but it is time well spent.
I also like Derek Banas videos (https://www.youtube.com/channel/UCwRXb5dUK4cvsHbx-rGzSgw) for this same quality. He doesn't do many "deep dives" into the languages, but he does a fantastic job getting your feet wet with very common tasks... much like these learnxinyminutes pages do.
I use this website as the most effective cheatsheet ever. Forget about PDFs and all these things people put together. Pretty much every programming language is on LearnXinYMinutes and it is like a standardized cheatsheet across all languages. Brilliant, so brilliant!
I failed to make it through the Rust book in (I think) 2017, and kind of hated it; I made it through easily in 2019 and enjoyed it... the book has improved that much. Oh, and the compiler error messages are better, too, which helps enormously.
> This is the most useful introduction to a language I have ever read. Often language introductions produce a "Wall of Complexity" and I failed in my last two attempts learning Rust failed because of that. This is just great.
> Be warned, the "half hour" part is probably a bit like "99 cents" as a price tag. I've already spent more than that, but it is time well spent.
Ahh, I remember a while back reading a blog where the author talked about `std::mem::drop` being their favorite standard library function because they used this effect. It is literately defined as:
Right, so drop is useless :) We can drop things fine without it. Since of course it's scope/ownership based and not something that's explicitly called.
Yes, it's effectively like free -- though C++'s "auto" is probably a closer idea.
The main difference to C's free is that you simply cannot double-free a value (without using unsafe). That's because a value is only dropped when it falls out of scope -- and since it values only have one owner (and references must not outlive values) there cannot be any dangling references after it's dropped.
The documentation for std::mem::drop is explaining that dropping a value is a property of the language and scoping rules -- thus there is nothing for the function to do other than take a value and let it go out of scope.
It’s more like delete in c++ because you can implement the Drop trait for a type in order to run code when it is dropped in order to clean up resources etc.
I think no-op is a bit misleading, and throwing away is more precise, since Rust has strict evaluation.
let _ = timeConsumingFunction();
Still needs to spend time performing timeConsumingFunction(). It might not move, but it still did the work and produced a result, you are just not using it (and hence not need to move it).
If you you had lazy evaluation I would agree with your argument for the no-op terminology.
I think that's the same thing as in let bindings; just like you can bind to a name in the wild card case, you can also choose not to bind to it. The main difference is that not binding in the wild card case is more common than in `let` bindings.
The first one drops ownership over the variable right after `let`, and the second drops in the end of the scope where the `let` is contained. It was mentioned in an open Rust github issue, but I can't seem to find it.
Nothing related to the tutorial itself, but I've seen many Rust basic tutorials recently, and this sorta remind me of Haskell - especially its Monad.
The Haskell community once was flooded with Monad tutorials[1]. People kept trying to put meanings on this mathematical construct, and had come up with tons of different ways to describe it. The problem is, this simple thing can fit into so many different places, so people couldn't possibly run out of new ideas. This oversaturation only confused newcomers even more, so the community concluded not to write any more Monad tutorials.
While individual tutorials enrich the community in general, it's more crucial to have a few good documentation that kill the need for more tutorials (e.g. MDN + W3School), so that efforts can be redirected into more productive things. Rustonomicon is one good documentation, but it always falls few steps short in practice. That's probably why more tutorials are being written, and why they get upvoted well in both HN and Reddit.
None of them. The concept is fundamentally flawed. It's as if everything you ever read about C was all about bitwise manipulation operations, on and on and on about bitwise manipulation, to the point not-C programmers think the language is primarily about bitwise manipulation and people start porting bizarre misunderstandings of bitwise manipulation into other languages and claiming they're just like C now, when it's just a part of the language that you'll pick up over time. Not a perfect metaphor but close enough.
However, the c community never felt like going on and on and on about bitwise manipulation... Anyone have any idea why people get so hung up about explaining monads?
When I was learning C 20 years ago, it felt like the C tutorials were really hung up on how scary pointers were, in a very similar way to Haskell's fixation on monads. It got to the point that I questioned my understanding pointers because I didn't get why everyone spent so much effort on explaining them.
Interesting, maybe the Sapir Whorf hypothesis is much more relevant for programming languages than for natural languages; maybe the language shapes the things a human can comfortably express, and this shapes the communities
I think there's a good chunk of it that's just nerd-sniping.
Some it was just a sort of stand-alone complex, too; people started writing monad tutorials because people writing monad tutorials was the thing to do. In the Go community, there was a run for a couple of years of people making stupidly overoptimized HTTP routers, even though there's very few websites where that's actually the problem and even fewer where the answer was to create a fancy router. Why? Because other people were making stupidly overoptimized routers. It was a smaller instance of the same thing and ultimately damaged the community less than I think the overfocus on monad tutorials did for Haskell, but it was not the best thing for Go.
Monad tutorials did have a particular problem, though, which is that rather a lot of them were wrong, too. I made my own list of issues here: http://www.jerf.org/iri/post/2928 That particular post is in the context of people trying to implement "monads" out of Haskell, but the misconceptions I list tended to come from the lower quality Haskell tutorials in the first place. Then, as is the way of things, the wrongness spread around the world before the correction had its boots on, as the saying goes.
You can write useful C programs without ever using bitwise manipulation. It's quite hard in Haskell without monads since the standard I/O library is a monad. You are limited only to evaluating values with REPL unless you use a monad. And, because the I/O is a monad, you cannot just say "use this function to read input and this other function to print output" so you need to explain the whole monad situation early into the language study.
Very. You just read the type signatures to see when you need to use the monad version of functions or syntax, and just get on with it. The whole system will naturally guide you anyhow. After some practical experience, you'll have enough understanding to use it perfectly fluidly. If you then feel like coming back around and reading one of the better explanations, you'll get the theoretical side better, and may be able to write your own if you are so inclined, but, you are reasonably likely to never want or need to.
I remember seeing someone write that they could tell whenever a new chapter of "Learn You a Haskell for Great Good"[0] came out, because questions on that topic would just disappear from the discussion lists!
The way I like to think about the expression / statement divide in Rust is that everything is an expression, and the semicolon is an operator akin to `ignore` in F#; it takes any operands, disposes of them, and returns the unit type `()`.
Rust in general is just such a pleasant language. It can be a bit tedious to work with sometimes, but that's usually because the problem in question is tedious in and of itself, and you didn't even notice all of the little screw-ups that could occur until you saw how Rust dealt with it.
It's only an half an hour if you are familiar with these concepts (generics, traits, pattern matching, error propagation, etc...). Some concepts are exclusive to Rust (lifetimes/ownership), so I doubt you can get it in 30 minutes if you have never written code in Rust before.
I’ve seen this guy’s posts a few times lately, he needs to learn to edit things down. What he’s saying is good, but it is extremely digressive, and the digressions never seem to return to the original point. I’m not the audience for this so I don’t have that much time to spare reading it, but I can’t imagine that style working well for an introduction to a language.
I respectfully disagree but - can't make everybody happy.
There's plenty of introductory articles to Rust in the style you desire :)
edit:
To expand a little bit, the digressions aren't just about me "getting distracted" while writing - they're very much on purpose. I always try and write pieces that expand in many different directions, because there's so much to discover, always.
Some folks come to an article wondering why Rust has two string types and end up spending an hour on Wikipedia reading about legacy code pages - and I think that's great.
My way of getting people interested in something is never a top-down, present-the-bare-minimum way, it's always about showing how what we're discussing is connected to a lot of other things, which are also fascinating and that you should check out if you want!
I think I'm talking more about your Mr Golang piece, but now that I have your ear, I guess may as well. I am less concerned with the existence of digressions than the way they're introduced. When I say needs editing, I mean that because I cannot navigate the piece properly, it takes longer to understand what you're getting at if I want to, so it seems like it's too long. Editing down is only one solution to that.
An example of this is when I was really confused reading about path extensions and non-UTF8 paths. I was promised by Cool Bear that you had a point to make using an example about a stat call, but you were no longer talking about stat, and yet you were still writing as if it was central to your stat discussion. So instead of expanding, I thought you just couldn't express what was wrong with the stat call. I nearly gave up waiting! And I was surprised that, in the end, you could (and put it quite well). Unfortunately, there are thousands of thinkpieces out there that never make the point they promise to, instead just dumping information at you and hoping it hits you like it hit them. So I am trained to close the tab when that is happening.
It might help to state why you're going to digress before you do, with a promise to return, so that people who are hooked can be confident you'll eventually make your point (and may skip back and forth to digest your argument again without interruption). If nothing else, you are making a promise to yourself to structure things in a way that is friendly to the reader.
Rather than making the piece less exploratory, adding signposting helps get the reader into the mindset you describe in the last sentence there. Otherwise, they are not sure whether to be interpreting what you're saying as a core argument or some side discussion. It hurts both the exploratory and the argumentative qualities of the writing if one is confused for the other. Without making it clear, the argument runs like my first para above: unnecessarily long-winded and questionably relevant, with no exploratory levity. With good signposting, you get to nail both. It's about two sentences' and two headings' difference, maybe a little shuffling around.
Hope that helps, looking forward to your next one.
The Golang piece is definitely subpar, it was quickly thrown together in half a day, didn't expect it to be spread so widely haha.
For other articles there's usually a few days of research, and 1.5 days of writing and editing, then a lot of touch-ups in the following weeks/months responding to feedback.
Thanks for the more detailed criticism though - I agree with the general sentiment, and haven't solved the navigation problem yet!
one of the fun things about Rust, is any time you think "man I wish there was some syntax to do this thing in Rust" you can make a macro to make it reality, usually one already exists.
You shouldn't be writing literal HashMaps in your code very often anyway; you should use a function that pattern matches on a string for the day-of-week-to-number example (returning Option of course). Most other cases where you might want literal HashMaps are well-served by either pattern matching functions or structs. The point of a HashMap is that it has keys that you don't know at compile-time; if you use a HashMap literal, that's basically saying that you do know the keys at compile time.
I assume it must be a common finding. I have too wanted to initialize a map value with a literal, only to find that one needs to juggle around their absence. Very weird thing to miss in such a great language, really.
I guess that the syntax proposal would get too complicated with all the different possibilities (like what happens if you want to store one ref, now you need to add lifetimes and such), but overall for the novice it just looks like a strange thing to not have.
you can so easily make a macro in rust to do exactly what you want, that it doesn't likely make sense to include it as part of the language. though perhaps the macro should be in the core libs.
Yeah, probably such a macro for maps should be provided in the core libs. It would be more consistent with providing the equivalent one for vecs. From a user (of the language) perspective, having one but not the other is just confusing.
Also writing macros is not the first thing one learns when getting to know Rust, so a learner won't know how to write her own.
ive been perusing a bit of rust code for the past 2 years, reading an article here and there....always heard it was a complex and low level , next generation C language. comparing it to go etc... but never wanted to learn it... This article is _so_ simple in its delivery. It really got excited in the 'beauty' presented by each simple chunk.
This is so good. I would love this kind of "learning" for every language. It's definitely not beginner friendly but for anyone that already knows a language this is a super introduction and quick start, maybe even tipping point to start using the language.
Only remark: If I had anchor tags at specific points creating a reference would be easy, this is referenceable material.
Great job though, I am amazed at the quality of this. Hell yeah!
I am really happy I found this blog post. If you like this one you should check out some of his other posts. He has some amazingly insightful other posts on Rust including some very in depth topics. Also, a great comparison of Rust and Go. This is also why I love Hacker News. I would have never found these posts if this one had not made it to page one.
I want off Mr. Golang's Wild Ride [1] is a great read. The part about how different OS file permissions are handled in Go vs Rust is great. Even though I don't know Rust, I thought the Rust approach looked easiest since it's accurately represents the underlying systems. It was really surprising to see that Go's motivation in glossing over the complexity is to keep things simple. Is a half working implementation really simpler?
I want books written like this. I've had too many books where I die of boredom over and over again in the first 150 pages and then I never read them again. It's like there's some rule saying books must be super wordy.
That rule is usually called the publication contract, and it usually literally says how big the work needs to be, even if you can say it better in fewer words/pages/chapters/etc.
TIL. This seems to be waste of paper/time for consumers.
What is the incentive for publishers by preferring wordy version than terse one? Is it because they afraid of being perceived as not worthy enough to be a book?
Holdover from another era. Pre-internet, we had more time to spend w/ books, and it was harder to gain context to make things make sense -- a little hand-holding and digression was more appropriate then.
Thanks for sharing. I enjoyed this. Could you elaborate on your closure example though? This was the only one I didn't understand. The explanation is a bit terse = "Closures are just variables with a function type with some captured context." Are these anonymous functions? What is calling the anonymous function if so in main? Cheers.
I did enjoy the article. It was quite clear but it made me wish for a bit more. Unlike the Rust article I didn't see any thing that suggested how golang might be special. I did see your suggestions for further reading though. I'll check them out.
Thanks for feedback. I think the “special” part for me was that there were large sections of the Rust blog that I could skip having to explain, because they aren’t a part of the Go language. To me, Go is a language that’s much easier to learn and be productive with. Maybe I should make it more clear what I skipped.
The problem with "long" "unsigned" "short" "long long", etc. in the C world is that they all can mean different things depending on the architecture and compiler. i32, u64, f32, etc. make it very explicit what sized object you're working with. I'm not sure that's hugely applicable in C# land as there's only, what, two compilers and two? three? hardware architectures supported.
I agree that something like 16i32 is difficult to read which is why I typically use an underscore to separate the number from the type if I need to write a literal that way (e.g. sometimes I find 0_f64 easier to grok than 0.0).
One neat thing Rust let you do is put underscores in number literals. It's great for readibility:
0xFFFF_FFFF // easier to count than 8 Fs
And you can use that with type suffixes:
let a = 255_u8;
Although in that case, you'd probably either omit u8 entirely, letting the compiler infer the type of 'a' from usage, or use a colon to give it ane explicit type.
Type suffixes are especially useful for arguments to generic functions:
You can do the same thing with C#, which is great for groupings, e.g. "2000000" becomes "2_000_000".
As a rust noob, I didn't realise rust supported the same syntax, or that you could use it to separate the type from the value (which in rust's case, I find really helps readability).
The type signature of the function will also enforce that.
You can also use generics to allow you to accept a wider variety of numbers, e.g.:
fn takes_a_float_like_number<T>(number: T) where T: Into<f64> {
// Number will now be a 64-bit IEEE floating point object
// for this scope
let number = number.into();
}
This will only compile if your type is small enough to fit into an f64. So takes_a_float_like_number(500_u32) will compile but takes_a_float_like_number(500_i64) will not. The error message will be a wee bit obtuse as it will be something like "trait Into<f64> not implemented for i64".
"Literal {} is too large for type {}" (or thereabouts) is how this is handled in C/C++ compilers. I think it is enabled by -Wall or -Wpedantic in clang.
In other languages I've seen something like "type {} can't represent literal of value {}" which is a bit more generic and applies to things like floats/signed ints and values that are too small/negative.
Because you're ---unpacking--- destructuring a Vec2.
The most common form of this idiom I've seen is:
if let Some(x) = foo() {
println!("{:?}", x);
}
where fn foo() -> Option<_>, i.e. foo returns either Some(_) or None. (if let is used to account for the possibility of None; it's technically different syntax to let, but very similar.)
Here, x isn't a Some (a variant of Option). It is, however, representing a value inside a Some, which we want to get out. Likewise with your example; we want to destructure from a Vec2, so we specify a Vec2 (with identifiers instead of data) on the left hand side and the data on the right hand side, and it takes the data out and binds it to the identifiers.
This is fantastic! Using it, I was finally able to write my favorite toy-problem in Rust (scoring a boggle board). Rust solution came out 25% faster than my (I thought) highly-optimized C++ solution, wow...
I've been writing Rust for about 2 years, I read this and learned all kinds of stuff I still did not know. Fantastic job on this! I am now a bit upset that everything isn't taught this well :o
It’s not about shortening things, it’s actually about semantics! Choosing to make something have a type parameter vs an associated type depends on what you’re trying to accomplish.
I found this extremely helpful. I’ve always been interested in Rust but never got around to it cause most of the source code I’ve read on GitHub was overwhelming. This is a nice walkthrough to get someone like me interested. Great work!
For a clickbaity title, it's surprisingly useful. Of course, the time to read it might take half an hour, but the time to really absorb it might take two weeks.
Very nice! first impressions is that it is very similar to Kotlin which i love working in, seems modern languages are converging on common set of niceties
I think for a lot of use cases, the safety and security and whatnots you get from Rust just aren't required and people already have a large time investment in Python/Go/Java/etc.
Certainly I can do pretty much everything I need to with C/Python/Go because I accept the tradeoffs and know enough to work around the safety/security issues.
(But Rust is on my list of things to get around to in 2020.)
I worked with C for nearly 20 years, and it is everything but simple. It is complicated, not "low level" as many thinks, full of weird edge cases, compilers will happily compile non standard code, fragile (hard to refactor), full of implicit conversion you didn't expect...
Rust has a good design, it still have a few rough edges but the syntax is great and improving.
One can make the argument that the C declaration syntax is simple, because the rule to make a declaration is to simply follow a type name by an expression where the declared variable is used. The fact that you write "int[4] arr" shows that you don't know how it works (which is not a criticism; it's just not well known how it works).
The correct way is to write
int arr[4];
and to interpret it as "arr[4] is an int" (which is only a slight lie because arr[4] is undefined if arr is a 4-element array).
How do you declare an array of pointers? Again, you write
int *arr[4]; // array of 4 pointers to ints
because in C expression syntax, "* arr[4]" means to index into the array first and then to dereference. If that is an int, it means that arr[4] is a pointer to an int, and consequently arr is an array of pointers to ints.
If you want a pointer to an array of ints instead, do this
int (*arr)[4]; // pointer to an array of 4 ints
again, because that's how regular C expressions work. Functions are (mostly) not an exception:
int myfunc(int x);
int (*myptr)(int x);
which is to say that "myfunc(x) is an int" and "(* myptr)(x) is an int", i.e. myptr is a pointer to a function that takes an int and returns an int.
Note that in the beginning (i.e. K&R C, pre-1989) the way to declare functions was consistent: Declarations had to be
int myfunc(x);
i.e. there was no types in the argument lists. The types in the argument list appeared, I believe, after Stroustroup added them to C++ in order to improve type-safety.
> the rule to make a declaration is to simply follow a type name by an expression where the declared variable is used.
Not really. You can't just use _an_ expression, you have to use a specific expression. For example, * ppX is a perfect valid expression for a pointer to a pointer named ppX, as in:
int **ppX;
if (*ppX == NULL)
So you need to use an expression where the declared variable is used that results in a non-pointer type. And then, there's actually more to it... only certain types of expressions are valid. E.g. this isn't a valid declaration for a pointer, even though it's valid as an expression:
int pX->;
I find that basically anytime somebody tells me that C rules are to "simply [...]", they've inevitably ignored a whole bunch of cases. Your post is no exception.
> So you need to use an expression where the declared variable is used that results in a non-pointer type.
I think you are wrong here, you can't even make something other than what "results in a non-pointer type". Because by definition, you're making the type to the left with the expression. (That could still be a pointer type if it is a typedef'ed type; such as "typedef int * intptr; intptr x;" but I don't think you meant that by "pointer type" [0]).
And your first example is perfectly syntactically valid (other than missing the conditional statement that must follow the if-condition).
And no, "pX->" is not a valid expression. Was that a typo?
And yes, only a subset of expressions are valid. Basically, the expressions that you can form by applying subscripts, (x[3]), dereferences (* x), and function calls. Because, a declaration like "int x + 3;" or even "int x + y;" just doesn't make sense. I don't see a problem there.
Btw. I'm not saying that C as by the current standards is super straightforward and pure. It's definitely not, and C does actually have a lot of historical baggage that makes our lives a little harder. I'm just explaining the underlying unifying principle, which IMHO is actually nice. And honestly it seems you, too, are still confused because there is just a lack of clear explanations about C declarations. That principle should be much more well-known, and almost all problems that novices have with declarations are unnecessary frustration that they wouldn't have if someone would have told them the trick.
[0] By the way, typedef is another thing that seems to be super obscure, while it is extremely simple: It's just a keyword that modifies declarations to declare an alias for that type, instead of a (named) variable of that type.
> The correct way is to write
> int arr[4];
> and to interpret it as "arr[4] is an int"
No. The "interpretation" is "arr is an array of ints", and that is its type.
The complexity of this declaration is evidenced by the fact that you spend another page of text "randomly" adding characters and delimiters around variable declarations to change its type:
int arr[4] // array of ints
int *arr[4] // array of pointers to int
int (*arr)[4] // pointer to array of ints
These are all changes to type. Yet, instead of changing the type declaration, a bunch of stuff is added all around the variable. And you have to come up with ridiculous explanations like "arr[4] is an int which makes arr an array of ints".
That's exactly why most languages said: "if we're changing the type, we're going to reflect this in the type". In a better world the examples above would be something like
int[4] arr; // array of ints
*int[4] arr; // array of pointers to int
*(int[]) arr; // pointer to an array of ints
> The correct way is to write > int arr[4]; > and to interpret it as "arr[4] is an int"
>> No. The "interpretation" is "arr is an array of ints", and that is its type.
I would have been happier if I had found my explanation interpreted in a more generous way. But that's basically what I was saying (and literally what I was saying in another comment).
As to the rest, the advantage of the C approach to type declarations is that there is no type declaration syntax. Just expression syntax. And that it's very terse.
> That's exactly why most languages said: "if we're changing the type, we're going to reflect this in the type". In a better world the examples above would be something like
There's a problem in that your proposed syntax is not even properly parseable. How would a parser recognize that your lines start with types i.e. are variable declarations? For example the example "* (int[]) arr", it would start reading the asterisk and the opening parenthese as an expression, and then suddenly find a type name (int), and could then not throw an error if it was one, but had to start all over again and try to parse the whole thing as a type declaration. That's not exactly nice - good syntax is parseable with a single token of lookahead. That not only makes parser implementations easier, but is also easier to read for humans and leads to better error detection.
Apart from that I think that your examples are about what D does, and this stuff is WORSE in my opinion. While the real problem with C declarations, which is the need to thread a symbol table through the lexer/parser, is still existent in D syntax (I believe), it introduces other problems:
How do you use an array that was declared as "int[5][10] arr"? Using it as "arr[4][9]" is an error: it must be "arr[9][4]". In other words, your approach to type declarations requires the programmer to constantly turn around declarations in his/her mind, which leads to lots of mistakes. It gets even harder when you add pointers / functions, for example "int* [5][10] arr" I believe you must access as "* arr[9][4]", or whatever the D dereference syntax is.
Java can afford to let you declare "int[][] arr = new int[5][10]" and let you access "arr[4][9]", at the cost of cheating. Java can "turn around" the dimensions because it doesn't actually have an "algebraic" type syntax, which it doesn't need because it doesn't have pointers / function pointers so there is no interaction there.
That's one of the reasons why most newer languages have the type to the right of the variable name, and types grow to the left (towards the variable name). For example, "let arr: [5][10]int" you can access as "arr[5][10]" which is easier, but that principled approach to syntactic construction of types also puts requirements on the expression syntax: For example, "let arr: [5][10]* int" would have to be accessed as "* arr[5][10]", which is weird - or the expression syntax must be changed to use a postfix dereference operator.
In short, it's not as easy as you thought, and the C syntax is in fact pretty smart. And from a practical standpoint I prefer the C way very much because it's so much terser and has less punctuation than all the alternatives. The only thing that annoys me is the lexer hack.
http://c-faq.com/decl/spiral.anderson.html Is the rule I learned to understand C declarations and while the rule there is described as simple I think the examples even without argument types are actually fairly complex.
It also seems telling that no recent language has followed C’s example for declaration style, which is more implicit than explicit.
The "spiral rule" doesn't get at the heart of declarations. It's just by some guy that tried to figure it out on his own, and what he discovered was basically not declarations but the precedence rules of C expressions ;-)
> It also seems telling that no recent language has followed C’s example for declaration style, which is more implicit than explicit.
Actually most languages don't let the user do what C declarations let you do. For example, in Java (almost) everything is an object, and you can't just create a triple-indirected pointer. So, these languages can afford a declaration syntax that is less potent.
And then there are other more systems-oriented languages that chose to not copy C declarations. They come with their own gotchas. As examples I will pick D and Rust.
In D, you create a multi-dimension array like this: int[5][10] arr; Leading you to believe that you can use it as arr[4][9]; Wrong. That's an out-of-bounds error. You need to write arr[9][4]. Now, was that totally not confusing? The alternative is to expand these types systematically to the left, i.e. write [10][5]int, and maybe move the type to the right of the variable name, as in "let arr [10][5]int;". Honestly I don't like that either.
I've never really used Rust (either), but its downside, in my opinion, is that it has much more distracting syntax / punctuation.
I would love if there was a uniformly better way to declare things than the C way, but I still think C has the best tradeoffs for my practical work. The next time that I toy with language design I might try to simply go with C declarations, prefixed with a sigil or "let" or something, to remove the need for the lexer hack.
The C approach makes the compiler much more complex, and introduces extra typing in other language constructs. (like parens around if statements) This is why many newer languages do something more like the Rust way. Overall it is simpler for programmer and compiler.
It's unfortunate that the simple (and easy) underlying principles are not well known. See my other comment.
The bigger reason why recent languages have different declaration syntax is to avoid the need to carry a symbol table during parsing, and to avoid the need to parse all files serially instead of independently. Because to recognize a declaration the parser has to know which words correspond to types in the current scope.
I don't think parentheses around if-conditions are related to C variable declarations (if that's what you were saying), and I think it's fair to say that C's syntactical terseness is unmatched.
The parentheses are required to separate the condition from the following conditional statement.
How about `let x = 42` and let type inference do the work? I used to do a lot of C-style variable declarations in other languages but I'm warmed up to Rust's really fast because most of the time I don't need to explicitly name the type.
Whether a type should come before or after the name is a fairly subjective matter. I believe that after is generally superior, especially when the type is optional.
But there’s a very practical reason for requiring the `let` token: it makes parsing very much easier. With `let`, you can keep a LL(1) grammar, because seeing `let` tells you to next parse a pattern, then if there’s a colon a type after that. But if you don’t put something in there, you get a genuinely intractable problem once type grammar is not trivial: sure, `int foo;` is simple and obvious, but what about `A<B, C> d;`? should that be parsed as an expression (respaced, `A < B, C > d;`). Some languages have not resolved this style of parsing ambiguity at all, and figure it out at runtime, based on what else they find (Perl is infamous for this). I think others only kind-of resolve it, by looking at what symbols are present at compile time, to decide what was meant. Others just declare that such ambiguities are parsed one way, and you can rewrite your code (e.g. add parentheses) if you want to mean the other. Still others have resolved it otherwise, by other more subtle syntactic means, so that even if you need arbitrary look-ahead while parsing, there’s not quite any overlap between the two syntaxes (e.g. don’t support commas in this way as a kind of alternative to semicolon within expressions; or use proper matched delimiters like [] or () for generics).
Rust chooses to make parsing simple, which benefits humans as well as machines, reducing cognitive requirements in reading code.
Furthermore, in Rust what follows `let` is not an identifier or identifier list, but rather a pattern. Imagine the following contrived example:
let x = [[0]];
type x = [u32; 1];
let a = 0;
let [a]: x = [1];
That falls over completely if you put the type first: `x [a] = [1];`—does that define a new binding a with value 1, or does it set x[0] to [1]?
And finally, as I mentioned, the type is optional, and not commonly required, so you end up with something like C++’s `auto` keyword, which is basically `let` but spelled worse (and with worse semantics).
The end result is that for Rust specifically, what you desire is quite unsuitable, and what it has works very well—and that its reasons for doing things that way are well worth while considering.
C is the opposite of simple. If you want to create a binding in C, you need to learn multiple, unnecessarily complex rules. Sure, creating a binding to an int is easy:
int foo = 42;
but doing the same thing for pointers to arrays or function pointers is not:
void(foo)(int) = bar; // function pointer
int (foo)[N] = baz; // pointer to array
OTOH in Rust you just need to learn one rule: bindings are created with the grammar "PATTERN [: TYPE]" ([] means the ":TYPE" is optional.
That's the only rule you need to know: (1) it works consistently everywhere in the language (let, match, for, if-let, function arguments, while-let...), (2) it lets you create bindings, and (3) it gives you pattern matching and destructuring for free. For example,
let x: i32 = 42;
let y: fn(i32) = foo;
let z: \const [i32] = bar;
but also:
struct Entity { id: i32, vel: (f32, f32) }
let e: Entity; // given
let Entity { vel .. } = e; // vel points to e.vel
let Entity { vel: (v_x, ..), .. } = e; // v_x points to e.vel.x;
match entity {
Entity { vel: (v_x, ..), .. } if v_x > 0 => {
// do something if the entity.vel.x > 0
}
_ => /* do something else otherwise */
}
etc.
That's IMO the definition of efficiency: one simple rule, that has no exceptions, works everywhere, and lets you do a lot.
What C does of having multiple different incompatible rules, some for doing simple things like "int a = 42;" and some for doing complex things like "void (foo)(int) = bar" is not simple. It's just a pain. It means that professional C programmers need to look up rules for things they don't use as often, which is why Stackoverflow is full with questions about "How do I assign a function pointer to a variable in C?", "How do I pass a function pointer as a function argument in C?", etc. Having to learn multiple rules to do the same thing just sucks, and is one of the main reasons I love using Rust: when I learn something, I learn it only once, and things I learn later just reinforce that I learned it in the right way.
But that's just not true. There is a single principled way for C declarations, and it's as easy as "a binding is a type name followed by an expression and a semicolon".
And while small inconcistencies have been introduced over time, it was entirely consistent when C was conceived. The problem is just that the simple rule how to read declarations is not well-known (I don't understand why). See my other comments.
> "a binding is a type name followed by an expression and a semicolon".
I find this hard to grock. Do you have a link to the actual grammar and production rules that apply to all cases ?
When I see:
int a = 3;
I don't see a "type name followed by an expression and a semicolon", but rather the grammar "TYPE_NAME NAME = EXPR;". However, that's not correct, since TYPE_NAME cannot be any type (e.g. a function pointer won't work). When looking at
void (*foo)(int) = expr;
int (*foo)[N] = expr;
or at how the keywords "struct" and "union" are part of the type name in some contexts, but not others, I see quite different grammar rules.
I've tried to find literature about this, since I hack on a toy C parser every now and then, and those could simplify it, without much luck.
I've never thought of these as "CDECL = EXPR;" or similar, since that does not work either (e.g. a function declaration would be "void foo(int);" but that's not exactly the same as "void (*foo)(int) = expr;").
> Do you have a link to the actual grammar and production rules that apply to all cases ?
A google search returns that Annex A in the ISO standard has something like a grammar. Here is a link to an unoffical version of the standard: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf . However, this is not the right place to look for if you just want to understand the simple principle behind declarations, because the purity of declaration syntax has been considerably diluted in the last decades. So, only look in the standard if you are in good psychological health, and need to implement a production-grade C compiler. Also, note that grammars are overrated and tend to make things more complex than they really are. They are often too theoretical of a construct to be applicable, and that is certainly true for a language like C.
> I don't see a "type name followed by an expression and a semicolon", but rather the grammar "TYPE_NAME NAME = EXPR;".
Let's ignore the optional equal sign + initializer expression, and just focus on declarations without initializers. The syntax is (as I said) "TYPE_NAME EXPR;" where EXPR is an expression that makes use of the newly-declared variable.
> (e.g. a function pointer won't work)
Not sure what you mean by "function pointer", but I'm pretty sure it's not a type name in the way I mean it. Here is how to look at your examples:
void (*foo)(int) = expr;
^^^^^^ (optional) initializer
^^^^^^^^^^^ expression (originally it was (*foo)(x),
but as I said nowadays there are
type specifiers in the list which
is a little inconsistent.)
^^^^ type name
int (*foo)[N] = expr;
^^^^^^ (optional) initializer
^^^^^^^^ expression
^^^ type name
Basically, the first example says "(* foo)(int) is a void", so you can conclude that foo is a pointer to a function that takes an int and returns a void.
The second example says "(* foo)[N] is an int" so you can conclude that foo is a pointer to an array of N ints.
To be pedantic, there is no (specialized) function pointer syntax. The syntax to declare function pointers is just general declaration syntax, which in turn is basically regular expression syntax.
How to declare a function pointer is hard to grok when not being introduced to declaring variables in a principled way. But it makes sense and is not too clunky if you're only declaring a function pointer every now and then.
Exactly! C is internally consistent in that, for example, the star in:
int *a;
is part of the variable declaration, not the type. And the function pointer syntax derives from that. (I know you know, just providing context).
However, just because it's consistent doesn't mean it's easy to remember or use. Just repeating "it's easy!" doesn't make it true. Others have brought up that cdecl exists, which illustrates this pretty well.
It seems you've worked with this syntax for long enough, and it fits your way of thinking well enough, that it's not an issue at all for you! But there's ample evidence of a lot of people struggling with it, which should be sufficient to deem it "not easy".
I can understand why newer languages have moved away from this style of declaration syntax, and moving away has brought merits such as becoming more intuitive to understand to novices, as well as better support for tooling / IDEs, including parsing performance gains.
On the other hand, nothing is quite so easy to read and write for me as the terse C declaration syntax (granted I don't declare a lot of pointers-to-functions-returning-pointers-to-functions).
I just made a similar comment before I read yours, but about C# rather than C, where just like C you can do:
`int x = 42;`
But since int (32-bit integer) is the default integer type, you can also do:
`var x = 42;`
If you want to use another type, for example, ulong (unsigned, 64-bit integer), you can do:
`ulong x = 42;`
Or:
`var x = 42ul;`
C#'s syntax is not only terser, but seems a lot easier to read to me. With rust's syntax, it's not immediately clear what the value is, and what the type is.
It's only clear because you already know what "long" and "int" mean in C#. It's potentially quite confusing for someone coming from a C or C++ background as they mean different things there. i32 and u32 on the other hand are ambiguously 32 bits.
In C++ I've actually had code using a value like `42l` that works on Linux but not on Windows, because the sizes of those types aren't fixed.
A valid point about knowing what the types mean, but even if you don't, it is at least immediately apparent which part is the type, and which part is the value.
I've only dabbled with rust, but I came across this very early on, and was baffled by the syntax. After further dabbling, I still can't see it and immediately know what the value is.
Meh, 123_i32 seems like a big improvement to me, but with the others, I just can't immediately grok it - it's having numeric digits as part of the type name that throws me.
I realise of course that not everyone will feel the same.
C99 has effectively identical types in the standard library (uint8_t ... uint64_t, and ditto for int8_t ... uint64_t). There are very few modern C codebases where I haven't seen these used (personally I use them because it's much easier than remembering what is the minimum guaranteed size of unsigned long). The Rust ones have just slightly more terse names (which it's understandable to dislike, though I personally find the endless _t suffixes in C type names to be a bit annoying as well). And Rust's usize is basically uintptr_t.
Be warned, the "half hour" part is probably a bit like "99 cents" as a price tag. I've already spent more than that, but it is time well spent.
Thanks for writing this!