
A Taste of Rust - cp9
http://www.evanmiller.org/a-taste-of-rust.html
======
ufo
The blog author mentions at one point that Algebraic Data Types cannot
determine pattern exaustiveness for things like

    
    
        match 100 {
            y if y > 0 => “positive”,
            y if y == 0 => “zero”,
            y if y < 0 => “negative”,
        };
    

and wonders if there is some kind of "algebraic data values" to notice that
the previous case is exaustive.

In Haskell they solve this particular problem with an "Ordering" ADT:

    
    
        data Ordering = LT | EQ | GT 
    
        case (compare 0 100) of
            LT -> "Positive"
            EQ -> "Equal"
            GT -> "Negative"
    

Using a richer datatype instead of boolean predicates solves most problems.
For some more advanced things you need extensions to the type systems. For
example, to be able to say "this function, when applied to non-empty lists
returns non-empty lists", you need Generalized Algebraic Data Types (gadts).

~~~
pcwalton
Same in Rust: [http://doc.rust-
lang.org/std/cmp/enum.Ordering.html](http://doc.rust-
lang.org/std/cmp/enum.Ordering.html)

~~~
Veedrac

        match 100.cmp(&0) {
            Ordering::Greater => "positive",
            Ordering::Equal   => "zero",
            Ordering::Less    => "negative",
        }

------
pcwalton
> I’m not sure if the ownership rule is actually helpful in single-threaded
> contexts, but it at least makes sense in light of Rust’s green-threaded
> heritage.

It's necessary for prevention of use-after-free. Here's a simple example
(which can be translated into the equivalent C++):

    
    
        let mut vector = vec![
            "1".to_owned(),
            "2".to_owned(),
            "3".to_owned(),
        ];
        for element in vector.iter() {
            vector.clear();
            println!("{}", element);
        }

------
kibwen
The author's observed speed discrepancy between iterators and while loops
makes me think that they forgot to compile with optimizations, as the
difference between those programs is almost negligible on my end.

~~~
Veedrac
To add to this, I found that when using i64 both generated exactly the same
code.

------
Veedrac
> but five function calls (from_ptr, from_utf8, to_bytes, unwrap, to_string)
> just to convert a C string to a Rust string seems like an excessive amount
> of ceremony

Well, only the first four are really needed. The last one turns it into a
local, growable copy. The rest should probably get a convenience wrapper,
though.

> the libxls API has an array-of-structs in a few places [...]; because Rust
> doesn’t believe in pointer arithmetic, I found myself manually writing the
> pointer arithmetic logic

Forgive me if I'm being stupid, but can't you just use slice.from_raw_parts?

[https://doc.rust-lang.org/std/slice/fn.from_raw_parts.html](https://doc.rust-
lang.org/std/slice/fn.from_raw_parts.html)

> writing things in a pseudo-functional, lots-of-chained-method-calls style,
> for which Rust is not all that well-designed

If this is about speed, then see the other comments. But if this is for
another reason, what reason is it? Personally this style of programming suits
Rust beautifully.

------
noelwelsh
From reading the article I came to the conclusion the author is rather
inexperienced with modern functional languages. Their claims about tension in
the design of Rust between functional and imperative features for me mostly
come down to them not understanding natural consequences of the modern
statically typed paradigm, or Rust's memory model.

For instance, take the discussion about if expressions. They claim an if
expression with a single arm returning unit is unintuitive. Firstly, we should
always recognise that claims about "intuitive" behaviour solely depend on
one's background. This behaviour is completely intuitive to me, coming from a
Racket background (which behaves the same way). It's also a natural
consequence of having to give a type to an if expression with one arm. You
have two choices: the else case either returns the bottom type (which
operationally means it raises some kind of error), or it returns no
interesting value -- which is exactly what unit is. Since a single arm if can
only be used for effect, unit is the best choice here.

Rust is certainly a very different language to OO-ish imperative languages
that most people are familiar with. I think the author made the common mistake
of expecting Rust to behave like one of these languages, and then blaming the
discrepancies between their mental model and actual behaviour on the language.

~~~
GolDDranks
On the other hand, an alternative design would be that the single-arm if
wrapped the return type R to "Some(R)", and None for the "phantom else" arm. I
haven't considered the ramifications of this, but I'd expect it to work well
if the Option<R> type is defined flexibly and compose-ably enough.

------
steveklabnik
I am utterly thankful for new experience reports on Rust, especially for ones
this well-written. Generally speaking, inaccuracies in such things are our
fault, not the writers', due to a lack of documentation and or good examples.

With that being said, a few notes:

> It runs about five times slower than the equivalent program

I'd be interested in hearing more about how these were benchmarked. On my
machine, they both run in roughly the same time, with a degree of variance
that makes them roughly equivalent. Some runs, the iterator version is faster.

It's common to forget to turn on optimizations, which _seriously_ impact
Rust's runtimes, LLVM can do wonders here. Generally speaking, if iterators
are slower than a loop, that's a bug.

> Rust does not have tail-call optimization, or any facilities for marking
> functions as pure, so the compiler can’t do the sort of functional
> optimization that Haskell programmers have come to expect out of Scotland.

LLVM will sometimes turn on TCO, but messing with stack frames in a systems
language is generally a no-no. We've reserved the 'become' keyword for the
purpose of explicitly opting into TCO in the future, but we haven't been able
to implement it because historically, LLVM had issues on some platforms. In
the time since, it's gotten better, and the feature really just needs design
to work.

Purity isn't as big of a deal in Rust as it is in other languages. We used to
have it, but it wasn't very useful.

> But assignment in Rust is not a totally trivial topic.

Move semantics can be strange from a not-systems background, but they're
surprisingly important. We used to differ here, we required two operators for
move vs copy, but that wasn't very good, and we used to infer Copy, but that
ended up with surprising errors at a distance. Opting into copy semantics ends
up the best option.

> how that could ever be more useful than returning the newly-assigned rvalue.

Returning the rvalue ends up in a universe of tricky errors; not returning the
rvalue here ends up being nicer. Furthermore, given something like "let (x, y)
= (1, 2)", what is that new rvalue? it's not as clear.

> I’ve always thought it should be up to the caller to say which functions
> they’d like inlined,

This is, in fact, the default. You can use the attributes to inform the
optimizer of your wishes, if you want more control.

> It’s a perfectly valid code,

In this case it is, but generally speaking, aliasing &muts leads to problems
like iterator invalidation, even in a single-threaded context.

> but the online documentation only lists the specific types at their five-
> layers-deep locations.

We have a bug open for this. Turns out, relevant search results is a Hard
Problem, in a sense, but also the kind of papercut you can clean up after the
language has stable semantics. Lots of work to do in this area, of course.

> Rust won’t read C header files, so you have to manually declare each
> function you want

The bindgen tool can help here.

> My initial belief was that a function that does something unsafe must,
> itself, be unsafe

This is true for unsafe functions, but not unsafe blocks. If unsafe were truly
infectious in this way, all Rust code would be unsafe, and so it wouldn't be a
useful feature. Unsafe blocks are intended to be safe to use, you're just
verifying the invariants manually, rather than letting the compiler do it.

> but until a few days ago, Cargo didn’t understand linker flags,

This is not actually true, see [http://doc.crates.io/build-
script.html](http://doc.crates.io/build-script.html) for more.

> the designers got rid of it (@T) in the interest of simplifying the language

This is sort of true, and sort of not. @T and ~T were removed to simplify the
language, we didn't want language-support for these two types. @T's
replacement type, Gc<T>, was deemed not actually useful in practice, and so
was removed, like all non-useful features should be.

In the future, we may still end up with a garbage collected type, but Gc<T>
was not it.

> Rust’s memory is essentially reference-counted at compile-time, rather than
> run-time, with a constraint that the refcount cannot exceed 1.

This is not strictly true, though it's a pretty decent starting point. You may
have either 1 -> N references, OR 1 mutable reference at a given time,
strictly speaking, at the language level. Library types which use `unsafe`
internally can provide more complex structures that give you more complex
options.

That's at least my initial thoughts. Once again, these kinds of reports are
invaluable to us, as it helps us know how we can help people understand Rust
better.

~~~
masklinn
> This is, in fact, the default. You can use the attributes to inform the
> optimizer of your wishes, if you want more control.

Isn't the default that the optimiser will do whatever the hell it wants, and
the attributes simply skew the optimiser's factors in one direction or
another? I think what the author means here is that the caller function should
be able to define whether the callee should be inlined or not.

> The bindgen tool can help here.

Would be really useful to have an implicit bindgen thing. Maybe a compiler
plugin using e.g. Clang's C parser? That way there's no need to maintain the
binding. I'd say I'd like a header generator more than a reader though.

~~~
steveklabnik
Maybe I misunderstood what the parent wants, but you're right that the
optimizer can do as it pleases, and you can use annotations to help it make
the right decision.

An 'implicit' tool may in fact be cool. It's not perfect, and so needs
tweaking in many cases, so the current state is pretty good, but for easier
cases and/or when you don't care, I can see such a thing being useful.

~~~
masklinn
> Maybe I misunderstood what the parent wants, but you're right that the
> optimizer can do as it pleases, and you can use annotations to help it make
> the right decision.

I understand TFAA's request to be a callsite annotation, which currently does
not exist, e.g.

    
    
        inline foo()
    

to force inlining or

    
    
        noinline bar()
    

to prevent it, probably with the first one erroring out if the call is not
inlinable.

~~~
Jweb_Guru
I believe #[inline(always)] and #[inline(never)] both work like this.

~~~
kbenson
According to steveklabnik here[1], those are on the definition, not the
callsite, which is the distinction here. Although from other info here, it
sounds like having it on the definition is a prerequisite in some cases if you
wanted to somehow specify it for the callsite, as it needs to be serialized in
the crate metadata to be inline-able, and that's controlled somewhat by
whether it was defined as inline capable.

1:
[https://news.ycombinator.com/item?id=9548248](https://news.ycombinator.com/item?id=9548248)

------
chaoky
the article brings up a good point with 'systems language'. What is a systems
language anyways? I guess C is, but whats the definition? Is common lisp a
'systems language'? After all, a good number of operating systems have been
written in common lisp, but is it too freewheeling and high level to be
considered a 'systems language'? Is java a systems language?

~~~
Retra
Implicitly, the most important feature of a 'systems language' is
predictability: how easily can you predict how much memory or time a program
will use when running?

~~~
dllthomas
If we take that seriously, I wonder if we should look at something that isn't
turing complete.

~~~
Retra
That's a good thought, but you'd probably need to know the problem domain in
advance to do that. The reason we use Turing complete languages is because you
can't predict what solutions need to be expressed, so you err on the side of
allowing them all.

~~~
dllthomas
Except we don't allow them all; we allow that subset that will run within the
memory we make available to the process, in the time before we get fed up
waiting and kill the process... There is likely valuable design space between
that and current (or at least common) explicitly non-TC languages.

~~~
Retra
You can write a program that doesn't terminate, consumes ever increasing
amounts of memory, and can perform arbitrary arithmetical calculations in just
about any language. Any language that can do that is Turing complete.

About the only non-Turing complete languages in wide use today are basic
dialects of SQL, Regex, and some layout engines. Every non-Turing complete
language we've invented has a very specific domain of application, and none of
them are suitable for writing low-level systems.

~~~
dllthomas
I think you missed my point, which I admit is rather subtle. I mean that there
are already restrictions which we place on programs in practice; if we make
those explicitly part of the language rather than implicitly part of the
environment, the resulting language is not Turing complete but might still be
comparably useful. The trick is in doing this in a way that _practically_
(instead of just theoretically) improves our ability to reason about the
programs, and in building it into a language that's actually usable.

Edited to add: Note that I'm _not_ saying this is a trivial engineering
exercise.

------
saosebastiao
I too have found the assignment semantics to be a little baffling, and the
errors to be ungoogleable (which may have changed in the last 4 months since I
used it last). A pragma determining semantics seems quite brittle as well. I
wish that there were some sort of distinguishing operators for copy vs move,
much like how F# has different operators for initial assignment vs mutation.

~~~
steveklabnik
We used to have two operators, but it wasn't actually helpful.

The only difference between a move and a copy in Rust is that you can use a
copy value afterward.

Why does this matter? Okay, imagine this code:

    
    
        let v = vec![1, 2, 3];
        let v2 = v;
    

Since Vec does not implement Copy, it's a move. Moves memcpy the value on the
stack, which, in a Vec's case, is a triple: pointer to the data, a length, and
the capacity. You haven't actually copied the data on the heap, just the three
pointers on the stack. If we let you use v after the assignment, there'd be
two pointers to the same data. No bueno.

Compare that with this code:

    
    
        let i = 5;
        let i2 = i;
    

In this case, 5 is an i32, which implements Copy. The same thing happens: a
memcpy. But now, the entire data was copied. There's nothing on the heap. In
this case, it's totally okay to keep using i.

Does that make sense?

~~~
saosebastiao
It makes sense...but only because you explained that Vec doesn't implement
copy and i32 does. Without scouring the source/docs, or possibly having a
mythical IDE that can discover this for me, I have to rely on error messages.

I just checked, and the error messages are definitely better since January,
but I think it would be helpful to have an extra operator just for a visual
understanding of the code, rather than enhancing its procedural semantics.

~~~
pcwalton
We tried it once, and there was way too much noise. "let move x = move f(move
y)" was all over the place. (Yes, you do need it on pattern bindings too if
you want to be consistent.)

~~~
masklinn
Since copy is the rarer of the two (I'd assume most developers are not going
to opt-in copy unless they need to, even if it's technically feasible _not_
marking a struct as Copy is more future-proof as removing Copy is more likely
to break code than adding it) the operator could be `copy`, with _everything_
(including copy-able objects) moving by default, making the language much more
uniform.

Of course, there would be no point to Copy then, the language could just lost
it and require explicit cloning.

That would certainly be less convenient for basic numeric types.

~~~
dbaupp
I think copy may be rarer in terms of the number of types, but I suspect it's
not rarer in terms of actual use, since shared references (&) and integers are
used a _lot_.

~~~
masklinn
Good point wrt reference, I always have trouble thinking of them as "real"
types to which the rest of the language applies as usual, thanks for the
reminder.

------
imron
> But then, this won’t compile: let x = if something > 0 { 2 };

Which makes perfect sense. After all, what should the value of x be if
something is <= 0?

~~~
erkl
Now, I'm not sure this is useful at all, but I think it would make sense for
the value of an if-expression to be Option<T>.

~~~
Jweb_Guru
That might actually make sense... and I have to say that it would be useful,
too. Quite often I find myself doing if condition { Some(foo) } else { None };
being able to just write if condition { foo } could be neat syntactic sugar
for that (though it might also be confusing, since in Rust generally types
don't form magically like that). The solution I'd come up with was just to
give booleans a .then method (maybe they already have one that i missed).

------
eridius
It's always interesting to see the experiences of new people to Rust. There's
a few curious misconceptions in here that I hadn't seen before:

> _Iterators are a great and reusable way to encapsulate your blah-blah-blah-
> blah-blah, but of course they’re impossible to optimize._

I'm very curious to know where you got this idea from. Iterators actually
optimize very well, in most cases being indistinguishable from a manual
imperative for-loop. If you forget to turn on compiler optimizations then
you'll see a significant performance difference, but that's true for a lot of
different things you might want to do. Compiler optimizations are important
whenever you're measuring performance.

> _pragma_

It's not a pragma. It uses the same # sigil that C compilers use to introduce
pragmas, but in Rust, it actually denotes an attribute which modifies the next
item (or in the enclosing item with the #! syntax). This is used for a number
of things. As you've seen, it can be used to automatically derive
implementations of traits, and it can be used to mark a function as being a
candidate for inlining, among other things.

> _Rust rather inelegantly overloads the assignment operator to mean either
> binding or copying._

This is indeed a very curious misconception. Assignment actually isn't
overloaded like that at all. In your code example, when you say

    
    
      let y = x;
    

You're not binding y to the value of x, you're actually _moving_ the value of
x into y. There is no implicit bind-by-reference in Rust. If you want a
reference, you need to use the & sigil.

The confusion here stems from the fact that some values can be _copied_ and
some values cannot. When you move a value in Rust, if the value can be copied
(if it conforms to the Copy trait), then the original value is still usable
after the move. This means you can say

    
    
      let x = 1;
      let y = x;
      let z = x;
    

The line `let y = x` moves the value of x into y, but since the value is
copyable, it's really just moving a copy of the value. In this context,
"copyable" basically means memcpy() can be used to produce a valid copy.
There's a different trait called Clone which has an explicit .clone() method
that is used for values that require additional work beyond memcpy() to copy.

In your code example, your struct is not Copy, so moving its value makes the
original value _inaccessible_. Basically, it's considered garbage memory and
cannot be read from again. This is why your code

    
    
      let x = MyValue::Digit(10);
      let y = x;
      let z = x;
    

throws an error. It isn't because `y` and `z` would be referring to the same
value, it's because after the line `let y = x;` the original value `x` is
garbage.

So really, any time you do assignment like that (or any time you pass a value
as an argument to a function) without taking an explicit reference (using &),
you're doing a move. Values that confirm to Copy will move a copy of the
value, and all other values will leave the original value inaccessible. This
should actually be familiar to people coming from C++, where values that
aren't Copy are basically like std::unique_ptr, except that instead of leaving
the original value with a known-"zero" state, the compiler prevents you from
accessing the original value at all.

~~~
eridius
Further commentary:

> _When Rust was originally announced, the team had ambitions to pursue the
> multi-core MacGuffin with a green-threaded actor model, but they found out
> that it’s very hard to do green threads with native code compilation._

It's easy to do if you're willing to define certain synchronization points
(such as I/O). IIRC, Rust abandoned Green threads because of a few reasons:

1\. It doesn't play well with FFI. This is particularly true with segmented
stacks, but Rust abandoned those before it abandoned green threading. With
regular OS-provided stacks, FFI is a problem because the FFI call can't yield
to the green scheduler. 2\. It has an unavoidable performance problem even for
code that doesn't use it. The existence of green threading means the entire
runtime needs to be abstracted over a threading and I/O interface. This is a
lot of complexity, and requires the use of dynamic dispatch for a lot of
things that would otherwise be static dispatch. Rust originally switched to
native as the default threading model but kept green threading as an option
for a while, but this violates the concept of you-don't-pay-for-what-you-
don't-use, especially since most people weren't even using green threading
anymore. 3\. It also made implementing some things very awkward. For example,
because of the abstract notion of a runtime, you couldn't really implement
selection over multiple file descriptors very well. Things like that must be
handled by the runtime library or not at all, because for native threading
you'd want to use select() (or kqueue or epoll) but for green threading the
green threading library needs to deal with that. In Go, the standard solution
there was to spawn multiple green threads that each block on one fd and
communicate using channels (because the runtime can then collect all the
blocked fds into a single select/kqueue/epoll), but that's unwanted overhead,
and is especially not very good when you're not using the green threading. 4\.
Green threads encourage the use of lots of short-lived threads that have very
shallow stacks. Rust abandoned segmented stacks with the understanding that
the OS-provided virtual memory support was good enough that it didn't matter
if every thread got an 8MB stack as it would allocate pages on-demand. But a
4k page for every green thread can be a lot of overhead if you're using
thousands of green threads. You can of course explicitly request a smaller
stack, but that's tricky to get right (if you request too little you explode).
There was a lot of thought given to trying to statically determine how much
stack space a given function call required (which is not always possible, and
is especially tricky if there's any recursion) to deal with this, but that's a
very hard problem.

Ultimately, it was determined that green threads simply weren't worth all the
costs. The OS is pretty good about dealing with lots of threads, and arguably
it's actually better than the green threading library is about scheduling them
all. Green threading makes some sense if you want to spawn 10,000 simultaneous
threads, but in nearly all real-world workloads it's just not really worth it.

> __Rust has taken up a different challenge: eliminating crashes from
> otherwise traditional, “boring” multi-threaded programs. In the sense that
> Rust has abandoned its original vision in favor of pursuing more modest and
> achievable goals*

I find this an odd thing to say. Statically eliminating crashes (and data
races) without a garbage collector or any performance penalty is actually
_significantly more interesting_ than actors. Rust did not decide to pursue
the more modest "boring" goal, it redoubled its focus on pursuing the very
hard (but achievable) and very important goal of solving the problem of
memory/data safety in multithreaded programs. This aspect of Rust is what most
excited me when I first got involved 2 years ago and what still excites me to
this day and makes me wish I could use Rust in my day job.

> _For the record, the Nim language manages to get this right, and the Rust
> folks might look to it for inspiration._

I've heard of Nim before but haven't actually looked at it. Does it attempt to
solve memory/data race safety at compile-time like Rust does?

> _It is also worth noting that processing non-overlapping slices in parallel
> is destined to come into mortal conflict with The Iterator, which is by its
> nature sequential_

I don't think this is true. You can build an Iterator that yields non-
overlapping mutable slices, and then you can process them in parallel. The
iterator is sequential, yes, but you can call it multiple times and hand the
resulting references to your parallel computations (as opposed to handing the
iterator itself over, which doesn't work because you'd be sharing mutable
state without a lock and Rust does not allow you to write that code).

Although it's worth noting that at the moment you can't do fork-join
concurrency (so you can't share mutable slices with parallel computations like
that unless you cheat with `unsafe`). There's `thread::scoped()` but it's
marked as unstable (so you can't use it from Beta or from tomorrow's Stable
1.0) because it has a serious safety issue, and the proposed replacements are
still being worked on.

~~~
felixgallo
I find a lot of this sounds like justification rather than rationale.
Certainly, OS level threads are objectively terrible, as is the OS level
scheduler. A chance to have a compiled, memory-safe erlang has been wasted.

~~~
Jweb_Guru
OS level threads are absolutely not objectively terrible and what eridius
described _was_ the rationale. This is the post that killed off green threads:
[https://mail.mozilla.org/pipermail/rust-
dev/2013-November/00...](https://mail.mozilla.org/pipermail/rust-
dev/2013-November/006550.html)

I find it hard to believe you can read that and believe that Rust made this
decision for anything but sound technical reasons.

~~~
felixgallo
Eppur si muove.

~~~
Jweb_Guru
Which part of the thread I linked did you disagree with? Or are you just being
contrarian for its own sake?

------
Manishearth
This is a great post! :D

Some nits on various errors or misrepresentations:

> pragma

You have some complaints about attributes (what you call pragmas) -- I suspect
that many of them are due to you looking at them as if they were C++
preprocessor directives or pragmas. They aren't, even if the syntax may be
reminiscent :)

In some cases they're like decorators in python (but much more powerful), in
others, Java annotations. They're a different concept.

> Sadly, Rust is not a target for my favorite parser generator, and the lexers
> in Servo don’t look much better than C-style state machines, with lots of
> matching (switching) on character literals.

[https://github.com/servo/html5ever](https://github.com/servo/html5ever) is
the largest parsing library we use in Servo, and there are a bunch of Rust
tricks done there. HTML parsing is _hard_ (the spec is insanely complex), and
this library does it well with much less code.

> it would be nice if the Rust compiler got rid of the split_at_mut secret
> password and could reason sanely about slice literals and array indexes.

`split_at_mut` is just a library function that uses `unsafe` internally, much
like many other core data structures and methods. It has nothing to do with
the compiler.

> It is also worth noting that processing non-overlapping slices in parallel
> is destined to come into mortal conflict with The Iterator, which is by its
> nature sequential.

I believe the thread::scoped API can be used to process nonoverlapping slices
in parallel with a regular iterator. Not sure, but I recall seeing an example
where this was done.

> ...seems like an excessive amount of ceremony, at least for a language that
> keeps using the word “systems” on its website.

I don't see what verbosity has to do with systems programming. Most of those
are zero or low cost (so verbosity doesn't correspond to more steps in the
generated assembly); I believe there's a utf8 check at one point and that's
it. Rust is verbose wherever errors or footguns are possible.

> Rust won’t read C header files, so you have to manually declare each
> function you want to call by wrapping it in an extern block, like this:

[https://github.com/crabtw/rust-bindgen](https://github.com/crabtw/rust-
bindgen)

> but until a few days ago, Cargo didn’t understand linker flags

You can specify -L and -l flags in the .cargo/config file under the rust-flags
key (or output the same in a build script); this has been allowed for a while
now. More complex linker args are now possible with the cargo rustc command.

~~~
general_failure
> [https://github.com/servo/html5ever](https://github.com/servo/html5ever) is
> the largest parsing library we use in Servo, and there are a bunch of Rust
> tricks done there. HTML parsing is hard (the spec is insanely complex), and
> this library does it well with much less code.

This might simply be because of a) Servo has no legacy b) Servo developers are
awesome c) Servo is not complete yet

A large part of the complexity of HTML is simply quirks and compatibility.
Which Servo does not handle yet.

Don't mistake all this as language wins...

~~~
dragonwriter
> A large part of the complexity of HTML is simply quirks and compatibility.

Actually, the HTML spec _already addresses those_ (that's why it is incredibly
complex; unlike HMTL4 and previous, the WHATWG HTML Living Spec -- and
possibly the W3C HTML5 spec, though I can never keep straight what that was in
the WHATWG spec at the time W3C kept and what it didn't -- contains a complete
specification of how compliant user agents should parse anything purporting to
be HTML even if it actually isn't valid HTML (IIRC, a compliant parser _may_
throw an error on invalid HTML, but if it is tolerant of errors, the spec
specifies _how_ it is to be tolerant, specifically to avoid the pre-HTML5
issue of different browsers parsing the same thing different ways. Modern
browser either have converged or are converging -- some might still be lagging
-- on that consistent model.)

------
Dewie3
> The pattern list looks pretty exhaustive to me, but Rust wouldn’t know it.
> I’m that sure someone who is versed in type theory will send me an email
> explain how what I want is impossible unless P=NP, or something like that,
> but all I’m saying is, it’d be a nice feature to have.

I Am Not A Type Theorist, but that looks like it could be very hard for a
compiler to deduce in general. You might have to bust out a proof yourself for
things like this. In which case you might not feel it is worth it for a "nice
to have".

~~~
ufo
It boils down to Rice's Theorem, aka the Halting Problem. Boolean conditionals
can contain arbitrary computations so its undecidable how they will behave at
runtime.

~~~
Dewie3
Right. Hence the proof obligation. :)

