
Announcing Rust 1.27 - steveklabnik
https://blog.rust-lang.org/2018/06/21/Rust-1.27.html
======
computerphage
SIMD on stable is very exciting news. SIMD unlocks the power of the GPU-esque
parallelism that is already inside your CPU. While compilers do try very hard
to take advantage of this automatically, it’s not always predictable and can
make performance fragile. This is what people are talking about when they
refer to “low level control”. Don’t expect that with Rust 1.27 you’ll need to
understand SIMD to get anything done. Do expect various libraries to just get
faster on stable. For example, BurntSushi, the author of the regex crate and
the ripgrep tool, has already enabled it if you’re on Rust 1.27 Stable. [1]

SIMD is another notch towards never needing to use nightly Rust. What this
really looks like in practice is that less frequently will a user encounter a
fast-mode and a slow-mode for a crate depending on whether they’re using
nightly, with all the unstable features enabled, or not.

[1] [https://github.com/rust-lang/regex/pull/490](https://github.com/rust-
lang/regex/pull/490)

~~~
seanmcdirmid
How useful is SIMD on CPU these days given that most of the touted original
applications (back in the MMX SSE days) have been moved over to GPUs?

~~~
vvanders
Still really useful. We use NEON on Android all the time since there's not a
great API there and spinning up a GPU on a mobile device is not something you
want to do unless you absolutely have to.

~~~
flipgimble
I don't want to worry you, but on a mobile device with a screen the GPU is
always spinning. That is where the pixels come from after all.

I get your point that there are energy efficiency tradeoffs to consider.
However for massively data parallel tasks like image operations, 3d graphics,
machine learning, vision etc, I haven't seen SIMD implementation come close to
GPU compute in terms of efficiency. Perhaps short lived text pressing and file
compression where data access is more random could benefit from SIMD.

~~~
vvanders
Nope, I'm well aware. Built/worked on two android based 3D graphics stacks at
two different companies and I've worked directly with just about every mobile
GPU vendor aside from PowerVR(which is pretty similar to all the other tiled
GPUs that are out there).

Your GPU is actually running much less than you'd expect. Unless pixels are
changing there's a 95% chance that everything on the system is sleeping and
just the display controller is being kept spun up. Battery requirements mean
that DVCS on any mobile chipset is going to be really aggressive.

Either way GPU compute has a pretty large overhead both in terms of latency
and scheduling. Earlier GPUs didn't schedule nice(hello waiting 16.6ms for
your next compute request) and generally the type of places where you use
SIMD(3D transforms, audio processing, etc) are so tightly coupled with other
CPU operations that even the act of moving them to the SIMD registers is
something you need to consider before diving into it. A lot of times waiting
for some work queue to complete(or adding pipeline latency by waiting for next
frame) just isn't feasible.

------
computerphage
The new syntax for dynamically-dispatched traits, “dyn Trait”, is an example
of Rust’s persistently excellent consideration of what should be explicit and
what should be implicit. Python’s mantra of “explicit is better than Implicit”
mostly captures my general feeling, but you can’t make everything explicit.
Back before impl Trait existed, just Box<Trait> seemed very clear. Now that
there’s an important thing to differentiate it from, it seems better to have
dyn Trait and impl Trait both prefixed with keywords.

It also makes for _much_ easier googling. ;)

~~~
kbd
> Rust’s trait object syntax is one that we ultimately regret.

Would someone please explain the problem (and the solution) for someone who
doesn't know Rust yet?

~~~
algesten
A trait is like an interface. A struct implementing a trait means it takes on
the methods of that interface (but there's more to it, because a trait can
have default implementations of of the trait methods).

If you want to express something like "this is a variable that holds something
that fulfils this trait", without knowing the _actual_ type it is, that
variable effectively has an unknown runtime size. std::io::Read is an
interface for reading bytes of some source, like a file or a socket.

This matters because we're talking about a stack frame. So the size needs to
be known at compile time.

    
    
        let a: u64 = 42; // ok, because well known size.
    
        let b: Read = ...; // illegal, because unknown size.
    

A "trait object" places the object on the heap and has a pointer in its place.

    
    
        let b: Box<Read> = ... // legal, because pointer is a known size
    

However it's a bit more complicated, because this syntax allows for dynamic
dispatch at runtime using a vtable. So there's a quite big difference between
Box<u64> (a 64 bit unsigned integer on the heap) vs Box<Read> (a runtime
dispatched lookup via a vtable).

This difference is not obvious at a glance though. Hence the new syntax:
Box<dyn Read>.

(I think I got that right)

~~~
steveklabnik
You did, except that it’s not one pointer, it’s two. This is one way that rust
is different than C++; the vtable isn’t stored with the data, but separately,
with the “trait object” itself being two pointers to the two things.

~~~
tzahola
>This is one way that rust is different than C++; the vtable isn’t stored with
the data

Which “data” is the vtable stored with in C++? An object contains only a
pointer to its vtable, not the vtable itself...

~~~
stormbrew
A C++ object with virtual methods and some members looks like this, if you put
it in terms of C:

    
    
      struct ClassVTable {
       void (*firstMethod)();
       void (*secondMethod)();
      }
    
      struct Class {
       struct ClassVTable *virt;
       int firstMember;
       int secondMember;
      };
    

and a pointer to it looks like:

    
    
      struct Class *objectRef;
    

A language that uses fat pointers, like rust, has this separated completely:

    
    
      struct Class {
       int firstMember;
       int secondMember;
      };
    
      struct FatPointerToClass {
       struct ClassVTable *virt;
       struct Class *data;
      }
    
      FatPointerToClass objectRef; // note lack of pointer, it'd be stored on the stack directly for eg.
    

This means a much simpler object layout in exchange for passing around larger
pointers, and it's a good fit for trait-based typing since it doesn't require
you to know all possible interface subtypes of the object in order to describe
or use its layout.

(please note that I'm using C struct to be precise about the concept, and this
should not be taken as a perfectly verbatim description of the actual object
layouts in memory)

~~~
Coding_Cat
Oh, I didn't know that: this is also great for serializing code where you
might have an array of `Class`, it would still be a POD-type to use C++
language.

------
eberkund
Yes!! I have been waiting for the new time helpers to arrive in stable. It may
be a small thing but is so much nicer and makes Rust feel more like a higher
level language when I can do `some_time.subsec_millis()` rather than
`some_time.subsec_nanos() / 1_000_000`.

~~~
oddity
I don't understand your comment. It feels like a higher level language because
you can use a different unit for measuring time?

~~~
pimeys
Having written enough of boilerplate to do millisecond timestamps, or
including a quite big chrono library for a nicer interface for a few time
operations, these additions are quite luxurious.

------
kibwen
SIMD! For anyone curious about the performance impact of this feature and a
real-world implementation, check out the PR adding support to the regex
library: [https://github.com/rust-
lang/regex/pull/456](https://github.com/rust-lang/regex/pull/456)

~~~
burntsushi
And just because this kind of thing is fun, if you use the right kind of
pattern on a big enough file, SIMD can be quite noticeable:

    
    
        $ rg-with-simd --version
        ripgrep 0.8.1 (rev 223d7d9846)
        +SIMD +AVX
        $ rg-without-simd --version
        ripgrep 0.8.1
        -SIMD -AVX
        
        $ time cat OpenSubtitles2016.raw.en > /dev/null
        real    0m1.280s
        user    0m0.020s
        sys     0m1.257s
        $ time wc -l OpenSubtitles2016.raw.en
        336602465 OpenSubtitles2016.raw.en
        real    0m4.303s
        user    0m3.132s
        sys     0m1.167s
        $ time rg-with-simd -c 'Sherlock Holmes|John Watson|Professor Moriarty' OpenSubtitles2016.raw.en
        6033
        real    0m2.099s
        user    0m1.750s
        sys     0m0.347s
        $ time rg-without-simd -c 'Sherlock Holmes|John Watson|Professor Moriarty' OpenSubtitles2016.raw.en
        6033
        real    0m4.128s
        user    0m3.781s
        sys     0m0.343s
        $ time rg-with-simd -c 'Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty' OpenSubtitles2016.raw.en
        6731
        real    0m1.989s
        user    0m1.621s
        sys     0m0.366s
        $ time rg-without-simd -c 'Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty' OpenSubtitles2016.raw.en
        6731
        real    0m18.417s
        user    0m18.000s
        sys     0m0.403s
    
    

Looks like `cat` is still faster, so there's some room for improvement. ;-)
With a single pattern, we're almost there:

    
    
        $ time rg -c 'Sherlock Holmes' OpenSubtitles2016.raw.en
        5107
        real    0m1.333s
        user    0m0.974s
        sys     0m0.357s
    

This one is mostly thanks to glibc's memchr implementation (which uses SIMD of
course), and the regex crate's frequency based searcher.

Of course, I'm presenting best cases here. Plenty of inputs can make ripgrep
run quite a bit more slowly than what's shown here!

The crazy thing is that we're still only barely scratching the surface. Check
out Intel's Hyperscan project for some truly next level SIMD use in regex
searching!

~~~
vvanders
Uf da, that's still some good speedups.

BTW as a happy daily user of rg thanks for all the work you put into it,
definitely shows.

------
harry8
Anyone happen to know when inline asm will hit stable?

~~~
roblabla
There are a _lot_ of inline-asm related bugs in the compiler. I'm hitting them
almost daily (I'm writing a kernel, so a fair bit of asm is going on here).
You can see them in the bug tracker[0].

Stabilizing inline asm in its current form would be a mistake. It's simply not
ready.

[0]: [https://github.com/rust-
lang/rust/issues?q=is%3Aopen+is%3Ais...](https://github.com/rust-
lang/rust/issues?q=is%3Aopen+is%3Aissue+label%3AA-inline-assembly)

~~~
harry8
"not ready, bugs." Is a perfectly good reason to hold it back. Thanks.

------
rl3
Any idea what SIMD in Rust via WebAssembly will look like?

~~~
steveklabnik
Wasm doesn’t support SIMD yet, so no way to tell.

~~~
rl3
Thanks, wasn't able to easily discern how far along the browser SIMD wasm
implementations were.

Do you see a future where wasm has its own std::arch module?

~~~
steveklabnik
Yeah, SIMD is on the wasm roadmap; we’ll see what form that takes for exact
details, but that’s what I’d expect.

