Hacker News

burntsushi · on Dec 27, 2016

None of the unsafe code that Steve linked to had anything to do with searching a file line by line. It has to do with other parts of ripgrep, like determining whether a tty is available or communicating with a Windows console to do coloring. (Hell, ripgrep doesn't even require memory maps, but they are faster in some cases.)

Your benchmark proposal is interesting on the surface, but if done correctly, its speed will come down to highly optimized SIMD routines. For example, in Go, the code to search for a single byte on amd64 is written in Assembly (as it is in glibc too). This means it's probably not a good indicator of language performance overall.

lossolo · on Dec 28, 2016

The thing is I don't really care if Go implementation is highly optimized for amd64 like memchr is in C which is also written in assembler and optimized for different platforms. What I care is that simple code written by me is faster without going into C/unsafe code myself. So it's correct, fast, simple and I do not pay with my time to figure out how to make it as fast in Rust. This is the point I am making. Of course this is only one example, but still a proof that what OP wrote is not valid in all cases.

burntsushi · on Dec 28, 2016

ripgrep is your proof. Go read the blog post Steve linked. If you still don't believe it, read the code. There's not that much of it.

Then compare it with similar tools written in Go, like sift and the platinum searcher.

The problem is, searching a file quickly is not as simple as you want to believe. If you show me a naive line by line approah, I'll show you an approach that is much faster but more complex. Top speed requires work, sometimes algorithmic work, regardless of the language you choose.

lossolo · on Dec 28, 2016

> If you show me a naive line by line approah, I'll show you an approach that is much faster but more complex.

Of course, I did it myself many times. But this is NOT the point, I've already wrote it. The point is that I wrote naive approach in both languages and it's a lot faster in Go. Which is a reply to what OP wrote (don't forget where this discussion started). In this case this is the fact and I don't see any reason to fight with facts if we get bias out of the equation. In other cases? I don't know. What I would expect is that naive searching for one string in haystack to be faster than Go in a language that is performance/zero-cost abstractions oriented like Rust but its false in this case. But it doesn't mean it's false in every case. And to be honest writing that "but they have faster implementation in assembler" is not an excuse at all, you can also write your own, especially if it works so well for those languages that have custom asm for specific platforms. In the end average Joe will not care if it's hand written assembler, he will care that his naive solution using standard library without any magic is just faster.

burntsushi · on Dec 28, 2016

> The point is that I wrote naive approach in both languages and it's a lot faster in Go.

I tried your challenge, and the first data point I uncovered contradicts this. Here is the source code of both programs: https://gist.github.com/anonymous/f01fc324ba8cccd690551caa43... --- The Rust program doesn't use unsafe, doesn't explicitly use C code, is shorter than the Go program, faster in terms of CPU time and uses less memory. I ran the following:

    $ /usr/bin/time -v ./lossolo-go /tmp/OpenSubtitles2016.raw.sample.en the
    $ /usr/bin/time -v ./target/release/lossolo-rust /tmp/OpenSubtitles2016.raw.sample.en the

Both runs report 6,123,710 matching lines (out of 32,722,372 total lines). The corpus is ~1GB and can be downloaded here (266 MB compressed): http://burntsushi.net/stuff/OpenSubtitles2016.raw.sample.en.... --- My /tmp is a ramdisk, so the file is in cache and I'm therefore not benchmarking disk reads. My CPU is an Intel i7-6900K.

The Go program takes ~6.5 seconds and has a maximum heap usage of 7.7 MB. The Rust program takes ~4.2 seconds and has a maximum heap usage of 6 MB. (As measured by GNU time using `time -v`.)

---

IMO, both programs reflect "naive" solutions. The point of me doing this exercise is to show just how silly this is, because now we're going to optimize these programs, but we'll limit ourselves to smallish perturbations in order to put a reasonable bound on the task.

If I run the Go program through `perf record`, the top hotspot is runtime.mallocgc. Now, I happen to know from experience that Scanner.Text is going to allocate a new string while Scanner.Bytes will not. I also happen to know that the Go standard library `bytes` package recently got a nice optimization that makes bytes.Contains as fast as strings.Contains: https://github.com/golang/go/commit/44f1854c9dc82d8dba415ef1... --- Since reading into a Go `string` doesn't actually do any UTF-8 validation, we don't lose anything by switching to using raw bytes.

Knowing this, we can tweak the Go program to great effect: https://gist.github.com/anonymous/c98dc8f6be6d414ae3e7aa6931... --- Running the same command as above, we now get a time of ~2.3 seconds and a maximum heap usage of 1.6 MB. That's impressive.

Now let's see if we can tweak Rust, which is now twice as slow as the Go program. Running perf, it looks like there's an even split between allocation, searching and UTF-8 validation, with a bit more towards searching. Like the Go program, let's attack allocation. In this case, I happen to know that the `lines` method returns an iterator that yields `String` values, which implies that it's allocating a fresh `String` for every line, just like our Go program was. Can we get rid of that? The BufReader API provides a `read_line` method, which permits the caller to control the `String` allocation. If we use that, our Rust program is tweaked to this: https://gist.github.com/anonymous/a6cf1aa51bf8e26e9dda4c50b0... --- It's not quite as symmetrical as a change as we made to the Go program, but it's pretty straight-forward IMO. Running the same command as above, we now get a time of ~3.3 seconds and a maximum heap usage of 6 MB.

OK, so we're still slower than the Go program. Looking at the profile again, the time now seems split completely between searching and UTF-8 validation. The allocation doesn't show up at all any more.

Is this where you got stuck? The next step from here isn't straight-forward because getting rid of the UTF-8 validation isn't possible to do safely while still using the String/&str search APIs. Notably, Rust's standard library doesn't provide a way to search an `&[u8]` directly using optimized substring search routines. Even if you knew your input was valid UTF-8 before hand, there's no obvious place to insert an unsafe `from_utf8_unchecked` because the BufReader itself is in control of producing the string contents. (You could do this by switching to using `BufReader.read_until` and then transmuting the result into an &str, but that would require unsafe.)

Let's take a leap. Rust's regex library has a little known feature that it can actually search the contents of an &[u8]. Rust's regex library isn't part of the standard library, but it is maintained as an official crate by the Rust project. If you know all of this, then it's possible to tweak the Rust program just a bit more to regain the speed lost by UTF-8 checking: https://gist.github.com/anonymous/bfa42d4f86e03695f3c880aace... --- Running the same command as above once again, we now get a time of ~2.1 seconds and a maximum heap usage of 6.5 MB.

In sum, we've beaten Go in CPU time, but lost the Battle for Memory and the battle for obviousness. Beating Go required noticing the `read_until` API of BufReader and knowing that 1) Rust's regexes are fast and 2) they can search &[u8] directly. It's not entirely unreasonable, but to be fair, I've done this without explicitly using any unsafe or any C code.

None of this process was rocket science. Both the Go and Rust programs were initially significantly sub-optimal because of allocation, but after some light profiling, it was possible to speed up both programs quite a bit.

---

Compared to the naive solution, some of our search tools can be a lot faster. Performing the same query on the same corpus:

    ripgrep    1.13 seconds, 7.7 MB
    ripgrep    1.35 seconds, mmap
    GNU grep   1.73 seconds, 2.3 MB
    ag         1.80 seconds, mmap
    pt         6.41 seconds, mmap
    sift      50.21 seconds, 16.6 MB

The differences between real search tools and our naive solution actually aren't that big here. The reason why is because of your initial requirement that the query match lots of lines. Lots of matches results in a lot of overhead. If we change the query to a more common type of search that produces very few matches (e.g., `Sherlock Holmes`), then our best naive programs drop down to about ~1.4 seconds, but ripgrep drops to about 200 milliseconds.

From here, the next step would be stop parsing lines and start searching the entire buffer directly. (I hope to make even this task very easy by moving some of the searching code inside of ripgrep to an easy to use library.)

---

In sum, your litmus test essentially comes down to these trade offs:

- Rust provides a rich API for its String/&str types, which are guaranteed to be valid UTF-8.

- Rust lacks a rich substring search API in the standard library for Vec<u8>/&[u8] types. Because of this, efficient substring search using only the standard library has an unavoidable UTF-8 validation cost in safe code.

- Go doesn't do any kind of UTF-8 checking and provides mirrored substring search APIs between its `bytes` and `strings` packages.

- The actual performance of searching in both programs probably boils down to optimized SIMD algorithms. Therefore, once you get past the ability to search each line of a file with minimal allocation, you've basically hit a wall that's probably the same in most mainstream languages.

In my opinion, these trade offs strike me as something terribly specific, and it's probably not something that is usefully generalizable. More than that, in the naive case, Rust is doing you a good service by checking that your input is valid UTF-8, which is something that Go doesn't do. I think this could go either way, but I think it's uncontroversial that guaranteeing valid UTF-8 up front like this probably eliminates a few possibly subtle bugs. (I will say that my experience with text encoding in Go has been stellar though.)

Most importantly, both languages at least have a path to writing a very fast program, which is often what most folks end up caring about at the end of the day.

Manishearth · on Dec 28, 2016

... whoa that's a comprehensive comment.

Do you think you could refactor out bytestring-based string manipulation into its own library? Even better would be something that worked for all encodings (using https://github.com/servo/tendril or something)

burntsushi · on Dec 28, 2016

> Do you think you could refactor out bytestring-based string manipulation into its own library?

IIRC, someone was working on making the Pattern trait work on &[u8], but I'm guessing that work is stalled.

To factor it out into a separate crate means copying the &str substring routines, since there's no way to safely use them on an &[u8] from the standard library. (bluss did that in the `twoway` crate, so we could just use that.)

It does seem like a plausible thing to do, at least until std gets better &[u8] support.

> Even better would be something that worked for all encodings

I suspect the standard practice here is something like "transcode to UTF-8 and then search the UTF-8." (This is what I hope to do with ripgrep.)

> (using https://github.com/servo/tendril or something)

I don't think I know what problems tendril is solving, so it's not clear to me what it's role is.

degurechaff · on Dec 28, 2016

woah, amazing comment. I usually just a silent reader on hacker news, but this comment urge me to create an account. I think lossolo just want a flamewar. He already has opinion which you cannot easly change. So any futher discussion after this comment will be pointless.

EDIT: and how can you have time to write this? I just usually close the browser tab when this situation occurs...

burntsushi · on Dec 28, 2016

Thanks! I like to think I usually close the browser tab, but text search is just kinda my thing. :-)

lossolo · on Dec 28, 2016

> I think lossolo just want a flamewar. He already has opinion which you cannot easly change. So any futher discussion after this comment will be pointless.

My opinion in this case is empirically checked. I am not saying this to start flame war, I am just showing my observations in particular example. I can also say that Rust regex implementation eats Go regex implementation by magnitudes (performance wise) and it will also be true and I am not looking for any flame war in this case also. I am only sharing my experience that I've backed up with proofs (code + perf results) for that particular use case. This is a factual discussion, I don't agree it's pointless.

nickpsecurity · on Dec 28, 2016

Best example of evidence-based rebuttal I've seen in a language argument on here in a long time. Great write-up!

andreimuntean · on Dec 28, 2016

Amazing! Thank you for your comment.

lossolo · on Dec 28, 2016

I have different times for both naive solutions on my machine than you.

It's 2.6 seconds for Go and 3.5 seconds for Rust, both perf results and code here: http://pastebin.com/WwhvHH6S

burntsushi · on Dec 28, 2016

Your Rust program corresponds to my second Rust program.

Your Go program is not what I would expect. A bufio.Scanner is the idiomatic (and naive) way to read lines in Go. But this is immaterial. Your results are consistent with mine. They aren't different. I just included more analysis and more programs to provide proper context to this challenge of yours.

lossolo · on Dec 28, 2016

I gave you naive solution, now lets see amateurish solution (someone totally new to both languages)

Rust 4.6s

Go 3.1s

http://pastebin.com/r6K22Dt2

EDIT:

Using grep 0.6s

Then I have installed rigrep and...

Using ripgrep 0.4s

Really nice burntsushi. I am surprised by those (ripgrep) results compared to grep.

mmstick · on Dec 28, 2016

Seems I can't reply to your other comment, so I'll reply here. How can you say that my naive implementation is not naive? What is not naive about it? It's very naive. It's basically the same naive code that you were writing, but in actual idiomatic Rust with the linting issues fixed.

Using a `lines()` approach is naive because that allocates owned heap-allocated strings. An optimal, non-naive solution would not use heap allocation for buffering lines but use a stack-allocated array. That alone would bring significant speedup versus the line approach.

As for ripgrep, it's a pretty comprehensive application that makes use of SIMD/AVX so it's only natural that it's fast.

lossolo · on Dec 28, 2016

Your solution is not naive in my opinion because you set the size of the buffer and use map/filter but ok... Let's check. Your solution is the slowest from all the solutions.

It took 4.6s which only confirms what I wrote on beginning when we started this discussion. Perf counters for your solution in here:

http://pastebin.com/Anak1ahe

mmstick · on Dec 28, 2016

Your Rust example is kind of odd. Did you not see the linting errors in your editor or from the compiler? You can basically boil it down to just two lines of Rust.

https://gist.github.com/mmstick/a8316ba0514f9d9ab33b18fa9b91...

As for timing, I'm doubtful that Go is any faster than Rust. I don't have this www.js file so I can't test it on my laptop, but I'm pretty sure you didn't even attempt to do things like enabling LTO, setting O3, disabling jemalloc, and using the musl target. All these things can make significant differences to the performance of the binary that's produced.

lossolo · on Dec 28, 2016

I don't know if you noticed but we are discussing naive solutions which I've mentioned couple of times in previous posts. Code you linked is not naive solution. Things you proposed to do are not naive either.

AsyncAwait · on Dec 28, 2016

Here's the thing; it is a naive solution. You may not want to accept it because that contradicts your claim of the Go solution being faster, but at that point it becomes a he-said/she-said kind of scenario because I can claim that your Go solution isn't naive as well and have exactly the same "validity" for such a claim as you do.

lossolo · on Dec 28, 2016

Anyway this really doesn't matter as his solution is the slowest of all which confirms what I wrote and contradicts what he wrote.

For me it's not naive solution. Do you have any proof it is? Can you mathematically prove that it's naive solution? I know I can't. For everyone naive solution is something different, what I saw in burntsushi reply and what I wrote myself is the closest to what I think are naive solutions.

> like enabling LTO, setting O3, disabling jemalloc, and using the musl target.

And this is for sure not part of naive solution either.

> You may not want to accept it because that contradicts your claim of the Go solution being faster

This is not true. You can find perf numbers of his solution in my second reply to him. Or you can compare those solutions yourself.

AsyncAwait · on Dec 28, 2016

> For me it's not naive solution. Do you have any proof it is? Can you mathematically prove that it's naive solution?

You can't prove a negative.

caconym_ · on Dec 28, 2016

> but still a proof that what OP wrote is not valid in all cases.

You are not proving anything until you post your Go code, which you don't seem to have done (please correct me if I'm wrong, I am legitimately curious to see for myself how Go and Rust stack up against each other for this problem). Until then, all you're doing is making vague claims backed up by precisely zero evidence. Why should anyone take you seriously?

lossolo · on Dec 28, 2016

I have posted the code in reply to burntsushi. You can check it out yourself.

neandrake · on Dec 28, 2016

I've seen several of your comments indicate "requiring naive implementation". This seems strange to me. Why require a naive implementation and then be concerned over some slight differences in performance?

steveklabnik · on Dec 27, 2016

Yes, and I was saying "even in production-grade, world-class grep implementation, there is barely any unsafe." I also acknowledged that that was different than what you were asking about.

When did you ask on IRC? I'll take a look.

saghm · on Dec 28, 2016

> I told you NO unsafe code. Use only standard library, no third party tools.

I'm not sure what such an example would prove, other than maybe "Go is better than a subset of Rust excluding some of the core features of the language and its greater ecosystem".

oconnor663 · on Dec 28, 2016

> no third party tools

This is a tricky requirement. The standard libraries of both languages use a crap ton of unsafe code. You might end up just asking for a comparison of which standard library is bigger.

But at the end of the day, both languages have perfectly standard buffered readers (https://doc.rust-lang.org/std/io/struct.BufReader.html), shouldn't a simple search like this compile to the exact same code?