Hacker News new | past | comments | ask | show | jobs | submit login
Making Rust as Fast as Go (christianfscott.com)
304 points by chrfrasco 34 days ago | hide | past | web | favorite | 198 comments



Note that Rust and Go programs differ in a seemingly insignificant, but actually important detail.

Rust does this

    next_dist = std::cmp::min(
        dist_if_substitute,
        std::cmp::min(dist_if_insert, dist_if_delete),
    );
Go does this

    nextDist = min(
         distIfDelete, 
         min(distIfInsert, distIfSubstitute)
    )
The order of minimums is important for this dynamic programming loop. If I change Rust version to take minimums in the same order (swapping substitute and delete), runtime drops from 1.878696288 to 1.579639363.

I haven't investigated this, but I would guess that this is the same effect I've observed in

* https://matklad.github.io/2017/03/12/min-of-three.html

* https://matklad.github.io/2017/03/18/min-of-three-part-2.htm...

(reposting my comment from reddit, as it's a rather unexpected observation)


Those are two really well written articles -- taking a complicated topic and making it very accessible. Thanks! FWIW I think a companion article on how to effectively use perf for tasks like this would be a great addition, since it can be a bit novice-unfriendly.


I think they made the rust version same as Go because I cloned it just now and they are both the same. Also, Thank you soo much for the blog posts! :)


Given this information, and for general parsing functionalities, which one is faster, Go or Rust?


As always, it depends on what your goals are. String processing is usually not the long pole when you're building something that consumes the output of a parser. Based on micro and macro benchmarks, Rust is typically substantially faster than Go and pretty much always uses less RAM [1].

But again, depends what you're doing with the output, and if these deltas even matter in your context.

[1] https://benchmarksgame-team.pages.debian.net/benchmarksgame/...


You can write a parser as far as the limits of your imagination allow in Rust, or C, or any other baremetal language without a thick run-time.

In Go, beyond the limits of your imagination, you'll also hit other limits, like those of the garbage collector.


Very few things have ever been measured accurately to ten significant digits. How much did these numbers change run to run? How many measurements did you take? Were the caches warmed up similarly? Still, point taken.


The excessive "precision" is because I've just copy-pasted what the original bench printed.

As for "is this a reliable result", I believe I've performed diligence, appropriate for a HN comment, to make sure that this is not a completely random result. As I've said, I did not investigate this particular bit of code thoroughly. You are welcome to read the linked blog posts, which study the issue in depth.


Since min(a, min(b,c)) == min(b, min(a,c)), perhaps the compiler should be smart enough to swap the comparisons around if it makes it quicker?


I suspect that statement is not true for floats. Possibly you don’t get the same float from min(0,-0) as min(-0,0), and similarly with NaNs. Rust specifies that if one input is NaN then the other is returned but doesn’t say what happens if both are NaN.


> Rust specifies that if one input is NaN then the other is returned but doesn’t say what happens if both are NaN.

It does. If both are NaN a NaN is returned. Note, however, that when Rust says that a NaN is returned, this means that any NaN can be returned. So if you have min(NaN0, NaN0) the result isn't necessarily NaN0, it might be another NaN with a different payload.


Right. That’s what I was getting at but I didn’t know the NaN-return rule.


Hmm, not sure what the llvm semantics look like, but you're right about the assembly semantics, vminsd (the instruction used in the post) is unfortunately not symettric in it's handling on NaN.

https://www.felixcloutier.com/x86/minsd


NaN's are rarely required on the fast path of any code... The compiler ought to make a 2nd slow implementation for if any NaNs are seen.


Sure, but then it needs to add a branch to detect the inputs, and maybe keep around some state to switch between implementations at run-time.

Its far from clear that doing that is, in general, worth it.


Part of the problem may be they re-implemented `std::cmp::min` at the bottom of the file, I wonder if there's a more optimized version in the stdlib.


One of the most frustrating parts of rust (for me), is that `std::cmp::min` (and some other methods) require that their arguments are `Ord` (totally ordered), and floats are only partially order because of NaN, so you can't use std::cmp::min on floats.



I do believe that's quite intentional, due to the inexactness of floating-point values, it's not "right" to do that. There should be a `partial_min` function though, IMO, which has weaker guarantees.


It's absolutely intentional, and it's got being technically correct on its side, it's just frustrating.


Not really. That Rust forced total order and partial order into a sub-typing relation is a completely unnecessary, self-inflicted wound.


Many folks do consider this a mistake. Some do still think it was the right call. I'd say it's very unclear.


You can use the ordered_float crate. It has a newtype that cannot hold NaNs and thus impls Ord.


Depending on what behavior you want, there’s a bunch of wrapper crates.


Yeah, it makes no sense to be the default. Code that expects to treat NaNs is very rare.


Safety is the whole idea behind Rust, and if you draw the line here, that's neither in line with the Rust ethos, nor particularly valuable. After all, code that "expects to handle" null was pretty rare too ;)


NaNs have nothing to do with safety (same for nulls).

It is a common misconception to conflate safety with functional expectations. A program that only calls panic() is perfectly safe.


Ah sorry, I didn't mean safety in the UB sense, I meant in the traditional, "do what I expect, don't surprise me" sense.


Safety in Rust's context is UB-free, memory errors-free, data race-free.

Safety in software engineering is more about designing systems with some degree of assurance against certain failures, but not about surprises or expectations of a programmer.

The usage you call traditional is perhaps common, but not really rooted in anything in software engineering. I'd call it an informal meaning, maybe.

PS. No need for apologies!


Safety as in unsafe blocks means what you say. But safety as in why so many people use rust is actually what arcticbull says. I think this is something that gets lost on a lot of people who don't use rust. The first form of safety is properly called memory safety.

Rust as a language isn't just designed to avoid undefined behavior, it's designed to make you write correct code, where correct means it does what you want it to. Obviously rust doesn't always succeed at that broader goal, but it actually does a pretty good job all things considered.

Arcticbull's description of rust's ethos is spot on.


Hey all, as some keen-eyed commenters have pointed out, it looks like the rust program is not actually equivalent to the go program. The go program parses the string once, while the rust program parses it repeatedly inside every loop. It's quite late in Sydney as I write this so I'm not up for a fix right now, but this post is probably Fake News. The perf gains from jemalloc are real, but it's probably not the allocators fault. I've updated the post with this message as well.

The one-two combo of 1) better performance on linux & 2) jemalloc seeming to fix the issue lured me into believing that the allocator was to blame. I’m not sure what the lesson here is – perhaps more proof of Cunningham’s law? https://en.wikipedia.org/wiki/Ward_Cunningham#Cunningham's_L...


Thanks for following up. Just as an FYI, there's a few bugs in your implementation, the most obvious one is the use of ".len()" in a number of places interspersed with ".chars().count()". These two return different values. ".len()" returns then number of UTF-8 bytes in the input string, which for ASCII is the same as ".chars().count()" obviously, but if you do attempt any Unicode characters, your function won't work. ".chars()" provides Unicode Scalar Values (USVs) -- which is a subset of code points, excluding surrogate pairs [1]. Note also this is not the same as a Go rune, which is a code point including surrogate pairs.

Secondly, you re-implemented "std::cmp::min" at the bottom of the file, and I'm not sure if the stdlib version is more optimized.

Lastly, well, you caught the issue with repeated passes over the string.

I've fixed the issues if you're curious: https://gist.github.com/martinmroz/2ff91041416eeff1b81f624ea...

Unrelated, I hate the term "fake news" as it's an intentional attempt to destroy the world public's faith in news media. It's a cancer on civilized society. Somewhere your civics teacher is crying into some whiskey, even though of course you're joking.

[1] http://www.unicode.org/glossary/#unicode_scalar_value


This doesn't even begin to get into the question of what Levenshtein Distance even means in a Unicode context. What's the Levenshtein Distance of 3 emoji flags? I suppose we should be segmenting by grapheme clusters and utilizing a consistent normalization form when comparing, but Rust has no native support for processing grapheme clusters -- or for normalizations I believe. The UnicodeSegementation crate might help.

Based on some cursory research, the go version differs in a more subtle way too. A Rune is a Code Point, which is a superset of the Rust "char" type; it includes surrogate pairs.


Levenstein (edit) distance is fundamentally an information-theoretical concept defined on bitstreams, as insertions/deletions/swaps of individual bits within a stream. It has a lot in common with error-correcting codes, fountain codes, and compression, which all also operate on bitstreams.

Any higher-level abstract mention of Levenstein distances (e.g. of Unicode codepoints) is properly supposed to be taken to refer to the Levenstein distance of a conventional (or explicitly specified) binary encoding of the two strings.


Can you point to a source that defines Levenstein distance as only referring to bitstreams?

A translation of the original article [1] that introduced the concept notes in a footnote that "the definitions given below are also meaningful if the code is taken to mean an arbitrary set of words (possibly of different lengths) in some alphabet containing r letters (r >= 2)".

And if you wish to strictly stick to how it was originally defined, you'd need to only use strings of the same length.

More recent sources [2] say instead "over some alphabet", and even in the first footnote, describe results for "arbitrarily large alphabets"!

[1] https://nymity.ch/sybilhunting/pdf/Levenshtein1966a.pdf

[2] https://arxiv.org/pdf/1005.4033.pdf


And Unicode is the biggest alphabet haha.


The problem is there's not a "conventional" binary encoding of Unicode strings. You can create the same exact character many different ways, from a single scalar value to a composition of multiple scalar values. There's also multiple different ways of ordering different pieces of a composite character that yield the same value. Would we not want to utilize a consistent decomposition and consistent logical segmentation for the string? It's no longer enough to iterate over a string one byte at a time and derive any meaning. Is it right that the Levenstein distance between 2 equal characters, "a" and "a", might be 12, simply because the joiners were ordered differently?

It seems like segmentation by grapheme cluster and comparison using a consistent normalization would provide the same logical answer as a classic byte-wise Levenshtein distance on an ASCII string. [1]

Or are you suggesting that's too high level and we should just consider this to be operating on bit strings that happen to be grouped into bytes, and not worry about the logical implications. Therefore we'd just use a consistent normalization form on both input strings, and it's okay that the distance is up to like 10-15 for a single character difference in a composite character and 1 in an ASCII character. That sounds totally reasonable too, just different.

[1] http://unicode.org/reports/tr15/


> Any higher-level abstract mention of Levenstein distances (e.g. of Unicode codepoints) is properly supposed to be taken to refer to the Levenstein distance of a conventional (or explicitly specified) binary encoding of the two strings.

This doesn’t match any definition of Levenshtein distance that I’ve ever encountered. I’ve always seen it defined in terms of strings over some alphabet, and the binary case is just what happens when your alphabet only has two symbols in it.

Quite naturally the problem with Unicode strings is that there is are multiple ways to treat them as sequences. One obvious way is to treat them as a sequence of Unicode scalar values, but that’s by no means what you’d want—maybe a sequence of grapheme clusters may be more appropriate, and you also may wish to consider normalization.


(related to the unrelated part) what if the media is corrupt? I mean, independently on current events (I really don't want to enter that here) we do live in a world where very few amoral corporations own most of the media industry.

If we (correctly) rely on the media to bring to public attentions relevant facts (both criminal and non-criminal) and keep a watchful eye on the nation who then keeps a watchful eye on the media?

is the model entirely based on always being there enough good journalist to spot the bad ones? how is this affected by the very precarious economics of current internet ads-based venture-funded media enterprises?

I just blurted too many questions... what I am trying to say is that similarly with the police there is not as easy answer in shoud-trust should-not-trust (in the US a supreme Court judge advised to "not talk to the police").

in that case I guess part of the problem is that the job of the police can be miscontrued as "arresting people". in the same way the job of a journalist can be miscontrued as "getting clicks"

overall I don't think we can pass an a priori moral judgement on that term, as essentially represent a statement that the default safety measures have failed.

(I want to reiterate that here I try not to intermingle my point with whether I believe or not that the current use is warranted, I am just trying to say that as a concept it needs to be part of an healthy democracy, the same as some distrust in electoral promises)


"Fake News" entered the realm (not even recently, might I add) of popular misuse. Actually, is there a term for: language/grammar incorrectly used because society has developed a familiar "meme" use?

Common examples:

* Look at this dank "meme".

meme has come to mean "a picture shared on the internet that has words on it".

* Let's [have a] "cheers".

It's a toast. You say "cheers" when you toast.

* You missed Suzie and I's party last night.

It's Suzie and my party. This one is particularly annoying because it's made it way past editors and into writing, screenplay, etc.


If it's the kind of things people say, you want it in screenplays, otherwise Brooklyn 99 would sound like Shakespeare.


> Hey all, as some keen-eyed commenters have pointed out, it looks like the rust program is not actually equivalent to the go program. The go program parses the string once, while the rust program parses it repeatedly inside every loop. […] The perf gains from jemalloc are real, but it's probably not the allocators fault. I've updated the post with this message as well.

I don't know that it would be a gain: Rust is pretty good at decoding UTF8 quickly given how absolutely fundamental that operation is, and "caching" the decoded data would increase pressure on the allocator.

Unless you also changed the interface of the levenshtein to hand it preallocated cache, source and destination buffers (or the caller did that).

edit: to the downvoter, burntsushi did the legwork of actually looking at this[0] and found caching the decoding to have no effect at best, unless the buffers get lifted out of the function entirely, which matches my comment's expectations.

[0] https://news.ycombinator.com/item?id=23059753

> But yes, I did benchmark this, even after reusing allocations, and I can't tell a difference. The benchmark is fairly noisy.


> this post is probably Fake News

It's not Fake News. Fake News is the publication of intentionally false stories. This is just erroneous.

There's a yawning chasm between the two.


"fake news" is an attempt at permanently damaging the American public's faith in the fifth estate. Nobody should ever use that term in the current political climate, ever.


[flagged]


Please don't take HN threads into nationalistic flamewar, or any flamewar.

https://news.ycombinator.com/newsguidelines.html

We had to ask you the same thing once recently already: https://news.ycombinator.com/item?id=22775767. Doing this repeatedly will get you banned here, so please stop.


That's a fair criticism if clumsily put. For the record, I'm Canadian, British and Polish. I called out America as by far the biggest perpetrator at the moment.

So to address the second half of your post, I have indeed heard of "other countries" (just ask USCIS) although being a US resident at the moment I want to speak to what I know, not for anyone else.

Though if we're being super pedantic, there's only one human species on this planet -- species of course being defined as animals who can mate and produce viable offspring.


"That's a fair criticism if clumsily put" - I guess the original point was a shining example of elegance.

USCIS - the same organ that asks women "have you ever been a prostitute" on citizenship tests?

"Though if we're being super pedantic, there's only one human species on this planet -- species of course being defined as animals who can mate and produce viable offspring."

You got me here ;)


If I'm understanding you correctly you're referring to the eligibility criteria at the back of an Adjustment of Status petition, form I-485, which is part of the green card process.

I'm pretty sure the prostitution questions apply to men, too. The question is "Have you EVER engaged in prostitution or are you coming to the United States to engage in prostitution?" (Part 8, Question 35). Credit where due, I suppose, an (albeit small) win for gender equality.

As an aside, you should see what else they ask. That's among the least concerning question -- they ask everyone if they've committed war crimes or trafficked humans recently, or been a member of a communist or nazi party. Or committed any political assassinations (Part 8, Question 48.a).

[1] https://www.uscis.gov/i-485


Don't let the troll drag you into this. It's a pointless distraction.


Indeed, I shouldn't get too far off course here haha, I just happened to know exactly what they were referring to and find that whole section amusing.


Says who?

When news organizations take other news organizations word for it and the story is false, that's fake news. We called it something different back then, but fake news led to the invasion of Iraq. Negligence is sufficient for fake news, malice not required.


Fake News was a term invented around 2015 during the run-up to the 2016 election. It specifically referred to the phenomenon of literally fake news stories, such as rallies and completely false stories about politicians, being published by legitimate-sounding but nonexistent news organizations, using platforms like Facebook to disseminate themselves. Usually these were done from China and Russia.


You're as silly as those people who claim that male and female refer exclusively to sex but not gender. Terms change in meaning and fake news is widely used to refer to false news stories, regardless of reason. You have to accept the way everybody else is using the word.


So, what do you recommend calling what used to be called fake news? And how we ensure that the its replacement isn't co-opted for malicious purposes again?


And then trump redefined it to be about legitimate news sources exclusively. We’ve really been in upside down world for four years.


That's one of the reasons why I bring it up. If we don't stand against this dilution of a powerful term, it will be co-opted - which is exactly what people who want to destroy the trust of the press for their own personal and political ends want to do.


It’s a joke mate


Of course, but it chips away at the Overton window, normalizing it.


The Rust version uses `target.chars().count()` to initialise the cache, while the Go version counts up to `len(target)`. These are not equivalent: the Rust version counts Unicode code points, the Go version counts bytes.

I am confused by the implementations, although I have not spent any time testing them. Both versions contain a mix of code that counts bytes (`.len()` and `len(...)`) and Unicode code points (`chars()` and `[]rune(...)`). My guess is that the implementation might not work correctly for certain non-ASCII strings, but I have not verified this.

Of course, if only ASCII strings are valid as input for this implementation then both versions will be a lot faster if they exclusively operate on bytes instead.


Yep.

Here a Go playground example showing that the result is indeed wrong:

https://play.golang.org/p/vmctMFUevPc

It should output 3 but outputs 5 because each ö is two bytes, len("föö") = 5.

I would suggest using "range" to iterate over the unicode characters.


If you diff föö and f though, it correctly gives an edit distance of 2.

The code is weird because someone knew enough to convert the strings to slices of runes but not enough to use the rune slices consistently. :-/


Not to mention Rune slices are insufficient for things like Flag emoji and Family emoji, which is going to be a ton of separate runes put together. The latter of which, apparently deletes one family member at a time when you hit "backspace".


Oh fab, just when I thought I had a fairly solid understanding of how to handle Unicode strings I learn something else that increases the complexity.

I have nothing but respect and gratitude for people that write good unicode handling libraries, but even then the end developer has to learn a lot just to be aware of what to look out for when handling strings.

Somewhere on github I think, somebody has posted a file with evil Unicode strings.


In general, unicode requires you think differently about strings depending on context. Here's my rule of thumb.

1. If you are transporting a unicode string, reading/writing over the network or to a file, think in terms of UTF-8 bytes. Do not attempt to splice the string, treat it as an atomic unit.

2. If you are parsing a string, think in terms of code points (runes in Go, chars in Rust). A good example would be the Servo CSS parser. [1]

3. If you're comparing/searching/inspecting/sorting a string in code, segment by grapheme clusters and normalize, then do what you came to do. [2]

4. If you're displaying a string, think in terms of pixels. Do not attempt to limit a string by length in "characters" (nee grapheme clusters in the unicode world) but rather measure by what the renderer does with the string. Each character can be a thoroughly arbitrary width and height.

5. If you're building a WYSIWYG editor, there's more to it than I even know myself, but I suggest reading into what Xi did. It's going to be some combination of everything above. [3]

[1] https://github.com/servo/rust-cssparser/blob/master/src/toke...

[2] https://github.com/unicode-rs/unicode-segmentation

[3] https://github.com/xi-editor/xi-editor


> 2. If you are parsing a string, think in terms of code points (runes in Go, chars in Rust). A good example would be the Servo CSS parser. [1]

If all your syntactically meaningful characters are in ASCII you can also use UTF-8 bytes in your parser.

Even if they aren't, no UTF-8 encoding of a character is a substring of the encoding of any other character(s).


Looks like a small bug in the go code, corrected here. Original author should have used rune throughout. https://play.golang.org/p/mGZZMFtMgHH


Yes, especially because that changes it from constant time (strings know their length in bytes) to linear time (counting chars means a loop through the text.)

I was a bit suspicious of the conclusion, but didn’t dig in myself. I imagine this would be a much larger source of difference.


huh, yeah if I switch the rust version to target.len() the execution time drops by more than 10%

edit: and if I switch to source.bytes().enumerate() it drops by 20% more


Go's version and the Rust version differ in yet more subtle ways. It appears that Go's "rune" type is a Code Point, but Rusts's "char" type is a Unicode Scalar Value, a subset of Code Point that excludes surrogate pairs. Both versions will not work with complex Unicode input unless you perform both segmentation by Grapheme Cluster [1] and utilize a consistent Normalization [2] when comparing clusters.

Unicode is hard, fams, and it's rare that anything that looks easy is actually what you want.

[1] https://unicode.org/reports/tr29/

[2] http://unicode.org/reports/tr15/


> Both versions will not work with complex Unicode input unless you perform both segmentation by Grapheme Cluster [1] and utilize a consistent Normalization [2] when comparing clusters

Doing this is quite easy from Rust.


There's no standard library API in Rust for either grapheme cluster segmentation or for normalization. You'd need a third-party crate, at which point it's less "easy in Rust" and more "someone did the hard work for you" because, its really not easy anywhere haha.


No, and there probably never be a libstd implementation of it, but there is a single crate that everybody uses: unicode-segmentation [0]

> You'd need a third-party crate, at which point it's less "easy in Rust" and more "someone did the hard work for you"

"Someone did the work for you" is true of all code that you did not write yourself, independently of whether that code is easy or hard to write, or whether it is in the standard library or not.

unicode-segmentation is pretty much the only library of its kind in Rust, is super easy to discover (google "Rust grapheme cluster", "Rust unicode segmentation", etc.), and using it is as easy as just typing "cargo add unicode-segmentation".

The library is maintained by a Rust core team member, a Rust standard library team member, is used by servo and firefox, and is the only unicode segmentation library that people use.

Since many programs don't need to do any kind of unicode segmentation, making it part of the standard library sounds like a bad idea. In particular, given that unicode is a moving standard, it would mean that people stuck on old Rust toolchains (e.g. LTS linux distros) cannot create binaries that do proper unicode segmentation, which does not make sense.

The underlying problem is that many programmers do not know that they need to do unicode segmentation in the first place. Moving this into the standard library does not fix that problem either.

[0]: https://crates.io/crates/unicode-segmentation


> Since many programs don't need to do any kind of unicode segmentation, making it part of the standard library sounds like a bad idea. In particular, given that unicode is a moving standard, it would mean that people stuck on old Rust toolchains (e.g. LTS linux distros) cannot create binaries that do proper unicode segmentation, which does not make sense.

That has nothing to do with it. You could still have a library that has the very latest Unicode standard support for those that need the very latest, and keep updating the stdlib one.

It does not make sense either to expect someone to use bleeding edge libraries from cargo yet use an old rustc compiler. They can easily update it if needed.


> It does not make sense either to expect someone to use bleeding edge libraries from cargo yet use an old rustc compiler.

Of course it does. Many software users are stuck on multiple-year-old toolchains for various reasons, yet these systems still need to be able to handle unicode properly.

> They can easily update it if needed.

No, they cannot. Many users are stuck in older windows versions, linux versions, LTS linux versions, etc. because of their organization or their clients requirements.

Telling a client that you can't develop an app for them because their 2 year old Ubuntu is too old is often not a very successful business model.

> and keep updating the stdlib one.

These updates would only apply to newer Rust toolchains, that many users cannot use. Unless you are suggesting the release of patch versions for the soon to be 100 old Rust toolchains in existence every time the unicode standard is updated.

This is too much trouble and work for little gain, given that one can still use a Rust 1.0 compiler to compile the latest version of the unicode-segmentation crate without problems.


IMO it's not as clear-cut as you make it out to be. It's a pretty arbitrary line to exclude full Unicode support from the standard library. There's a ton of stuff in libstd that could be supported as third-party crates. I don't disagree with what the Rust team has done, and I think there could be a world in which the compiler team also releases first-party crates with "enhanced" functionality beyond just libstd. I consider proper Unicode support to be a "first party" thing, but I also don't think it has to be in libstd per se, necessarily.

For the record, I also disagree with your assertion that "easily done in rust" should be extended to include "...by importing a third-party framework." In that sense anything is easy to do in any language where a third-party framework exists. I'm confident it's just as easy in go.


> For the record, I also disagree with your assertion that "easily done in rust" should be extended to include "...by importing a third-party framework." In that sense anything is easy to do in any language where a third-party framework exists. I'm confident it's just as easy in go.

Have you tried doing that in C++? Doing that in a cross-platform way (or even in a single platform) is anything but easy, because you don't have a tool like cargo, you have to change your build system, do the dependency resolution manually, etc.

So no, such a library existing does not imply that using that is easy.

In Rust, you just need to write `cargo add unicode-segmentation` once in a project, and then you can directly use the library API. There is literally nothing else for you to do.

That's a pretty low barrier of entry, and something you will need to do 100s of times per project anyway, because the standard library is minimal by design.

If you prefer languages without a minimal standard library, then Rust isn't for you. Go try Python, where half of the standard library has a warning saying "deprecated: use this other better external dependency instead; adding this to the standard library for convenience was the worst idea ever and now we need to maintain all this code forever".


In C++ you do have cross-platform tools like cargo. For instance, vcpkg. > In Rust, you just need to write `cargo add unicode-segmentation` once in a project,

In C++ I can write `vcpkg install whatever`. Yet that does not mean my organization will allow the library.

So no, adding libraries is quite harder than installing them unless you are working in your own projects alone. And even then adding them is never a one line effort.

> If you prefer languages without a minimal standard library, then Rust isn't for you.

There is nothing minimal about Rust's std.


the point would be that the standard library should be forward compatible while crates should be backward compatible.

so that current crates work with old version of compilers/toolchains.

this applies here as each new Unicode standard requires an update of the Unicode crate. ideally the best case would be to make it so that in 20 years Rust 1.0 can still use the most updated version of Unicode fragmentation. similarly to how some C libraries insist on C89 compatibility to still work on older systems.

I guess Rust would like it if this never became indispensable but also should be possible


Today, if you write `cargo add unicode-segmentation` to a Rust 1.0 program, you can use the latest version of Unicode. You can also add an older version of the library and use an older version if you want.

Adding unicode segmentation to the standard library and making all Rust binaries, most of which don't do unicode segmentaiton, 20 Mb larger by unnecessarily bundling the unicode tables, makes no sense.

As you see in the other thread, the problem the parent poster has is that their organizaiton doesn't let them use crates from crates.io.

That's a stupid policy for a language like Rust, and the solution isn't "move all crates of crates.io into the standard library". The solution there is for them to write their own unicode-segmentation code (and async executor, and http stack, and linear algebra, and... the whole world), since that is what their organization wants them to do. That's a stupid policy, but it is their own stupid fault.

Most organizations either allow using crates from crates.io, or have a vetting policy, where you just submit a 1 liner email saying "I need to do unicode-segmentation and there is only one library for that that's used by Firefox here: ...". Somebody then checks licenses and stuff, and gives you approval in a couple of days. If their organization doesn't have such processes, then i'm sorry for them, but I don't see how this is in any way the standard library fault.

Whatever their reasons are "my org doesn't allow us to use cargo" isn't a good reason to move something into the standard library.


> Adding unicode segmentation to the standard library and making all Rust binaries, most of which don't do unicode segmentaiton, 20 Mb larger by unnecessarily bundling the unicode tables, makes no sense.

Isn't that what dead code stripping is for?


If you are shipping a binary library, like the standard library, you don't know which parts of your library users are going to use when they link it, so you need to ship binaries with all of it to all Rust users.

The only one that can strip is the end user compiling a final binary, and the compiler often cannot do this for you because that requires whole program optimization and full LTO, which is super super slow.

Also, just because the final binary doesn't use a symbol, doesn't imply that the symbol isn't used. You can ship a library with a main function that can be run as an executable, or linked as a library. The linker doesn't know.

You would really need to go out of your way to strip your binary for your particular application. This is possible, and not that hard.

But the point remains: why should 99% of Rust users have to go through the trouble just so that those who need this don't have to write `cargo add unicode-segmentation` ?

Rust philosophy is "don't pay for what you don't use", so if your organization doesn't support third-party dependencies, they need a language with "batteries included", and not a language like Rust that comes without batteries by design.

Proposing to make Rust come with batteries included is proposing to change one of Rust's core values. Go ahead and write an RFC for that. I'll get my popcorn.


> The only one that can strip is the end user compiling a final binary, and the compiler often cannot do this for you because that requires whole program optimization and full LTO, which is super super slow.

Whole program dead code elimination doesn't have to be slow. Nim does that by default (it's not possible to turn it off since a couple releases actually) and it still compiles quite fast.


> Proposing to make Rust come with batteries included is proposing to change one of Rust's core values. Go ahead and write an RFC for that. I'll get my popcorn.

Again, my recommendation wasn't that, it was that the core team consider releasing "core" language features like Unicode support as first-party crates when they don't make sense as part of 'stdlib' not that I think they will. Feels weird to me that I have to rely on the goodwill of third parties to provide core language functionality like complete string handling.


(We already do do this in some cases; these crates are authored by "The Rust Project Developers". For example, the regex crate is one of these.)


Yeah, but that makes them third-party already for many orgs' policies, even if they come from the same set of developers.

As soon as you have to add a crate, you are in for extra review and pain.


I don't understand, wasn't

> it was that the core team consider releasing "core" language features like Unicode support as first-party crates

what you were asking for?


I think that's what I was asking for, but I'm not the person you were replying to ^_^, they were taking a more hard-line stance that it should be part of libstd.

Side-note I really appreciate all the work you folks are doing. Rust has changed the way I write software even when I'm not writing Rust, which is about the highest praise I can offer.

I'd love to pitch in.


Whoops, how embarrassing for me!

Thanks :)

... how would you like to pitch in? I can point you to the right people. We're always happy to have more help.


You can definitely ship binary libraries and use only whatever is needed, and there is no need for LTO/WPO to achieve that.

In fact, LTO/WPO have nothing to do with the ability to link whatever is needed.


(as a maybe unneeded clarification, my previous comment agrees with you)

I agree that something like unicode-segmentation should not be in the bundled standard library. specifically because the std should be forward compatible.

In my opinion it is why foundational libraries should strive to be on Rust 2015 edition rather than the latest.


> Of course it does. Many software users are stuck on multiple-year-old toolchains for various reasons, yet these systems still need to be able to handle unicode properly.

So? Use the external library then. One thing does not preclude the other.

> No, they cannot. Many users are stuck in older windows versions, linux versions, LTS linux versions, etc. because of their organization or their clients requirements.

I work in such an organization and no, we cannot use third-party packages. The same way we cannot update our toolchain. So in most cases the point is moot.

> These updates would only apply to newer Rust toolchains, that many users cannot use. Unless you are suggesting the release of patch versions for the soon to be 100 old Rust toolchains in existence every time the unicode standard is updated.

You can provide standard Unicode handling that is good enough for 99% software out there. If you need to be on the bleeding edge, then use the bleeding edge library or rustc.

It is pretty simple, actually!


> So? Use the external library then. One thing does not preclude the other.

That's what everybody already does? You are proposing to, instead of doing that, move that library into the standard library where it cannot ever change.

> You can provide standard Unicode handling that is good enough for 99% software out there.

That's already in std? 99% of the code doesn't need to handle unicode grapheme clusters, because it doesn't deal with unicode at all.

You are suggesting moving something into standard that would make unicode software harder to update, and would make the standard library huge (>20mb larger) for all programs (the unicode tables take a lot of binary size), even those that don't use unicode, to try to solve a problem that does not exist.

> I work in such an organization and no, we cannot use third-party packages

If a Rust user cannot write `cargo add unicode-segmentation`, they have bigger problems than not being able to handle grapheme clusters. You can't run async code because you don't have an executor, you can't do http because the standard library doesn't support that, you can't solve partial differential equations, or do machine learning, or pretty much anything interesting with Rust.

That's bad for you, but the solution isn't to make Rust bad for everybody else instead.

If your organization doesn't let you use third-party packages, then write your own: that's what your organization wants you to do.

Some organizations want all code in CamelCase, they can't use the standard library at all. But the solution isn't to make Rust case insensitive, or to prove a 2nd standard library API for those organizations.


> That's already in std? 99% of the code doesn't need to handle unicode grapheme clusters, because it doesn't deal with unicode at all.

99% of the software does not use the entirety of the std. Something is good to be in the std if for that domain it solves the majority of problems, not if everyone uses it.

> You are suggesting moving something into standard that would make unicode software harder to update

It is equally hard to update.

When people say that std libraries are harder to update they refer to changes in interfaces, not incremental updates to tables etc.

> and would make the standard library huge (>20mb larger) for all programs (the unicode tables take a lot of binary size)

Including the tables in every executable even when not used is a broken implementation.

> If a Rust user cannot write `cargo add unicode-segmentation`, they have bigger problems than not being able to handle grapheme clusters.

It is not a "problem". In most commercial software, libraries and versions are vetted. Same applies for all languages. If something is in the std, then it is already in, that is why it is useful.

> That's bad for you, but the solution isn't to make Rust bad for everybody else instead.

I don't see why that makes Rust "bad". It sounds like the opposite to me!

> Some organizations want all code in CamelCase, they can't use the standard library at all.

You are going off-topic to support your point.


> I don't see why that makes Rust "bad".

And this is why people suggesting what you are suggesting never manage to achieve the change.

> 99% of the software does not use the entirety of the std

Most Rust software uses most of it.

> You are suggesting moving something into standard that would make unicode software harder to update

How do you update the unicode tables for those stuck with Rust 1.0 ? If you are going to make this claims, back them up.

> It is not a "problem". In most commercial software, libraries and versions are vetted. Same applies for all languages. If something is in the std, then it is already in, that is why it is useful.

So your organization does support third-party packages, you are just to lazy to ask for vetting ? That's not what you claimed above (you claimed that your organization does not support third-party packages at all).

The answer to this is simple, ask your organization to vet this library. If that's too complicated and takes too much effort, improve your organization's process.

Suggesting that only because you are too lazy to vet a library that library should be in standard is a laughable proposal. Think about the trade-offs, evaluate them, weight them, and if you still think doing so is worth it, write an RFC. The process for putting things into standard is open.

But if your only argument is "me,me,me,me" that's not going to go anywhere.


Nice catch, thanks for pointing this out. I've updated the cache initialization to use `len(targetChars)` rather than `len(target)`:

    cache := make([]int, len(targetChars)+1)
    for i := 0; i < len(targetChars)+1; i++ {
        cache[i] = i
    }
AFAIK this makes them equivalent (fingers crossed). It seems to not have made much of a difference (-0.03s)


They might be equivalent (perhaps apart from other issues pointed out in other comments), but both implementations are still either

1) incorrect if UTF-8-strings are supposed to be valid input, or

2) very inefficient if only ASCII-strings are supposed to be valid input.



I tried to benchmark Go/Rust versions as well.

I made 4 changes in Rust version.

1. Moved up the line that gets a value from cache[j+1] before any calls are made to cache[j]. This removes 1 bound check. (Improvement from 182,747ns down to 176,xyzns +-4800)

2. Moved from .chars().enumerate() to .as_bytes() and manually tracking current position with i/j variables. (Improvement from 176,xyz ns down to 140,xyz ns)

3. Moved to the standard benchmark suite from main + handrolled benchmark system.(File read + load + parse into lines was kept out of benchmark)

4. Replaced hand rolled min with std::cmp::min. (improvement from 140,xyz down to 139,xyz but the std deviation was about the same. So Could just be a fluke. Don't know)

In Go version, I made three changes.

1. Doing the same thing from #1 in Rust actually increased the runtime from 190,xyz to 232,xyz and quite consistently too. I ran it 10+ times to confirm)

2. Replaced []rune(source), []rune(target) to []byte(source), []byte(target). (Improvement from 214817ns to 190152 ns)

3. Replaced hand rolled bench mark system with a proper bench mark system in Go. (Again, File read + load + parse into lines was kept out of benchmark)

So, At the end of it, Rust version was about 50k ns faster than Go version.

Edit #1:

In rust version, I had also replaced the cache initialization to (0..=target.len()).collect() before doing anything els.. This also gave a good perf boost but I forgot to note down the exact value.


I'd be really surprised to hear that Go is supposed to be faster than Rust. Of course Rust is a bit newer but to me it always sounded like Go is fast because it's static but it doesn't have to be high-speed if that would sacrifice conciseness. Given that this is an artifical example, this here looks more realistic: https://benchmarksgame-team.pages.debian.net/benchmarksgame/...


Rust and Go are contemporaries. Rust started in 2006 at Mozilla, and the first Go public release from Google was in 2009, meaning it probably started at the same time.

Except of course all the Plan 9 garbage (like Go's hand-rolled assembler) brought in to underpin Go from the 80s ;)


Rust is also basically just LLVM in terms of optimization, which looks like it had it's initial release in 2003.


> Except of course all the Plan 9 garbage (like Go's hand-rolled assembler) brought in to underpin Go from the 80s

This is unfair criticism. If Go had used LLVM, it would affect its selling point (fast compile times) and authors knew the plan 9 toolchain well.

Go the language feels like it is from 80s. But its toolchain is not at all bad. LLVM monoculture is the last thing one would want. Obligatory reminder that LLVM has its flaws too..


Also when this LLVM stuff doesn't work, it's a major pain to troubleshoot because all this is a complexity monster. Go and its Plan 9 heritage is more like 80s retro future and things like cross-compiling are super easy


In my opinion, compile times are irrelevant. Developers can always get faster or larger machines if they need them. What matters is how the final product performs on customer machines. Performance and memory usage of the final product, plus your ability as an engineering team to avoid costly and difficult mistakes are basically the only thing that matters.


> Developers can always get faster or larger machines if they need them.

Those who can get top notch hardware are already on top notch hardware and those who can't are limited by management decisions.

Even with top notch machines, compile times matter. Because the difference are huge in compile times of C++ and Go in moderately complex projects with bad build systems that are the norm.

> What matters is how the final product performs on customer machines.

Apparently today's developers value fast iteration speed and that's fine. The problem is they don't value user resources because they have top notch machines and don't care about performance.


Note that using bytes is a fundamentally different implementation that will produce different results on non-ASCII input. Using codepoints (or "runes") will better approximate edit distance based on visual characters. (And grapheme clusters would be even better. Although one could put the text in composed normal form to get more mileage out of the rune based algorithm.)


I recently did some experiments with creating small static Rust binaries, custom linking, no_std et cetera. A lot of stuff around that kind of thing is unstable or unfinished, which might be somewhat expected. But I’ve also come to the conclusion that Rust relies on libc way too much. That might be fine on Linux, where GNU’s libc is well-maintained, is a bit questionable on MacOS (as seen in this article) and is a a complete distribution nightmare on Windows (in no small part due to a series of very questionable decisions by Microsoft).

My understanding is that Go doesn’t use the libc at all and makes system calls directly, which IMO is the correct decision in a modern systems programming language that doesn’t want to be limited by 40 years of cruft.


As far as I know, the official system interface on Windows and several Unix systems is via the standard library, not via direct syscalls. I don't know about the MacOS. But in general, you may be required to dynamically link the standard library on many platforms.

Linux guarantees syscalls are stable. And on Linux, you have the option of telling Rust to cross-compile using a statically-linked musl-libc. (If you also need to statically link OpenSSL or a few other common libraries, I maintain https://github.com/emk/rust-musl-builder, and there's at least one similar image out there.)


macOS requires you to make syscalls through libSystem if you want a stable interface. Go binaries used to make direct syscalls until 1.10. Since this caused major breakage on most new macOS releases, they have since switched to using libSystem as well in 1.11:

> On macOS and iOS, the runtime now uses libSystem.dylib instead of calling the kernel directly. This should make Go binaries more compatible with future versions of macOS and iOS. The syscall package still makes direct system calls; fixing this is planned for a future release.

Source: https://golang.org/doc/go1.11


> I don't know about the MacOS.

MacOS literally forbids statically linking to libSystem.

Go finally had to bow down and accept that they just could not perform raw syscalls on MacOS after gettimeofday (IIRC) changed ABI multiple times during the Sierra beta.


AFAIK on Windows, the hierarchy is:

C library => kernel32.dll => ntdll.dll => system calls

You don’t have to go via the C library - calling kernel32 directly is fine (I believe this is what Go does). However, it’s very rare to call ntdll or to make system calls directly.


To expand on that, on Windows it's best to use kernel32 (or WindowsApp) unless you really need the cross-platform convenience of the C lib. The exception being architecture dependent functions like memcmp, memcpy and memset which will most likely have efficient assembly implementations.

ntdll is lower level and technically unstable but core functions have been pretty stable for a long time. They could of course have breaking changes but it risks breaking things like cygwin. Microsoft tends to take compatibility seriously, although perhaps not as much as they used to.

Direct system calls are completely unstable. They can and do change a lot between releases. You'd need to use a lookup table for every build of Windows.


Basically yes. ntdll is an implementation detail that shouldn’t be relied upon. kernel32/user32 and friends are considered the “proper” interface to the system and have been stable for decades.


There are some ntdll calls that are officially documented and ok to use. Of course there are also a lot of calls you shouldn't use.

When necessary, it's fine to use even undocumented ones to support Windows 7 and older. It's not like those are going to change anymore.


> it’s very rare to call ntdll or to make system calls directly

Until you need to do something that's not possible through kernel32.dll. Sometimes I've called ntdll.dll directly to support older Windows versions.


I'm guessing the performance benefits are negligible anyway?


Not sure why I was downvoted for curiousity


That’s partly true, but the stable interface on Windows is not the libc, but the kernel32/user32 functions like VirtualAlloc, CreateFileW etc. Those are stable since the NT days. The libc functions like malloc and fopen are a compatibility layer above that and unfortunately switch places every few years. Currently they are delivered by a combination of a Windows update and a redistributable package, which makes it a nightmare to ship (on pre-Windows 10 even more so).


Do you really need to ship them? I thought libc (CRT?) in Windows was a given, and what used to be redistributed was only the C++ ones. Is not that the case?


Thanks for the correction, Microsoft calls it the C Runtime/CRT. It‘s unfortunately complicated, and I’ll completely ignore static linking, which is possible, but not supported in many scenarios involving DLLs.

It used to be the case that the CRT shipped with Windows (msvcrt.dll). That file is now considered legacy/deprecated and is no longer supported by current compilers. For several years after that, you always had to redistribute the CRT (msvcrtXXX.dll), even for pure C support.

The current state of affairs is that the CRT is split into several files, some of which come with Windows update (the so-called UCRT) and some of which are compiler-specific and have to be redistributed. C++ std support requires yet more files.

This document gives an overview: https://docs.microsoft.com/en-us/cpp/c-runtime-library/crt-l...


Thanks a lot! It indeed seems complicated...

I understand standards evolve and that they want to modularize stuff, but in the case of C the majority programs will only use the basics of the C library.


> But I’ve also come to the conclusion that Rust relies on libc way too much.

How did you come to this conclusion?

People using Rust rely on libc a lot.

For example, #![no_std] means "no standard-library", but it doesn't mean "no libc, no libunwind, etc.".

So a lot of people like to advertise their crates as "#![no_std]" compatible, because they compile with #![no_std], but then the first thing the crate does is linking against libc, against liballoc, against libm, .... or even the standard library itself...

So... if you are trying to build a binary that does not depend on libc or libstd, then there is no language feature to help you there.

#[no_std] binaries are not only unstable, but also probably not what you want, since that won't prevent you from linking any library that links libc, or the standard library, etc.

If you want to avoid libc, you have to enforce that, e.g., by checking in your build system that your binary doesn't contain any libc, libstd, etc. symbols (I just use nm for this). #![no_std] helps a bit here, but making sure you don't link any library that violates this is up to you.


To me, a #![no_std] library is useful for contexts where there isn't a platform libc (embedded systems, kernels, etc.) and you can't assume things like the existence of stdin or processes. On those systems, there may still be a dynamic allocator, because that's a super common feature in all but the most spartan environments. For those use cases, a #![no_std] library that links liballoc (and documents that it needs it) is totally okay, and often better than a more-limited library that only does static allocation - or conversely implementing all of libstd with runtime panics for most of it, just because your library needs to pass a Vec around. It's a balance.

The only case I think I've seen where a #![no_std] library ends up pulling in libc is if you haven't added a custom allocator and your platform's default allocator uses libc (and so you could switch to a libc-free allocator if you want). Are there other cases?


There are lots of tiny practical annoyances you notice, starting with having to vet dependencies and transitive dependency upgrades extremely carefully. A major one is that dev-dependencies are very broken in a no_std environment. It’s obviously useful to have the std in your build.rs script, but unfortunately, Cargo merges dependencies and devDependencies in a way that’s simply incorrect.

The good news is that every problem I found had an actively discussed GitHub issue and the community is active, so there will be progress.


> unfortunately, Cargo merges dependencies and devDependencies in a way that’s simply incorrect.

Yeah, I've run into that, but IIRC this got solved super recently/is being solved. Probably https://github.com/rust-lang/cargo/pull/7820 ?


> To me, a #![no_std] library is useful for contexts where there isn't a platform libc (embedded systems, kernels, etc.) and you can't assume things like the existence of stdin or processes.

"Could be useful for" (FTFY).

Unfortunately, for the reasons mentioned above, most #![no_std] libraries aren't useful for that, because they link libc and other libraries.

Most people don't do this intentionally. The compiler doesn't complain about:

    #![no_std]
    extern "C" { fn malloc(...) -> ...; }
so when somebody does it accidentally, everything works and they get no feedback.

When you then go and build a #![no_std] binary, and run it in a platform when libc is not available, only then you get an error. And at that point, good luck figuring out which of your 200 dependencies has the error.

In particular, if you are running `#[test]`s, doing that links standard, so if your program implicitly depends on libc somewhere, tests won't reveal that, because while testing, libc will be linked.


No, most #![no_std] libraries do not link libc in the way you describe (having a direct FFI dependency on malloc) - they link liballoc, which has a pluggable allocator interface. On some platforms and some configurations (including most normal userspace platforms), the default allocator used by liballoc uses malloc.

It's actually hard to go out of your way and call malloc directly, because FFI calls are unsafe. It's a lot easier to use Box/Vec/String/etc., all of which are defined in liballoc and use the pluggable allocator interface.

I know this because I've successfully used #![no_std] libraries in places where libc doesn't exist and no function named malloc exists, and they do work. If you're having a linker issue it's almost certainly because you haven't changed the default allocator - if you have an example of this I'd be happy to take a look at debugging it.


> and they do work.

Maybe you are just using different no_std libraries that I am using, but pretty much all of the no_std libraries that I use have `libc` as a direct dependency.

Not only for malloc, many of them just needs to write something to stdout or a file, generate random numbers, ..., and that's hard to do without libc. Why they advertise themselves as no_std escapes my comprehension.


Huh, the libc crate itself claims #![no_std] support. I agree that's confusing and counterintuitive... I see what it means in terms of semantics but I don't understand why that's useful.


The libc crate does correctly claim #![no_std] support, because #![no_std] means "Does not require the Rust standard library", and libc does not require it.

The two main issues I see with #![no_std] are:

* its a flag for "doesn't link the standard library" but the standard library is often too high-level for bare metal apps that want to be in precise control about everything that gets linked

* it isn't a contract of any kind, so you can't rely on it for anything. In fact, this code is correct and works as intended:

    #![no_std]
    extern crate std;
This is problematic, because many #![no_std] libraries end up linking libstd "by accident", even though that's probably not their intent.

So I agree with you that #![no_std] isn't a silver bullet. I think it is still useful in that it lets you avoid linking the standard library by default, which is necessary condition for embedded development. It is not a sufficient condition, in that in practice you actually want to forbid the standard library and other libraries from being linked, and not just "don't link them by default, but do so by accident".


That’s true. But going the no_std route is very hard (the ecosystem isn’t big, and relying cargo and crates.io you’re almost guaranteed to link in the std or liballoc by accident at some point). Even when using the std intentionally, I really wouldn’t have expected that basic std functions like println require the libc.


I would not necessarily expect it but I appreciate it in a "systems" language where "systems" is defined as compatible with the existing traditional systems software on a machine. For example, it's nice if the stdbuf(1) command works on Rust binaries on glibc systems, and it's nice if a Rust binary calling a C library that writes with printf (or vice versa) don't maintain two separate output buffers.

To me, Go is the systems programming language for a world untethered by existing platform compatibility (and so, for instance, writing Go libraries to be called from C is awkward, calling C libraries from Go incurs overhead, Go has its own concurrency model, etc.) and Rust is the systems programming language for use cases where you'd otherwise want to use C (really "the platform's native systems language," but that's C on all the major platforms) but you want a better language. I appreciate that they both exist and target these different use cases.


Is Go widely used compared to the others as a systems language at this point?

Over the years I’ve gathered that it’s more of a competitor for C# & Java rather than Rust & C(++).


Depends what you mean by "systems language" (many common definitions turn out to be equivalent to "drop-in replacement for the platform language," which makes the argument circular), but the choice of Go as an implementation language for Docker and Kubernetes puts it pretty firmly in the "systems language" space IMO. Container management requires more systems-level functionality than C# and Java are really geared towards (although you certainly could use them, perhaps with a tablespoon of FFI).


The original version of Docker was written in Python, and the original version of Kubernetes was written in Java, so your argument doesn't hold.


Would this be the part of my argument that doesn't mention Python (which I personally consider a systems language) at all? Or the part of my argument where I say, "you certainly could use them"?


It just means that golang isn't anything special when it comes to writing tools like these, and is in the same ballpark as python and Java (it's somewhere in between the two of them, better than python but worse than Java), and you'd have to use the same escape hatches in golang as them to implement those tools.


Well, writing to standard output requires interacting with an operating system, so it's not surprising that it requires `std`. You cannot write to standard output without knowing what operating system you are compiling for.

Worth noting however that `writeln` only needs `core`, and as long you can provide an implementation of writing to standard output (pretty much create a struct and implement `core::fmt::Write` trait for it), creating `println` macro is trivial.



> [...] relies on libc way too much. That might be fine on Linux [...]

> My understanding is that Go doesn’t use the libc at all and makes system calls directly

Actually, the only system on which it's fine to "not use the libc at all and make system calls directly" is Linux. On MacOS, Windows, and most non-Linux Unix-like systems, you must to go through the libc or its equivalent (which on Windows is kernel32.dll and/or ntdll.dll), since the system call interface is unstable (the libc or its equivalent is distributed together with the kernel, so it's updated whenever the system call interface changes).

AFAIK, Go tried for a while to use system calls directly on MacOS; after being broken several times by operating system updates, they gave up and now go through the shared libraries like everyone else. They still insist on using direct system calls on Linux, where it works mostly fine (except for things like the DNS resolver, in which AFAIK they try to guess whether they can do directly network DNS requests, or have to go through libc; and that one time in which Go's small stacks conflicted with a misoptimized kernel VDSO using too much stack).


The problem with this is that not every system makes their system call ABI stable. You have two choices here: use the interface that is stable (which is libc), or track changes and fix things up when they break.


The only stable interface on Windows are the documented functions from kernel32.dll, user32.dll etc. Libc is a compatibility layer above that, that Microsoft invents a new incompatible distribution mechanism for every 3-5 years. It’s pure DLL hell unfortunately.

Edit: Not even the DLL name of Microsoft’s libc is stable (msvcrt140.dll etc.), leading to all kinds of wild goose chases when trying to run old binaries.


Yes, I was being imprecise wrt Windows, thanks :)


Yeah, I saw your username too late or I would have known that you know that. ;) But it’s a common misunderstanding among many Unix programmers, so I feel it was good to clarify.


Nah I think it's helpful! Comments are read by a lot more folks than just the person you're replying to. I knew it wasn't exactly libc but I didn't know the full hierarchy involved.


Go’s pathological NIH syndrome does come with downsides. For example, there was an infamous memory corruption bug in the way they called into vDSO.


> Go’s pathological NIH syndrome does come with downsides.

Yes. And you can add the inability to use the glibc's nss modules under Linux.

Making it unable to use sssd properly and authenticate a posix user on a machine with LDAP authentication.

Getting completely independent from OS sys lib has consequences


This is not accurate. [1]

When compiled on Linux for Linux, Go will use libc and natively call NSS.

When cross-compiling to Linux from another system, Go requires (mostly) CGO to be disabled and a subset of NSS will be implemented in Go. Native NSS modules will not work.

[1] https://github.com/golang/go/issues/24083


I think Go tries to reduce its dependence on libc but, by default, it will still link to it.

For instance, this code:

  package main
  import "net"
  func main(){
    net.Dial("tcp", "golang.org:80")
  }
When compiled with go build main.go does link:

  linux-vdso.so.1 (0x00007ffe3d7f0000) 
  libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fc7ac05a000)
  libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fc7abc69000)
  /lib64/ld-linux-x86-64.so.2 (0x00007fc7ac279000)   
There are of course compiler options to truly statically compile.


This is simply false. Windows APIs are stable. It's one of the best platforms for back compatibility. The kernel32/user32 API has been stable for 20 years. Rust has some work to do.


That’s exactly my point: The problem is that Rust doesn’t use the stable kernel32/user32 functions (VirtualAlloc etc.), but the libc ones, which don’t have a stable location on Windows. Without looking it up: Which DLL do you have to redistribute this week to get “malloc”?


I just tell Visual Studio to create an installer, I select the set of target platforms, and it just works. See, for example: https://docs.microsoft.com/en-us/cpp/ide/walkthrough-deployi...


The main problem is that allocating a vector for each evaluation is completely wrong: instead, it needs to be allocated once by making the function a method on a struct containing a Vec; which makes the allocator moot.

The second problem is that at least the Rust code is decoding UTF-8 every iteration of the inner loop instead of decoding once and saving the result, or even better interning the characters and having versions of the inner loop for 32-bit chars and 8-bit and 16-bit interned indexes.

Furthermore the code rereads cache[j] instead of storing the previous value, and doesn't do anything to make sure that bound checks are elided in the inner loop (although perhaps the compiler can optimize that).

The code for computing the min seems to have been written mindlessly rather than putting serious thought towards whether to have branches or not and in what order (depending on an analysis of what the branch directions rates would be).

Implausible benchmark results are almost always an indicator of the incompetence of the person performing the benchmark.


> The second problem is that at least the Rust code is decoding UTF-8 every iteration of the inner loop instead of decoding once and saving the result

Indeed. This is a pretty damning difference. The `target` string is being repeatedly UTF-8 decoded where as the same is not true in the Go version. The Go version even goes out of its way to do UTF-8 decoding exactly once for each of `source` and `target`, but then doesn't do the same for the Rust program.

> Implausible benchmark results are almost always an indicator of the incompetence of the person performing the benchmark.

Come on. We can do better than this. Please don't make it personal. We all have to learn things at some point.


> Indeed. This is a pretty damning difference. The `target` string is being repeatedly UTF-8 decoded where as the same is not true in the Go version. The Go version even goes out of its way to do UTF-8 decoding exactly once for each of `source` and `target`, but then doesn't do the same for the Rust program.

I'm really not sure that's an issue, utf8 decoding is very, very cheap and it's iterating either way.

It would have to be benched, but I wouldn't be surprised if allocating the caches (at least one allocation per line of input) had way more overhead, especially given the inputs are so very short.

I'm not going to claim Rust's utf8 decoder is the fastest around, but it's very fast.


It's cheap, but not _that_ cheap. It shouldn't be as cheap as just iterating over a sequence of 32-bit integers.

But yes, I did benchmark this, even after reusing allocations, and I can't tell a difference. The benchmark is fairly noisy.

I agree with your conclusion, especially after looking at the input[1]. The strings are so small that the overhead of caching the UTF-8 decoding is probably comparable to the cost of doing UTF-8 decoding.

[1] - https://github.com/christianscott/levenshtein-distance-bench...


> It shouldn't be as cheap as just iterating over a sequence of 32-bit integers.

I wonder if there are any benchmarks about this? Specifically, it feels like in theory iterating utf8 could actually be faster if the data is mostly ascii, as that would require less memory bandwidth, and it seems like the computation is simple enough for memory to be the bottleneck (this is a wild guess, I have horrible intuition about speed of various hardware things). In this particular benchmark this reasoning doesn’t apply, as strings are short and should just fit in cache.


If all you need to do is validate UTF-8, then yes, mostly ASCII enables some nice fast paths[1].

I'm not a UTF-8 decoding specialist, but if you need to traverse rune-by-rune via an API as general as `str::chars`, then you need to do some kind of work to convert your bytes into runes. Usually this involves some kind of branching.

But no, I haven't benchmarked it. Just intuition. A better researched response to your comment would benchmark, and would probably at least do some research on whether Daniel Lemire's work[2] would be applicable. (Or, in general, whether SIMD could be used to batch the UTF-8 decoding process.)

[1] - https://github.com/BurntSushi/bstr/blob/91edb3fb3e1ef347b30e...

[2] - https://lemire.me/blog/2018/05/16/validating-utf-8-strings-u...


Did a quick benchmark: https://gist.github.com/matklad/ebc1acd2fab884f3ff9d96d4175f....

Seems like hypothesis is not wrong, but also is not interesting: the difference is pretty small, and it can be easily dwarfed by the big wins for utf32 if something like auto-vectorization or bounds-check elision kicks in. Also I am not entirely sure that the diff I observe is not due to something completely irrelevant, like one loop ending up on a lucky cacheline or what not.


> The second problem is that at least the Rust code is decoding UTF-8 every iteration of the inner loop instead of decoding once and saving the result, or even better interning the characters and having versions of the inner loop for 32-bit chars and 8-bit and 16-bit interned indexes.

I tried this. Pulling the .chars() call out of the loop & collecting into a Vec made the performance even worse – the following balloons the runtime from ~2.7s to ~5s:

    let target_chars: Vec<char> = target.chars().collect();
    for (i, source_char) in source.chars().enumerate() {
        let mut next_dist = i + 1;

        for (j, target_char) in target_chars.iter().enumerate()  {
> written mindlessly > incompetence of the person

No challenge there :P I am operating under the assumption that I don't need to understand how compilers work to get good performance from rust (where good is "similar enough to an equivalent go program")


Indeed, it looks like doing the UTF-8 decoding up-front is exacerbating the performance difference in the allocator.

I think this is where the GP's first suggestion comes into play. If one were writing this code _and_ cared about performance, then you'd usually find a way to reuse allocations. I submitted a PR to demonstrate this: https://github.com/christianscott/levenshtein-distance-bench...


Yeah, I can definitely see how that would be a more performant approach.

I suppose I wasn't so interested in figuring out how to make this algorithm as fast as possible as much I was interested in diving into why this particular implementation was slower.

I'm not totally convinced that this difference is down to the string being parsed over and over, though

> doing the UTF-8 decoding up-front is exacerbating the performance difference in the allocator

This seems to suggest that allocation might be dominating here. WDYT? Either way, I've added a disclaimer to the post.


> Implausible benchmark results are almost always an indicator of the incompetence of the person performing the benchmark.

An ad hominem attack surely isn't needed.


And that's really too bad because I think the rest of the comment is spot-on as the fundamental issue is that Rust simply delegates to the system allocator, and MacOS's allocator is pretty slow.

As a result, while Rust allows very explicitly and relatively easily removing allocations (compared to C or C++), getting the most performances out of your program also requires doing so, unless you use a non-system allocator with better support for the workload.


But you can use a custom allocator in Rust: https://doc.rust-lang.org/1.9.0/book/custom-allocators.html

You don’t have that option in Go


> But you can use a custom allocator in Rust: https://doc.rust-lang.org/1.9.0/book/custom-allocators.html

That is true — and was demonstrated by the article as its "fix" was to use jemalloc, but even a custom allocator will usually be less performant than a high-performance GC there, because the GC's allocator has more insight into the requirements and workloads.

It might be possible to give that insight to a custom allocator by using the allocator's custom APIs, but this requires a deeper integration between the program and the allocator.


(those are very old docs, you want to link to https://doc.rust-lang.org/stable/std/alloc/trait.GlobalAlloc...)


> You don’t have that option in Go

Sure you do. You can even build one from nothing but mmap in pure Go. It just won't be part of the garbage collection, so you get malloc/free or arenas etc, just like in C/Rust/whatever.


Missed chance to shorten title to "Making Rust Go Fast" :-)


Making Rust As Fast As It Can Go


Making the code rust as fast as it can go


Go Rust!


GO!!!!


Is it intentional that the benchmarks include not only running the program itself but also compiling it? e.g. in the linked source code, the bench.sh includes the compilation step which is known to be slow in rust:

    #!/usr/bin/env bash

    set -e
    run() {
      cargo build --release 2> /dev/null
      ./target/release/rust
    }

    run;
Sure, if you run it many times in succession the compiler won't do much but the benchmarking script (run.js) doesn't really indicate that and the blog post also doesn't mention that.

EDIT: I was just being stupid, don't mind me. The times were taken within each language and not externally.


run.js is not doing the benchmarking. If you look at the source for each of the programs being benchmarked, you'll see that the programs themselves are responsible for benchmarking


Could also try to use the smallvec crate in this case, which put small allocation on the stack https://docs.rs/smallvec/


There's a bunch of issues with the Rust implementation, not least that the initial condition where source or target lengths are zero, it returns the number of UTF-8 bytes of the other, but all other computations are performed in terms of Unicode chars -- except at the end: `cache[target.len()]` which will return the wrong value if any non-ASCII characters precede it.

Further, each time you call `.chars().count()` the entire string is re-enumerated at Unicode character boundaries, which is O(n) and hardly cheap, hence wrapping it in an iterator over char view.

Also, re-implementing std::cmp::min at the bottom there may well lead to a missed optimization.

Anyways, I cleaned it up here in case the author is curious: https://gist.github.com/martinmroz/2ff91041416eeff1b81f624ea...


I'm surprised that a naive implementation in Go can outperform a naive implementation in Rust.


I’m not. Hell, when I first started learning rust i frequently wrote code that ran slower than _python_.


I tried this on a spare time project[1]. Runtime in a quick test went down from 14.5 to 12.2 secs on macOS!

So a solid ~15% by changing the allocator to jemalloc.

However, I now have a segfault w/o a stack trace when the data gets written at the end of the process.

Possibly something fishy in some `unsafe{}` code of a dependent crate of mine that the different allocator exposed. :]

Still – no stack trace at all is very strange in Rust when one runs a debug build with RUST_BACKTRACE=full.

[1] https://github.com/virtualritz/rust-diffusion-limited-aggreg...


I have found that jemallocator is currently broken on macOS Catalina, so that might be the problem. If you can reproduce this issue reliably, I'd love to hear about it because I can't myself unless I use a very specific toolchain that produces -O3 binaries that are a real pain to work with.


It's 100% reproducible. Just check out the previous to last commit on master on the github repo I linked to and run the tool with any command that invokes the nsi crate.

Eg.:

   > rdla dump foo.nsi
should produce the segfault before exiting the process.

Is there a jemallocator ticked where to attach a report for this?


Thanks! I think you found the jemallocator bug, so I'll try your project and follow up there.


I think that bug may be fixed in jemalloc by now. The version of jemalloc that the jemalloc-sys crate tracks is from two years ago. I tried bumping jemalloc to latest master but that makes the jemalloc-sys build fail (trivially with a missing script but still).

Also there is this: https://github.com/gnzlbg/jemallocator/issues/136


Impressed to see four posts about Rust on the front page of HN simultaneously.


I’ve found that Microsoft’s mimalloc (available for Rust via https://crates.io/crates/mimalloc) typically provides even better performance than jemalloc. Of course, allocator performance can vary a lot depending on workload, which is why it’s good to have options.


This discussion seems to me like a microcosm of the differences in philosophies between Rust and Go.

With Rust, you have much more control, but you also need a deep understanding of the language to get the most out of it. With Go, the way you think it should work is usually is Good Enough™.


I wouldn't put it that way. Both languages are fast at nice straight-line code.

The main area I'd expect to see performance benefits for rust (though I don't have experience here) is larger rust programs. Rust's zero-cost abstractions have more benefits as the abstractions nest more deeply. For a small program, you don't really have a lot of abstractions, so Go will do just fine.

I think Go has a number of nice performance tricks up it's sleeve, though, so I wouldn't rule out Go on performance grounds too quickly.


It may be that the system allocator is making an excessive number of syscalls to do work, whereas most custom allocators will allocate in slabs to avoid this. You could try using dtruss or strace to compare the differences.


A few folks have commented that there were logic errors in the Go version. Specifically that

  len("föö") = 5
should instead have returned

  len("föö") = 3
I submitted a pull request, https://github.com/christianscott/levenshtein-distance-bench..., that fixes these issues in the Go implementation.

Interestingly enough, when I re-ran the benchmark, the Go version is roughly 19% faster than it was previously:

  old: 1.747889s
  new: 1.409262s (-19.3%)



FreeBSD's system allocator is jemalloc :-).


TLDR for people who didn't read:

The speed difference came from the allocator.

Rust switched from jemalloc to the system allocator per ticket #36963[0] for various reasons (like binary bloat, valgrind incompatibility, etc...).

Go uses a custom allocator[1] instead.

To make 'Rust Go fast' (pun intended), one can use the '#[global_allocator]' to use a custom allocator (in this case, with the jemallocator crate) to make allocations fast again.

[0]: https://github.com/rust-lang/rust/issues/36963

[1]: https://golang.org/src/runtime/malloc.go


The comments of Rust programmers here also suggest that the Rust implementation is, indeed, different from the Go implementation.


It was just a summary of the post contents - the post suggests that the biggest difference comes from the allocator.


Assuming this was run on a 64bit system, the Rust version seems to be allocating and zeroing twice as much memory as the Go version.

edit: this has been pointed out as incorrect, Go ints are 8 bytes on 64bit systems -- thanks for the correction!

  let mut cache: Vec<usize> = (0..=target.chars().count()).collect();
which can be simplified as

  let mut cache: Vec<usize> = vec![0; target.len()];
vs

  cache := make([]int, len(target)+1)
  for i := 0; i < len(target)+1; i++ {
    cache[i] = i
  }
Rust usize being 8 bytes and Go int being 4 bytes as I understand it.

So between doing more work and worse cache usage, it wouldn't be surprising if the Rust version was slower even with the faster allocator.


Go int is 8 bytes


It can be either depending on the system.

https://golang.org/ref/spec#Numeric_types




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: