
Update on Go 1.1 crypto performance - jgrahamc
http://blog.jgc.org/2013/04/update-on-go-11-crypto-performance.html
======
wereHamster
Not really relevant for hashes, but Go also has native implementation of
selected ciphers (eg. AES). Are those implementations constant-time? If not
then they are vulnerable to timing attacks. People have put a lot effort into
making OpenSSL ciphers constant-time. This effort was pushed by people working
at Google. This guy works at Google and regularly posts about crypto stuff
they are working on: <http://www.imperialviolet.org/>.

~~~
jgrahamc
The blog you link to belongs to Adam Langley who works on Go's crypto stuff
(he's user agl here). Go has a rather nice crypto/subtle package that contains
constant-time implementations of primitives.

<http://golang.org/pkg/crypto/subtle/>

~~~
enneff
Adam doesn't just work on the Go crypto packages. He is almost solely
responsible for writing them.

~~~
jgrahamc
I was channelling Rob Pike when he described Ken Thompson as someone who had
"done some Unix stuff".

~~~
enneff
I'm sure Adam would appreciate the understatement, but IMO he deserves more
kudos than I can possibly give. :-)

------
dchest
This post should be called "Go 1.1 _old broken_ crypto performance" ;-) If
you're designing a new system, there's no sane reason to use MD5, SHA1, or
RC4. These performance improvements, though, are good for the native TLS
implementation.

There are some small improvements to SHA-2, though, in Go 1.1.

~~~
carbocation
> This post should be called "Go 1.1 old broken crypto performance" ;-) If
> you're designing a new system, there's no sane reason to use MD5, SHA1, or
> RC4.*

I think your post is being downvoted because it seems to conflate
_cryptographic primitives_ with _tools that you can use without modification
to safely store a password_.

~~~
dchest
I ported scrypt to Go
([https://code.google.com/p/go/source/browse/scrypt/scrypt.go?...](https://code.google.com/p/go/source/browse/scrypt/scrypt.go?repo=crypto)),
so I know how to store passwords ;-)

This has nothing to do with passwords. You can read about brokenness of RC4,
MD5, or SHA1 in their respective Wikipedia articles.

~~~
carbocation
First, thanks for your contribution. Second, I didn't downvote you. Third, the
question was never whether you knew how to store a password. There are many
reasons to need to be able to use even 'broken' primitives (esp. for
compatibility with legacy apps or conversion), which makes them valuable (or
at least necessary) for some apps. This is why I was guessing that your post
was focused on passwords.

~~~
dchest
I understand. I also said that there's no reason to use them for _new_
systems.

BTW, suggested reading: <http://cr.yp.to/talks.html#2013.03.12> ("Failures of
secret-key cryptography").

~~~
carbocation
Gulp. You did; I missed it! My apologies.

And thanks for that link. Slide 313 is terrifying.

------
Jabbles
What did the Go team change to do this? Compiler optimizations or code
changes? SIMD instructions?

~~~
jgrahamc
AMD64 and 386 assembly implementations. My blog post links to the actual
changes.

~~~
drivebyacct2
Oh, I assumed some of it was from the GC/scheduler work, or mostly just the
asm impls?

~~~
dchest
While not related to these ASM implementations, the code generator for Go code
has also been improved: e.g. it now can generate rotatation instructions,
which helps SHA-2 and some other crypto primitives.

~~~
kzrdude
Yup that's absolutely essential for most block ciphers and similar crypto
(i.e. hash functions).

------
minopret
My understanding is that all three of these algorithms MD5, SHA1, and RC4 are
inherently difficult to parallelize. The result cannot be computed by
splitting the input, processing the pieces separately, and then combining the
results. So they would not get the benefit of using "goroutines" rather than
threads, a benefit which is important to the performance of many Go programs.

~~~
enneff
Goroutines are not primarily about performance, they are for modelling
concurrent processes in a natural way. This can have a performance in some
cases, but I wouldn't classify goroutines the way you have.

~~~
rubinelli
Excuse my ignorance, but could you use goroutines with different algorithms
and pick the one that finishes first? Perhaps not in the case of MD5/SHA1, but
for problems that have some algorithms that are very fast most of the time,
but have a pathological worst case?

~~~
numbernein
Go has no way to abort a goroutine so if you tried multiple algorithms you
would use total CPU time for all algorithms. Unlike actual threads Go gives no
assurance that goroutines are preemptively scheduled, so doing so might take
worst case time anyway. Even worse, there is no guarantee that new goroutines
run next so doing so could introduce a scheduling delay.

So basically it would be a terrible idea to do this in Go. With C/pthreads it
might have some use in some really exceptional case.

These are all reasons of course why goroutines are a bad idea. They should
have just used threads, but apparently they wanted to support 32-bit
architectures or some archaic OSes where threads have a high unit cost.

~~~
enneff
Gccgo did use threads for each goroutine for a while, and it was significantly
slower than the gc "green thread" style implementation (used by both compilers
today). Goroutines weigh in at about 4k each. OS threads are 64k minimum. That
allows more than an order of magnitude more goroutines than threads, from a
memory perspective. From an execution standpoint, goroutines managed by the Go
runtime allow some nice scheduling optimisations, as the runtime is aware of
the communication between goroutines. (some of these optimisations are in Go
1.1)

Finally, there is no reason goroutines can't be preemptively scheduled. It is
just an implementation detail, and work has already begun on this front.

~~~
teenth
Gccgo didn't have a garbage collector for a while, around the same time it was
using threads. Was it even using a pool? Did it run goroutines for a short
time on the current thread before moving to its own thread? Certainly not.

Thread overhead in Linux is 8k, only slightly more than goroutine. It's 24k
for windows. You guys don't even know basic facts that should have guided the
design.

~~~
enneff
Really? Can a Linux process spawn 1m threads?

I didn't work on the design. If I made a mistake that's my fault. The people
that did design Go and implement the runtime know what they're doing. You
didn't respond to my other points, either.

~~~
teenth
Yes a Linux process can spawn 1m threads, and testlimit64.exe spawned 250k
threads both with 64k user stack and 1 MiB user stack... because that is just
reserved address space if it isn't actually used. Only the kernel overhead
matters to how many threads.

I'm not sure what point you want addressed... scheduling optimization? It's
also a deoptimization in other ways, such as predictable latency and fairness.

I agree the designers knew what they were doing: designing a 'modern' language
for 32-bit computers. The question is why?

~~~
enneff
Even given that what you say is true, these are all implementation details and
have nothing to do with the design of the language.

~~~
teenth
An implementation is what you have to use, you can't use 'the language
design'. When you have to use a special build system to generate Go -and- C
stubs to call out, can't use a standard linker or partial compilation or
shared libraries, have a massively large exe, can't embed in other programs
(Go has to be 'main'), can't use with SELinux, has huge unpredictable latency
spikes, etc it'll be cold comfort that those problems are 'just implementation
details'.

It's too bad they had NIH or thought 32-bit was the future and didn't stop at
just creating a language.

~~~
cmccabe
I don't understand any of your criticisms. What else would you expect to use
to compile Go besides "a special build system"? A not-special build system?
It's a compiled language. You can, however, use gccgo to link Go code with
C/C++ code.

Both gccgo and cgo already use partial compilation. You can invoke the 6l, 8l,
etc. commands directly if you really want to. One of the design goals was
build speed. Shared libraries are on the roadmap. See issue
<https://code.google.com/p/go/issues/detail?id=256>.

Go does _not_ "have to be main". You can use gccgo to mix Go code into your
existing C or C++ binary. See <http://golang.org/doc/install/gccgo>

What the hell does "can't use with SELinux" mean? Are you talking about issue
871 that was fixed in 2010?

You keep repeating over and over that Go is optimized for 32-bit systems.
Repeating something doesn't make it true. In fact, exactly the opposite is
true, however. Go uses a lot of address space, which is great on 64 bit, not
so good on 32-bit. This has been discussed a lot years ago:
<http://lwn.net/Articles/428100/> is a good place to start.

I've seen you repeat over and over in multiple discussion threads that having
lots of kernel threads is no big deal. What you don't seem to realize is that
in C/C++, thread overhead is somewhere between 1 MB and 2 MB a thread if you
want to use pthreads and glibc and avoid random data corruption resulting from
thread heap collisions (hint: you do). Linux also doesn't have an O(1)
scheduler any more in mainline; it uses the completely fair scheduler (CFS),
which has complexity O(log N). There are disadvantages to green threads, but
to even start discussing this requires a lot more background than you have.

Again, you can read about any of this on wikipedia, LWN.net, or even the
replies that inevitably get made to all of your posts on HN. So learn already.

