
How the Go runtime implements maps efficiently without generics - kjeetgill
https://dave.cheney.net/2018/05/29/how-the-go-runtime-implements-maps-efficiently-without-generics
======
kjeetgill
If you're familiar with how go does interfaces it's kind-of reminiscent of how
they do maps. A variable that is of an interface type has 2 parts: And pointer
to the Itable mapping functions in the interface to their implementations in
hard type, and a pointer to data of that instance of the hard type. The
creation and assignment of that Itable is invisible at user level and baked in
at compile time (with reasonable but irritating consequences like type nils).

Similar to an Itable, they're injecting a type descriptor table (maptype) in
the function parameter that can get things like the size of a key or value,
alignment, kind, etc.

I had two questions: A) why can't hmap struct keep a pointer to the maptype?
it shouldn't vary per callsite right? B) does this count as generics?

It's strange. Like a half erased, half reified generics. It seems erased in
that the hmap itself doesn't carry the type information but it's reified at
each callsite?

~~~
ainar-g
>B) does this count as generics?

I think "generic programming" is more about generic _algorithms_ than it is
about data structures. Because without generic functions, generic data
structures are less useful.

Can you create a new map type in Go? Yes. Can you create a _function_ that
works on _any_ map? Only with reflection, which you should consider your last
measure.

~~~
jimmy1
> Can you create a function that works on any map? Only with reflection

Most experienced go developers at this point would template the function and
generate it for any type they might need it for, before they reach for
reflection.

~~~
paulddraper
> template the function and generate it for any type they might need it for

How? This is C++ generics. I thought Go didn't have generics?

~~~
ainar-g
I think they meant literal generation of files based on templates. Basically

    
    
      sed 's/T/int/g' file@T.go > file@int.go
    

Some people do that, I guess.

~~~
pjmlp
They do

[https://www.reddit.com/r/rust/comments/5penft/parallelizing_...](https://www.reddit.com/r/rust/comments/5penft/parallelizing_enjarify_in_go_and_rust/dcsgk7n/)

------
blixt
The article explains how you can get arbitrarily typed values out of the map
using unsafe pointers, but it never really explains how they're put in.

The insert example in the article uses a function that doesn't actually exist
in the runtime package, and it implies you can just give it a plain value as
input:

    
    
      m["key"] = 9001   → runtime.mapinsert(m, ”key", 9001)
    

I would assume that is only possible if the 9001 value can make its way
through as an actual value on the stack, not as a pointer. Here's what I would
expect to actually happen:

    
    
      ptr := runtime.mapallocate(m, "key")  // unsafe.Pointer
      *(*int)(ptr) = 9001  // Is there a non-typed way to do this?
    

While this is a good intro to hash tables in general, the Go specific part
feels a little bit lacking.

~~~
benmmurphy
my guess is it allocates the value onto the stack and then inside mapinsert it
knows to memcpy the value into the bucket because it knows the size of the
value.

like

    
    
        tmp = 9001
        runtime.mapinsert(m, "key", &tmp)

~~~
blixt
Yup, that looks simpler than what I put in my comment. Do you know where the
actual implementation is so we can confirm?

~~~
benmmurphy
Nah. I just decompiled m["foo"] = 5 and it is closer to what is in your
comment than mine.

so for this it does:

    
    
        push addr of string
        push $3
        call runtime.mapassign_faststr
        mov $5, 0($ax)
    
        test.go:4             0x104b4cb               488d05f9c20100          LEAQ go.string.*+91(SB), AX
        test.go:4             0x104b4d2               4889442410              MOVQ AX, 0x10(SP)
        test.go:4             0x104b4d7               48c744241803000000      MOVQ $0x3, 0x18(SP)
        test.go:4             0x104b4e0               e8cbc6fbff              CALL runtime.mapassign_faststr(SB)
        test.go:4             0x104b4e5               488b442420              MOVQ 0x20(SP), AX
        test.go:4             0x104b4ea               48c70005000000          MOVQ $0x5, 0(AX)
    

so its kind of like:

    
    
        ptr = runtime.mapassign_faststr("foo")
        *ptr = 5
    
    

EDIT: oops confused 0x3 with the value the first time around

~~~
blixt
Thanks! That clarifies a few things. Too bad this isn't covered in the blog
post which was specifically about how Go implements maps efficiently. Here are
two interesting things the post doesn't cover:

1) How the insert works and why it's this way instead of any other;

2) Apparently the runtime has specialized implementations to insert values for
some key types (string in this case), which implies it does in fact have
multiple implementations of the "same" code.

Edit: I'm also curious how plain Go code can assign a value directly to an
unsafe pointer without expressing the type? Or is that made unnecessary
because the compiler rewrites the `myMap["key"] = value` step and has free
reign to output unsafe, untyped Go code that isn't normally valid?

~~~
benmmurphy
yeah. my guess is the compiler just outputs its intermediate format when it
sees map assignment and can just do whatever it wants.

for reference you can play around with this by doing:

    
    
        go build test.go
        go tool objdump ./test
    

then search for test.go in the output and you should be able to find the go
'assembly' for your functions. also, if you use a big enough struct type then
go will call into a memcpy() style routine to do the assignment.

    
    
        test.go:32            0x108d791               488d15d7170300          LEAQ go.string.*+255(SB), DX
        test.go:32            0x108d798               4889542410              MOVQ DX, 0x10(SP)
        test.go:32            0x108d79d               48c744241803000000      MOVQ $0x3, 0x18(SP)
        test.go:32            0x108d7a6               e835c2f7ff              CALL runtime.mapassign_faststr(SB)
        test.go:32            0x108d7ab               488b7c2420              MOVQ 0x20(SP), DI
        test.go:32            0x108d7b0               488d742430              LEAQ 0x30(SP), SI
        test.go:32            0x108d7b5               48896c24f0              MOVQ BP, -0x10(SP)
        test.go:32            0x108d7ba               488d6c24f0              LEAQ -0x10(SP), BP
        test.go:32            0x108d7bf               e83807fcff              CALL 0x104defc
    

0x104defc for me looks like it duffcopy from this file:
[https://golang.org/src/runtime/duff_amd64.s](https://golang.org/src/runtime/duff_amd64.s)

------
Animats
Go hashmaps _are_ generics, or at least parameterized types. You can't create
new parameterized types in Go, but you get maps and channels built in.

~~~
mseepgood
By that definition C89 has generics too (arrays and pointers). The whole point
of generics is that they are user definable. Everything else is hairsplitting.

~~~
hoppelhase
C11 got user defined generics, FYI.

[https://en.cppreference.com/w/c/language/generic](https://en.cppreference.com/w/c/language/generic)

~~~
leshow
This is something else. C++ has had parametric polymorphism in the form of
template metaprogramming for a long time (the article on hashmaps even covers
it).

This is one of the reasons I don't really like the name 'generics', it's a
little ambiguous.

~~~
pjmlp
It was the term coined by CLU, Ada, Modula-3, so it is already widely used in
the academia.

------
royjacobs
How can the author claim that "Just like C++ and just like Java, Go’s hashmap
written in Go" (sic) when there is clearly a whole bunch of compiler magic
going on that goes beyond plain Go?

~~~
usrusr
The implementation is in Go, compiled from the source of the runtime library
when that was built, but the syntactic desugaring of your map-accessing code
is happening in the compiler.

~~~
royjacobs
That's still a bunch of magic that C++ and Java don't really get to use
though, isn't it?

Or would you consider this to be on the same level of adding opcodes to the
JVM?

~~~
jerf
I'd consider it fairly similar to the distinction between Java and the JVM,
yes, in that the JVM technically has more power than the Java language
exposes. There's multiple levels, with different powers.

You're not going to be able to draw a sharp line here where Go is somehow on
one side of it and most other languages aren't. In a sense the entire point of
a "language" in the first place is to provide various bits of "magic" to make
your life easier. For instance, you can't implement your own exceptions
system, because the compiler hooks in at a deeper level than you can get to.
(In both cases you can probably do it via what are technically platform-
specific hacks, but I wouldn't say that you've implemented it _in the
language_.) The only language with "no magic" is pure assembler, with either
no macros or at least no predefined macros.

~~~
pjmlp
Actually even Assembly has magic if the CPU is microcoded.

------
Sinidir
I really don't understand why go doesn't have generics. Shouldn't it be even
easier because go has no inheritance? So the problem of covariance and
contravariance goes away.

Seems really strange.

~~~
josefx
Rob Pike wrote a bit about it in "less is exponentially more". He points out
that generics are used by languages that focus on types and that Go focuses on
composition and uses interfaces ( and language buildins ) instead of types to
solve problems.

~~~
vmchale
>Rob Pike wrote a bit about it in "less is exponentially more".

Which is wrong.

>Go focuses on composition

Clearly false.

~~~
majewsky
How is this false? I agree with this assertion after having used Go
extensively for some time. Where Go really shines is when the stdlib offers a
nice abstraction (io.Reader, io.Writer, http.Handler) and you can compose your
programs from building blocks that use these abstractions. For example, from
the http.Handler interface follows directly a middleware interface:

    
    
      type Middleware interface {
        Wrap(http.Handler) http.Handler
      }
    

without requiring any generic types. For comparison, the same thing in Rust
would rely on type polymorphism:

    
    
      //assuming that Handler is a trait
      trait Middleware<T: Handler> {
        type Result: Handler;
        fn Wrap(handler: T) -> Result;
      }
      //NOTE: Not using impl-trait here to make it clear that
      //two type variables are involved (one type parameter
      //and one associated type).
    

You could mimic the Go behavior if you box all your traits, which Rust does
not like to do because it's not zero-cost:

    
    
      trait Middleware {
        fn Wrap(handler: Box<Handler>) -> Box<Handler>;
      }

------
di4na
So basically they invented a runtime type system. Again.

~~~
lifthrasiir
Yeah, a callback approach like this is actually one of common strategies for
data structures implemented in C. Others include a requirement that both the
key and value should be byte sequences or equivalent (you can always `memcpy`
primitives from/to byte sequences).

------
Matthias247
I think there's some contradiction between those statements:

> Rather than having, as C++ has, a complete map implementation for each
> unique map declaration, the Go compiler creates a maptype during compilation
> and uses that value when calling into the runtime’s map functions.

> Does the compiler use code generation?

> No, there is only one copy of the map implementation in a Go binary

The compiler certainly does code generation, as outlined in the article. It
just minimizes the amount of generated code, and uses explicit non-generic
functions on some higher layers. One can do the same in a generic C++ map
implementation too, by delegating some of the logic into non-generic
functions. It's a thing that is commonly done in manually size-optimized code.
One could also hope that the compiler automatically detects some of the
duplicated code between template instances, and deduplicates it.

------
comesee
The nice thing about C++ and templates is that you can implement a scheme
exactly like this without special help from the compiler :)

------
skybrian
The claim is that Go maps are "very fast" but that's relative to what you're
trying to do. If you're writing an interpreter loop, you want to avoid maps as
much as you can and use arrays instead.

~~~
sacado2
Indeed. Go maps are slow as hell. Everytime I need them in a tight loop, I
have to make an ad-hoc equivalent with slices, if at all possible.

~~~
_old_dude_
I believe it's because map access are not inlinined in Go.

Swift uses a similar strategy as Go to implement dictionary (a runtime data
structure that describe data access is sent as a supplementary parameter) but
the backend of Swift is able to inline most of the calls. So at some point in
the future, Go maps may be faster.

------
alpb
(This is a bit off-topic, but I couldn't help it.)

I noticed the article refers to "hashmap" as a data structure. Such as this
statement:

> Go maps are hashmaps

I think instead of hashmap, this article could use "hash table", which is the
actual data structure. The term hashmap sounds almost Java-specific to me (but
I understand it helps distinguish when you have both TreeMap and HashMap in a
language, like Java).

This is a cool article regardless, giving a sneak peek into the internals of
Go map type.

~~~
kaoD
Didn't sound that Java-specific to me:

[https://doc.rust-
lang.org/std/collections/struct.HashMap.htm...](https://doc.rust-
lang.org/std/collections/struct.HashMap.html)

[http://hackage.haskell.org/package/hashmap-1.3.3/docs/Data-H...](http://hackage.haskell.org/package/hashmap-1.3.3/docs/Data-
HashMap.html)

[https://stdlib.ponylang.org/collections-
HashMap/](https://stdlib.ponylang.org/collections-HashMap/)

[https://api.dartlang.org/stable/1.24.3/dart-
collection/HashM...](https://api.dartlang.org/stable/1.24.3/dart-
collection/HashMap-class.html)

[https://pocoproject.org/docs/Poco.HashMap.html](https://pocoproject.org/docs/Poco.HashMap.html)

For me, HashMap is an abstract data type that adheres to the Map interface and
is implemented with (i.e. has the performance characteristics of) a hash
table. It's a subset of Map, hence the "go's maps are hashmaps" remark.

HashSet is often implemented with hash tables too. I guess the difference is
ADT vs data structure (and I see your point there), not Java-specificity.

------
beagle3
The dict/hash implementation in K (since 1993 or so ..) is much simpler,
cleaner, takes much less memory and often faster - though speed depends on the
actual use case more than anything.

The idea is that instead of the hash map being a list of <key,value> items
indexed by key, there's a vector of keys, a vector of values (with matching
indices), and a search index (usually a hash, but could be a tree) that maps a
key to the index; note that the search index does not need a copy of the key
even if it is inexact because one can always refer to the keys vector.

The cons:

* You can't pass a single <key,value> object without building one

* You may incur twice or thrice cache loads (one for index, one for key, and if correct, one for value) whereas on a conventional <key, value> list, they are likely to be on the same cache line.

* (edit: added based on aidenn0 comment, see discussion below): Removal is generally O(n) unless you are willing to give up one of the pros; but even if you give some up, it's still usually more featureful than commonly used implementations.

The pros:

* Insertion order is preserved - OrderedDict for free!

* Getting the list of keys, or the list of values, e.g. python's keys() and values() methods, are O(1) and allocation free

* Applying a function to each item needs to allocate a new value vector, but can reuse the key vector -- and if the function doesn't need the key, you don't actually have to fetch it - e.g., to normalize a map so values sum to 1, you can do:
    
    
        whole = sum(mymap.values())
        normalized = apply_values(mymap, lambda x: x/whole) 
    

And get a normalized map with the same keys and literally the minimal number
of allocations and operations possible for a concrete datatype

* Specialization is by the key type, regardless of the value type - which means that a few efficient implementations (for ints, longs, floats, strings, symbols) manage to often provide C++ speeds for an interpreted language.

* Memory packing is about as tight as it can be. Your key is a double, and your value is one byte? You pay exactly 9 bytes per record + index size (which, e.g. for a 30,000 record map, takes two bytes per hash slot, or ~128KB for a 64K slot table). This is true whether you are running on an 16-bit, 32-bit, 64-bit or 128-bit processor.

* It is straightforward and natural to implement with almost no pointers involved, meaning that it is trivial to share memory among processes (at least for read only accesses) and to use with memory-mapped storage -- to the point that you can have an 8GB dictionary which is instantly mmap()/MapViewOfFile()d into existence, and is no different in any way than any other one - you pay disk access for the records you use, but enjoy OS cache/page management, with cache shared between processes and between subsequent runs -- no need to special case this dictionary just because it's big.

The Python 3.6 implementation switched from a hash-indexed <hash,key,value>
vector to a <hash -> order> and ordered <key,value> vector, and saw speed
improvements and memory use reductions. Due to Python's legacy and structure,
they could not have gone farther on the K way, except for CPU cache effects.

Having used APL and K for over 15 years now, I find it weird seeing the same
Smalltalk-inspired hash table structure come up in every single new runtime
and language implementation.

~~~
yiyus
There is a lot to learn from K, but there are two problems:

1) It is not open source, but a commercial product. Although some source code
can be found online, there is no license, so I would not dare using any of it
in a free software project.

2) Whitney's code is worth studying. I have spent a lot of time "deciphering"
the b interpreter in kparc and it has been a great exercise, but it is not for
everybody. Most people will find it unnecessarily obfuscated. It certainly is
not obvious for a casual observer.

I wish (1) changed, then we could see annotated versions of the code that made
it more palatable to solve (2) too. But according to what I have seen, this is
not going to happen.

~~~
beagle3
Totally agree. In the context of learning lessons from K about implementing
dict/hashmap, though, neither of these matter.

~~~
yiyus
Totally agree too :)

In particular, your explanation about K maps would be a great addition to
Cheney's post.

------
saagarjha
> Each different map are different types. For N map types in your source, you
> will have N copies of the map code in your binary.

Is this necessarily true? IIRC the compiler will roll identical functions into
one if it detects that they have the same body (instructions wise).

~~~
initium
Won't the read/written values to the data structure depend on different
constant values (data size) in the different functions, making it unfoldable?

~~~
saagarjha
Ok, here's a weaker question: do instantiating an std::unordered_map<int, int
* > and std::unordered_map<int, long * > cause two copies of the template to
appear? Yes, I know that the functions must both appear so that they can be
called–but do the function bodies get duplicated (as opposed to one "function"
just falling through into the shared implementation).

~~~
iainmerrick
I haven’t checked but I wouldn’t be surprised if they are duplicated.

 _(Edit to add:_ if this done, it’ll be part of the link-time optimization
step, which I think is relatively new in all the major C++ compilers.)

If you include debugging symbols, there will probably be multiple copies of
the duplicated code just because it has different names.

You certainly should be able to merge duplicated functions after stripping
symbols but I don’t know if that’s a standard thing. I’ve definitely been
disappointed in the past by the performance of “dead code stripping”
optimization. It’s not quite as trivial a problem as it sounds because when
you have groups of functions that call each other, you need to unify two
graphs rather than just look for individual identical blobs. The obvious
implementation (multiple passes looking for identical leaf functions) is slow
and will miss some cases.

~~~
jcelerier
> which I think is relatively new in all the major C++ compilers.

LTO has been available in gcc since 2009 and even longer in clang AFAIK. MSVC
had LTO for... I don't know but googling a bit shows that visual studio 2005
supports /GL.

~~~
iainmerrick
Yeah, sorry, I should have clarified “relatively recent” as meaning the last
10 years or so. That’s recent for C++! :)

This stuff is being actively worked on so I don’t know if merging identical
template instantiations is a thing yet. Any idea?

~~~
jcelerier
MSVC has /OPT:ICF which does this. GCC has -fipa-icf, and the GNU Gold linker
has --icf={safe,all} options (since it's operating on ASM it needn't be at the
compiler level).

In both cases it does not only work with templates, but with any kind of
function. However if you use --icf=all, some subtle bugs can appear since
function pointers that would compare different in a default build would now
compare equal. I have never been bitten by those.

~~~
iainmerrick
Cool! I’ll give those a try.

------
yani
Everytime I see an article from Dave Cheney I am wonder why is it so popular?

