
Go Slices Are Fat Pointers - ingve
https://nullprogram.com/blog/2019/06/30/
======
WalterBright
D slices are fat pointers as well:

[https://dlang.org/spec/arrays.html#dynamic-
arrays](https://dlang.org/spec/arrays.html#dynamic-arrays)

My suggestion for C to adopt fat pointers:

[https://www.digitalmars.com/articles/b44.html](https://www.digitalmars.com/articles/b44.html)

~~~
masklinn
> D slices are fat pointers as well:

So are Rust's.

Possibly more interestingly, so are its _trait objects_ , the trait object
pointer is a (vtable _, instance_ ) rather than (instance*, length).

Also of note: that applies to "owned pointers" (`Box`) not just "references",
despite the former looking very struct-like: [https://play.rust-
lang.org/?version=stable&mode=debug&editio...](https://play.rust-
lang.org/?version=stable&mode=debug&edition=2018&gist=a0d4e151acba3b8f273666946613f789)

In a way, C++ and Rust (and possibly D?) have pretty much an infinity of fat
pointers as anything can be made to deref' to something else, and be more than
one pointer wide (e.g. C++'s std::string and std::vector and Rust's String and
Vec)

~~~
ainar-g

      > Possibly more interestingly, so are its trait objects, the trait object pointer is a (vtable,
      > instance) rather than (instance*, length).
    

The Go interfaces are the same. See
[https://research.swtch.com/interfaces](https://research.swtch.com/interfaces).

------
Stratoscope
A word of caution to anyone writing C macros like the examples in the
article... There is a potential error in this macro:

    
    
      #define COUNTOF(array) \
          (sizeof(array) / sizeof(array[0]))
    

Generally, when you write a C macro, you should parenthesize each use of any
argument inside the macro. The exceptions would be cases like token pasting
where you want the literal argument text and only the text.

I don't have a specific failure example in mind for the COUNTOF macro (maybe
someone can post one?) - but here is a macro where it is easy to see the
problem:

    
    
      #define TIMESTWO( value )  ( value * 2 )
    
      int ShouldBeEight = TIMESTWO( 2 + 2 );  // Oops, it's 6!
    

This happens because the macro call expands to:

    
    
      2 + 2 * 2
    

Parentheses prevent this problem:

    
    
      #define TIMESTWO( value )  ( (value) * 2 )
    

The macro also illustrates a common misconception: sizeof is not a macro or
function call, it's an _operator_. So wrapping the argument to sizeof in
parens isn't necessary unless they are needed for other reasons. (They don't
hurt anything, they just aren't needed.)

So when I've written a COUNTOF macro (and it is a very handy macro in C/C++
code if your compiler doesn't already provide a similar one like __countof in
MSVC), I write it like this:

    
    
      #define COUNTOF( array )  \
          ( sizeof (array) / sizeof (array)[0] )
    

The space after each sizeof is my attempt to indicate that the following
parens are not function/macro call parens, but are there specifically to wrap
the array argument safely.

~~~
hedora
You also shouldn’t repeat the argument: “while(COUNTOF(ptr++) > 0) { }” would
do horrible things (if it compiles; didn’t check) for example.

There are various preprocessor tricks to avoid this stuff (eg “do{ x =
(array); ... } while (0)”), but, honestly, if you want this sort of memory
safety, C++ is your friend.

~~~
firethief
Even plain `COUNTOF(ptr)` is wrong, unless `ptr` is an array.

~~~
MaulingMonkey

        -Werror-sizeof-pointer-div
    

Or in C++:

    
    
        #include <type_traits>
    
        template < typename T, std::size_t N >
        constexpr std::size_t count_of(const T (&)[N]) { return N; }
    
        #define COUNTOF(a) std::integral_constant<std::size_t, count_of(a)>::value
    

(C++03 friendly versions are entirely possible, but left as an exercise to the
reader)

------
kccqzy
Rust slices are also fat pointers. The std::string_view type is also a fat
pointer. The Haskell ByteString type is also a fat pointer.

------
stochastic_monk
This is analogous to spans and views in C++. I've rather enjoyed using span-
lite [0], an implementation compatible with earlier C++ standards. (Roughly,
all the syntactic niceness of an STL container without requiring ownership.)

[0] [https://github.com/martinmoene/span-
lite](https://github.com/martinmoene/span-lite)

------
anandology
Any slice implementation that references to the original data has to be some
kind of "fat pointer". Is there any other way that can be done?

String.substring in Java does exactly that. I'm sure numpy arrays in Python
are implemented like that, perhaps slightly more fatter.

~~~
masklinn
> Any slice implementation that references to the original data has to be some
> kind of "fat pointer". Is there any other way that can be done?

Yes, add one more level of indirection.

For instance traditional "oo" languages don't usually use fat pointers for
dynamic dispatch, you have a single pointer to an instance which holds a
pointer to its vtable.

In Rust however, the "object pointer" is a fat pointer of (vtable _, instance_
). Go's interface pointers are the same (which combined with nils leads to the
dreaded "typed nil" issue).

~~~
weberc2
I don’t understand people’s beef with “typed nil”. If an interface is just a
fat pointer and the data is itself a pointer, then it stands to reason that
the interface can be not nil while the data can be nil. If you understand the
concept of pointers (or nillable pointers, anyway), surely this should not be
surprising to anyone? It would have the same states as a asterisk-asterisk-int
(HN formatting prohibits asterisk literals), would it not?

~~~
jerf
I think it's a combination of several things: First, you've got a good chunk
of Go programmers who haven't dug down into the details of the internal
representation of interfaces, because for the most part, they don't need to,
and they either don't generally dig in to that level, or just haven't gotten
around to it yet. (There's nothing wrong with this and this isn't a criticism.
It's just that not everybody is an expert in everything they do or use.)

Secondly, I think that for a lot of people, when they have an interface that
is itself nil, they mentally tag it with the label "nil", and when they have
an interface that contains a typed value that is nil, they mentally tag this
with "nil", and from the there the confusion is obvious. Mix in some concepts
from other languages like C where nil instances are always invalid, so that
you don't mentally have a model for a "legitimate instance of a type whose
pointer is nil" [1], and it just gets worse.

Third, I think it's an action-at-a-distance problem. I'd say the point at
which you have, say, an "io.Reader" value, and you put an invalid nil struct
pointer in there that doesn't work, the problem is there. Values that are
invalid in that way should never be created at all, so "interfaceVal == nil"
should be all the check that is necessary. So you have a problem where the
code that is blowing up trying to use the "invalid interface" is actually
caused by an arbitrary-distant bit of code that created the value that
shouldn't have been created in the first place, and that pattern generally
causes problems with proper attribution of "fault" in code; the temptation to
blame the place that crashed is very strong, and, I mean, that's generally a
perfectly sensible default presumption so it's not like that's a crazy idea or
anything.

[1]: For those who don't know, in Go, you can have a legitimate value of a
pointer which is nil, and implements methods, because the "nil" still has a
type the compiler/runtime tracks. I have a memory pool-type thing, for
instance (different use case than sync.Pool, and predates it) which if it has
an instance does its pooling thing, but if it is nil, falls back to just using
make and letting the GC pick up the pieces.

~~~
inlined
This. Nil is the one case where you can seriously screw your self up in Go
because it’s an untyped literal that gets typed on assignment. Interfaces are
fat pointers and a “true” nil pointer is nil interface, nil implementation. If
your function is going to return a known subtype and you store that nil in a
temporary you now return a known interface and nil implementation. Then your
== nil check on the interface pointer will fail.

I royally screwed up a deploy once because my functions were returning nil
error implementations that were != nil and triggered failure paths.

~~~
weberc2
I agree that nil is error prone, but I don’t think interfaces add any
additional confusion so long as you think of them as a reference type and
reason accordingly. Also, I think you’re mistaken about “true nil”. A “true
nil” only requires the interface to be nil. The value doesn’t matter as there
necessarily is no value (moreover interfaces can have values which are value-
types that cannot be nil!).

------
rwmj
Another way to implement fat pointers is to keep the pointer representation
the same but have another structure like an interval tree that stores a map of
pointers to the extra information on bounds. That's how I implemented Bounds
Checking GCC back in the day:
[https://www.doc.ic.ac.uk/~phjk/BoundsChecking.html](https://www.doc.ic.ac.uk/~phjk/BoundsChecking.html)
It has the advantage that pointers remain compatible with code that wasn't
specially compiled to know about fat pointers (such as system libraries).

~~~
rocqua
Seems like that would hurt memory locality, with a commensurate increase in
cache-misses.

~~~
wahern
That's basically how Intel MPX works,

    
    
      https://en.wikipedia.org/wiki/Intel_MPX
    

as well as TinyCC's Bounds Check,

    
    
      https://bellard.org/tcc/tcc-doc.html#Bounds
    

which cites the aforementioned GCC patch.

I think AddressSanitizer uses a shadow mapping structure more similar to
Valgrind.

Although your point about performance still stands. Intel MPX is dead--support
removed from GCC, IIRC, because of hints Intel was abandoning it. We'll never
know how well it could have been optimized in practice. I think ultimately we
simply need proper fat pointers, but those might not become ubiquitous (in the
sense of becoming a de facto standard part of shared, C-based platform ABIs)
until we see 128-bit native pointers.

------
tangus
(In the same vein of other comments...)

Algol 68 slices are also fat pointers. Only they also carry a `stride` field,
since in Algol 68 you can slice multidimensional arrays in any direction.

~~~
oconnor663
A couple questions about that design. Is it worth paying the cost of an extra
word in every slice, if strided slicing is rare in most programs? And when you
wrap OS APIs that don't understand striding, do you have to use a different
pointer type?

~~~
tangus
It's a subjective question, but given that A68's slices are already burdened
with carrying their lower bound wherever they go, not much is lost with adding
another piece of data to them. They are already quite fat, and more with each
dimension. I think the ability of taking a column and just apply any sequence
function to it is well worth it.

I don't think you could use OS APIs without boxing and unboxing data, but I
think that's the case with every language except C.

------
benreesman
Obviously there are all kind of cute games that can be played with pointers if
you know arch-specific details (IIRC v8 was using alignment guarantees to
steal low bits at one point, might still be). And tagged pointers of various
kinds go at least back to Symbolics if not further.

One that I thought was especially nifty was [0].

[0]
[https://github.com/facebook/folly/blob/master/folly/PackedSy...](https://github.com/facebook/folly/blob/master/folly/PackedSyncPtr.h)

~~~
drfuchs
Lisp 1.5 was using tagged pointers by 1962, and I don’t think it was
innovative in this regard at the time.

~~~
db48x
Only because the computer it was on already had a tag field in every word.
Other implementations of Lisp did similar things though:
[https://www.snellman.net/blog/archive/2017-09-04-lisp-
number...](https://www.snellman.net/blog/archive/2017-09-04-lisp-numbers/)

------
throwaway34241
Go interfaces are also fat pointers. Basically, the pointer to the type
info/method set and the actual data are stored separately. [1]

As a result of this you can use pointers as interface types without boxing
them. For example you can take an interior pointer to an array, or a C
pointer, and use it as an interface type without boxing since the type info
doesn't need to be stored with the data.

[1]
[https://research.swtch.com/interfaces](https://research.swtch.com/interfaces)

------
baby
The “uint8* a, usize a_len“ pattern is one of the ugliest aspect of C. It’s
probably the source of most bugs. To its defense, it was invented at a time
where we were trying to save each byte of data and security was not really a
thing. Unfortunately C is still quite popular.

~~~
jasonhansel
IMHO if C had generics then so many problems could be fixed (e.g. you could
easily have a struct Slice<T> for a fat pointer). Existential generics would
even give you OOP (and, in a roundabout way, closures) for free. Hmm...

~~~
bshanks
What are existential generics? I searched Google but couldn't find the meaning
of the term.

~~~
jasonhansel
[https://wiki.haskell.org/Existential_type#Dynamic_dispatch_m...](https://wiki.haskell.org/Existential_type#Dynamic_dispatch_mechanism_of_OOP)

~~~
bshanks
thanks!

------
fulafel
Interesting that this post has been in the #1 place for so long. I don't use
Go but I knew this already by having randomly read some things about the
language. And slices are central language concepts after all.

Is it that the Go documentation doesn't cover the underlying implementation at
the machine word level?

~~~
_Chief
Docs explain this, not as detailed as the post, but it does touch on slices
referencing/wrapping an array. However, there is a blog post dedicated to
slice internals [https://blog.golang.org/go-slices-usage-and-
internals](https://blog.golang.org/go-slices-usage-and-internals) that dives
further in

------
metroholografix
Some errors I found:

    
    
        uintptr addr = (uintptr_t)buf & 0xffffffffffff;
        uintptr pack = (sizeof(buf) << 48) | p;
    

should be “| addr”

    
    
        #define ARRAYPTR(array) \
            ADDROF(array, COUNTOF(array))
    

should be FATPTR(array, COUNTOF(array))

------
millstone
Fat pointers + threads permit torn reads/writes, leading to memory unsafety.
It seems weird that Go allowed memory unsafety for this narrow use case -
AFAIK only slices and interfaces are fat pointers.

~~~
the8472
If you need thread-safety you must at least use atomics anyway Either your
platform has transactional memory, double-wide cas or you need to fall back to
locks.

~~~
gpderetta
Using double wide cas for atomic pointer store and loads is a big
pessimization though.

------
nilsocket
I regret learning go as my first language.

Every other language just looks awful.

~~~
db48x
People say that about Lisp as well.

~~~
weberc2
I think it’s for different reasons. Go as a language is just fine, but Go as a
framework for building software is just fantastic. The tooling is
comprehensive and easy to use, testing is built in, and the happy path is
almost always highlighted for you. No need to figure out which documentation
generator, style guideline/formatter, test framework, build system, package
manager, etc to use and these are easy to use (with the notable exception of
package management, which has been rocky but is improving). Lisp is a neat and
powerful little language but the tooling and documentation are often lacking
and it’s often difficult to figure out even which lisp is right for your
project.

~~~
db48x
Except that most of the tools that come with Go are pretty mediocre. Go's
debugger is worse than mediocre.

~~~
weberc2
The debugger isn’t great. Most things just work, and they just work well for
the most popular use cases. Documentation is just comments—no special syntax
to memorize and no need to maintain infrastructure to parse docstrings and
publish to a web server. It also runs the examples to make sure they work. The
build system doesn’t require you to learn a special (typically imperative)
project configuration language or Makefile generator. It also outputs a
statically-linked binary by default—no VMs or interpreters or libraries to
shop separately. Although it doesn’t have a comprehensive assertion suite, Go
ships with a unit test framework that just works, which is more than can be
said for most languages. And all of these things are standard so you can
pretty much jump into any project and build it, run its test suite, and begin
contributing with little more than a text editor and the Go toolchain. So
yeah, Go’s tooling is simply fantastic and no other mainstream language except
Rust comes close. This isn’t a fanboy statement; there are lots of other
factors to consider, but Go’s tooling is top notch.

------
tsegratis
Here's my own attempt at implementing these ideas in c

Basically feature complete, with unitests, and interop with c arrays

[https://tse.gratis/aArray/](https://tse.gratis/aArray/)

A few changes for a later version:

* Pure public domain license (it is that already, but the wording could be better)

* Remove c++ support... Not sure about this

* Let higher order map/filter functions take a void* payload when using plain c functions over clang blocks (or lambdas)

------
floatboth
> For example, currently on x86-64, only the lower 48 bits of a pointer are
> actually used. The other 16 bits could carefully be used for other
> information

Well, you can't dereference these fat pointers directly on amd64, you have to
remove the tag.

aarch64 actually has a Top Byte Ignore flag though :)

~~~
rocqua
I'd also expect that bit-fiddling on pointers, and then de referencing them,
is undefined behavior.

~~~
kragen
In C, yes, but not in amd64 or i386 (or most instruction sets.)

------
blt
...yes, and? How else could you possibly implement slices?

I don't disagree with any of this article at a factual level, I'm just trying
to understand the author's message.

~~~
gouggoug
For people like me whose knowledge of C is close to 0, and had no idea what
fat pointers were, that post was pretty interesting.

I've now learned 2 things: what fat pointers are, and that go slices are fat
pointer.

------
inlined
I’ve seen this technique used in old codebases for strings. A typedef was made
for a char* whose -1 address was a count and there wasn’t a zero terminator.
This was useful since string length checks were now O(1) and string
manipulation can take a surprising amount of time in CPU bound programs.

It was an interesting idea but you needed a whole new set of string functions
and bridge functions. It takes a runtime (or codebase) wide commitment to make
it work.

~~~
firethief
That sounds like Pascal strings. Arguably saner than C strings, but lacking
the main advantage of slices: multiple slices of different lengths and/or
different starting points can refer to parts of the same string, so you can
"slice" things without copying.

------
nerdbaggy
One of the things that was a gotcha for me when I started Go. Couldn’t figure
out why my slices weren’t what I was expecting

------
kazinator
You can easily obtain convenient fat pointers with like the the 1990 dialect
of C++ described here:

[https://www.amazon.ca/Annotated-C-Reference-
Manual/dp/020151...](https://www.amazon.ca/Annotated-C-Reference-
Manual/dp/0201514591)

------
ryanworl
Another useful unsafe slice operation is casting a byte slice to a string so
you can e.g. use it as a map key. Watch your lifetimes though!

You probably also wouldn’t want to use any string manipulation functions on
it, but I’ve never had a reason to do that anyway.

~~~
BarkMore
The compiler optimizes conversion from []byte to string on map lookup.
[https://github.com/golang/go/commit/f5f5a8b6209f84961687d993...](https://github.com/golang/go/commit/f5f5a8b6209f84961687d993b93ea0d397f5d5bf)

~~~
ryanworl
That’s awesome! I’ll have to benchmark that and see if I can delete some code
now. Thanks.

------
octonion
Swift arrays are the only programming language arrays that have ever baffled
me. I still have no idea how to manually reconstruct them given the array
struct.

~~~
eridius
Swift arrays are interesting beasts because they need to support bridging into
Obj-C, which makes them rather complicated. Look at ContiguousArray instead if
you want the simple version that skips bridging support.

------
codedokode
What surprises me in C programmers is that bugs like buffer overflow have been
known for several decades but they still use raw pointers for strings and
unsafe pointer arithmetics. At least in open source projects I often see this.
C is really a language that doesn't move forward.

Also, no proper package manager.

------
svd4anything
My favorite is Smart Fat Pointers. They are sexy!

