Hacker News new | comments | show | ask | jobs | submit login
Understand Go pointers (cheney.net)
189 points by spacey 12 months ago | hide | past | web | favorite | 131 comments



I am still not sure why the language creators of Go decided to make pointers explicit (syntactically) instead of making references the default like most other languages. That is you still have pointers but you don't have to put "*" all over the place. I suppose it is because the language designers came from C or maybe they wanted to be that explicit?

I understand the value of having pointers even in a GC but I'am actually more concerned with the resource I'm pointing to and its lifecycle than that is pointers. That is there should be different types of pointers depending on where the data is stored and how it is reclaimed (something Rust does nicely with generics).

I'm not trying to bash Go rather I must be missing something (I don't know the language that well).


Having explicit pointers gives you more control over allocation. As you mention, Go is a GC language, and there are no guarantees of whether allocation will occur on the stack or the heap. But generally, non-pointer values will be stack-allocated. This reduces pressure on the GC, since the values are just discarded along with the stack frame on return.

For this reason, it's usually considered good form to pass values to functions instead of pointers, even if the values are larger. This allows the compiler to prove through escape analysis that the arguments don't need to be heap-allocated. In some cases pointers can be proven not to escape, but in a limited set of circumstances.

Having explicit pointers also resolves type system awkwardness between simple and complex data types. Consider Java, which is also a pass-by-value language, but has "reference" types, except for int, float, etc. So everything is a reference except this bag of simple data types, effectively creating a 2-tiered type system. Java has dealt with this by implementing boxing and unboxing rules for int/Integer and friends, which is fine, but adds complexity.

Finally, one of my personal favourite features about having a complex data type that is a value instead of a pointer or reference is that you can use structs and arrays as map keys.


As I mentioned in another similar comment I think this is the key point for me why it makes sense:

> Having explicit pointers also resolves type system awkwardness between simple and complex data types.

That is it appears to be worth it to have explicit pointers for data structures (and the manipulation of those data structures) and probably more so with a concurrent language. That is when the programmer sees things that don't have pointers they can make some generally assumptions (the struct or whatever is generally immutable).


So how big does my struct need to be before its more efficient to pass a pointer to it to a function?


I've often wondered this myself. Perhaps the right answer would be to benchmark and/or use the profiler.

On the other handy, I have never really had a situation where I had to pass a strict that was really big (more then 1MB).

Most of the time, big data structures are actually just slices, which use pointers to point to the memory address of the underlying array (as far as I understand).

So having sich big struct is rather unusual. Maybe it is possible to do with strings, though it would depend on how are strings implemented in go (I would guess they are just slices as well).

Furthermore, having big strings 1<x<50 MB) is rather unusual unless the strings can also become really big (1GB in which case you should not probably pass-by-reference or even better, buffer the string because arbitrary length strings can easily cause ram problems).


In Go, strings are internally a header that contains a length and a pointer. The bytes of the string itself are heap allocated.


Well, it depends whether the time spent copying the bytes for the function invocation is less than the time spent garbage-collecting objects that are passed as pointers, and then following those pointers. So as you can imagine, it's highly variable depending on the lifecycle of those objects, how frequently the function is being called, and probably a bunch of other factors like how big your CPU cache is. And then there is the cost of pointer indirection, which can cause cache misses.

I don't think there's really a rule of thumb here. Writing benchmarks, and observing the results is about the best you can do.


I think others answered the question of pointers vs not-pointers reasonably well, but your question implies you already understood that. The interesting thing to me is that they _did_ make some pointer types implicit: maps, chans, etc. Moreover, you can easily hide a pointer in a struct, such that the use site gives no hint as to whether or not you're going to have "spooky action at a distance" for mutations.

Given this, a more interesting question is "Why can't I define types that are implicitly always pointers?" I don't know the answer to that. My guess would be path-dependent design. As I understand it, even maps and chans were explicitly pointers for a long while in the early history of Go.

I've found that I can classify all structs in to two groups:

1) simple record values with all public fields; rare methods for convenience or specific polymorphism. 2) encapsulated machines with all private fields; only accessed through rich method interface.

Any blending of public and private fields has turned out to be a mistake in my experience. For record types, the implicit copying is valuable for passing data around. With encapsulated machines, I almost always want all usages to be a pointer. I wish I could define types like this:

    type MyMachine *struct {
        ...private fields...
    }
Which would basically compile like this:

    type machineState struct {
        ...public fields...
    }

    type MyMachine struct {
        *machineState
    }
One problem with this approximate macro-expansion is that you can't use "nil" for MyMachine, instead you need to use (MyMachine{}) as a zero value. I'd change that across the language too and just make nil a proper polymorphic literal zero value for any type.


I love Go and it is my preferred language. But the pointer thing can be very confusing when it comes to slices. Idiomatic go is to generally use slices, not arrays. Slices are passed by copy, but contain a reference to the underlying array.

In other words, the default is pass by copy, but the fundamental idiomatic data structure is effectively pass by reference. This makes it way too easy to accidently pass parameters by reference when you didn't intend to or, even worse, try to manipulate the same array from multiple goroutines without proper locking.

This bit me hard in an early system I wrote in Go. The situation is marginally improved with the panic-on-multiple-thread-manipulating-slice feature. The only way to deal with it is to be paranoid about any code that passes slices, which weakens one of my favorite things about Go: you can look at the code, instantly understand what it is doing, and trust there is no magic.

Edit: to be clear, I think pointers should be included in Go. I would just prefer that slices be handled differently - perhaps by treating them as a special case, where the compiler passes the underlying array by copy unless explicitly told to pass by reference.


This makes it way too easy to accidently pass parameters by reference when you didn't intend to or, even worse, try to manipulate the same array from multiple goroutines without proper locking.

That's the biggest hole in Go's "share by communicating" story. In practice, Go channels are just a built-in queue object, with the same concurrency problems as queue objects in other languages. Rust has a borrow checker, so if you pass a reference through a queue, the compiler notices.

Gp strings are read-only, so slices of strings, although sharing data, are not sharing mutable data. That tends to make the problem at least manageable.


That's a common mistake, don't look at channels as queues, one thing is be buffered and other one it's working as a queue, actually buffered channels should be avoided at least you really need a buffer.


Huh?


Channels can be made with no buffering at all. In this case trying to send without a receiver is blocking, so the channel is not really acting as a queue anymore.

Buffering is only an option to improve performance. Programs should (if they are correct in a sense) be able to work with channels without buffering.


> Channels can be made with no buffering at all. In this case trying to send without a receiver is blocking, so the channel is not really acting as a queue anymore.

It depends on how you define a queue. That is in this case whether there is blocking or not. In Java this distinction is made clear with BlockingQueues and plain Queues.

A non buffering blocking queue in Java is a SynchronousQueue or casually known as a hand off queue or rendezvous queue. Unfortunately unlike Go these queues are painfully expensive in Java because Java doesn't have green threads as well I believe the queues use expensive locks (then again locks can be at times more efficient than spinning/sleeping the cpu aka ConcurrentLinkedQueue but in general they are expensive).

Going back to the ambiguity of what is a queue it gets even more complicated when talking about distributed queues/streams where you have acknowledgements, prefetching, round robin distribution etc (ie AMQP).


A pointer tells you that the underlying data type can be modified in place. Pass by value ensures immutability to the original value. The exception to the syntax are collections that use `make` like slices, maps, channels.


Yes I understand what the pointer means I'm just surprised they would default to that or generally encourage it (again I don't know the language that well but it appears pass by value is not used often for I guess performance reasons). I probably just haven't looked at the right libraries or programming practices in Golang but it appears most APIs are very much mutable and thus use pointers.

With Java (and lets assume Java now has Value types for argument sake) the default is immutable references but the language could easily provide other reference types (and in fact sort of does) that would allow pointer like behavior and value like behavior. And with immutable references the runtime may actually copy anyway for some types (I vaguely remember the JVM doing this somewhere).

The reason I don't mind pointers in Rust or C/C++ is that I expect to manage the memory because the languages doesn't do the memory management.


The language doesn't really have a preference for either. When you see it being done mostly with pointers that's usually the author's preference. Also, using a pointer for a receiver is more convenient than pass by value since you can always derive the value by dereferencing a pointer, but you cannot derive the original pointer after a value has been passed by value (since the address has changed during the copy operation).


I didn't know the answer, but I just did a bit of Googling to determine that one of the reasons that pass-by-value can be preferred is because it makes life easier for the garbage collector.

See golang escape analysis: http://www.agardner.me/golang/garbage/collection/gc/escape/a...


I think there's a good case to be made for being explicit in Go although it may feel like a step backwards when Go has a GC.

In C# where pointers are implicit, if you need to pass function parameters by reference you need to prefix them with the `ref` or `out` keywords depending on if they are supposed to be read-write or write-only. In Java, you need to wrap them with objects.

Then comes the arrays. In C#, they are references passed by value by default. Not references passed by reference, which still needs the `ref` keyword. So array content changes will be seen by the caller by default, but not full reassignments.

I think it can easily start to get confusing when you start to think about the details in what the language does under the hood, and all that only because of trying to steer clear from an innocent * character here and there. A character that also serves to inform and remind readers of the code what is going on, what the intent of the code is. And again, it's not as if other similar GC'ed languages can avoid this either. There always comes a point where the compiler must know. In Go's case I think the designers just chose the path to second guess less and just leave it to the user for less compiler magic.


This might be off subject but to me this language always tries to make me understand the underlying of being a programmer. I always look at the third party support for versioning as a way of the Golang to tell me: "this is one way of doing it. Read it, copy what you need and make it your own".

+1 for the javascript breakouts every now and then.


> instead of making references the default like most other languages

My primary language (at the moment) doesn't make references the default and actually doesn't standardize allocation practice. This can be very confusing and will often lead to unexpected results unless you are aware enough of the nature of the language to avoid common pitfalls.

To me, this adds to the learning curve and is an unnecessary headache.

I'm not saying this means that explicit pointers are the "right" way of doing things...but just providing an example where a language without explicit pointers can result in odd behavior.

Also, I am not saying my primary language is bad...it is, after all, my primary language so I continue to use it quite extensively. Just trying to give some food for thought since you posed a good question.


I always assumed it was because passing by copy was useful in highly concurrrent environments. It's essentially pseudo-immutability, if you will.


Rust does not use references by default either, so I wonder why you don't mind it there but mind it in Go.


Because Rust is not a GC language. I should have perhaps explained that better. My round about point mentioning Rust was that instead of putting "*" all over the place you could just have types that represent such (pointers) or the opposite (ie value types) depending on what the language defaults to. I mean the pointer in some senses is effectively abstract wise Pointer<SomeType>. However I suppose this is difficult with out generics.

That is I don't mind the explicitness of Rust because I guess I expect it just like C/C++ but I probably incorrectly expect it with Go lang. This is mostly because I come from a JVM background where I expect the VM to figure out what is more optimal and consequently also have very little experience using in/out parameters.


It's not so much about what's optimal for performance but as to which piece of state changes and which one doesn't.

Implicit pointers means you have to often think extra when shuffling data structures around.


> Implicit pointers means you have to often think extra when shuffling data structures around.

This basically answered my question. Thank you. I don't know why I didn't make that connection.


I clicked the article thinking that Go pointers are something special... but this seems targeted at the "programmers" who only know javascript...


Since you sort of ask: Go pointers themselves are not that special, because in a GC'ed language with no pointer arithmetic they're really just a map/territory-type distinction, but there are some syntax affordances that can make them briefly more confusing than they appear.

For method calls and field accesses, Go fuzzes whether you have an object or a pointer by making the "." operator work on both, rather than C's "." vs "->" distinction. Since it's never ambiguous, it's just pointless overhead to make the programmer worry about that. It can also be slightly confusing that the method itself can control whether it receives a pointer, so you can call object.Method on a concrete object, but Method will receive &object. As a professional programmer I appreciate not being bothered with this unambiguous detail that the language can easily handle for me, but it can confuse people in the early going, which is a legitimate criticism. (I still come down in favor of doing what it is doing, but it is a legitimate drawback.)

The other thing I see that fools people with experience from other languages is that the "nil" pointer is not the same thing as a NULL pointer in C. In C, the NULL pointer is simply a zero with no type information connected to it. In Go, pointers are actually (type, address) tuples, so a nil pointer to a custom struct is actually (pointer CustomStruct [1], nil), which means that the runtime is capable of correctly resolving methods on nil pointers and that you can therefore write methods that work on nil pointers. Nil pointers are still bad because they are a value added to all pointer types by the act of taking a pointer that you can't control, you get it whether you like it or not, but they are not bad in the C sense that any attempt to touch one is a segfault.[2]

That's about all that matters in normal Go programming.

[1]: Asterisk is confusing HN's italicization there.

[2]: Which is I think an important aspect of understanding the "billion dollar mistake", by the way; it is easy to deconstruct that mistake as being two mistakes rolled in to one, which may perhaps explain why it was such a big mistake.


> The other thing I see that fools people with experience from other languages is that the "nil" pointer is not the same thing as a NULL pointer in C. In C, the NULL pointer is simply a zero with no type information connected to it. In Go, pointers are actually (type, address) tuples, so a nil pointer to a custom struct is actually (pointer CustomStruct [1], nil), which means that the runtime is capable of correctly resolving methods on nil pointers and that you can therefore write methods that work on nil pointers.

This paragraph seems, at best, misleading. C NULLs are not untyped either (using the same asterisk-avoidance as you): `T pointer x = NULL` means x is a NULL pointer with type T. Similarly, a Go `var x pointer T = nil` is a plain pointer-sized object with all bytes zero, just like C.

If you're talking about interfaces, then, there's actually two sorts of nil interfaces: interfaces where the data pointer is nil, and interfaces that are completely nil (both data and type/interface-info pointers).

  func main() {
  	var x *int
  	var y interface{} = x
  	var z interface{}
  	fmt.Printf("%v, %v, %T, %v, %T, %v", x, y, y, z, z, y == z)	
  	// <nil>, <nil>, *int, <nil>, <nil>, false
  }
In any case, for a comparison to C, Go is fairly similar: you can call functions with nil pointers in either, just fine, and dereferencing one is bad (the badness is far more controlled in Go, but see below). For a more reasonable comparison about methods, C++ could make it legal to call methods on nil/NULL pointers, and, like a completely-nil interface, crash when calling a virtual method (i.e. one that requires dereferencing a nil pointer to get the function pointer), but optimisations win out.

> they are not bad in the C sense that any attempt to touch one is a segfault

Note that this has nothing to do with nil pointers storing runtime types (even if they stored them) or anything like that: the Go language just (sensibly) decided to not have undefined behaviour with nil pointers, requiring that implementations handle them in a reliable/reproducible way.


> This paragraph seems, at best, misleading. C NULLs are not untyped either (using the same asterisk-avoidance as you): `T pointer x = NULL` means x is a NULL pointer with type T.

The point, I think, is that this type information is carried only at compile time on the binding, not at runtime on the value itself. If you have multiple aliases to the same null value (e.g. via references or pointers to pointers), the behavior will change depending on which alias is used. Not so in Go.

Of course, this is also true for non-null values in C++, when methods are non-virtual, so...


The rest of my comment fairly specifically covered that: Go nil pointers don't have runtime types (like C), Go nil interfaces sometimes do.

In any case, pointers to pointers with different types seem likely to be a strict aliasing violation in C.


> In any case, pointers to pointers with different types seem likely to be a strict aliasing violation in C.

Surprisingly, no, at least not in similar context. For example, it's perfectly legal to cast a pointer to a struct to a pointer to its first member - it's specifically guaranteed that this works as you'd expect. This is often used when emulating single-inheritance OOP in C - your "base class" is then the first member of struct type, and this lets you upcast and downcast with impunity.


Casting pointer-to-struct to pointer-to-field is fine, yes, but is it legal to cast pointer-to-pointer-to-struct to pointer-to-pointer-to-field? (I genuinely don't know the answer to this.) The former results in a new (temporary or otherwise) value with a new static static type, independent of the original, while the latter does not, and so is the interesting one. In any case, note that Go also has something a little similar, with its anonymous fields approach to composition.

However, that's not still at all the main point: there's actually not that much difference between how C and Go behave with compiletime/runtime types.


To the best of my language lawyering knowledge, no, such a cast wouldn't be legal, and would indeed be a strict aliasing violation.


> Which is I think an important aspect of understanding the "billion dollar mistake", by the way; it is easy to deconstruct that mistake as being two mistakes rolled in to one, which may perhaps explain why it was such a big mistake.

If it's two mistakes, it's one big one (having null pointers at all) and one tiny one (the fact that calling methods on null pointers is undefined behavior in C++). The latter is yet another C++ gotcha, but not nearly as pernicious as the former, which causes nearly all the null pointer-related bugs in the wild.


Yes, I agree that adding values that you can not avoid having into the set of valid values for a type is the worse one by quite a bit.

However, I'm still not a big fan of languages where values exist that are automatically crashes if you try to touch them in some bad way. Why are they there, then? My mental image is one of the anthropomorphized language just tossing caltrops around willy-nilly and blaming people who get stabbed for not being careful where they walk. I think it's important to call this idea out separately because programmers should learn this principle as they learn how to use strong typing systems: Do your best to exclude meaningless states for which your only recourse will be to crash if you see one from your system at the type level. The C-style NULL value is a degenerate case of this general principle.

Yes, this absolutely includes Go. I consider it a mistake in any language. In particular I'd highlight the distinction between reading a nil map (legal, gets the zero value of the value type for the map) and writing to one ("kablooie") as particularly bothersome and asymmetric, especially in light of the way nil slices can often be "written" to (via append which is in practice much more common than using index access to write) legally. Lots of asymmetries around what actually blows up vs. what "works" (but quite likely isn't doing what you want) in Go here. I'm not really in love with getting zero values out of a map if the key doesn't exist anyhow and I tend to pretend that only the check for both the value and existence exists in my own code (because I almost always care about existence too), but I especially don't like that behavior out of nil maps.


Undefined behavior is what makes C efficient. And NULL pointers are just a consequence of the ability of pointers to take numerical values. Naturally, this behavior was kept in C++, which is just a superset of C (with a few minor exceptions).

NULL pointers and undefined behavior make a lot of sense in C++. The mistake is replicating this behavior on languages where it doesn't.


> And NULL pointers are just a consequence of the ability of pointers to take numerical values.

Not at all. The C standard doesn't allow pointers to take arbitrary numerical values; while you can cast back and forth, the only guarantee is that casting a pointer to a numeric value and back to a pointer still points to the original object (and even then only if the numeric type used is large enough, which couldn't be portably ensured prior to C99, and is only conditionally supported even now).

In particular, one thing that the standard does not guarantee - and there have been implementations that did not do so - is that casting a null pointer to int will produce zero, or that casting int zero to a pointer will produce null. Nor does it guarantee that a null pointer is an all-bits-zero value.

The fact that you can write "p = 0" in C is not because pointers can take arbitrary numerical values, but because the language syntax and semantics allow you to do so, with 0 being treated specially when assigned to an lvalue of a pointer type. But you can't write "p = 1", for example, because there's no such special treatment for any other integer literal.


> In Go, pointers are actually (type, address) tuples, so a nil pointer to a custom struct is actually (pointer CustomStruct [1], nil)

I could be wrong but I think that's not right. The key difference is dynamic vs static dispatch. What you're describing (type, address) is how go represents interfaces. So go can resolve a.Read() on an io.Reader if the value is (os.File, nil). A go empty interface is not like a c void pointer for that reason. Type info is preserved and can be safely recovered.

But go also has non-virtual method calls unlike Java (afaik), it doesn't implicitly dynamic dispatch. This is a consequence of the fact that it doesn't have inheritance. If you write var foo os.File; foo.Close(); there's no runtime (type,address) tuple. That .Close() call is essentially compiled directly to something like CALL os.File.Close(nil). This can potentially be inlined even. If your method treats nil specially that's up to you.


"it's just pointless overhead to make the programmer worry about that"

You are incorrect, removing this distinction actually adds overhead of keeping a type in mind when reading and working with code. And overall selector operator in Go is pretty bad, no need to defend it, it forces you to always keep in mind some things, like not to collide with a namespace, because there is no :: operator for namespaces and always look around where the thing is defined, because it's not obvious whether it's a namespace, an interface, a struct or a pointer receiver.


Go has pointer arithmetic via the unsafe package, it just makes it very explicit, as it should be.


Programmers who only know javascript are, controversially, still programmers.


I'm a programmer who spends all day in Go working on backend systems, but came to it from Javascript, my first language. Pointers were one of the concepts that took me the longest to understand, even having dug into them previously in the context of V8's object implementation. So I for one am grateful for this article, and just passed it on to a recent bootcamp grad at my company. Hopefully it will accelerate his learning curve.


The first language I learned was QBasic, so when I started to learn C++, the concept of pointers was bizarre and took longer to learn than it should have, but after it finally clicked, it gave me a special love for pointers, the simplicity of C, and low-level programming in general.


While QBasic didn't have pointer types, it did have pointers in disguise (or rather, pointers without disguise, which are just memory addresses) - VARPTR, PEEK and POKE. As I recall, PEEK'ing and POKE'ing things was actually necessary to do many important tricks in QB, especially once you started playing with CALL INTERRUPT.


Regardless of whether they can be called programmers, we can not deny that they exist.


Even if you're a JS programmer, having no knowledge of pointers just shows you've never dug very deep into how your language works. Does JS pass by value? by reference? does it pass references by value? These are all things a JS programmer should know, and requires some knowledge of pointers.


Don't understand the downvote. Understanding javascript's closure (which is a core js feature even on the client) properly without understanding pointers seems like an impossible task to me.


The idea of reference is necessary, but pointers are an implementation detail. If you don't come from C, you can understand everything in JavaScript without knowing what pointers are. Many languages have references to mutable state without the peculiar details that make pointers what they are.


I'm really tired of people bashing Javascript to death. It's here, it's being used, wether you like it or not. Stop whining about it, please.


The constructive question here is: is there any way to make JavaScript stop being used (especially for tasks not related to webpages)?


Nope. Especially since many of us don't have piles of irrational hatred for it.


That is not a constructive question.


Or Python or Java or C# or any other language without pointers... These languages command a huge marketshare.

This article is clearly targeting folks coming to Go from those languages.


C# actually has (unsafe) pointers, although they're not used (and are not intended to be used) very frequently.


Yes, there are programmers who only know JS.

There's nothing "wrong" with that. If you only work in Ruby, Python, JS, or even functional languages it's very possible you've never run across pointers (at least not explicitly).

It's actually something people misunderstand about C, it's fundamentally a memory-oriented language in a way functional or OO languages are not and there are some real advantages to working that way. Ironically, this actually shows through in JS with the constructs you use to manipulate arrays (cause it borrowed from C).

No need to look down on other programmers; we all gotta start somewhere.


We all do... but IMO it's better to start with BASIC than Javascript...


BASIC is an ill-defined term, but if you mean the most modern versions of BASIC (as I seriously hope you don't mean 8-bit era BASIC), they aren't significantly different from Javascript circa 2005 in terms of quirkiness and silliness (and that's assuming you've got one with some sort of closure support, which I don't recall if one ever existed; without that BASIC is a clear loser behind Javascript-2005), and Javascript is certainly moving to exceed any BASIC I know about lately.

Javascript does have a lot of flaws, but unlike BASIC which is pretty much dead, the main structure of Javascript is alive and well, as dynamic scripting languages are still very much a going concern.


> BASIC which is pretty much dead

Tell that to the enterprise employees writing Office macros or doing VB.NET applications.


Looks the same as C pointers. Maybe the title is more clear if it would be: "Understand pointers, using Go". Then for a C programmer it's clear from the title that "pointers" is the same concept, and Go doesn't have a different type of "Go pointers".

Or instead of starting with:

"This post is for programmers coming to Go who are unfamiliar with the idea of pointers or a pointer type in Go."

It could start with:

"This post is for programmers coming to Go from a language without pointers or a pointer type."


> Then for a C programmer

Maybe that's not the target audience?

Tutorials targetted at C newbies have been about "C pointers" for eons; why would Go tutorials refer to a different programming language?

What that program is doing would not be well-defined in C; you cannot increment a pointer from the address of one local variable to point to another, without leaving behind the ISO C standard dialect. I haven't seen any compiler-specific document which "blesses" the practice.

If that is well-defined in Go, that would be an excellent reason why the article really is specifically about Go pointers and not C pointers.


They're not the same. Go doesn't have pointer arithmetic, and Go functions can safely return pointers to values that aren't created via new().

https://play.golang.org/p/m3OdaXH98_


Yes it has, via unsafe package.

Which ANSI C example should I write for you in Go?


Package unsafe is in the spec, but converting unsafe.Pointer to uintptr (which is how I'm supposing you'd do your pointer arithmetic) is implementation-defined.

This means I could create a perfectly legal implementation of Go where such things result in complete nonsense. I don't think package unsafe "counts".


If package unsafe doesn't "count", anything that C compilers allow beyond what ANSI C specifies don't count as C, thus many of its system programming features just vanish.


I don't see how that relates to Go pointers.


The article referenced 'Cuneiform' which I had to Google. It's neat to see that languages evolved the same way math was developed and similar to how software programs grow as a series of expanding abstraction:

> Emerging in Sumer in the late fourth millennium BC (the Uruk IV period), cuneiform writing began as a system of pictograms. In the third millennium, the pictorial representations became simplified and more abstract as the number of characters in use grew smaller (Hittite cuneiform).

Software is very much a natural extension of the brain and how we processed the world around us.


My favourite bit of ancient writing is the Kushim Tablet. It's a clay document, written in pre-cuneiform archaic Sumerian. It was written in Uruk, about halfway between Baghdad and Basra in modern-day Iraq, in the 31st century BC - five thousand years ago. On it is are the oldest examples of two things fundamental to human civilisation: a person's name, and an industrial process. It's a receipt for ingredients for a brewery:

http://www.schoyencollection.com/24-smaller-collections/wine...


Are there actually programmers who need to be explained the following: "...memory is just a series of numbered cells, and variables are just nicknames for a memory location assigned by the compiler."

What kind of programming can one do without even that level of mental model of what a computer does?


Most (keyword: Most) self-taught programmers / hobbyists start with high level languages like Python, Ruby, PHP, JS etc and work their way "down" out of interest and intellectual stimulation (like myself). It might be counter-intuitive to those with CS backgrounds but the truth is for most things, you don't really need to know the implementation details of variable assignement if you're only interested in scraping the NYT.

Believe or not, pointers can be difficult to grasp as a concept for people who aren't used to this type of mental model.

If you didn't struggle with it, good for you. But there's no need to look down on others who are trying to learn. I should also add that Software Engineering and Programming aren't necessarly synonyms. Some programmers aren't SWE and that's OK.

If you're a "fake programmer" like the parent is trying to imply, don't lose hope. Continue to learn at your own pace and you'll eventually catch up.


> If you're a "fake programmer" like the parent is trying to imply...

You probably shouldn't put words in other people's mouth.

I don't think he was calling the people fake, but merely asking how effective they could be at programming without an understanding of memory.


I admit it was a bit harsh but the parent comment sounded a bit too condescending.


I can see that pointers are conceptually difficult (the passage I quoted was not about pointers). I'm not even looking down on anyone. I'm just honestly surprised that programmers might not know what RAM is.


Just people who are new to it mostly, this must be aimed at people who haven't done any low level development at all. Some people seriously struggle with it though, I don't think I had a truly full grasp until I started reverse engineering and saw the memory first hand and how it was accessed with different instructions.


Actually you don't need to know about numbered memory cells for most modern languages that aren't C, C++, or assembly.

It's a good thing. Most of the time for most of the people, it's better to work closer to your problem domain than to the CPU.


> It's a good thing. Most of the time for most of the people, it's better to work closer to your problem

Having a rough understanding of what memory is at a very rough high-level grasp ("numbered byte-size cells" isn't very in-depth after all) doesn't preclude one from working "closer to one's problem domain", likewise lacking such grasp doesn't bring one any closer to one's problem domain.. what am I missing?


It's not that such an understanding precludes a person from working close to the problem domain, it's just that that knowledge is not necessary most of the time.

In the book, "Mythical Man Month", Fred Brooks talks about the two kinds of complexity in solving problems with computers. There's the "essential complexity" that's there because the problem you are solving is actually complex, and then there's the "accidental complexity" that's not actually required to solve the problem, but to make the computer happy.

Let's compare C arrays and Python/C# Lists. For my pretend problem, I need an ordered, index-able set of things.

With the C array, I need to know how big to make it before I create it. I need to allocate the memory that is used before I use it, and I must deallocate that memory when I'm done with it. I might underflow or overflow the array and allow Eastern European hackers control over my server and data. Even if I checking for overflows and underflows in all the proper places, I also have to add code to every one of those places handle each possible error.

Whereas in C# and Python, I just make a new list and stick things into it. When it goes out of scope, it disappears automatically.

Python and C# move me closer to the actual problem being solved by removing this "Accidental Complexity". When complexity removing is done well, it also removes the requirement for the programmer to know about numbered cells of memory, because numbered cells of memory is "Accidental Complexity" for the problem domain.

Now when I know how the computer is actually working, it does bring benefits. Today, in fact, I'm writing C code for an embedded ARM chip. Knowledge of memory numbers is a little required here. But for most software developers, knowing how memory works is not a requirement for working software. Even memory-as-a-numbered-set-of-cells is still just a huge abstraction from how the memory is actually being handled by the hardware.


> With the C array, I need to know how big to make it before I create it ..

But that is essential complexity when you need to reason carefully about memory use.


The parent just said that it's a good thing that these languages don't require an understanding of memory, not that such an understanding is not valuable.

If you need to understand memory to use a language, then it's not abstracting well enough.


Yeah, that's why we have chat programs taking 5-10% of the CPU when idle and in the background. The developers worked "closer to the problem domain".


Is it their fault that we still haven't been able to find an elegant solution to this problem?


There are many elegant solutions; they're not JavaScript on Electron.


> What kind of programming can one do without even that level of mental model of what a computer does?

JavaScript? Python?


I'm pretty sure to program in Javascript you need to understand that primitives are passed as values, while objects are essentially passed by pointers, and if you modify one inside the callee, the caller will see the modifications.

Understanding this fact does not strictly imply having a basic understanding of memory, but it gets pretty close to it. I would personally be weary of some who calls itself a programmer or software engineer and couldn't write this article themselves.


Primitives being passed by pointer or value....what is the difference?

The only relevant parts is that (1) they are immutable and (2) they are compared by value.


There is an argument to be made that is true for any of the top 10 programming languages with the exception of perhaps C and C++.


Surely you need to have some kind of idea of what memory is just to write:

  var a = 1
  var b = 2
You have to know that these values are stored in working memory, not on hard drive or Google's servers or Martian stone tablets.


No? All you need to understand to program are the assignment semantics.

Understanding the implementation details is valuable, but I'm not sure why you think it's necessary to program?


I just don't see what kind of useful programming you can do without knowing anything about the machine it runs on.

If "a = b + c" takes a second to execute, you have to take that into account. The programs that people wrote for 1950s computers were very different from today, even though the language semantics might be essentially the same.

On the other hand, this discussion is helping me understand why the web development world is full of weird database-backed Rube Goldberg machines that can spend milliseconds to access a few bytes of data that were already in RAM.


>If "a = b + c" takes a second to execute, you have to take that into account.

But it doesn't, so you don't.

Sure understanding performance is necessary to build more complex or higher usage systems, but not understanding it does not preclude "useful programming".


>If "a = b + c" takes a second to execute, you have to take that into account.

But it doesn't, so you don't.

How do you know it doesn't? What if assigning to 'a' writes to a remote database and waits for the write to be confirmed?

More to the point, how does a new programmer know that is not the case? The average person's expectation of computation speed is shaped by the experience of using server-side web apps: you click on a button, a new page is loaded from the server taking multiple seconds to finish.

To understand that an assignment doesn't run that slow, you need to be told that. Surely explaining local memory would be part of that.


To understand that an assignment doesn't run that slow, you need to be told that.

Not really. You just try it, and notice it ran instantly. You don't really have to care how it happens to run behind the scenes, as long as you have a working model of the semantics. That's the entire point of most abstractions, to make it so that you don't have to think about all the details when you don't want/need to.


Did you never go through the beginning phase where you didn't understand pointers? That's usually a breakthrough moment, not something you start out with.


No, because the languages I had available to me were Timex 2068 Basic and Z80 Assembly.

Maybe the first couple of hours when I still hadn't read the DIM, DATA, READ, POKE and PEEK manual pages.

Just like on Dave Cheney's post, seeing a few box examples was enough to get it.


So you benefitted from exactly the sort of explanation you now seem to be questioning?


No, because I never had a "breakthrough moment".

It just felt natural on how a computer was working, typing example after example, to a 10 year old version of myself.


There's a certain distinction to be had between understanding memory addresses, and understanding pointers as part of a type system. I remember the former was very straightforward in various BASIC dialects running on DOS, and it never really confused me. C pointer types, on the other hand, did. Sure, there's an obvious mapping between the two... in the retrospect. But it wasn't obvious at first.


Depends where you start from, really. If you come from the "bare metal" side of things (electronic work, microcontrollers etc...) then work your way "up" then the memory model is all you think about. I learned pretty much that way so I never really had much of an issue understanding these concepts (conversely, it took me a while to get used to things like dynamic typing).

That being said nowadays I'm sure many more coders start with something like Javascript instead of 8bit controllers, so I'm sure these types of articles are very valuable to many.


Sure. I'm not questioning the need for pointers to be explained.

I'm just surprised that an explanation geared for programmers would need to also explain that variables are stored in local memory, or what RAM is. (Again, there is nothing wrong with explaining that.)


You can get pretty far without knowing that. A big chunk of Go's audience seems to be Python programmers who want something that isn't quite so slow, and you can probably do a whole career in Python without ever knowing what a pointer is.


I'm so glad I learned how to program on a C=64. Even using Commodore BASIC one was introduced to how memory worked so that one could use PEEK and POKE commands. This made learning C and pointers so much easier because the mental model was there already.


(I upvoted you, I think your question is fine FWTIW)

Sadly, yes. Universities have moved away from teaching C. In ancient times, when I went to school, you'd start with an intro class in Pascal, then you'd take a data structures class in Pascal, and then you'd take some harder class in C.

About half of the people in the C class would drop out of Computer Science when they hit pointers. As someone who gets pointers pretty much instinctively, I didn't get it, the concept seemed really intuitive to me. But apparently that's not true for everyone, some people really struggle with it.

I think it really doesn't help that CS has moved away from C as a teaching language. C can be viewed as a pleasant, portable, assembly language. As such, it lets you "feel" the bare metal, much more so than a scripting language like Python or Javascript.


Oddly I find pointers intuitive and easy in Pascal and assembler - arcane and complicated in C. Java (memory model) is also rather difficult to internalize IMNHO - probably because of the complication of "bare" types (eg: int), boxing, pass by reference and some rather muddy default/usual containers/objects/arrays etc.

Other than for teaching "an industry language" I don't immediately see much reason to teach C and Pascal - but I suppose the "modern" equivalent would be Python, Cython, C and assembler, followed by haskell/(oca)ml, Prolog and/or a lisp/scheme - with the benefit of showing C interop along the way.


I can't help but wonder if the reason why Pascal pointers feel less complicated because they're more constrained, or because of the syntax, or both.

On the constrained part of things, there's no pointer arithmetic, and in Standard Pascal, there's no "address of" operator at all - a pointer can only point at dynamically allocated memory block, not at global or local variable, or some memory location inside another block. Consequently, there's also no pointer arithmetic. This is sufficient to make linked lists and other similar data structures, but also makes the concept much more opaque compared to C (i.e. it's less obvious that it's really a numeric address).

On the syntax part, I think the big problem with C is that the moment you start dealing with pointers, the arcane rules for declarator precedence are in play. Pascal, OTOH, has a very regular type syntax, which in case of pointers is also easier to read - you just read ^ as "pointer to", and so ^integer is "pointer to integer". Same thing for dereferencing - again it's ^, but there you read ^ as "points to", and so a dereference like x^ becomes "what x points to". And there's also no confusion with any other operator, since ^ is reserved for pointers alone, and it has a fairly obvious mnemonic to it.


> Oddly I find pointers intuitive and easy in Pascal and assembler

Sure, sixy seven lines of assembly handily beats arcane crap like:

   obj->ops.tbl[BLAH]->func(obj, &obj->x, arg);
Want to take an integer in a register, then add another register's integer multiplied by 8 or 4, and then a fixed offset?

Just use LEA (load effective address) and don't comment anything.

Let the reader figure out that no pointers are involved.


I guess you're being sarcastic but I honestly can't tell.

[edit: 99% sure is sarcastic]


> C can be viewed as a pleasant, portable, assembly language.

Do I have to buy what you're on from some guy on the street or is it available as a prescription?

> As such, it lets you "feel" the bare metal, much more so than a scripting language like Python or Javascript.

C hasn't been bare metal since the PDP-11. There are like eight layers of abstraction between

  char *foo = "bar";
and the "bare metal"; I don't know why people are desperately clinging to this "C is basically portable assembly nonsense".


Eight layers? That made me curious about how many are there really, though? My understanding is rather vague at this level, though hopefully good enough to know when I need to look deeper.

So the compiler is obviously one layer. Then there's the assembler and the linker. Does the C runtime count too?

You think you have a "string", but it's actually just an address to a (hopefully) nul-terminated chunk of "contiguous" virtual memory.

If you wanted to read the first byte of that array, it would first go through the OS's virtual memory system, so that's one rather large abstraction. (I'm lumping in the hardware's virtual memory support here, too)

Then when you actually access a piece of "real" memory, there are a number of caches between your data and the request to fetch it. And what about the DRAM itself? Can it access only a single byte of memory at a time, or is that too an abstraction?

Instruction decoding is one or two layers at least, since chances are the processor doesn't actually execute x86 opcodes directly.

And when you run out of software abstractions, how many levels of abstraction is there in the actual hardware?

I only have vague ideas of what actually happens at this level, and whenever I stop to think about it, it's pretty amazing our software stacks work at all...


https://godbolt.org/g/q01z7n

I'm not sure what you mean by layers of abstraction (type checking? optimization?) but C code often does have a pretty straightforward translation to assembly.

Perhaps you had a bad experience and can clarify what you mean?


We had pointers already in Pascal.


New people are born, every day...


Huh? You need a mental model of your language and runtime.

If your language treats things in terms of numbered cells and nicknames, then you need a mental model of that.

If your language treats things in terms of reverentially transparent values, then you need a mental model of that.


That's programming eqivalent of giving someone an axe and telling them to go chop some trees.

For non-early-apprentice-level programming, one could also use the mental model of what's below the language runtime.


Well, it's really all quantum wavefunctions.


Back in my uni days I met a few people who found this confusing. It's one of the basic concepts of programming that, believe it or not, some people just aren't mentally equipped to grasp.


It seems that some languages provide a pretty good abstraction, memory-wise ;)


Thank you! This is extremely useful!

I've always wanted to break into Go, but pointers scare me after my experience in C in University. While I got the usefulness and it's functions (and usage), it was not something I felt comfortable with.

This definitely it easier!


The word "Go" is not essential in the title.


If it didn't include Go, this would barely be scratching the surface (pointer arithmetic)

I see Go's pointer as closer to C#'s ref/out than anything


I personally (emphasis intentional) consider the core distinction between "references" and "pointers" to be whether you can do pointer arithmetic on the pointers. Pointers without pointer arithmetic aren't hardly scary at all, especially in languages where there is no way to deallocate the underlying value but leave the pointer behind such that it may point to the wrong thing later, be that due to something like Rust or with GC like Go.

So personally, I think of Go as having references, but not pointers (outside of unsafe).

Of course every language community uses those terms its own way, but that seems to me to be the most broadly useful way of looking at it.


I've spent a lot of time on this, working on Rust's docs, and the way I see it is that "pointer" is the most generic concept, with "reference" being a more restricted form. So all references are pointers but not all pointers are references.

Words are hard.


I agree, Go in the title is irrelevant. Otherwise, it is a decent overview of the notion of pointer, although there are many of such overviews around.

My other objection is with using code like * b++ in a text that aims to be crystal clear about a single concept (pointers). That can bring totally irrelevant questions on operator precedence and right / left associations. It would be better to say * b = 201. My 2c.


Your asterisk got interpreted as italicization and screwed up your post.


Thank you!


If it was pointers in general I would have expected the article to talk about pointer arithmetic, which from other comments in the thread seem to be not possible in Go.

What is the use case for pointers then? are parameters passed by value instead of by reference by default?


Pointer arithmetic is possible in Go via the unsafe package.

Here is an example.

    buff := (*[25][80][2]byte)(unsafe.Pointer(uintptr(addr) + 25))


"Understand Go pointers in less than 800 words or your money back"

There are multiple pictures each of which is worth 1000 words. Where is my money?


This is a good explanation of what happens on the machine.

However, most languages (C, C++, etc) have different definitions of what pointers are, and many operations that seem reasonable on the machine model are in fact undefined behavior.

In C, for example, you cannot reference one object from a pointer to another object (there is one exception to this rule).


> In C, for example, you cannot reference one object from a pointer to another object (there is one exception to this rule).

I'm not sure what you mean by this, but if it's a reference to strict aliasing rule, then it's about types, not object identity; and there's more than one exception to it.


> In C, for example, you cannot reference one object from a pointer to another object (there is one exception to this rule).

I'm confused about what this means--can you explain?


You can't both cast a double* to an int* and then dereference the int* , expecting to read an int-sized chunk of the double. It's called strict aliasing and there are defined scenarios where it is allowed (one is that char* is an alias for all pointers).




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: