

Go Data Structures (2009) - zachorr
http://research.swtch.com/godata

======
fauigerzigerk
_> As an aside, there is a well-known gotcha in Java and other languages that
when you slice a string to save a small piece, the reference to the original
keeps the entire original string in memory even though only a small amount is
still needed. Go has this gotcha too._

That is no longer the case in Java. String.substring() now makes a copy. I
think it doesn't matter much which of the two approaches a language takes as
long as everybody knows it. This needs to be in the language spec and can't be
an implementation issue.

~~~
aaronblohowiak
for historical note, this change was made in may 2012 Java 7u6

~~~
krakensden
That's kind of a big deal for a point release...

~~~
mitchty
There is a good reason for it however.
[http://www.javaadvent.com/2012/12/changes-to-
stringsubstring...](http://www.javaadvent.com/2012/12/changes-to-
stringsubstring-in-java-7.html)

~~~
twotwotwo
Here's more nitty-gritty from Oracle:
[http://mail.openjdk.java.net/pipermail/core-libs-
dev/2012-Ma...](http://mail.openjdk.java.net/pipermail/core-libs-
dev/2012-May/010257.html)

If I'm following, a Java string used to be a []char and offset/count ints, and
this change let them drop those ints. You saved RAM if you had a lot of little
strings, but paid for extra copying if you took lots of substrings.

Go slices/strings don't have a pointer to the "original" backing array, just a
pointer to the first byte in this (sub)string. It doesn't need extra fields to
do substrings by reference.

I think part of the technical reason for the different string headers is that
the Java designers didn't want their GC to have to handle "internal pointers"
into strings/objects (maybe for performance reasons?), whereas the Go
designers decided to support 'em (maybe to support more C-like code in Go?).

~~~
fauigerzigerk
Go does not support internal pointers into strings. You have to use slicing
for that.

~~~
twotwotwo
Sorry, I mean that there's an internal pointer in Go's in-memory
representation of the string, not that there's a naked byte pointer directly
visible to the programmer.

Go's GC's support for internal pointers means it can use a pointer-and-length
representation for substring references. Java's lack of support for them means
its string representation needs a pointer to the start of the char array and a
separate offset and count in order to do the same substring-reference trick.
(And, I'm saying, that helps explain why Java and Go now do substrings
differently.)

There are other places where Go's ability to use internal pointers is exposed
more directly to the programmer: for example, Go lets you take the address of
an array element or struct field and pass around the resulting pointer.

~~~
fauigerzigerk
_> Go's GC's support for internal pointers means it can use a pointer-and-
length representation for substring references_

Only if the String class is implemented in pure Java, which it currently is.
But it doesn't have to be that way. Oracle could go around the Java language
features and implement the String class in native code just as Go does with
several builtin types. You may be right that it would be more difficult to do
than in Go because of garbage collector specifics.

But I guess the real issue is a philosophical one. Is it a good idea to let
the standard library use features that are not available to users of the
language?

~~~
twotwotwo
I'm saying I think it would take GC rearchitecting for Java to be able use a
pointer into the middle of the string in its internal String representation,
because Java's GC, unlike Go's, is currently not built such that a pointer
into the middle of anything keeps that thing "alive" for GC purposes; you have
to have a pointer to the beginning. Sun made that choice that in hopes they
could write a faster GC that way, I suspect.

Given that GC design, Go's two-word substring references (pointer into middle
of string + count) wouldn't work; even if String were a builtin, with the no-
internal-pointers GC design it would need to be at least three words (pointer
to start of string, offset, count).

tl;dr of my larger point is--I think Java needed a few extra bytes/String to
support substrings by reference because of how its GC works differently from
Go's, and I think that explains why Java decided to remove its substring-by-
reference trick while Go didn't. (And I'm not trying to say either way is
worse, just trying to really grok why they're different.)

------
iand
Note that this is from 2009. Although the main details have not changed, the
int type is more commonly 64 bits now (since 64 bit architectures are much
more common)

~~~
codezero
Do you know what version that happens in? I tested on my 32 and 64 bit
platforms with golang 1.1 and a static definition of an integer results in
type int (which is explicitly 32 bit)

    
    
      package main
      import "fmt"
      import "reflect"
      func main() {
          i := 3
          z := reflect.ValueOf(i)
          fmt.Printf("%s\n", z.Kind()) 
      }
      // $./test
      // int
      // $
    

It's my understanding that this intentional and won't change, only explicit
declarations of int64 are 64-bit.

~~~
enneff
The size of int on 64-bit systems was increased to 64 bits as of Go 1.1:
[http://golang.org/doc/go1.1#int](http://golang.org/doc/go1.1#int)

~~~
codezero
Cool, thanks for the clarification, this makes sense!

------
codeflo
I wish there were a way to create custom data structures without casting to
and from interface{} all the time. Heck, it would already help if there were a
shorthand for interface{}, like "any" or something.

~~~
krakensden
The usual pattern is to use a type that requires the thing you pass in to have
methods that you use for the data structure, like sort.Interface[1]. It's
faster, safer, and better than using interface{}.

As for shorthand, behold!

    
    
        type any interface{}
    

[1]:
[http://golang.org/pkg/sort/#Interface](http://golang.org/pkg/sort/#Interface)

~~~
codeflo
That introduces a new named type though, i.e., the "any" in your package is
different from the "any" in mine, which is not what I want.

(Unless I'm mistaken here, which might very well be the case.)

~~~
krakensden
Check it out:
[http://play.golang.org/p/l9yn0PRbrd](http://play.golang.org/p/l9yn0PRbrd)

Anyway, that's the point of go's type inference- if the object implements the
necessary parts of the interface, it counts as that kind of object.

------
grannyg00se
make(* Point) seems much better than having a separate _new_ keyword.
Surprised to hear that was changed after just a few days.

~~~
oofabz
The "new" keyword is practically unused in modern Go development, but is kept
for backwards compatibility. The usual way to make a point is "p := &Point{}",
without using any keyword.

~~~
rsc
Not true. I count "new" being used about half as often as "&Point{}" in the Go
standard library. That's not "practically unused".

    
    
      g% cg -c -f 'g/go/src/pkg.*\.go' '\bnew\(' | total 2
      1485
      g% cg -c -f 'g/go/src/pkg.*\.go' '\&[A-Za-z0-9_.]+\{' | total 2
      3051
      g% cg -c -f 'g/go/src/pkg.*\.go' . | total 2
      430482
      g%
    

So 430,482 non-blank lines of code, 1485 lines with new, 3051 lines that look
like a struct pointer literal.

~~~
trentmb
If I had to guess, I think they meant that anyone writing new code will avoid
using 'new'.

