
Go: Severe memory problems on 32bit Linux - kristianp
http://groups.google.com/group/golang-nuts/browse_thread/thread/ab1971bb9459025d#
======
wtallis
Wow. That third message, with suggestions for avoiding the bug, reads like a
twisted joke. Highlights:

"avoid struct types which contain both integer and pointer fields"

"avoid data structures which form densely interconnected graphs at run-time"

"avoid integer values which may alias at run-time to an address; make sure
most integer values are fairly low (such as: below 10000)"

I understand that this isn't a completely brain-dead garbage collector, but
warnings like that really scream "I'm just a toy language". It doesn't seem
wise to call such a fragile programming tool production-ready or 1.0; the
32-bit implementation should be tagged as experimental, if only to lessen the
damage to Go's reputation.

~~~
BarkMore
Go 1 defines and the backwards-compatibility guarantees one can expect as the
language matures. It's more a statement about the language specification than
it is about the implementations.

I know from experience that the gc 64-bit version is production ready. It's
unfortunate that the limitations of the 32-bit version are not called out
clearly on the website.

~~~
batista
_> I know from experience that the gc 64-bit version is production ready._

Is it really production ready though, or is it the same half-arsed
implementation as the 32-bit one, just taking advantage of the larger
availability of virtual memory on a 64 bit system?

~~~
ghusbands
On 32-bit systems, the problem is that there's a very high chance of non-
pointer data looking like pointers to existing data, which, in turn might have
its own pointer-like data. This means that you can end up keeping an excessive
amount of unreferenced data.

However, a 64-bit address space is so much larger that you can't really suffer
the same issue. Unlike on 32-bit systems, neither high entropy data nor text
will look like valid pointers.

Therefore, the technique can be validly described as production-ready for
64-bit systems.

~~~
ArbitraryLimits
Can someone help me out here with a TL;DR? This is my first exposure ever to
any technical aspect of Go, and this discussion makes it sound from this
discussion like it doesn't use any tag bits in its pointers, but has its
garbage collector run heuristics on the data it examines to see whether it
"looks like text" to decide whether to collect it? I know that can't possibly
be correct.

~~~
Arelius
It's much simpler than that, it's called a conservative collector, It looks at
every bit of data, at just pretends it is a pointer, if anything points to a
valid allocated address, that address is retained. Otherwise, just like every
collection when there are no references, the object is collected.

------
jedbrown
_However, it does not address the base issue, which is that Go uses a
conservative garbage collector, and more values look like pointers in a 32-bit
world.

The only real fix would be to improve the garbage collector's understanding of
which values are pointers and which are something else (e.g., floating point
numbers that happen to look like pointers). And that is not an easy fix._

How does this GC work? Is it literally just marching through the heap looking
for pointer-sized values in the range that has been mapped to the process?

~~~
smanek
Yep, that's basically it.

It sounds crazy (and it is!) - but it often works reasonably well in practice.
SBCL (one of the most performant Common Lisp implementations) has an
'imprecise gc' that works the same way - and I've seen reasonably heavily
stressed processes with uptimes in the weeks/months.

Remember, even a few years ago (before fastthread, etc) a reasonably loaded
Ruby on Rails app couldn't stay up for more than ~10 minutes w/o memory leaks
forcing a restart (DHH said 37Signals as doing ~400 restarts/day per process,
IIRC) because the runtime was such a piece of crap. Yet, _many_ people still
used it to solve real problems and make real money.

At least Go's memory leaks are much slower than Ruby's ;-)

~~~
rand_r
Would that gc scheme potentially cause the following (pseudo) code to break?

    
    
      x = malloc(1); // allocates block 'a' in memory
      int i = (int) x;
      x = 0;
      i = i - 1;
      // gc runs here and frees 'a'
      *( (int*)(i + 1) ) = 123; // failure

~~~
ianlancetaylor
When Go gets a precise collector, simple implementations of this will work.
Doing this kind of thing in Go requires importing the "unsafe" package, and
any memory allocations done by code importing "unsafe" could be marked as
possibly a pointer.

However, it would probably be possible to write code involving two packages,
one of which does not import "unsafe", to lead to dangling pointers and
eventual crashes. That is why you should be careful about code that imports
"unsafe".

~~~
pcwalton
Actually, the code sample in the grandparent comment obfuscates the memory
address by subtracting 1. Even a conservative GC will be confused in this
case...

~~~
ianlancetaylor
Ah, yes, missed that. Nothing a GC can do about code like that. At least it
remains true that this can only happen in Go if you explicitly import
"unsafe".

------
_delirium
This is the relevant bug: <http://code.google.com/p/go/issues/detail?id=909>

As the discussion there and in the thread says, the root problem is that Go
uses a conservative garbage collector, and on 32-bit a lot more values look
like pointers than on 64-bit, so many more things don't get freed in long-
running processes. Seems not to be easy to fix.

~~~
papaf
Can anyone knowledgeable about compilers/GCs say why GO went with a
conservative garbage collector? From any GO source code its trivial to pick
out pointers from values. Is this information hard to preserve at runtime?

~~~
riffraff
I recall reading something along the lines of it being hard to use a precise
GC due to the the "unsafe" package, but I am not knowledgeable at all.

~~~
_delirium
Here's an early LtU thread where someone predicted it'd probably have to use a
conservative GC due to some of the addressing features: <http://lambda-the-
ultimate.org/node/3676#comment-52560>

------
blinkingled
I suspect the issue exists on 64-bit platforms as well, it's just that it
doesn't impact as easily. In theory it is possible/common to have 64-bit
machine with less than a ton of physical memory (VPS) and running 64-bit Go
program which triggers this bug would result in similar impact as the 32-bit
version.

Why yes - Issue #909 essentially confirms this - running 64-bit doesn't
fundamentally change anything - it just buys more time for impact because of
larger address space and hope of more physical memory. Which is sad on many
levels - just mind blowing that the language designers did not think of this
upfront! (Oh and Go's built in packages trigger this problem too - per #909
commenting Unicode package makes the program run!)

~~~
4ad
Physical memory does not have anything to do with it, the virtual address
space is all that matters.

It might be useful to consider that 2^64 is 2^32 * 2^32. This means the
problem becomes important on 64 bit only if an application will use _10 orders
of magnitude_ more memory. By considering the historic growth in memory
capacity and usage, this will only happen around 2060.

~~~
blinkingled
Check out <http://code.google.com/p/go/issues/detail?id=909#c32> and the
following comment that agrees with it.

This isn't just a address space leak - it is a real memory leak. On 64-bit the
GC may not be so easily fooled as on 32-bit but it can still be fooled and
that is a fundamental problem that will result in memory leaks - if I have 2GB
RAM VPS - it doesn't help to have 2^64 bits of address space (actually it is
more like 2^48
([http://en.wikipedia.org/wiki/X86-64#Virtual_address_space_de...](http://en.wikipedia.org/wiki/X86-64#Virtual_address_space_details)
) - if the GC leaks memory sooner or later my process will be killed by the
OS.

~~~
4ad
Again, this has nothing to do with physical memory. It only has to do with the
virtual address space.

Of course if you bump into this, it's a real leak, who said otherwise? And
yes, it's possible to artificially generate the collision on 64 bit as it's
the same mechanism as with 32 bit. It's about whether it happens frequent
enough under normal usage patterns to be a concern. Youtube, and everybody who
tried Go in production say it isn't, and that's because of reasons outlined in
my first reply to you.

~~~
blinkingled
Care to explain why? You keep insisting without explaining. Have you checked
#C32 and how it clearly says it is a memory leak on both 32 and 64 bit
platforms?

What is your explanation as to why this is not a memory leak and only a
address space leak?

[EDIT POST YOUR UPDATE] Ok - so we are on the same page. I wasn't arguing
about the likelihood at all - just the fact that it is possible troubled me as
a bad GC design. Sure people use lots of crappy software on servers - doesn't
mean it's a sound idea :)

~~~
4ad
But it is a real leak, it's just an artificially created leak. These might be
interesting to investigate for DOS potential, but they don't happen under
regular usage because you are searching a needle in a haystack.

[edit after your update]

Each GC strategy has its drawbacks, for example the one used by Go has the
least overhead in extra memory usage, and it's also simple to understand and
implement. Mono got its precise GC only last year, it survived 8 years with a
conservative GC. Go is only two years old.

------
luriel
gccgo is better on 32bit systems: [https://groups.google.com/d/msg/golang-
nuts/qxlxu5RZAl0/NS71...](https://groups.google.com/d/msg/golang-
nuts/qxlxu5RZAl0/NS718AK297cJ)

Note that work on the garbage collector is ongoing post-Go1, a faster parallel
GC is in process being merged: <http://codereview.appspot.com/5279048/>

But ultimately 32bit systems don't seem to be a big issue for Google or anyone
else using Go in production (and there are quite a few big organizations using
it: <http://go-lang.cat-v.org/organizations-using-go> ).

Most people moved to 64bits a while ago so the amount of attention the 32bit
port gets will never be the same.

~~~
bad_user
You know, that kind of sucks ... there are many machines around that are still
on 32 bits.

I keep my own Ubuntu laptop (dual-boots to Windows) on 32-bit builds (both
Linux and Windows), simply because I have less problems that way (mostly with
hardware drivers, but also with software). I keep the Amazon EC2 instances I
maintain on 32-bit images, simply because they are cheaper. My Android phone
is also 32-bit and will be so for a long time. My other phone, an older iPhone
3GS, is also 32-bit. My servers, prior to Amazon EC2, built with ARM
processors, were also 32-bit.

And when I was playing with MongoDB, do you know what I did when I discovered
that the 32-bit build was basically unusable? I ditched it and never looked
back.

~~~
4ad
I've been exclusively using 64 bit computers and operating systems, both
Windows and Linux, for about 7 years now, never had the reported driver
problems, well, never had any issue, really.

For me at least, 32 bit is only important for ARM. On the other hand the issue
is very much blown out of proportion, most people haven't seen it, even if
they run 32 bit servers. Most usual servers written in Go, like web servers,
use very little memory. I process 4k requests per second using 7MB of resident
memory. There are many memory intensive applications, but you usually don't
run those on 32 bit.

~~~
sixbrx
I do wish the desktop world would just move wholesale to 64 bit, but do note
that Ubuntu still marks the 32 bit install the "recommended" one, so I would
guess there must be some sort of problems that are lingering in the 64 bit
versions. Maybe just flash support or something like that?

~~~
el_muchacho
I don't think there are problems with 64 bit versions, but many computers are
still 32 bits. The PC I use right now is 32 bits and I don't feel the urge to
buy a new one. So I suppose 32 bit machines will stay around for another
decade or so.

------
stiff
The problem itself seems quite serious and I would seriously reconsider using
Go if I was interested in it in the first place, but after seeing the way this
is treated by the Go "community", I am pretty sure I will never ever even
think about using Go for anything.

~~~
dchest
_the way this is treated by the Go "community"_

What do you mean?

~~~
stiff
Well, since the people behind Go did not up front say anything about the
language/compiler being specifically targeted at 64-bit platforms, I would
expect someone more mature from the Go team to step up and say something like
"We are sorry, we didn't foresee the consequences of some design decisions and
hence screwed it up." and either "We will fix it ASAP" or "We cannot fix it
because XYZ". They might not take money for their work directly from the
users, but there is still some moral obligation if you create something,
release it to the world, praise its virtues and persuade people to use it. As
it can be seen in the thread, lots of people already invested lots of time
into building things with Go and now they're in serious trouble. Instead, many
people in this thread try to somehow downplay the problem, advocate changing
hardware (that's something quite new in the programming language world) or
following some pretty absurd guidelines. This might or might not be
representative of the whole community around Go, but it surely leaves a bad
taste, hence the slight irony.

~~~
dchest
There is an "official" response: the post references a bug # where Russ Cox
said "the rest of the issue will have to wait until after Go 1".

Note that Go 1 is a "language freeze", not the implementation freeze.

Now that it's known there's a bug that will be worked on later, people
proposed possible workarounds, the easiest of which is to switch to 64-bit
platform. I agree that downplaying the issue is wrong, but only one person did
that.

There is occasional rudeness, mostly caused by strong opinions, but overall I
think Go community is pretty good.

~~~
stiff
The very next sentence in this response from Russ Cox is: "Or maybe all the
32-bit systems will be replaced by 64-bit ones.". So, they do not admit this
is a serious problem that needs attention, it is not clear whether they will
fix it, when they are going to fix and whether anyone cares about fixing it at
all.

I couldn't find the adequate words to express this so far, but the reason I at
all find this situation worth commenting on is that is reassembles to me a
very common pattern of denial I observe among many professionals in various
professions in cases where a problem appears that is very hard to tackle or
even to analyse in the first place. Often a doctor who has troubles
identifying a disease will tell you it's probably just something in your head,
a programmer who has trouble reproducing a difficult bug will tell you it's
you who probably did something wrong at some time, even a guy who I called to
repair my washing machine that was stopping the washing at random told me to
"keep it under observation" when he wasn't able to tell what's wrong. Many
people simply do not want to put in the work needed to solve an unexpected and
difficult problem, and thus, perhaps even subconsciously, try to handle it by
pretending it doesn't exist. If you want to be a real professional and a
leader in what you do, you can not behave like that, you can not repress a
problem when someone reports one to you, you have to have the patience to
examine the issue, the experience necessary to know when you can be certain
that you have the complete picture of it, then sometimes the courage to admit
there really is a problem and then finally you have to solve it, or people
will not respect you.

~~~
agentS
I don't think Russ denied it was a problem, so I don't think this is a
"pattern of denial". I think you're incorrectly interpreting that statement as
his "fix" for this bug.

The fix is well-known (a precise GC), just implementing it hasn't happened
yet.

------
ezyang
Funnily enough, I understood why they went the conservative GC route. It has
to do with the overall Go philosophy, which is that they really do not want
features to affect data representation. This has meant no boxing (and no easy
polymorphism), and a decision like that has logical consequences for GC too.

Here's to hoping they find a cool solution! It's been a problem for GC's since
forever, and if they find a general way of handling the problem I'm sure it
will be picked up by many other runtimes.

~~~
pcwalton
In Rust we're working on a solution for this problem. Essentially, the plan is
to have RTTI on garbage-collected data (this is already completed and is used
in the cycle collector) and precise stack information for every root on the
stack. The latter is in a fork of LLVM: <https://github.com/pcwalton/llvm>

But aside from the shameless plug, C# and D (I believe), have had precise
garbage collectors for quite a while now, and they have similar memory
management to Go. It's well-known how to implement it (but that doesn't make
it any less hard — I can totally understand why Google opted for conservative
GC in the first version).

~~~
ezyang
I know how to do precise GC if you allow me to add a (pointer-size) header to
all data living in the heap; i.e. to maintain the RTTI. I don't know how you
do that if you're not allowed a header. Do C# and D have headers?

~~~
pcwalton
I'm sure they do. There are a few things to note here:

(1) In order for malloc to work, you need a header anyway (at least, unless
your allocation fits in one of the fixed-size bins).

(2) You can get around the header to some extent by sorting the fields of your
objects so that pointers come first, and then all you need to do is to store
the number of pointers (or a sentinel value). This is what Haskell does. Of
course, this prevents low-level control over data representation.

(3) You can tag (or NaN box) all your values. This is what most MLs do, as
well as JS, many Lisps, etc.

(4) You can use a map on the side from pointer to type info to avoid a header.
This is what Rust in its early days did. It's worse than a header for memory
consumption though, so it doesn't really buy much.

~~~
ezyang
So, the thing that always gets Haskell folks when dealing with an
implementation (2) is that you can't get uniform data representation when
dealing with things like arrays. It means you have to unbox things. Arguably,
the situation is not much better in malloc land; if you malloc a large
multiple of your object size, you're explicitly saying, "I want this to be
unboxed", but by this point you've wandered into generics land.

(3) is annoying. Who likes 31-bit integers? Not I!

~~~
pcwalton
Yeah, I hate 31-bit integers too. It's not the only tagging scheme though; I
prefer NaN boxing (used in SpiderMonkey among others). NaN boxing allows
unboxed doubles and 32-bit ints, at the cost of increased register pressure
and memory usage on 32-bit systems.

------
zvrba
Interesting piece of information below in the thread: in reply to "Go being
advertised as a systems PL", David Symonds replies with: "It's not. It used to
be, but it's not any more."

~~~
el_muchacho
What is the definition of a system PL ?

~~~
zvrba
The ability to access the raw memory underlying any object.

------
ungerik
After switching to 64 bit, we haven't had a single crash. So I would describe
the 64 bit version as production ready. But the problems on 32 bit systems
should have been documented in the release notes.

Beside this hiccup (and a week without sleep with the website on life
support), Go has been a real joy to work with.

------
ww520
Are most Androids 32-bit? Doesn't this issue impact the GO Android SDK
adoption?

~~~
masklinn
> Are most Androids 32-bit?

By virtue of there being no AArch64 implementation announced so far, _all_
androids are 32-bit because there is no 64b ARM core.

> Doesn't this issue impact the GO Android SDK adoption?

Any Go application should be able to run into this issue if it has a similar
usage pattern.

------
0xABADC0DA
It's amazing how nonchalant they are about this 'oh sure we'll fix it in a
year or so' when the garbage collector is basically all there needs to be in a
Google Go runtime

Some comments were asking why early Java's GC wasn't this bad even though it
was conservative. The reason is that Java can't take references to fields of
an object, so the data mistaken as a pointer has to actually point to an
object header. In Google Go you can take a reference to a field, locking the
whole object, so the faux pointer can point to any field as well (or in this
probably any location in the object). Not exactly the wisest choice in
semantics, as they are seeing now that it complicates the GC.

~~~
4ad
It's not _Google_ Go, it's simply Go, there's not a single Google reference on
the web page and that's intentional. Looking at your previous posts here and
on Reddit I see you are a notorious anti-Google troll that adds "Google" to
"Go" so that your negative posts will be associated with both Google and Go.

It's absurd to call it _Google_ Go when there's a first class GNU
implementation that has been in development from the beginning, there's at
least a closed source implementation and another distinct BSD license
implementation in its infancy.

It makes even less sense than saying _Apple_ LLVM, _Juniper_ FreeBSD, or _AT
&T_ C.

~~~
rand_r
Any recommendation on how we should google for Go related pages?

~~~
4ad
Golang or Go programming works fine. If searching for Google Go produces
meaningful results, by all means, use it. I was merely complaining about
referring to the project as such.

There's also this: <http://go-lang.cat-v.org/go-search>

------
sunyc
to go back to 32 bit hardware, you will be running very old machines that you
can buy off ebay for <200$ and the power cost offset the new hardware cost in
several months.

~~~
wtallis
Running a 32-bit operating system on 64-bit hardware is still pretty common.
And this problem afflicts ARM as well.

~~~
jergason
This is really disappointing. I was hoping to do some stuff with Go on ARM,
but it looks like they are not too concerned about this bug.

