"avoid struct types which contain both integer and pointer fields"
"avoid data structures which form densely interconnected graphs at run-time"
"avoid integer values which may alias at run-time to an address; make sure
most integer values are fairly low (such as: below 10000)"
I understand that this isn't a completely brain-dead garbage collector, but warnings like that really scream "I'm just a toy language". It doesn't seem wise to call such a fragile programming tool production-ready or 1.0; the 32-bit implementation should be tagged as experimental, if only to lessen the damage to Go's reputation.
I wouldn't bother with the 32 bit hiccups. Go hits the sweet spot pretty well. You have chosen well. Hang in there.
Comparing the standard libraries, Phobos doesn't look far behind Go in scope. There are big holes though, like crypto, which is entirely missing, and a complete SQL driver (was in development, but we haven't heard from it for a while now), although there is a binding for SQLite3 and several drivers for major RDBMSs (not in the standard lib, tough). Logging will be included soon.
Most of the rest is included (networking uses libCurl), and Phobos quality is continuously improving, some parts of it being excellent both in terms of functionality and performance, like the new regex library. On some other parts, like containers, Phobos seems much more advanced than Go. OTOH, there seems to be more 3-party libraries for Go than for D, but we can't comment on their quality. And of course, both languages allow to bind C libraries.
How so? I think it's got pretty much what belongs in the standard library.
This is a rather strange assessment. The history of "successful" languages has been a mixture of "cool jump onboard" and "who can stay alive the longest to get a community." D is in the latter camp.
It seems to me, unreasonable to expect a language to be coming out the gate with guns blazing. People expect Nukes now!
One of the main problems was, he was almost the sole compiler developer, and could hardly keep up with the tasks of maintaining 2 parallel branches and developing new ideas at the same time. People complained that they couldn't get involved as much as they wanted. It's understandable that many people thought that D didn't have a solid future with such uncertainties.
Nowadays, these problems are mostly overcome with a much better organization: there are several committers for the compiler, and several committers for the standard library. Phobos is the standard library, it's maturing, D2 has shown its strengths over D1 and the community is united again, because not only it is deeply involved with the design of the language and standard lib(through the m-l), it is also involved with the implementation of essential parts of it. 2011 has been a very good year for D, and I think that more than ever, the whole project feels like it's going in the right direction.
edit: I guess another reason D isn't gaining as much traction as it could is, it has been removed from the Alioth computer language shootout. For a language which is aimed at raw speed (and was brilliant at that when it was still on the shootout), it's a severe blow.
I know from experience that the gc 64-bit version is production ready. It's unfortunate that the limitations of the 32-bit version are not called out clearly on the website.
No need to guess, look at golang-dev https://groups.google.com/forum/?fromgroups#!forum/golang-de...
Is it really production ready though, or is it the same half-arsed implementation as the 32-bit one, just taking advantage of the larger availability of virtual memory on a 64 bit system?
However, a 64-bit address space is so much larger that you can't really suffer the same issue. Unlike on 32-bit systems, neither high entropy data nor text will look like valid pointers.
Therefore, the technique can be validly described as production-ready for 64-bit systems.
In order to pull of an attack, you'd need to know what address range the program in question has been allocated, then figure out the smaller range that the runtime is actively using, then give it data with integers in that range. This is impractical.
If you think you can pull it off play.golang.org lets you upload text to a Go program on Appengine, then it compiles that program and runs it. This gives you 2 programs to attack, the playground binary, and the one compiled from your source. If you can do it, you'll have a way to kill machines inside Google.
Even if there's a smaller chance of it happening, any language that has ANY chance of killing your system when running as expected is a language I'll never bother to learn.
Is there any sort of analysis tool they might be able to put into the compiler to tell you if a data structure you created has a high chance of looking like a pointer?
The only real fix would be to improve the garbage collector's
understanding of which values are pointers and which are something else
(e.g., floating point numbers that happen to look like pointers). And
that is not an easy fix.
How does this GC work? Is it literally just marching through the heap looking for pointer-sized values in the range that has been mapped to the process?
It sounds crazy (and it is!) - but it often works reasonably well in practice. SBCL (one of the most performant Common Lisp implementations) has an 'imprecise gc' that works the same way - and I've seen reasonably heavily stressed processes with uptimes in the weeks/months.
Remember, even a few years ago (before fastthread, etc) a reasonably loaded Ruby on Rails app couldn't stay up for more than ~10 minutes w/o memory leaks forcing a restart (DHH said 37Signals as doing ~400 restarts/day per process, IIRC) because the runtime was such a piece of crap. Yet, many people still used it to solve real problems and make real money.
At least Go's memory leaks are much slower than Ruby's ;-)
I can't say for sure how the Go GC works, but I would assume it likewise isn't fully conservative. E.g. the "avoid struct types with both integer and pointer fields" advice would be pointless for a fully conservative GC, but does make sense if structs containing no pointers are allocated in a separate memory region that's not scanned for potential pointers.
You pay for quick development turn-around with machines.
Go is not in that kind of space as far as I know since, as I understand it, Go is sold as low-ish level language competing with C++ and C for system development. For something that would replace those languages, unexplained, systematic leaks would be bad (I mean, my C++ programs might leak but it's reasonably easy to discover why. That matters).
About this issue: yeah, its unfortunate, but its a property of the class of garbage collector that Go uses atm. My servers are all 64-bit, so doesn't really affect me. But I do feel bad for those who are trying to run Go on ARM, or on 32-bit servers.
This "magic" constant should be non-valid value in 32-bit single precision float point format, far away from usual mmap()'ped address ranges, and far away from small integers.
Maybe this "magic" value should be randomized to prevent DoS attacks on Go runtime library.
x = malloc(1); // allocates block 'a' in memory
int i = (int) x;
x = 0;
i = i - 1;
// gc runs here and frees 'a'
*( (int*)(i + 1) ) = 123; // failure
But note that isn't really fair to conservative GC, as no GC, precise or not, can cope with hidden pointers like that. So your example isn't really a strike against Go.
However, it would probably be possible to write code involving two packages, one of which does not import "unsafe", to lead to dangling pointers and eventual crashes. That is why you should be careful about code that imports "unsafe".
The problem with an imprecise GC is that having an integer that looks like a pointer, could prevent an object from being freed.
- Shaw's opening volley: http://web.archive.org/web/20080103072111/http://www.zedshaw... [lots of stupid personal flames - but his technical points are consistent with what I remember from the time]
- DHH's response: http://david.heinemeierhansson.com/posts/31-myth-2-rails-is-...
- I can't find a copy of shaw's actual response, but here's the relevant HN thread: http://news.ycombinator.com/item?id=364659
The salient point (quoting Zed):
Now, DHH tells me that he’s got 400 restarts a mother fucking day. That’s 1 restart about ever 4 minutes bitches. These restarts went away after I exposed bugs in the GC and Threads which Mentalguy fixed with fastthread (like a Ninja, Mentalguy is awesome).
If anyone had known Rails was that unstable they would have laughed in his face. Think about it further, this means that the creator of Rails in his flagship products could not keep them running for longer than 4 minutes on average.
Repeat that to yourself. “He couldn’t keep his own servers running for longer than 4 minutes on average.”
I've never been much of Ruby/Rails guy - but my understanding is that everyone had to restart Rails a few times an hour cause the memory leaks were so bad in those days.
Zed's rant/flame goes into so much personal detail that he doesn't really articulate the point. Rails was receiving this enormous amount of hype, but there wasn't even a working application server yet.
If you think you can still do it, all of *.golang.org and golang.org are running Go on Appengine, with the source code being freely available. This is your opportunity to get a back door into Google's servers.
I won't comment on the DOS concern until I've investigated further.
As the discussion there and in the thread says, the root problem is that Go uses a conservative garbage collector, and on 32-bit a lot more values look like pointers than on 64-bit, so many more things don't get freed in long-running processes. Seems not to be easy to fix.
Unfortunately, Boehm has draw backs- because it's getting no help from the compiler, it can't tell integers from pointers. So it has to treat everything that looks like it might be a pointer as a pointer, even if it's an integer (or floating point number). Which means that it's possible for garbage to not be collected, because there is an integer that happens to have the same value as the address of the garbage object. And, of course, once you can't collect that object, you can't collect all the objects it refers to (including false pointers), and so on.
The odds of this happening are a function of what percentage of the virtual address space is in use- once some critical threshold is reached, the amount of garbage that can't be collected due to false pointers just explodes. On 32-bit platforms, I've seen this happen with heap sizes of only a few hundred megabytes. And the advice to work around this is exactly what the responder said- use less memory, don't use large ints (which are more likely to be mistaken for pointers), etc. Also, the problem goes away (for the time being) on 64 bits, because the percentage of memory used drops. A terabyte of memory on a 64-bit system is the same fraction of the total address space as a kilobyte of memory is on a 32-bit system.
(Precise GCs are possible in C-like languages, but trickier to implement. Here's a recent paper on one: http://www.cs.utah.edu/~regehr/papers/ismm15-rafkind.pdf)
I doodle all the time when working. Pristine set of diagrams emerges from the chaos. I wish my computer would play along with this regime ..
Why yes - Issue #909 essentially confirms this - running 64-bit doesn't fundamentally change anything - it just buys more time for impact because of larger address space and hope of more physical memory. Which is sad on many levels - just mind blowing that the language designers did not think of this upfront! (Oh and Go's built in packages trigger this problem too - per #909 commenting Unicode package makes the program run!)
It might be useful to consider that 2^64 is 2^32 * 2^32. This means the problem becomes important on 64 bit only if an application will use 10 orders of magnitude more memory. By considering the historic growth in memory capacity and usage, this will only happen around 2060.
This isn't just a address space leak - it is a real memory leak. On 64-bit the GC may not be so easily fooled as on 32-bit but it can still be fooled and that is a fundamental problem that will result in memory leaks - if I have 2GB RAM VPS - it doesn't help to have 2^64 bits of address space (actually it is more like 2^48 (http://en.wikipedia.org/wiki/X86-64#Virtual_address_space_de... ) - if the GC leaks memory sooner or later my process will be killed by the OS.
Of course if you bump into this, it's a real leak, who said otherwise? And yes, it's possible to artificially generate the collision on 64 bit as it's the same mechanism as with 32 bit. It's about whether it happens frequent enough under normal usage patterns to be a concern. Youtube, and everybody who tried Go in production say it isn't, and that's because of reasons outlined in my first reply to you.
What is your explanation as to why this is not a memory leak and only a address space leak?
[EDIT POST YOUR UPDATE] Ok - so we are on the same page. I wasn't arguing about the likelihood at all - just the fact that it is possible troubled me as a bad GC design. Sure people use lots of crappy software on servers - doesn't mean it's a sound idea :)
[edit after your update]
Each GC strategy has its drawbacks, for example the one used by Go has the least overhead in extra memory usage, and it's also simple to understand and implement. Mono got its precise GC only last year, it survived 8 years with a conservative GC. Go is only two years old.
In practice the Go GC never leaks memory on 64bit systems and for most programs never on 32bit systems either.
Note that work on the garbage collector is ongoing post-Go1, a faster parallel GC is in process being merged: http://codereview.appspot.com/5279048/
But ultimately 32bit systems don't seem to be a big issue for Google or anyone else using Go in production (and there are quite a few big organizations using it: http://go-lang.cat-v.org/organizations-using-go ).
Most people moved to 64bits a while ago so the amount of attention the 32bit port gets will never be the same.
I keep my own Ubuntu laptop (dual-boots to Windows) on 32-bit builds (both Linux and Windows), simply because I have less problems that way (mostly with hardware drivers, but also with software). I keep the Amazon EC2 instances I maintain on 32-bit images, simply because they are cheaper. My Android phone is also 32-bit and will be so for a long time. My other phone, an older iPhone 3GS, is also 32-bit. My servers, prior to Amazon EC2, built with ARM processors, were also 32-bit.
And when I was playing with MongoDB, do you know what I did when I discovered that the 32-bit build was basically unusable? I ditched it and never looked back.
For me at least, 32 bit is only important for ARM. On the other hand the issue is very much blown out of proportion, most people haven't seen it, even if they run 32 bit servers. Most usual servers written in Go, like web servers, use very little memory. I process 4k requests per second using 7MB of resident memory. There are many memory intensive applications, but you usually don't run those on 32 bit.
What do you mean?
Note that Go 1 is a "language freeze", not the implementation freeze.
Now that it's known there's a bug that will be worked on later, people proposed possible workarounds, the easiest of which is to switch to 64-bit platform. I agree that downplaying the issue is wrong, but only one person did that.
There is occasional rudeness, mostly caused by strong opinions, but overall I think Go community is pretty good.
I couldn't find the adequate words to express this so far, but the reason I at all find this situation worth commenting on is that is reassembles to me a very common pattern of denial I observe among many professionals in various professions in cases where a problem appears that is very hard to tackle or even to analyse in the first place. Often a doctor who has troubles identifying a disease will tell you it's probably just something in your head, a programmer who has trouble reproducing a difficult bug will tell you it's you who probably did something wrong at some time, even a guy who I called to repair my washing machine that was stopping the washing at random told me to "keep it under observation" when he wasn't able to tell what's wrong. Many people simply do not want to put in the work needed to solve an unexpected and difficult problem, and thus, perhaps even subconsciously, try to handle it by pretending it doesn't exist. If you want to be a real professional and a leader in what you do, you can not behave like that, you can not repress a problem when someone reports one to you, you have to have the patience to examine the issue, the experience necessary to know when you can be certain that you have the complete picture of it, then sometimes the courage to admit there really is a problem and then finally you have to solve it, or people will not respect you.
The fix is well-known (a precise GC), just implementing it hasn't happened yet.
Here's to hoping they find a cool solution! It's been a problem for GC's since forever, and if they find a general way of handling the problem I'm sure it will be picked up by many other runtimes.
But aside from the shameless plug, C# and D (I believe), have had precise garbage collectors for quite a while now, and they have similar memory management to Go. It's well-known how to implement it (but that doesn't make it any less hard — I can totally understand why Google opted for conservative GC in the first version).
(1) In order for malloc to work, you need a header anyway (at least, unless your allocation fits in one of the fixed-size bins).
(2) You can get around the header to some extent by sorting the fields of your objects so that pointers come first, and then all you need to do is to store the number of pointers (or a sentinel value). This is what Haskell does. Of course, this prevents low-level control over data representation.
(3) You can tag (or NaN box) all your values. This is what most MLs do, as well as JS, many Lisps, etc.
(4) You can use a map on the side from pointer to type info to avoid a header. This is what Rust in its early days did. It's worse than a header for memory consumption though, so it doesn't really buy much.
(3) is annoying. Who likes 31-bit integers? Not I!
"Go is a general-purpose language designed with systems programming in mind."
Beside this hiccup (and a week without sleep with the website on life support), Go has been a real joy to work with.
By virtue of there being no AArch64 implementation announced so far, all androids are 32-bit because there is no 64b ARM core.
> Doesn't this issue impact the GO Android SDK adoption?
Any Go application should be able to run into this issue if it has a similar usage pattern.
Some comments were asking why early Java's GC wasn't this bad even though it was conservative. The reason is that Java can't take references to fields of an object, so the data mistaken as a pointer has to actually point to an object header. In Google Go you can take a reference to a field, locking the whole object, so the faux pointer can point to any field as well (or in this probably any location in the object). Not exactly the wisest choice in semantics, as they are seeing now that it complicates the GC.
It's absurd to call it Google Go when there's a first class GNU implementation that has been in development from the beginning, there's at least a closed source implementation and another distinct BSD license implementation in its infancy.
It makes even less sense than saying Apple LLVM, Juniper FreeBSD, or AT&T C.
There's also this: http://go-lang.cat-v.org/go-search
I only use that construction with ambiguous or unsearchable names (for instance I would also say Google Maps). I don't think I am along in using this form and I feel it is appropriate to refer to Google Go this way.
Frankly I fail to understand why this bothers you so much, as it clearly does. I would expect Google Go advocates to be delighted to have Google's good reputation for engineering imparted onto this language.
"Frankly I fail to understand why this bothers you so much, as it clearly does."
It is annoying because it is incorrect, because you have been corrected multiple times, and because what you are attempting to do is transparent as hell. I, and I imagine many others, do not have a strong appreciation for botched attempts at subtlety.
> It is annoying because it is incorrect
Maybe some English professor, technical writer, or journalist reading this can chime in and explain how so. It seems to be the standard practice and I intend to use the best grammar and construction that I am capable of, as bad as that may be. This isn't Twitter.
FWIW, the Windows, Plan 9, OpenBSD and NetBSD ports were done entirely by the community.
AT&T C? That's even less specific than Go. What about gccgo? Google/GNU Go? What about the commercial implementations?
When referring to a friend named "Edward" in a text to another friend planning Edward's surprise birthday party, you can probably refer to him as "Ed" or "Edward". You definitely don't need to refer to him as "Edward (Parent's SSN:12345...)" or "Edward (Philip's Son)". While these latter forms are less ambiguous (and more searchable, to boot), the context is more than sufficient to disambiguate.
Yes, and then we see:
Give me a fucking break. You are clearly not worth engaging in discussion; I'm done.
1) it mainly affects long time running applications on 32 bits machines. Fact is most production servers are (or should be) on 64 bits
2) most people (me for example) never observed such a phenomenon (and all my dev computers are 32 bits)
So, that's an important problem but far from a game stopper.
Sadly, it isn't here yet.
(To forestall the obvious objection: No, there's no way Torvalds will allow this to re-introduce the 2038 Problem into a new ABI. None.)
Then I realized there are many small VPS/EC2 instances with <1GB memory (I have two right now running on 64bit) and there are people who try to squeeze every last bit out of them by going 32bit...