
Memory matters, even in Erlang (2010) - ColinWright
http://www.lshift.net/blog/2010/02/28/memory-matters-even-in-erlang/?repost=HN
======
dgreensp
I'm kind of speechless that "popping a message from the queue" requires
knowing the length of the queue, which requires traversing the entire queue, a
linked list of as many as 90,000 elements. Even the "fast" time of about half
a millisecond is crazy slow for dequeuing an item.

If a linked list is a built-in Erlang data structure, why doesn't it keep a
length count? If it's common knowledge that this operation is O(N), why does
RabbitMQ call it on dequeue, and why doesn't it keep its own length count (and
why isn't that the obvious fix)?

I can't think of any JavaScript built-in (the language where I spend my days)
with a name like "length" that internally dereferences N pointers, but I would
not be shocked to discover that dereferencing tens of thousands of pointers
was taking a few milliseconds, and I would not write my data structure that
way.

~~~
Betelgeuse90
Well, you do need to know that there exists an item in the queue. Knowing that
length > 0 is one way to go about it, but certainly it's enough to establish
the existence of one item and then move on.

If one is unaware of the complexity of the .length() method, one might use it
to find out if a dequeue is possible.

~~~
rtpg
This seems like a pretty fundamental issue though. I'd feel like any systems
programmer worth their weight in salt would know that yo can build an O(1) pop
on a queue.

~~~
felixgallo
oh, I suspect they do. Which is why the manual page for erlang:queue talks
about the topic in detail and explains the rationale, gives a workaround, and
links to the literature.

[http://erlang.org/doc/man/queue.html](http://erlang.org/doc/man/queue.html)

~~~
rtpg
This talks about the O(n)-ness of length, but I don't see the rationale for
the O(n)-ness of pop. Maybe I'm missing something?

~~~
felixgallo
You fell victim to the confusion in the original post and this thread between
what the original poster describes as popping, and what they were apparently
actually doing (which involved taking the length, which is O(N) for the
reasons explained here and in the documentation).

erlang's queue's pop ("out") is amortized O(1), worst case O(N) just like
every other immutable double headed queue. Here's the source code:

[https://github.com/erlang/otp/blob/172e812c491680fbb175f56f7...](https://github.com/erlang/otp/blob/172e812c491680fbb175f56f7604d4098cdc9de4/lib/stdlib/src/queue.erl#L137)

------
Kenji
"They said high level languages relieved us from thinking about memory issues"
nobody said that. I'd even claim the opposite. For example, to develop a game
on Android (Java) or in the browser (JavaScript) you need to work around the
garbage collector the best you can with techniques like object pooling, or
else your entire program stops every 3 seconds. It's just nasty. Know thine
enemy, know thine garbage collector.

~~~
jasode
>nobody said that.

What's your definition of "nobody"? :-)

Actually, many people said it all the time -- especially during 1995/1996 when
Java hype was gearing up. It was touted as a selling point in Java books,
JavaWorld articles, usenet postings, etc.

Examples:

\--Java also implements automatic garbage collection, so you don't have to
worry about memory management issues. All of this frees you from having to
worry about dangling pointers, invalid pointer references, and memory leaks,
so you can spend your time developing the functionality of your programs."[1]

\--Programmers can be relatively fearless about dealing with memory because
they don’t have to worry about it getting messed up.[2]

\--Java also implements automatic garbage collection, so the programmer does
not have to worry about memory management issues.[3]

\--The entire interaction with memory is worry-free in Java.[4]

The issue of "GC pauses" or "extra memory footprint required for performance"
is not mentioned in any of those sources. It's not Java's fault that it's
presented this way because every technology is hyped by only listing the
advantages and avoiding the talk about disadvantages.

[1]1996 edition of David Flanagan's "Java in a Nutshell" \--
[https://books.google.com/books?ei=ZDHOVOztD4z8sASIsoKYAg&id=...](https://books.google.com/books?ei=ZDHOVOztD4z8sASIsoKYAg&id=ibCWvFAnov8C&dq=1565921836&focus=searchwithinvolume&q=worry)

(he removed the phrase " _don 't have to worry about memory management
issues_" from newer editions of the book:
[https://books.google.com/books?id=mvzgNSmHEUAC&printsec=fron...](https://books.google.com/books?id=mvzgNSmHEUAC&printsec=frontcover&dq=java+nutshell+worry+memory&hl=en&sa=X&ei=JyzOVMzsJIi1sQTOroHoBQ&ved=0CB8Q6AEwAA#v=snippet&q=memory&f=false))

[2]James Gosling -
[http://www.cs.dartmouth.edu/~mckeeman/cs118/references/Origi...](http://www.cs.dartmouth.edu/~mckeeman/cs118/references/OriginalJavaWhitepaper.pdf)

[3][http://www.trailstone.com/softdeve/java/overview.html](http://www.trailstone.com/softdeve/java/overview.html)

[4][http://wellscs.com/robert/java/productivity.htm](http://wellscs.com/robert/java/productivity.htm)

~~~
JoeAltmaier
Few have said it since those foolish comments back in 1995. That pretty much
is 'nobody'.

~~~
tbrownaw
No, people still say those things. Try saying something in favor of C++,
you'll see what I mean. :)

------
tel
It's an interesting phenomenon in "slowish" languages that people feel this
way

> This function [length] is implemented in the VM so it’s expected to be very,
> very fast.

Even being aware of the data structure and algorithmic bounds forced upon such
a function, being "in the VM" indicates speed and a safe choice in using it.

Of course, as this post notes, that's sort of insane. O(N) is a big
computation on a big list and it means that there's a chance of memory
competition. High level language or otherwise, you can't ignore _algorithmic
complexity_. That's just absurd.

It just feels like a weird quirk of Erlang's queue module. If you're mutable,
then getting down to raw memory is probably a good idea. If you're doing this
all with linked lists the _dear god_ go read Okasaki and at least have an
amortized constant queue pop!

~~~
felixgallo
or at least read the erlang:queue() documentation which specifically
references this case, outlines the rationale for why it is the way that it is,
provides an Okasaki-style API, and references his book.

[http://erlang.org/doc/man/queue.html](http://erlang.org/doc/man/queue.html)

------
jacquesm
Bad title. Interesting bug, and it shows you just how much being aware of
implementation details can pay off. I'm kind of surprised that Erlang doesn't
keep a 'length' field under the hood for lists, but that's nothing the OP
couldn't fix by keeping one for himself.

The Erlang VM fix isn't really a fix from a user point of view, installing a
package like this should not require patching the VM, I do think that the
Erlang VM maintainers should be made aware of this issue and then they can
decide if they want to roll the patch or a more serious version of that patch
into the distribution, it would seem to me that they are not the only people
that have been hit by this bug.

As for the bad title: The higher level your language the more place there is
for bugs to hide and if you actually believed that higher level languages
would isolate you from the machine sufficiently that you could ignore reality
then you had it coming.

~~~
rasz_pl
>Interesting bug

its not a bug, its hidden optimization during GC (hibernation). You would want
that optimization if you intend to touch values. Its something Erlang devs
decided upon after analysing use cases.

~~~
jacquesm
A 9:1 performance drop after hibernation is a bug in my book but of course one
mans bug is another mans feature.

Especially since hibernation is something you don't control (it kicks in by
default in OTP when a process is idle for a while).

[http://www.erlang.org/doc/man/gen_server.html](http://www.erlang.org/doc/man/gen_server.html)

Does not contain any caveats about performance being substantially worse for
some use cases after waking up a hibernated process.

~~~
asabil
No it does not, hibernation is very explicit.

------
robmccoll
I know nothing about erlang, but if performance matters and you won't be
deleting from the middle, an array-based implementation will usually be better
than a linked structure. Better alignment, less pointer chasing, fewer
allocations and reallocations. Also the synchronization cost can be lower in a
multi threaded context. Cost might be more memory allocated in some cases (but
less in others and less fragmentation in your memory allocator's pool).

~~~
JoeAltmaier
An array would work, most of the time. The rest of the time, it'd be
reallocating to extend the array, then copying. Which also has undesirable
performance at unexpected times. Don't see it as any better.

~~~
robmccoll
That's almost true under the assumption that you are using the most elementary
implementation possible with only a single array (insert at end of array,
reallocate as needed to increase array length, copy elements down during
dequeue, etc.) - you would still be avoiding the garbage collector relocating
your array elements and generally get better performance due to nice alignment
within cache lines in memory. With a more advanced implementation using
multiple arrays, you can do even better (basically a blocked linked list), and
you can cache the blocks as you dequeue and free up blocks to avoid hitting
the memory allocator more than necessary. If you want to get really crazy with
it, you can start doing blocks of multiple sizes that grow and shrink
dynamically with the usage behavior of the list, but at that point what you
have done is implement a memory allocator.

------
ZanyProgrammer
Until recently De Anza College had _C_ as their introductory programming
language (and even now, their C++ courses appear to be nothing more than a
warmed over C with slightly different syntax). My opinion is that that was a
horrible choice-students spend waaaaay too much time bogged down in pointers
and memory allocation, and less time thinking about algorithms and just
getting used to writing code.

C/C++ are excellent languages to learn, but dear god, at an upper division
level.

------
Morgawr
From the HN posting guidelines:

 _please use the original title, unless it is misleading or linkbait._

I'd say this title should be changed to the original, as it is now it smells
like clickbait. Or at least its purpose seems to draw unnecessary attention.

------
rtpg
This is about an implementation issue, not really a language issue. Like
saying a bug in GHC proves we need to think about types

That said, it's an interesting breakdown of the bug, though I wonder how they
arrived to it (Wouldn't have minded more steps in the discovery phase)

~~~
jblow
It is only an implementation issue if it is possible to solve the issue.

Nobody has ever built a garbage collector that does not slow your program down
or cause it to use vastly more resources than it would otherwise. (Claims to
the contrary are always implicitly caveated).

Given that this is the case, it really does start looking like a language
issue. Yes you can rearchitect the GC to care more about locality, but you are
just pushing the dust around on the floor: you will find a different problem.

~~~
rtpg
Your comment isn't wrong in itself. For any GC you can write some program that
exploits its weaknesses

But you could deal with this bug in various different ways. For example, here
you could be implementing erlang queues so that the length is stored, instead
of checking the length. You could implement hibernation differently. You could
implement popping differently. There are probably ideas from people who know
more than nothing about erlang.

Anyways, I don't think this particular issue was really an issue with the
concept of a GC'd language, more of a specific bug issue.

~~~
jblow
My point is that you can solve this one symptom but your program will have
many other problems due to GC (provided it does a lot of work). It is like
whack-a-mole in that there are always more moles.

------
steventhedev
Horrible link title: "They said high level languages relieved us from thinking
about memory issues"

High level languages do relieve us from thinking about memory. The linker
allows you to forget about program layout in memory and executable. C and
other "system" languages free you from managing byte alignment for your
structures, byte order and more, and very high level languages even manage
your memory use with automatic garbage collection. Locality is a concern in
all these languages, and there's no abstraction for that (yet).

The simple solution to the issue presented in the article itself is any of the
following:

1\. Remove the call to len(). Handle the empty case from the drop/get/peek
call, which would appear to be the idiomatic style for Erlang.

2\. Track the size of the queue as is suggested in the docs, to improve the
efficiency of this specific call.

Or, you could tackle the underlying issue, which is a lack of locality in the
list data type following a GC cycle. This would probably involve allocating a
chunk of nodes at once so it's the size of a cache line on the platform that's
running the VM. Seems to me like that would make a patch worthy of being
merged upstream rather than prefetching random chunks of memory.

EDIT: formatting

EDIT: After looking up the offending function:
[https://github.com/rabbitmq/rabbitmq-
server/blob/master/src/...](https://github.com/rabbitmq/rabbitmq-
server/blob/master/src/rabbit_amqqueue_process.erl#L982) it appears that they
can't just remove the call to len, since it is part of the API. So please
ignore my first "solution"

~~~
lostcolony
Yeah, RabbitMQ opted for the 2nd because of that. BQ in that case is whatever
queue implementation was selected; if you look at both the lqueue.erl and
priority_queue.erl, you see that they implemented len() by pulling apart a
tuple and taking a value verbatim; no further operations. As such, I'd assume
every queue type they support has its own internal wrappers to maintain a
length.

------
perdunov
Frankly, the problem described is not a problem at all, as it would be weird
to rely on cache hits to ensure performance of linked lists. Cache misses are
an expected drawback of linked lists, as they have other advantages.

~~~
ColinWright
I would think that if you were to ask people about a sudden and unexpected
9-fold drop in performance, they would say that it was a problem.

Certainly to me, never knowing when your system might slow down by three
orders of magnitude (base 2, nearly one order of magnitude base 10) _is_ a
problem.

~~~
perdunov
Yes, I agree that this is a peculiar case and it is worth reading and keeping
in mind that such things can happen.

I am just saying that I think it would be more correct to consider this as a
9-fold performance _increase_ gained from caching, as no one should rely on
caching when dealing with linked lists.

------
JoeAltmaier
Nobody ever said that. Memory and time are especially important for high-level
languages, because they spend them in an uncontrollable way inside said
languages.

For instance, the common approach of garbage collection is a huge topic in
every high-level language. Because it takes a relatively simple operation
(delete one item) and batches it up. So when it happens, it takes thousands or
millions of times as long AND inevitably happens when you can least afford it.

You never stop thinking about memory (and cpu time) in any environment.

~~~
davidw
> the common approach of garbage collection

Erlang actually doesn't take the common approach, which is part of the reason
it usually does "soft" real time pretty well. This is the first thing I found
in Google when searching for a description of how it works:

[http://prog21.dadgum.com/16.html](http://prog21.dadgum.com/16.html)

~~~
JoeAltmaier
Don't understand. That's a link to a description of the Erlang garbage
collector.

~~~
asabil
The Erlang garbage collector was designed to explicitly avoid unpredictable
long pauses

------
SideburnsOfDoom
In my experience, if you use a List supplied by the standard library of your
language, it isn't a _linked_ list. In fact it's usually an "ArrayList" style.
(1)

This is so that length() and item = list[index] operations are not O(n), they
are O(1). You do have to be aware of the tradeoffs, e.g. that inserts into the
middle of the list can be slower (and YMMV with adding at the end. It depends
on if a realloc is triggered).

It's worth checking which implementation you actually are using, but linked
lists are the exception, for good reasons.

And when you have no choice but to walk the list, it's well known that instead
of writing this (in c# functional style pseudocode)

    
    
        if (someList.Count(item => someCond(item)) > 0)  doSomething(); 
    

... it's better to write

    
    
        if (someList.Any(item => someCond(item)) doSomething();
    

This is for much the same performance reasons - you don't need the exact count
when all you want to know is if there are any, and the count can become
expensive if the list is long.

1)

Java:
[http://docs.oracle.com/javase/7/docs/api/java/util/ArrayList...](http://docs.oracle.com/javase/7/docs/api/java/util/ArrayList.html)

c#: [https://msdn.microsoft.com/en-
us/library/6sh2ey19%28v=vs.110...](https://msdn.microsoft.com/en-
us/library/6sh2ey19%28v=vs.110%29.aspx)

python: [http://stackoverflow.com/questions/3917574/how-is-pythons-
li...](http://stackoverflow.com/questions/3917574/how-is-pythons-list-
implemented)

~~~
theseoafs
Erlang is a functional language; functional languages are different. They
really need proper linked lists as a built-in default "list" data structure.

~~~
SideburnsOfDoom
Right. Is that because functional languages prefer immutable data structures?

e.g. cdr/ tail / list.skip(1) are easier and simpler if you can just return
the first element's next pointer rather than copying an array.

~~~
theseoafs
Yes, that's the idea. ArrayLists don't work well at all if you want persistent
data structures as your language's default. Also arrays are less sensible for
pattern-matching. (Note that most functional languages offer normal stateful
arrays as an option, if they're necessary; interestingly, Erlang does not.)

------
toast0
I'm not sure why they're hibernating when they have a big queue? Usually, one
would hibernate when you're idle: you've done all the work (queue is zero),
and you haven't seen a request in a while.

But certainly, don't do length(List) if you need to see if it's empty; pattern
match against ([]) and ([Head | Tail]).

------
dschiptsov
But not of thinking about "access patterns". Most of list operations are is
still O(n), while arrays or tables are of O(1) and the like.

Btw, it is always suspicious when someone promise you less thinking. You will
end up with crap like UCS4 or std::list, etc.

------
kungfooguru
Another fun Erlang memory investigation
[https://blog.heroku.com/archives/2013/11/7/logplex-down-
the-...](https://blog.heroku.com/archives/2013/11/7/logplex-down-the-rabbit-
hole)

------
jkot
They dont. In java we fairly often dive into memory layouts and how bytecode
is JITed into machine code. But to be fair Java is low-level language at some
levels.

------
tbrownaw
This is why your basic data structures should come with complexity guarantees
as part of the API documentation.

------
MichaelGG
It's rather rich to say that length should be fast, then describe a pointer
chasing loop.

