Hacker News new | past | comments | ask | show | jobs | submit login
Stop Using Linked Lists (highscalability.com)
128 points by aespinoza on May 22, 2013 | hide | past | favorite | 137 comments

On a related note, cache alignment is one of the reasons Clojure's vectors (log32 insertion, access, and deletion) and linked lists (o(index) insertion, access, and deletion) have unintuitive performance characteristics. For instance, constructing lists of either type takes roughly the same amount of time for big lists:

  user=> (with-progress-reporting (bench (into '() (range 100000))))
               Execution time mean : 9.293932 ms
      Execution time std-deviation : 269.771284 µs
  user=> (with-progress-reporting (bench (into [] (range 100000))))
               Execution time mean : 9.882163 ms
      Execution time std-deviation : 359.744662 µs
And with smaller lists, building vectors is ~30-50% slower.

  user=> (with-progress-reporting (bench (into '() (range 24))))
               Execution time mean : 2.483705 µs
      Execution time std-deviation : 71.302962 ns

  user=> (with-progress-reporting (bench (into [] (range 24))))
               Execution time mean : 3.349080 µs
      Execution time std-deviation : 114.007930 ns
However, traversal is significantly faster for vectors, because you can pack 32 references into a cache line at a time. Here's a decent-sized list:

  user=> (let [x (apply list (range 100000))] (bench (reduce + x)))
               Execution time mean : 8.586510 ms
      Execution time std-deviation : 80.923357 µs
Vs a comparable vector:

  user=> (let [x (vec (range 100000))] (bench (reduce + x)))
               Execution time mean : 4.564553 ms
      Execution time std-deviation : 135.795328 µs
Traversing small lists:

  user=> (let [x (apply list (range 24))] (bench (reduce + x)))
               Execution time mean : 2.041794 µs
      Execution time std-deviation : 18.752533 ns
Vs small vectors:

  user=> (let [x (vec (range 24))] (bench (reduce + x)))
               Execution time mean : 1.051182 µs
      Execution time std-deviation : 10.413211 ns


Benchmarking on the JVM is tricky, beyond pure algorithmic efficiency (assuming all memory accesses are equal), there's cache effects, JVM heap and GC effects and the usual warmup/inlining concerns of benchmarking. On page 551 of the excellent scala staircase book (2nd edition[1]) there's a collection efficiency table w.r.t. head/tail/indexing/insert, but that's a jumping off point (sorry, I can't find an online version of the table).

[1] http://www.artima.com/shop/programming_in_scala_2ed

These benchmarks were taken with Criterium, which performs JIT warming, GC purge, final GC, retains results for each run, isolates outliers, etc.

I don't understand all the hate about the post, one of the referenced articles is actually very good and gives a quite compelling case (maybe it should have been submitted instead):


It shows that linked lists are slower than vectors for real-world-like scenarios even for those cases where the asymptotic complexity for linked lists is lower. Seems that that modern CPU architectures have changed so much that our theoretical models diverge further and further from reality, I think this is pretty interesting.

That's a much better, much more careful article. For example:

> Just in case my Mea Culpa and Introduction did not cover it. Let me be clear: This article is not disqualifying the linked-list for other types of usages, such as when it contains types that have pointers to other types, large and expensive-to-copy types (and yes, POD can also be too large).

The OP displays no such nuanced understanding.

I do not think our theoretical models diverge from reality; the theoretical models are based on how the performance relates to the size of the input, and remain valid even if you have a very fast computer. What modern architectures have done is to change the constant factors a bit; that just raises the threshold for problem sizes, but it does not eliminate it.

Sure, there are cases where asymptotic analysis can be dismissed. Matrix multiplication comes to mind: the fastest algorithms have impractically large constant factors. Usually, though, asymptotic analysis does matter, because it is often the case that input sizes will grow unexpectedly.

This has nothing to do with simply having a "fast computer", it would be all fine if we just used 486s clocked at 10Ghz, but we don't, the changes in the architecture of CPUs do indeed make the theoretical model more diverged from reality, because it presupposes a lot of things, like that every instruction takes the same time to execute, also implying that instruction execution times are independent, that one instruction gets executed at a time, etc. If those assumptions do not hold you cannot assume the factor is really constant, and even if it were so, you cannot dismiss this difference indefinitely on this ground, if on datasets of the size commonly encountered in practice the algorithms with a larger asymptotic complexity start outperforming the ones with a smaller one. For matrix multiplication this was known for years, for linked lists it is a relatively new development that came with larger and faster CPU caches etc.

Except that the theoretical model is not affected by any of the things you mentioned. The existence of caches, branch prediction, instruction reordering, parallel execution, etc. does not chance as problem sizes change. If your inputs become large enough, you will reach the upper limit of your CPU's ability to speed up execution with those features. In the limit, the asymptotics still matter, and experience has shown that the limit is not at all far-fetched in most cases.

What makes matrix multiplication an exceptional case is that the constant factor on the best known algorithm is so large that we do not know how to build a computer with enough memory to store a problem large enough to overcome that constant. That is not the case with this analysis of linked lists; all one can say is that the data sets chosen in that particular article (possibly representative of the most common data sets) are not big enough. One can certainly store a large enough data set to overcome the advantages arrays have, and so the only real question is, "Is it possible that the inputs will be so large?"

Maybe the answer is truly, "No, that is unlikely." I am skeptical, though, as there are not many cases where such statements can be made. Even software written for embedded systems that target specific products is likely to be repurposed for new systems with different inputs. Even researchers, who write software for the particular datasets sitting on their hard drives, often re-use their code in future work. There are "border" cases, like integer multiplication, but typically libraries will just select the best algorithm for a given input size (e.g. for multiplication, you'll probably only see FFT methods applied above a particular threshold). Perhaps linked lists are now a "border" case, but all that would mean is that we need to use abstract "sequence" operations that dynamically choose particular implementations to use as their sizes change.

I don't think he means big-O notation as a theoretical model... Processors are not the same as they were when they were single core machines. The entirety of computing and thinking about problem solving has become different than it was before 2005 because of multicore processor architecture. Projects like parallela make this obvious. Programmers in general don't understand how much this makes things that weren't a big deal 10 years ago a huge deal now. And stuff with threading is not even the tip of it. Threading is trivial compared to real parallelism. Our model of computation is different than it is now, and will continue to diverge until everybody starts thinking with parallel as a first priority. That's the most convincing argument the OP makes about why linked lists are bad. In parallel computing, they are definitely hard/bad.


Sometimes vague or implicit assumptions can make it unclear whether an algorithm is asymptotically optimal. For example, a lower bound theorem might assume a particular abstract machine model, as in the case of comparison sorts, or a particular organization of memory. By violating these assumptions, a new algorithm could potentially asymptotically outperform the lower bound and the "asymptotically optimal" algorithms.

See also:





As far as I know all asymptotic analysis has to be done using some abstract machine model.

Sure, but how do modern architectures not fit into the RASP machine model? You can view the cache contents as being part of the state (rather than the memory); you can similarly view instructions as being part of sequences, so that a single instruction can mean different things depending on the state from which it is fetched. Other modern features can be similarly approached (with the exception of CPUs whose behavior depends on environmental factors like temperature, but that is an edge case).

Really, if you doubt that the RASP model is appropriate for modern architectures, you can test it (a typical exercise in an algorithms course) -- see if, as the input size grows, the timing follows the asymptotic analysis. That is basically what the article you linked to does, and the results are not all that surprising -- where things are linear time in theory, they are linear time in practice; where they are quadratic time in theory, they are quadratic time in practice. It is worth pointing out that in all but the last example, the list and vector operations had the same complexity (because of the linear search), so it was really a comparison between constant factors.

Asymptotically, sure, you're right. Constant factors are often important in practice, and simple cost models (e.g. ones that don't model cache locality) will no longer give you a decent estimate of constant-factor differences in performance between algorithms.

I think the issue here is that, in the past, with shallower cache hierarchies, models that assumed a constant cost per memory access would maybe be off by smallish factor (I don't know, maybe 50%).

However, now memory access is frequently the limiting factor for an algorithm, and there can easily be an order of magnitude in variation between the average memory access latency for different algorithms (i.e. cache-smart versus cache-dumb).

It seems that often articles like this overstate their case to the point that it really detracts from the message.

There is a valid point, that a naive analysis of linked list vs. static array based on intro CS course descriptions of their properties isn't a good model for what is going on in a modern system.

The real lesson is: if you want to achieve high performance you simply must understand the impact of things like cache localization, vectorization, hardware prefetch, pipelining etc., and how your data structure operation will interact with them.

"Never use a linked list" is a silly lesson to take from this though. "In these situations, linked lists might not perform as well as you expect" is more like it.

"Use the right data structure for the job" is still as good advice as it ever was.

Right. Who is more likely to be able to write the more performant program:

* A developer who knows that an L3 cache miss and random main memory access typically takes about 100 ns, or

* A developer who learns from highscalability.com that he should "Stop Using Linked-Lists"

Exactly. Also, pointers are just an implementation detail of linked lists. You can implement an array based linked list and avoid "scribbling all over memory", which really is a valid concern raised in the article.

What exactly would an array based linked list be but either an array or an array deque? These things have names for a reason.

Instead of pointers, you would store an index in the array of nodes:

  struct node {
     void * data;
     uint32_t idx_next;
You are still dereferencing pointers, of course, but you have better locality, even after doing a lot of insertions and deletions.

It's even better with intrusive containers. As an extreme case, let's say we need a collection of 24 bit RGB triplets:

    struct node {
        uint8_t r, g, b;
        uint8_t idx_next;
Of course, a contiguous array uint8_t[256][3] might still be faster.

This is a great optimization technique to have up your sleeve. The downside of course is that if you're dynamically allocating and freeing nodes you can end up having to write your own memory allocator.

If you know in advance, say, that you will never have more than 2^(multiple of 8) items in the list, you can use a index smaller than the pointer size.

traditional linked list:

{ l * next; x * data; } or { l * next; x data; }

array-based linked list would be an array where each element also has the index (or pointer) of the next element.

{ int next; x data; } [1000];

Now all your elements are packed together in a memory space that you can control.

I think of (array) deque as something that allows me to operate on the front and back, not necessarily traverse and insert/delete in the middle.

This strikes me as heap management more than a 'linked list implemented as an array'. The heap itself is, after all, equivalent to a giant array.

But it doesn't get you the same locality as an array at any rate. The idea is that, assuming your elements are smaller than a cache line and your array as a whole is larger, what you're avoiding with an array is N/M cache misses while you traverse rather than potentially (or even probably, depending on various other factors) having a cache miss on every nextItem(). You'll also miss more often because of the expansion of the elements.

What stupid advice. Linked lists (and trees formed from lists) are a fundamental functional data structure. They have tremendous expressive advantages in functional code where operating on the head and the rest of the list leads to clear expression of an algorithm, and easy sharing of sub-parts makes certain other algorithms much more elegant.

An example is maintaining the lexical environment as you're compiling a programming language. The first response might be to use something like a hash table. But then how do you handle shadowing (where an inner block shadows a variable of the same name in an other block)? A much cleaner way is to use an association list that's implicitly maintained as you recurse over the AST.

E.g. in psuedo-Python:

    def parse_let(form, environment):
        (name, value) = parse_declaration(declaration_part(form))
        return parse_body(body_part(form), 
            acons(name, value, environment))
For those unfamiliar with Lisp, "let" introduces a new name bound to the value of an initializing expression, which is only in scope for the body of the "let" construct. Assume here that we're generating code from an AST and that the return value of a parse function is the register number where you can find the result of the expression, and the lexical environment maintains a mapping from a variable name to the register where that variable can be found. Or we could be creating an intermediate representation and the lexical environment maintains a mapping from a name to an IR node. Or whatever.

Here, "environment" is the association list. Assume that "acons" is a Python function that adds a pair of (key, value) to the front of a linked list. Note how the list is never explicitly mutated, it is maintained implicitly by passing a new value for "environment" as you recurse down. The beauty of this is that entries are removed from the lexical environment implicitly as parsing functions return, and also that "environment" is a purely functional data structure. You can stash away "environment" in say an IR node and it will always refer to a snapshot of the lexical environment for a given AST node, even after other nodes are parsed or the current function returns. This is non-trivial with say a hash table. Storing a copy at each IR node would eat memory. With a linked list, sharing of sub parts falls out for free.

Also, look at the Linux kernel sometime. It uses linked lists all over the place. Your malloc() implementation likely keeps a set of segregated free lists maintained as linked lists. They do this because they avoid ever traversing the list completely. They just operate on the head/tail.

Consider that the source is High Scalability, a blog that focuses on how orthodox methods are inadequate for 1-10M concurrent connections and how different methods are being employed to reach these lofty goals, I think the advice is pretty spot-on.

It's important to consider that if your objective is scalability and performance, then the article's advice is appropriate. The article's title ought to read "Stop Using Linked Lists if you care about High Scalability", but I attach the dependent clause on many titles I read from that site.

I'm sure the Linux kernel developers care about scalability and performance. Count how many linked lists you see in the kernel code. See: http://lxr.free-electrons.com/ident?i=INIT_LIST_HEAD.

I'm sure you will find strlen and linear searches too. It's more likely that absolute performance wasn't necessary for those cases, or memory usage with linked lists is better than a more complex data structure, or maybe even that linked lists were easiest for C developers.

Linked lists are used pervasively in performance critical parts of the code (e.g. the scheduler, the VM, etc). Linked lists just happen to have very suitable performance characteristics for the kind of tasks that happen often in a kernel. E.g. say you keep queue of IO buffers that have pending operations on them. You get an interrupt and the driver gives you back a pointer to the IO buffer it just filled. You want to copy that data out, and them remove the IO buffer from the pending queue and add it to a free queue. In this case, you'd almost certainly rather use an (intrusive) linked list rather than maintaining these queues as arrays.

They're pretty good at using the right tool for the right job. There are plenty of places where they use arrays or other data structures too.

In a lot of cases they really want to be able to do O(1) inserts/deletes at the beginning/end of the list, which is where linked lists do have a major advantage.

Short linked lists aren't too bad, using them for storing bulk data is mostly a bad idea unless you can't avoid it.

Citing kernel code is not a good argument without actually understanding why they're using linked lists, when they are used and how many of them are used at a time, etc, etc. Most of those linked lists are not really relevant. Because (1) they are far too few of them (2) they do not occur in areas where performance is significantly impacted. Besides which.. the kernel is a single instance program. Its not like there are 1,000 different kernels running simultaneously.

So.. if you want to write a web server/service that can serve tens of thousands of requests - Write it both ways and see for yourself. Its obvious that linked lists are a performance drag. Although they have much in common with other forced indirection penalties (virtual functions, etc)

Edit: ah, found a nice post that has analyzed just this.


Look at the performance profile here. Using linked lists is the right choice because of the way they are used...

As I've said elsewhere, there are linked lists buried in key operations of the kernel. E.g. the O(1) scheduler's core data structure was an array of linked lists (www.cis.ksu.edu/~gud/docs/ppt/scheduler.pdf‎). There was one list per priority level, and tasks were queued/removed from the lists at each scheduler tick. The current scheduler (CFS) uses an RB-tree. Virtual memory areas are tracked using vm_area_struct structures organized in a sorted linked list (http://lxr.free-electrons.com/source/include/linux/mm_types....). They're used in the futex implementation (http://lxr.free-electrons.com/source/kernel/futex.c#L467, the kernel's internal memory allocators (http://lxr.free-electrons.com/source/mm/slab.c), and extensively in the network stack.

Those links are of little value to the current discussion. If you took the performance profile, you'd find that the primary reason linked lists are used in those scenarios is because the performance worst case for linked lists occurs rarely. Which means the choice of datastructure is irrelavant w.r.t performance in those cases.

First of all, there's the issue of asymptotic complexity.

Usage of various data structures determines the asymptotic complexity of whatever you're doing. However, the thing that some uneducated people don't understand about asymptotic complexity is that this metric doesn't measure performance, but rather growth.

Usage of a linked list may mean that searches in it will always be O(n). However, depending on the case, this may be worse than logarithmic or O(1) complexity only for large-enough values of N, because for small values the constant factor plays a role too. Say for instance that you're searching for some value in a linked list. If you're talking about 100 items tops that's being traversed, that's probably going to be faster than a recursive function without TCO searching in a tree.

Priceless is the moment you realise that thinking in terms of asymptotic complexity is the most important thing you could ever do for performance.

Because it's a rather stupid thing to worry about things like cache locality if you don't first optimize the algorithms used. Because, for example, a quick-sort is going to be more efficient for most cases than a bubble sort and a bubble sort is going to be more efficient than a quick-sort for nearly sorted lists, with all the branch predictions or cache locality you could ever pull.

For a real world example, think of databases like MySQL. Performance on inserts in most databases, such as MySQL, deteriorates at an exponential rate, even though most of them are written in hard-core C with all CPU optimizations thrown at it that you can think of. This means that at scale, in one moment your database server is running fine, but in the next moment your server is gone. By comparisson, with a database where inserts degrade linearly, you can notice problems with months in advance.

All one can accomplish with CPU or GPU optimizations is improving the constant factor. This constant factor can be significant indeed, but at large scale it pales in comparison with the speed benefits you get from proper algorithms.

Going further, after you get your algorithms right, which is much easier to do with clean code that uses the right data-structures, you can then easily optimize the underlying implementation of those data-structures. For lists, for the interface of "push()" and "pop()" or of "queue()" and "dequeue()", you can use arrays instead, or linked lists where the items are arrays, or balanced binary search trees, or freaking Patricia tries, or whatever floats your boat, as long as you can maintain the FIFO or LIFO contract.

So that's why the advice is stupid. Because it's not putting things into context.

In fact, I would tell people - try not to use linked lists, because the notion of head and tail is an imperative concept that leaked into the functional world. Which really means it's not a future-proof concept and you'll have problems optimizing it, because you're still thinking in terms of how, versus what.

>In fact, I would tell people - try not to use linked lists, because the notion of head and tail is an imperative concept that leaked into the functional world.

Citation needed. Haskell's most basic data structure is a singly-linked list. The concept of a linked list seems to fit quite well with functional programming, as adding an additional element is simply a matter of making an element that points back to the unchanged previous element. And, linked lists work very well with recursion, as you can process the head, then move on to the rest of the list.

Are you aware of a functional datatype that can store an arbitrary amount of information and is simpler than a linked-list?

I got this idea from Guy Steele's keynote at Scala Days, named "What Fortress and Scala can learn from each other". For reference, Guy Steele worked on multiple programming languages, including Scheme and he can explain it better than me:


The gist of the matter is this - lists preserve insertion order (instead of the elements themselves having a natural order) and can only be accessed sequentially. You cannot easily split lists in two, or multiple chunks, which means we'll have a hard time doing automatic parallelizing of computations performed on linked lists.

Think of this simple expression:

    someList.filter(_ % 2 == 0).map(_ + 10)
If the underlying data-structure would be a balanced binary tree, you could easily split the work required in multiple threads. Because you're specifying the "what" instead of "how" and you can let the platform choose the best solution, like splitting it in threads or offloading that to a GPU, right? Well, linked list have the "how" encoded in them.

Maybe it helps to think of data-structures for what they really are: frozen algorithms.

Guy Steele is working on some pretty cool stuff in Fortress. Check that video out.

That makes sense. Still, I don't think its fair to say that linked lists are an imperative concept. While they lack one of the most mentioned advantages of functional programming, they still feel 'natural' in functional programming, not like they are borrowed from imperative.

Furthermore, I am not aware of another functional data structure which has O(1) append time, which allows us to construct lists element by element easily.

You can have amortized O(1) with trees for example. Appending elements to the end of Vectors in Scala or Clojure is done in amortized O(1), being a tree with a high branching factor. And I'm not sure how the immutable Queue is implemented in Scala, but it's also O(1) for queue() and dequeue() and it can't be a list.

It's true though, Lists are easy to implement, easy to reason about and speaking of frozen algorithms, you can view a linked list as being a frozen tail-recursion :-)

Linked lists are a good fit for functional languages, and arrays aren't a great fit: this can be a major obstacle to writing Haskell code that is both idiomatic and efficient.

I agree about lists being much more expressive in a lot of functional code, and sometimes compilers can optimize away some of the overhead. On the other hand, take a language like Haskell: It has Data.ByteString and Data.Text, which are array-backed string types that exist (almost) solely because String (i.e. [Char]) is inefficient unless you're doing a lot of manipulation such as insertions and deletions in the middle of the string (in which case you might want to use some form of Builder instead, anyway.)

The advice should probably be "use the right tool for the job." I wouldn't dream of writing functional programs without lists, but I also don't use e.g. the String type if I don't actually need it to be a list. Lists and arrays have very different algorithmic complexities for different operations. Figure out which operations you need, and pick the data structure that best suits what you intend to do.

Yeah. There’s a finer point too: in Haskell, linked lists are useful less as data structures but more as iterators, because of their relatively large size overhead per element. A large proportion of list-based algorithms can be made streaming thanks to laziness and the immense cleverness of GHC, in which case only a constant number of allocations actually occur—often zero.

> String (i.e. [Char]) is inefficient unless you're doing a lot of manipulation such as insertions and deletions in the middle of the string

Is String actually efficient for this use case? If you insert in the middle of a String, wouldn't it have to copy all of the preceding characters? Only the tail could be shared, I would think.

Yeah, it's not very efficient. I mainly use lists as stacks, or when I'm filtering some collection and I just need to traverse it.

In general I've found that people who have spent their whole careers writing just one kind of code tend to overgeneralize the lessons from that experience. If you spend all your time doing numerical physics simulations, indeed you probably don't see many occasions when a linked list would be a good choice. A compiler, though, is, by its nature, going to be crawling with them (as rayiner points out). It wouldn't make any sense to run a compiler on a SIMD GPU anyway -- it's far too full of conditional branches.

I appreciate your post, but I wish you would've addressed his examples of when linked lists are less efficient instead of responding to the link bait title and calling it stupid advice. Your post was informative in its own right, though, so thanks for that.

The title wasn't link bait. The article said: never used linked lists ever. That's stupid advice.

His examples aren't bad. But they're focused on traversal. Of course use an array when you're mostly traversing linearly. But linked lists are very versatile. Deep down in your OS, the kernel is probably not representing IO buffers as linked lists. But, it's almost certainly storing free IO buffers in a linked list, or using lists to track threads waiting on a lock, etc.

Also, the implication you should ignore algorithmic complexity is questionable. I'd rather be slower by a constant factor than sometimes suffer catastrophic performance when a workload causes the asymptotic complexity to dominate the constant factors. I remember debugging an algorithm that worked fine on some developer's test machine with a few nodes, but exploded on a load with many nodes. It had factorial complexity...

The article should be viewed in the context of High Scalability. I think the author is more saying that you should test which works faster, rather than solely relying on algorithmic complexity.

Maybe I have a different idea about what's important for highly scalable architectures, because in my mind ignoring algorithmic complexity theory and adhering to mantra's like 'stop using linked lists' is one of the best ways to end up with a solution that doesn't scale at all.

Err, no, he's not saying that.

There's nothing to address. Your introductory data structures and algorithms class taught you when lists are good and when they are bad. That isn't controversial. The problem is the claim that you should never use lists, which is absurd.

The advice is not against lists, it's against linked lists which are a particular implementation. You can have dynamic-sized lists that are backed by arrays, yet have O(1) amortized time for operations that require the list to grow or allow it to return freed space (add/remove at either end). Linked lists will only win when you need to add or remove elements in random positions (AND don't need random access, so this is usually very niche scenarios where you iterate the list end-to-end but make some updates, e.g. to insert new elements in order). Trees made up of list nodes are another obvious exception, but then we're not arguing about trees, just pure lists. (And even then... denser kinds of trees such as heaps or b-trees can often wipe the floor with traditional trees made of tiny, linked nodes.)

> You can have dynamic-sized lists that are backed by arrays

Lists backed by arrays have the same problem as arrays: pointers to elements are not stable. To take the example in my post, say you want to save the environment of every AST node so you can generate debug info mapping from variable names to register numbers at each line. When the list nodes are stable, it's trivial to just save a pointer to the head of the list. When addition/removal operations on the list can cause reallocation, that becomes much harder.

> Linked lists will only win when you need to add or remove elements in random positions (AND don't need random access, so this is usually very niche scenarios where you iterate the list end-to-end but make some updates, e.g. to insert new elements in order).

This is not a niche use. There are a huge number of operations where you don't traverse the list but need to add/remove from the middle. Consider something like the task scheduler in a kernel. When a process makes a call that blocks, the scheduler gets a pointer to the task structure and needs to remove the process from the run queue until the data it is waiting for arrives. If you store the runque as an array, you need to search it and remove the relevant task. If you store it as a doubly linked list, you can remove it with just a couple of pointer modifications. Indeed, inside a kernel or memory allocator, if you're iterating over any potentially large sequences, you've already lost the battle.

> pointers to elements are not stable

This is only a problem if the list is constituted exclusively by its backing store; which is a common implementation for linked lists in some langs/libraries, but rarely for lists backed by arrays. In the latter case, the "list object" typically has a pointer to the backing array, and also other fields like indexes of first/last element in use. This means one extra level of indirection for any use of the list, but in practice compiler optimizations easily hoist or constant-propagate this overhead away in any code where it matters.

> There are a huge number of operations where you don't traverse the list but need to add/remove from the middle

Admittedly, my use of "niche" is context-dependent. In languages like Java where List is a kind of catch-all data structure -- it's the collection that people use when they don't have a very good reason for any other option -- the huge majority of uses do not involve updates in non-tail position (or even any updates after the initial population; most of the time a fixed-size array would work just right... except that it's against modern Java religi, er, style, to ever use its primitive arrays).

Actually lists of arrays are still linked-lists, with the definition being: a data-structure where nodes are grouped together, with each node being made of a datum and a reference to the next item, where you can push() and pop() from one or both ends in O(1) and that can only be accessed sequentially in O(n).

It really doesn't matter what the reference actually is, all that matters is that the information for accessing the next node is contained within the current node.

Having optimizations like XOR-ing the back/forth pointers for encoding a doubly-linked list with a single word per node, instead of two, or linking together arrays of fixed size, that's just an implementation detail.

What I meant is a list where all elements are stored in a single, dense array, which grows automatically if necessary for adding new elements. See C++'s std::vector, and Java's java.util.ArrayList.

Your first sentence was enough for me to up-vote you, without reading the rest of the comment.

Also, your whole explanation is unnecessary, when this advice is analogous to ... stop using stacks!

I would add queues here too, though people might mistake that for priority queues, which can be modelled as trees.

I remember being asked once in a job interview what I thought about heaps. I confessed that I didn't recall the specifics of all the different variants of that data structure as I hadn't had the occasion to use one before. https://en.wikipedia.org/wiki/Heap_(data_structure)#Variants

It turned out he was asking about dynamic memory allocation, i.e., malloc/free and new/delete.

(I didn't get the job)

Hehe, it happens.

Speaking of memory allocation, the stack is named that way because it's an actual stack and most function calls receive parameters through it and return their results through it. This is why it's cheaper to store things on the stack, because if you want to use a value from memory, it's going to end up on that stack anyway (e.g. boxing / unboxing), not to mention that items get allocated and deallocated in LIFO order, so you've got no issues with searching for available space or fragmentation.

There's been some discussion on HN in the last few days about negativity. If there's one thing that inspires negativity, it's hearing categorical statements like "ALWAYS do X," or "Y is NEVER true" when the listener has specific experiences that contradict this.

We all come at computing from different perspectives. The perspective of a JS developer is very different from the perspective of an OS developer. "Rules of thumb" that make sense in one scenario may be completely wrong in another. Different programmers are faced with different constraints, different performance profiles, and different relative costs (which can lead to different tradeoffs).

If you're tempted to make a categorical statement, maybe it's better to first consider whether your statement is as universal as you think it is.

This is not good advice if you need your list to have unbounded size and want it to be lock-free. Growing an array and copying all the elements over is very expensive in a thread-safe scenario. If you use a linked list, however, you can implement all the operations without using any locks at all.

(Linked lists are what we use for our channels in Rust, and as a result they're extremely fast: in the new scheduler they're totally lock-free except if the task is sleeping, which we can optimize to be lock-free later. They have unlimited size, which helps prevent deadlocks.)

Very cool. What kind of lockfree linked lists? Did you work from one of the papers on the subject?

(Searching "rust lockfree" reveals a feature request for a lockfree malloc. I happen to have one of those, but the reality is that synchronization will probably not be your bottleneck.)


Use linked-lists in situations where you need fast insertion and lookup time isn't as important. Don't use linked-lists when lookup time is important. Don't make fallacious claims supported with misguided and incomplete examples.


I don't think it's like this, it's true that the article is a bit harsh, but it contains a lot of references that support the claim.

Additionally I also came to the conclusion that plain linked list are virtually always slower in practice because insertions in vector is amortized constant time, or because you can use better more local structures like deques, or because hash tables are always an option, and so on. Also check Soustrup's vector vs list slide in the presentation by linked by chmike's in this thread it's pretty demonstrative.

Stop Using Linked-Lists (in areas where linked-lists are the wrong structure to use)

Edit: I should note this article's advice is really good advice if your focus is performance.

More: "Linked lists are much, much slower than you think they are and should not be used in performance-sensitive code." That fact isn't remotely obvious to most hackers, and simply brushing it aside as an area where linked lists are "wrong" is missing important details.

Even in performance sensitive code, linked lists might be the right way to go. If you need to store an unknown amount of data, a resizable array probably does amoratize to a better performance than linked lists. But, all of the 'slowness' happens at the same time, so it might be worth slowing down the average case to avoid the worst case. The most notable examples I can think of are video games where FPS is king, and kernels, where you always want to exit quickly.

Linked lists can also work better in limited memory environments because, with the overhead of 1 pointer per element, you can make use of fragmented memory.

Or something between the two: "Unrolled linked list": In computer programming, an unrolled linked list is a variation on the linked list which stores multiple elements in each node. https://en.wikipedia.org/wiki/Unrolled_linked_list

What the article misses is that asymptotic analysis does matter. Yes, you can find specific cases where arrays are faster even when you are doing a lot of insertions and deletions. The problem size often grows, and often grows without proportionate increases in cache sizes and the other factors mentioned in the article. I would much rather have an algorithm that scales well than one that performs well for the specific problem size I was thinking about when I wrote the code.

But perhaps you should be counting the horrifically expensive operations (cache misses) in your asymptotic analysis, not the virtually-free operations (writes).

Only if those operations become more expensive as the problem size grows. The point of asymptotic analysis is the relationship between problem size and expense, not the expense of individual operations.

To put it another way, imagine an strange implementation of quicksort that caused a cache miss on every comparison and every swap, and an implementation of bubble sort that hardly ever caused a cache miss. Which would you use in your code?

Hm, glib answers aside, you're right. I think the real point is that counting writes makes no sense because it is both not the most expensive and not the most frequent operation involved. Really, the whole thing comes from looking at lists wrong. Find-and-delete is not O(1), and converting half of an O(n) operation to O(1) doesn't actually make an asymptotic difference. If you are genuinely just performing inserts and removals of known elements things will look different.

In a real-world app that was actually subject to performance requirements? If the bubble sort meets the goal where quick sort doesn't, hell yes that's what'll get picked.

Not all optimization is premature optimization. Some apps care. And the point of this article is that some operations which people are trained to think of as "fast" (like the "O(1)" pointer operations in a list traversal) actually aren't.

I suspect that the real-world app using bubble sort would be fine...until the day it was not fine. By that point, you'll have a bunch of people who depend on the software, while you scramble to figure out why things became so slow.

Constant factors should not be ignored, but neither should scalability. That is the reason that production-grade sorting algorithms switch from quicksort to bubblesort/selection sort when the sequence is short enough. That is also the reason we typically use quicksort rather than heapsort -- quicksort usually has a lower constant factor on modern architectures, which is what matters when the asymptotics are the same.

The problem with focusing on the performance of your system for particular input sizes is that you are usually not guaranteed that the input size will not increase. It is not premature optimization if you have a good reason to believe that the input size will not grow, or that it will not grow enough to matter. Such is the case with matrix multiplication. If you have no evidence of that, though, you are almost certainly better off choosing the asymptotically better algorithm first, and coming back to tune that / switch to difference algorithms on smaller inputs / etc. later on.

Mmm...how do you represent your free memory block lists in your memory allocators for easy coalescing and whatnot?

Just curious.

Good grief. Why the sarcasm? Again I'm seeing a deliberate attempt to interpret an insane absolutist stance from what is actually (if you read it and not the headling) a well-considered argument. Linked list traversal can be slow, and they can be slow in ways that (again, if you read the article) you probably find surprising. Does that mean that some special applications (like malloc) can't find a user for them? Of course not.

But to respond to the technical point: the malloc case is very special. Free memory is by definition unused and probably cache-cold, so there's little value in improving the locality of reference. And it's "already memory", so there is no value in trying to put it into a "container" when you can just drop the link pointers right into the buffer. So your'e right, but in a specious way that is still missing the point of the linked article.

This! Linked lists / trees are great in applicable use cases, I think the author or the article just being an attention whore... to be fair I haven't actually needed to use binary trees or linked lists in ages...

The link to the article about "Starcraft crashes because it uses linked lists" seems underhanded. That article talks about intrusive lists and why they are useful, and what the down sides are. The actual crash seems to be related not to the use of a linked list at all, but just lack of shared data synchonisation which can happen with many other data structures.

True story. I was working for a property/casualty insurance company in South Carolina. There was a local community college that had programming, and most of the "programmer analysts" at this place went there and took the intro to C course, which included writing a doubly linked list. It also had the notion that not writing your own code was "cheating."

The application I was working on had something like 500 separate implementations of a doubly linked list, each of which was used to support exactly one collection, and each of which involved hours of coding and debugging. The company didn't care, as client companies were billed by the hour. One of the client companies had programmers who were horrified at this and their programmers introduced an adaptable linked-list library called "SuperLink." It was accepted as a "modification," incorporated into just the one implementation, then forgotten.

At page 45 of this presentation of Soustrup, [http://ecn.channel9.msdn.com/events/GoingNative12/GN12Cpp11S...], he shows benhmark comparison between linked list, preallocated list and array for insertion and deletion of small element values in list up to 500 000 elemnts.

Arrays are beating linked list.

That example involves inserting (and deleting) elements at a specific index, which means you would have to traverse (on average) half of the linked-list each time.

Generally, when I work with linked-lists (excluding functional programming) I only delete elements after I already have a pointer to them for something else. Similarly, I generally do not care about the order of elements, so I can either insert a new element at whatever index my cursor happens to be at, or append them to the end.

These are specific use case. Indeed in some cases lists are more efficient than arrays. The advice to use arrays instead of linked list is a rule thumb. Not all programmers are still sucking their thumb. ;)

I think a lot of people are missing the point, and that's probably at least in part because of the bold claim of the title. The point is that some of the (very most common) conventional wisdom about the appropriateness of linked list has become outdated and wrong in the age of very fast local cache.

It really does mean that in a lot of cases that, say, Knuth's books would have suggested you use a linked list, you really probably shouldn't any more, even if it doesn't really mean you never should.

From the perspective of computer architecture, the author provides what look like -- on their face -- good arguments. They quote from Aeter Suleman, who says things like: "they throw off hardware prefetching", or "they reduce the benefit of out-of-order execution".

These may have been true on machines of the past, but these are no longer true on modern systems. With the advent of trace caches and runahead execution [1], linked lists are really no longer as painful as they once were. (Indeed, even back in the day, Alpha had low-cost "explicit" runahead-like semantics, where the programmer could specify other work to do while waiting for DRAM; this was usable to accelerate linked-list traversal.)

[1] http://users.ece.cmu.edu/~omutlu/pub/mutlu_hpca03.pdf (disclosure: I worked closely with Onur at one point; his Ph.D thesis introduced runahead. I may be excessively biased in favor of that technique :-))

The correct way to do linked lists is to store the list traversal fields next to the data. In C, this would mean storing next/prev pointers inside of the structs which will be placed on lists.

In light of this, the things in the OP are often non-issues because you'll need the data in cache immediately after the list operation anyway (or during the traversal, for O(n) operations like list_find). In fact, vectors of pointers are worse for the hardware because you'll need to load in more cache lines than with lists, in order to traverse the array.

Lists aren't clearly the better option when the data will need to live exactly as long as the data exists in the container. In this case, you can store the data itself in a vector's backing array (and so the data will be invalid as soon as it's removed from the vector).

Who the heck puts the payload into a separate allocation?

I'm pretty sure I've never seen such an implementation. Not once. I could imagine seeing something like that in textbooks where they use graphical diagrams to illustrate how a linked list works but who would actually implement it like that -- I don't know.

The canonical way is to do:

  struct listnode {
    struct listnode *next;
    struct listnode *prev;

  struct your_own_data_node {
    struct listnode node;
    int x, y, z;
which ensures that you can have a set of functions that operate on struct listnode * and you can use them for all of your lists.

I wouldn't call that canonical. You can take a further step, as in the Linux version that someone posted below:


Anecdotal evidence to back up your claim: Mergesorting an intrusive linked list can outperform quick or heap sorting an array of pointers.

That's a very interesting refinement. Dangit, now I've got to go and code it up and profile it. Unless you've done this legwork?

What he describes is how the Linux kernel implements linked lists: http://kernelnewbies.org/FAQ/LinkedLists

In performance-critical code, this is well known. Also, since the 64-bit days, storing linked lists also has a significant overhead over arrays due to the additional storage requirements of the pointers, which means you fit less in cache which compounds the problem even more.

The article( and its referenced article) recommends against using linked lists because they are inefficient on modern processors (due to cache misses).

However, there is a better solution than throwing away a powerful and expressive data structure. Rather than linking individual data elements, instead link blocks of consecutive elements.

This hybrid approach takes full advantage of cache locality and it minimizes the memory overhead of storing both the links and the data.

This is the approach used in Python's implementation of deques: http://hg.python.org/cpython/file/85c04fdaa404/Modules/_coll...

This is awful advice which is based on a complete misunderstanding of the use cases of linked lists. Linked lists apply in situations when you can't place things in arrays (because they are too large to copy, or because there will be pointers maintained to them).

The author makes much ado about locality of reference and cpu friendly layout without understanding that those things are irrelevant because this is a data structure to use when indirection is required.

It always amazes me that such simple data structures can be so poorly understood.

I feel like this could have perhaps mentioned the cases where one should use linked lists.

Is a simple queue still implemented as a linked list? That is all I ever seem to use them for.

A queue would be an excellent time to use a linked list.

I don't understand the point of making a claim like this about a data-structure. The most fundamental thing in data-structures is that there is a time and a place for using each one. No data-structure is inherently 'better' than the others.

Based on the disruptor paper (http://disruptor.googlecode.com/files/Disruptor-1.0.pdf) linked lists make poor queues also. Ring buffers backed by arrays are much better.

Very interesting, but I feel there is a difference of need here.

You're right...there a Ring buffer with an array is better, but for average programmers a linked-list makes a damn good (maintainable, easy, understandable, and pretty quick) queue.

Mmmmmeehhh. Profile it first.

This is pretty reasonable advice. There was a time when linked-lists didn't have such a massive performance disadvantage compared with more contiguous data structures, but that time has passed and I'm not sure that the programming community is fully aware of it (certainly you wouldn't be explicitly told it in most CS programs). Memory efficiency is also often terrible on 64-bit machines, especially for doubly-linked lists.

Sometimes they're the right data structure, but I've definitely come across programmers who want to use a linked list for everything, even in code where performance is important.

Edit: the general advice that you should avoid linked lists for performance reason is good. The idea that you should never use them I just took as additional trolling for page views.

I was recently working some data structures in C++ where I needed to have tree-based structures and sustained performance. The compromise that I hit on was to allocate all of the nodes of a tree out of an std::vector. This allowed me the flexibility of tree-based structures together with all of the locality of reference that I needed. As the tree grew, the vector would resize and move, but the amortized average cost of that is directly proportional to the size of my data structure.

I offer this in case a similar compromise might be useful for someone else. (My guess is that it is probably standard for people who need to know this sort of stuff. I'm just not usually someone who needs to know this sort of stuff.)

This is used a lot for things like acceleration structures in 3D graphics - you also get the benefit of being able to store an uint32_t offset instead of a 64-bit pointer, fitting a bigger tree in memory/cache.

I used that trick as well. :-)

How big of a tree was it and how big were the nodes? I don't see how you get any locality of reference for a large enough tree, and you certainly don't get any of the hardware prefetch. To get any benefit out of this I think you need multiple nodes to fit on a cache line and for tree traversal to hit adjacent nodes in the vector (but how could you guarantee that?)

I had several related data structures, and the details varied.

But for the most part 5-1000 nodes per tree, with a node size of 12-32 bytes. And in my use case I'd be walking through a vector with thousands to hundreds of thousands of things, having to do operations involving lookups into 2-3 of these trees at a time.

The trick would not make sense for large trees. But it worked out very well for me.

I've been working on an application for a card game and I have been considering the implications of using a linked list for the deck structure. I just don't see how using an array would make sense for a deck of cards. I need the ability to constantly grow and shrink the deck (this is for Yugioh so cards would get put back into the deck often). I did consider using an array but it seemed to be more trouble than its worth considering the need to remove cards from random locations in the deck. My question is then what kind of data structure is optimal for this then? Are arrays still the better choice or is there another data structure I don't know about that is optimal for card decks?

"optimal for card decks"

When n is human-typical card deck size, the optimal solution is whatever minimizes some balance of development time and debugging time. CPU and coding efficiency will never enter as a limitation.

The absolute dirt simplest way to test your numerous card manipulation algorithms might be two arrays (or plain text files?) and your algos copy from one array into the other.

In Yugioh isn't there some inherent (however ridiculously large) limit to the possible number of cards in a deck?

Just use linked lists. If it turns out to be a performance problem (my guess: it won't) you can change the data structure later.

First implement this using a linked list; then an array; and then, if you have some time, see if a random access list improves on performance:


If you use a List in .NET, you get a wrapper for an array that will expand as needed (doubling in size every so often). The reasoning is quite similar to the reasons given in this article.

System.Collections.Generic.List<T> is similar to std::vector, but oddly named. System.Collections.Generic.LinkedList<T> is the implementation of the canonical double linked list ADT.

What is that about "premature optimization" and "the root of all evil"?

O(n) arguments are all very well as long as you keep in mind that, ehem, there are constants all around and they usually tend to be pretty big.

So a more honest title would be "linked lists may harm your efficiency in high-speed environments". Notice the 'may' and the context.

The title implies straightaway "stop using LISP", which to my taste is a rather bold statement.

So, exist a list of wich structures are better for the main use cases? ie, a simple "cheat list" or something like that? Because the norm, I think, for the naive developer is use the main one offered by the language (ie: in python, list, dicts).

I'm almost certain my tests I did a few years ago were valid, when attempting to push, pop and sort array vs linked list. Would be nice if there were so numbers, examples, etc.

The article links to another article that gives a ton of numbers and examples: http://kjellkod.wordpress.com/2012/02/25/why-you-should-neve...

They're useful where you need to grow a collection in constant time. Re-sizing arrays is potentially very expensive.

You might want to take a look at the graph (scroll down a little) in this article that is linked in the blog post:


It's not just growing the collection (it does read and insert), but it may be a bit surprising.

That's why you use a growth factor, say doubling the backing array when you need to grow. That makes the amortized cost of adding an item constant.

The only case where a LL may be preferable are when you care about performance of inserting/deleting in the middle of a list.

If really needed you can have it both ways, you could roll a data structure that is a linked list of arrays. Then you have constant time growth and good caching/prefetching.

Toy example:

  struct ListSegment {
    T[64] items
    int nextItem = 0
    ListSegment* nextSeg, prevSeg

If you're using C++ you don't need to roll your own because the standard library provides it. It is called deque.

Who used linked lists in practice anyway?

Kernels, embeded, games come to mind.

For all of them, a major advantage of linked-lists is that they really are constant time. For example, if you are programming a game, you gennerally do not want to use a vector. Even though vectors amoratize to O(1), and may be, on average, faster, they occasionally take a long time. Switching to a linked list could be the difference from going from 100FPS with an occasional 1FPS, to going at a constant 80FPS.

Also, linked lists can (counterintuitivly) often make better use of a low memory environment. This is because they do not require a continuous block of ram, so they can fill in free memory that has become fragmented.

Also, in many cases linked-lists produce very clear and understandable code.

And functional programming.

LIBXML uses them for in-memory representation of DOM parsed XML.

They are also very effective in Trie node structures (http://en.wikipedia.org/wiki/Trie). They provide super-fast searching of large texts.

I prefer arrays. I experimented with linked lists in my early C days, and this code turned about to be the most bug-prone and hard to maintain. They have their place, but given a choice, arrays are simpler, and faster to code for.

>I experimented with linked lists in my early C days, and this code turned about to be the most bug-prone and hard to maintain.

I had almost the exact opposite experience. Arrays always felt like I was shuffling indexes around, and needed to do extra bookkeeping to keep track of where I was. Linked-lists seemed more explicit.

Granted, I have had times where arrays produced easier code. These tended to be when I need random access (or to backtrack n-elements or such). Furtuantly, these cases also (normally) coincide with the cases where arrays are the more (asymptotically) performant data-structure.

Anyone who needs an unbounded list of stuff and who cannot afford the memory to over allocate.

At least that is my biased opinion coming from embedded!

(No STL, everything done from scratch)

The Linux kernel, among other things.

It's heavily used in the Linux kernel, for example:


   [fbsd /home/chris/linux-tegra (master)]
   $ git grep LIST_HEAD|wc
      5939   12872  372136

Maybe we should use Linked Arrays :)

Back in highschool we'd come up with the idea of using arrays of pointers, those getting the speed of iteration of arrays with the smaller memory footprint of linked-lists.

At the time, we thought we'd discovered something magical.

This is akin to saying "Don't use text! Binary data is always more efficient!" Or "Never use uncompressed data storage!"

The reality is linked lists are a tool. They can be used well or they can be used poorly. Just because there are disadvantages doesn't mean they should never be used.

Very link-baity article.

They're bad in the same way arrays are bad and "should never be used": appends often require reallocating the entire array, and are thus O(n) (although the reallocs are usually better on average.)

Right, but if you overallocate a bit you can ensure amortized constant time. (A comment in Python's list resize code claims that it does this, at least. You can look at http://hg.python.org/cpython/file/d047928ae3f6/Objects/listo... , function 'list_resize' for their sizing algorithm.)

.Net's System.Text.StringBuilder does the same... and you can choose the starting provision... When it runs out of space underneath, it will double the allocation for text. Which works pretty well.

I wrote a utf-8 character encoder for use with PostgreSQL early in the C#/.Net 1.0 days, and allocated a StringBuilder for the output at 3x the original string size (up to 8K), which worked very well.

Its worth noting that for some use cases, amortized constant time isn't good enough.

Typically it's considered to be amortized constant time since reallocations happen infrequently if implemented properly (the array has a growth factor and not just resize by one every append).


I thought generally dynamic arrays were allocated more memory than the last allocation every time an appending happens, so that future appends are faster. Might've been the vector class from C++ I heard that about.

Indeed, and most implementations use exponential scaling when allocating a new array. So although your worst-case insertion time is O(n) from copying the entire array, over a sequence of n operations, this only happens log n times, so you have O(log n) amortized time.

Edit: All the references to "constant amortized time" lead me to reevaluate this claim, and it's not correct. Let's say you start with an initial capacity of 2, and double the capacity when necessary. Then the cost of adding n elements to the list will be n + the sum of 2^i from i=1 to i=log(n), which is n + 2(n - 1) or O(n). Over n operations, that's O(1) amortized.

I don't think any advice you can get from highscalability.com can be any good, because their whole mission statement is crap.

They talk about getting 10 million concurrent clients only; they don't talk about scaling in any other dimension. And you'll never have 10 million clients of one server in practice, because how much network bandwidth does your super server have? 10 gbps? So that means you'll be strangling each client down to only a few kbps - way to go for performance. The 1980s called, they want their network apps back.

In real life, you'd offload the task from the server onto middle-tier machines that talk to the clients. There'll be at most thousands of these, and each one would have at most thousands of clients. And this will let you provide several mbps to each client if you get the networking right.

In fact, it means that even highscalability.com's original mission statement of 10 thousand clients was moot in the first place.

Funny this should pop up on a day where I'm writing some flocking.... switching away from a home built linked list. Mostly to use foundation classes - just saw the talk form the developer of Braid about not wheel-inventing or optimizing data structures.

Just a bogus claim by a troll. Nothing to see here move on.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact