
Stop Using Linked Lists - aespinoza
http://highscalability.com/blog/2013/5/22/strategy-stop-using-linked-lists.html?utm_source=feedly
======
aphyr
On a related note, cache alignment is one of the reasons Clojure's vectors
(log32 insertion, access, and deletion) and linked lists (o(index) insertion,
access, and deletion) have unintuitive performance characteristics. For
instance, constructing lists of either type takes roughly the same amount of
time for big lists:

    
    
      user=> (with-progress-reporting (bench (into '() (range 100000))))
                   Execution time mean : 9.293932 ms
          Execution time std-deviation : 269.771284 µs
      
      user=> (with-progress-reporting (bench (into [] (range 100000))))
                   Execution time mean : 9.882163 ms
          Execution time std-deviation : 359.744662 µs
    

And with smaller lists, building vectors is ~30-50% slower.

    
    
      user=> (with-progress-reporting (bench (into '() (range 24))))
                   Execution time mean : 2.483705 µs
          Execution time std-deviation : 71.302962 ns
    
      user=> (with-progress-reporting (bench (into [] (range 24))))
                   Execution time mean : 3.349080 µs
          Execution time std-deviation : 114.007930 ns
    

However, _traversal_ is significantly faster for vectors, because you can pack
32 references into a cache line at a time. Here's a decent-sized list:

    
    
      user=> (let [x (apply list (range 100000))] (bench (reduce + x)))
                   Execution time mean : 8.586510 ms
          Execution time std-deviation : 80.923357 µs
    

Vs a comparable vector:

    
    
      user=> (let [x (vec (range 100000))] (bench (reduce + x)))
                   Execution time mean : 4.564553 ms
          Execution time std-deviation : 135.795328 µs
    

Traversing small lists:

    
    
      user=> (let [x (apply list (range 24))] (bench (reduce + x)))
                   Execution time mean : 2.041794 µs
          Execution time std-deviation : 18.752533 ns
    

Vs small vectors:

    
    
      user=> (let [x (vec (range 24))] (bench (reduce + x)))
                   Execution time mean : 1.051182 µs
          Execution time std-deviation : 10.413211 ns
    
    

[http://blog.higher-order.net/2009/02/01/understanding-
clojur...](http://blog.higher-order.net/2009/02/01/understanding-clojures-
persistentvector-implementation/)

~~~
gtani
Benchmarking on the JVM is tricky, beyond pure algorithmic efficiency
(assuming all memory accesses are equal), there's cache effects, JVM heap and
GC effects and the usual warmup/inlining concerns of benchmarking. On page 551
of the excellent scala staircase book (2nd edition[1]) there's a collection
efficiency table w.r.t. head/tail/indexing/insert, but that's a jumping off
point (sorry, I can't find an online version of the table).

[1] <http://www.artima.com/shop/programming_in_scala_2ed>

~~~
aphyr
These benchmarks were taken with Criterium, which performs JIT warming, GC
purge, final GC, retains results for each run, isolates outliers, etc.

------
stiff
I don't understand all the hate about the post, one of the referenced articles
is actually very good and gives a quite compelling case (maybe it should have
been submitted instead):

[http://kjellkod.wordpress.com/2012/02/25/why-you-should-
neve...](http://kjellkod.wordpress.com/2012/02/25/why-you-should-never-ever-
ever-use-linked-list-in-your-code-again/)

It shows that linked lists are slower than vectors for real-world-like
scenarios even for those cases where the asymptotic complexity for linked
lists is lower. Seems that that modern CPU architectures have changed so much
that our theoretical models diverge further and further from reality, I think
this is pretty interesting.

~~~
betterunix
I do not think our theoretical models diverge from reality; the theoretical
models are based on how the performance relates to the size of the input, and
remain valid even if you have a very fast computer. What modern architectures
have done is to change the constant factors a bit; that just raises the
threshold for problem sizes, but it does not eliminate it.

Sure, there are cases where asymptotic analysis can be dismissed. Matrix
multiplication comes to mind: the fastest algorithms have impractically large
constant factors. Usually, though, asymptotic analysis does matter, because it
is often the case that input sizes will grow unexpectedly.

~~~
stiff
This has nothing to do with simply having a "fast computer", it would be all
fine if we just used 486s clocked at 10Ghz, but we don't, the changes in the
architecture of CPUs do indeed make the theoretical model more diverged from
reality, because it presupposes a lot of things, like that every instruction
takes the same time to execute, also implying that instruction execution times
are independent, that one instruction gets executed at a time, etc. If those
assumptions do not hold you cannot assume the factor is really constant, and
even if it were so, you cannot dismiss this difference indefinitely on this
ground, if on datasets of the size commonly encountered in practice the
algorithms with a larger asymptotic complexity start outperforming the ones
with a smaller one. For matrix multiplication this was known for years, for
linked lists it is a relatively new development that came with larger and
faster CPU caches etc.

~~~
betterunix
Except that the theoretical model is not affected by any of the things you
mentioned. The existence of caches, branch prediction, instruction reordering,
parallel execution, etc. does not chance as problem sizes change. If your
inputs become large enough, you will reach the upper limit of your CPU's
ability to speed up execution with those features. In the limit, the
asymptotics still matter, and experience has shown that the limit is not at
all far-fetched in most cases.

What makes matrix multiplication an exceptional case is that the constant
factor on the best known algorithm is so large that we do not know how to
build a computer with enough memory to store a problem large enough to
overcome that constant. That is not the case with this analysis of linked
lists; all one can say is that the data sets chosen in that _particular_
article (possibly representative of the most common data sets) are not big
enough. One can certainly store a large enough data set to overcome the
advantages arrays have, and so the only real question is, "Is it possible that
the inputs will be so large?"

Maybe the answer is truly, "No, that is unlikely." I am skeptical, though, as
there are not many cases where such statements can be made. Even software
written for embedded systems that target specific products is likely to be
repurposed for new systems with different inputs. Even researchers, who write
software for the particular datasets sitting on their hard drives, often re-
use their code in future work. There are "border" cases, like integer
multiplication, but typically libraries will just select the best algorithm
for a given input size (e.g. for multiplication, you'll probably only see FFT
methods applied above a particular threshold). Perhaps linked lists are now a
"border" case, but all that would mean is that we need to use abstract
"sequence" operations that dynamically choose particular implementations to
use as their sizes change.

~~~
stiff
[http://en.wikipedia.org/wiki/Asymptotically_optimal_algorith...](http://en.wikipedia.org/wiki/Asymptotically_optimal_algorithm#Formal_definitions)

 _Sometimes vague or implicit assumptions can make it unclear whether an
algorithm is asymptotically optimal. For example, a lower bound theorem might
assume a particular abstract machine model, as in the case of comparison
sorts, or a particular organization of memory. By violating these assumptions,
a new algorithm could potentially asymptotically outperform the lower bound
and the "asymptotically optimal" algorithms._

See also:

<http://en.wikipedia.org/wiki/Abstract_machine>

[http://en.wikipedia.org/wiki/Random-access_stored-
program_ma...](http://en.wikipedia.org/wiki/Random-access_stored-
program_machine)

<http://en.wikipedia.org/wiki/Cache-oblivious_model>

etc.

As far as I know all asymptotic analysis has to be done using some abstract
machine model.

~~~
betterunix
Sure, but how do modern architectures not fit into the RASP machine model? You
can view the cache contents as being _part_ of the state (rather than the
memory); you can similarly view instructions as being part of sequences, so
that a single instruction can mean different things depending on the state
from which it is fetched. Other modern features can be similarly approached
(with the exception of CPUs whose behavior depends on environmental factors
like temperature, but that is an edge case).

Really, if you doubt that the RASP model is appropriate for modern
architectures, you can test it (a typical exercise in an algorithms course) --
see if, as the input size grows, the timing follows the asymptotic analysis.
That is basically what the article you linked to does, and the results are not
all that surprising -- where things are linear time in theory, they are linear
time in practice; where they are quadratic time in theory, they are quadratic
time in practice. It is worth pointing out that in all but the last example,
the list and vector operations had the _same_ complexity (because of the
linear search), so it was really a comparison between constant factors.

~~~
yyqux
Asymptotically, sure, you're right. Constant factors are often important in
practice, and simple cost models (e.g. ones that don't model cache locality)
will no longer give you a decent estimate of constant-factor differences in
performance between algorithms.

I think the issue here is that, in the past, with shallower cache hierarchies,
models that assumed a constant cost per memory access would maybe be off by
smallish factor (I don't know, maybe 50%).

However, now memory access is frequently the limiting factor for an algorithm,
and there can easily be an order of magnitude in variation between the average
memory access latency for different algorithms (i.e. cache-smart versus cache-
dumb).

------
ska
It seems that often articles like this overstate their case to the point that
it really detracts from the message.

There is a valid point, that a naive analysis of linked list vs. static array
based on intro CS course descriptions of their properties isn't a good model
for what is going on in a modern system.

The real lesson is: if you want to achieve high performance you simply must
understand the impact of things like cache localization, vectorization,
hardware prefetch, pipelining etc., and how your data structure operation will
interact with them.

"Never use a linked list" is a silly lesson to take from this though. "In
these situations, linked lists might not perform as well as you expect" is
more like it.

"Use the right data structure for the job" is still as good advice as it ever
was.

~~~
zwieback
Exactly. Also, pointers are just an implementation detail of linked lists. You
can implement an array based linked list and avoid "scribbling all over
memory", which really is a valid concern raised in the article.

~~~
stormbrew
What exactly would an array based linked list be but either an array or an
array deque? These things have names for a reason.

~~~
betterunix
Instead of pointers, you would store an index in the array of nodes:

    
    
      struct node {
         void * data;
         uint32_t idx_next;
      };
    

You are still dereferencing pointers, of course, but you have better locality,
even after doing a lot of insertions and deletions.

~~~
marshray
It's even better with intrusive containers. As an extreme case, let's say we
need a collection of 24 bit RGB triplets:

    
    
        struct node {
            uint8_t r, g, b;
            uint8_t idx_next;
        };
    

Of course, a contiguous array uint8_t[256][3] might still be faster.

------
rayiner
What stupid advice. Linked lists (and trees formed from lists) are a
fundamental functional data structure. They have tremendous expressive
advantages in functional code where operating on the head and the rest of the
list leads to clear expression of an algorithm, and easy sharing of sub-parts
makes certain other algorithms much more elegant.

An example is maintaining the lexical environment as you're compiling a
programming language. The first response might be to use something like a hash
table. But then how do you handle shadowing (where an inner block shadows a
variable of the same name in an other block)? A much cleaner way is to use an
association list that's implicitly maintained as you recurse over the AST.

E.g. in psuedo-Python:

    
    
        def parse_let(form, environment):
            (name, value) = parse_declaration(declaration_part(form))
            return parse_body(body_part(form), 
                acons(name, value, environment))
    

For those unfamiliar with Lisp, "let" introduces a new name bound to the value
of an initializing expression, which is only in scope for the body of the
"let" construct. Assume here that we're generating code from an AST and that
the return value of a parse function is the register number where you can find
the result of the expression, and the lexical environment maintains a mapping
from a variable name to the register where that variable can be found. Or we
could be creating an intermediate representation and the lexical environment
maintains a mapping from a name to an IR node. Or whatever.

Here, "environment" is the association list. Assume that "acons" is a Python
function that adds a pair of (key, value) to the front of a linked list. Note
how the list is never explicitly mutated, it is maintained implicitly by
passing a new value for "environment" as you recurse down. The beauty of this
is that entries are removed from the lexical environment implicitly as parsing
functions return, and also that "environment" is a purely functional data
structure. You can stash away "environment" in say an IR node and it will
always refer to a snapshot of the lexical environment for a given AST node,
even after other nodes are parsed or the current function returns. This is
non-trivial with say a hash table. Storing a copy at each IR node would eat
memory. With a linked list, sharing of sub parts falls out for free.

Also, look at the Linux kernel sometime. It uses linked lists all over the
place. Your malloc() implementation likely keeps a set of segregated free
lists maintained as linked lists. They do this because they avoid ever
traversing the list completely. They just operate on the head/tail.

~~~
christopheraden
Consider that the source is High Scalability, a blog that focuses on how
orthodox methods are inadequate for 1-10M concurrent connections and how
different methods are being employed to reach these lofty goals, I think the
advice is pretty spot-on.

It's important to consider that if your objective is scalability and
performance, then the article's advice is appropriate. The article's title
ought to read "Stop Using Linked Lists if you care about High Scalability",
but I attach the dependent clause on many titles I read from that site.

~~~
rayiner
I'm sure the Linux kernel developers care about scalability and performance.
Count how many linked lists you see in the kernel code. See: <http://lxr.free-
electrons.com/ident?i=INIT_LIST_HEAD>.

~~~
alayne
I'm sure you will find strlen and linear searches too. It's more likely that
absolute performance wasn't necessary for those cases, or memory usage with
linked lists is better than a more complex data structure, or maybe even that
linked lists were easiest for C developers.

~~~
rayiner
Linked lists are used pervasively in performance critical parts of the code
(e.g. the scheduler, the VM, etc). Linked lists just happen to have very
suitable performance characteristics for the kind of tasks that happen often
in a kernel. E.g. say you keep queue of IO buffers that have pending
operations on them. You get an interrupt and the driver gives you back a
pointer to the IO buffer it just filled. You want to copy that data out, and
them remove the IO buffer from the pending queue and add it to a free queue.
In this case, you'd almost certainly rather use an (intrusive) linked list
rather than maintaining these queues as arrays.

------
haberman
There's been some discussion on HN in the last few days about negativity. If
there's one thing that inspires negativity, it's hearing categorical
statements like "ALWAYS do X," or "Y is NEVER true" when the listener has
specific experiences that contradict this.

We all come at computing from different perspectives. The perspective of a JS
developer is very different from the perspective of an OS developer. "Rules of
thumb" that make sense in one scenario may be completely wrong in another.
Different programmers are faced with different constraints, different
performance profiles, and different relative costs (which can lead to
different tradeoffs).

If you're tempted to make a categorical statement, maybe it's better to first
consider whether your statement is as universal as you think it is.

------
pcwalton
This is not good advice if you need your list to have unbounded size and want
it to be lock-free. Growing an array and copying all the elements over is very
expensive in a thread-safe scenario. If you use a linked list, however, you
can implement all the operations without using any locks at all.

(Linked lists are what we use for our channels in Rust, and as a result
they're extremely fast: in the new scheduler they're totally lock-free except
if the task is sleeping, which we can optimize to be lock-free later. They
have unlimited size, which helps prevent deadlocks.)

~~~
gosu
Very cool. What kind of lockfree linked lists? Did you work from one of the
papers on the subject?

(Searching "rust lockfree" reveals a feature request for a lockfree malloc. I
happen to have one of those, but the reality is that synchronization will
probably not be your bottleneck.)

------
mosqutip
I'M ARGUING AGAINST CONVENTIONAL WISDOM! PLEASE GIVE ME ATTENTION!

Use linked-lists in situations where you need fast insertion and lookup time
isn't as important. Don't use linked-lists when lookup time is important.
Don't make fallacious claims supported with misguided and incomplete examples.

~~~
bnegreve
> _I'M ARGUING AGAINST CONVENTIONAL WISDOM! PLEASE GIVE ME ATTENTION!_

I don't think it's like this, it's true that the article is a bit harsh, but
it contains a lot of references that support the claim.

Additionally I also came to the conclusion that plain linked list are
virtually always slower in practice because insertions in vector is amortized
constant time, or because you can use better more local structures like
deques, or because hash tables are always an option, and so on. Also check
Soustrup's vector vs list slide in the presentation by linked by chmike's in
this thread it's pretty demonstrative.

------
jvanenk
Stop Using Linked-Lists (in areas where linked-lists are the wrong structure
to use)

Edit: I should note this article's advice is really good advice if your focus
is performance.

~~~
ajross
More: "Linked lists are much, much slower than you think they are and should
not be used in performance-sensitive code." That fact isn't remotely obvious
to most hackers, and simply brushing it aside as an area where linked lists
are "wrong" is missing important details.

~~~
gizmo686
Even in performance sensitive code, linked lists might be the right way to go.
If you need to store an unknown amount of data, a resizable array probably
does amoratize to a better performance than linked lists. But, all of the
'slowness' happens at the same time, so it might be worth slowing down the
average case to avoid the worst case. The most notable examples I can think of
are video games where FPS is king, and kernels, where you always want to exit
quickly.

Linked lists can also work better in limited memory environments because, with
the overhead of 1 pointer per element, you can make use of fragmented memory.

~~~
olivier1664
Or something between the two: "Unrolled linked list": In computer programming,
an unrolled linked list is a variation on the linked list which stores
multiple elements in each node.
<https://en.wikipedia.org/wiki/Unrolled_linked_list>

------
justinhj
The link to the article about "Starcraft crashes because it uses linked lists"
seems underhanded. That article talks about intrusive lists and why they are
useful, and what the down sides are. The actual crash seems to be related not
to the use of a linked list at all, but just lack of shared data
synchonisation which can happen with many other data structures.

------
stcredzero
True story. I was working for a property/casualty insurance company in South
Carolina. There was a local community college that had programming, and most
of the "programmer analysts" at this place went there and took the intro to C
course, which included writing a doubly linked list. It also had the notion
that not writing your own code was "cheating."

The application I was working on had something like 500 separate
implementations of a doubly linked list, each of which was used to support
exactly one collection, and each of which involved hours of coding and
debugging. The company didn't care, as client companies were billed by the
hour. One of the client companies had programmers who were horrified at this
and their programmers introduced an adaptable linked-list library called
"SuperLink." It was accepted as a "modification," incorporated into just the
one implementation, then forgotten.

------
chmike
At page 45 of this presentation of Soustrup,
[[http://ecn.channel9.msdn.com/events/GoingNative12/GN12Cpp11S...](http://ecn.channel9.msdn.com/events/GoingNative12/GN12Cpp11Style.pdf)],
he shows benhmark comparison between linked list, preallocated list and array
for insertion and deletion of small element values in list up to 500 000
elemnts.

Arrays are beating linked list.

~~~
gizmo686
That example involves inserting (and deleting) elements at a specific index,
which means you would have to traverse (on average) half of the linked-list
each time.

Generally, when I work with linked-lists (excluding functional programming) I
only delete elements after I already have a pointer to them for something
else. Similarly, I generally do not care about the order of elements, so I can
either insert a new element at whatever index my cursor happens to be at, or
append them to the end.

~~~
chmike
These are specific use case. Indeed in some cases lists are more efficient
than arrays. The advice to use arrays instead of linked list is a rule thumb.
Not all programmers are still sucking their thumb. ;)

------
stormbrew
I think a lot of people are missing the point, and that's probably at least in
part because of the bold claim of the title. The point is that some of the
(very most common) conventional wisdom about the appropriateness of linked
list has become outdated and wrong in the age of very fast local cache.

It really does mean that in a lot of cases that, say, Knuth's books would have
suggested you use a linked list, you really probably shouldn't any more, even
if it doesn't really mean you never should.

------
jwise0
From the perspective of computer architecture, the author provides what look
like -- on their face -- good arguments. They quote from Aeter Suleman, who
says things like: "they throw off hardware prefetching", or "they reduce the
benefit of out-of-order execution".

These may have been true on machines of the past, but these are no longer true
on modern systems. With the advent of trace caches and runahead execution [1],
linked lists are _really_ no longer as painful as they once were. (Indeed,
even back in the day, Alpha had low-cost "explicit" runahead-like semantics,
where the programmer could specify other work to do while waiting for DRAM;
this was usable to accelerate linked-list traversal.)

[1] <http://users.ece.cmu.edu/~omutlu/pub/mutlu_hpca03.pdf> (disclosure: I
worked closely with Onur at one point; his Ph.D thesis introduced runahead. I
may be excessively biased in favor of that technique :-))

------
gosu
The correct way to do linked lists is to store the list traversal fields next
to the data. In C, this would mean storing next/prev pointers inside of the
structs which will be placed on lists.

In light of this, the things in the OP are often non-issues because you'll
need the data in cache immediately after the list operation anyway (or during
the traversal, for O(n) operations like list_find). In fact, vectors of
pointers are worse for the hardware because you'll need to load in _more_
cache lines than with lists, in order to traverse the array.

Lists aren't clearly the better option when the data will need to live exactly
as long as the data exists in the container. In this case, you can store the
data itself in a vector's backing array (and so the data will be invalid as
soon as it's removed from the vector).

~~~
yason
Who the heck puts the payload into a separate allocation?

I'm pretty sure I've never seen such an implementation. Not once. I could
imagine seeing something like that in textbooks where they use graphical
diagrams to illustrate how a linked list works but who would actually
implement it like that -- I don't know.

The canonical way is to do:

    
    
      struct listnode {
        struct listnode *next;
        struct listnode *prev;
      };
    
      struct your_own_data_node {
        struct listnode node;
        int x, y, z;
      };
    

which ensures that you can have a set of functions that operate on struct
listnode * and you can use them for all of your lists.

~~~
gosu
I wouldn't call that canonical. You can take a further step, as in the Linux
version that someone posted below:

<http://kernelnewbies.org/FAQ/LinkedLists>

------
berkut
In performance-critical code, this is well known. Also, since the 64-bit days,
storing linked lists also has a significant overhead over arrays due to the
additional storage requirements of the pointers, which means you fit less in
cache which compounds the problem even more.

------
raymondh
The article( and its referenced article) recommends against using linked lists
because they are inefficient on modern processors (due to cache misses).

However, there is a better solution than throwing away a powerful and
expressive data structure. Rather than linking individual data elements,
instead link blocks of consecutive elements.

This hybrid approach takes full advantage of cache locality and it minimizes
the memory overhead of storing both the links and the data.

This is the approach used in Python's implementation of deques:
[http://hg.python.org/cpython/file/85c04fdaa404/Modules/_coll...](http://hg.python.org/cpython/file/85c04fdaa404/Modules/_collectionsmodule.c#l4)

------
gsg
This is awful advice which is based on a complete misunderstanding of the use
cases of linked lists. Linked lists apply in situations when you can't place
things in arrays (because they are too large to copy, or because there will be
pointers maintained to them).

The author makes much ado about locality of reference and cpu friendly layout
without understanding that those things are irrelevant because this is a data
structure to use when indirection is _required_.

It always amazes me that such simple data structures can be so poorly
understood.

------
gamegoblin
I feel like this could have perhaps mentioned the cases where one should use
linked lists.

Is a simple queue still implemented as a linked list? That is all I ever seem
to use them for.

~~~
theboss
A queue would be an excellent time to use a linked list.

I don't understand the point of making a claim like this about a data-
structure. The most fundamental thing in data-structures is that there is a
time and a place for using each one. No data-structure is inherently 'better'
than the others.

~~~
jhawk28
Based on the disruptor paper
(<http://disruptor.googlecode.com/files/Disruptor-1.0.pdf>) linked lists make
poor queues also. Ring buffers backed by arrays are much better.

~~~
theboss
Very interesting, but I feel there is a difference of need here.

You're right...there a Ring buffer with an array is better, but for average
programmers a linked-list makes a damn good (maintainable, easy,
understandable, and pretty quick) queue.

------
jamesaguilar
Mmmmmeehhh. Profile it first.

------
yyqux
This is pretty reasonable advice. There was a time when linked-lists didn't
have such a massive performance disadvantage compared with more contiguous
data structures, but that time has passed and I'm not sure that the
programming community is fully aware of it (certainly you wouldn't be
explicitly told it in most CS programs). Memory efficiency is also often
terrible on 64-bit machines, especially for doubly-linked lists.

Sometimes they're the right data structure, but I've definitely come across
programmers who want to use a linked list for everything, even in code where
performance is important.

Edit: the general advice that you should avoid linked lists for performance
reason is good. The idea that you should _never_ use them I just took as
additional trolling for page views.

------
btilly
I was recently working some data structures in C++ where I needed to have
tree-based structures and sustained performance. The compromise that I hit on
was to allocate all of the nodes of a tree out of an std::vector. This allowed
me the flexibility of tree-based structures together with all of the locality
of reference that I needed. As the tree grew, the vector would resize and
move, but the amortized average cost of that is directly proportional to the
size of my data structure.

I offer this in case a similar compromise might be useful for someone else.
(My guess is that it is probably standard for people who need to know this
sort of stuff. I'm just not usually someone who needs to know this sort of
stuff.)

~~~
berkut
This is used a lot for things like acceleration structures in 3D graphics -
you also get the benefit of being able to store an uint32_t offset instead of
a 64-bit pointer, fitting a bigger tree in memory/cache.

~~~
btilly
I used that trick as well. :-)

------
b0b_d0e
I've been working on an application for a card game and I have been
considering the implications of using a linked list for the deck structure. I
just don't see how using an array would make sense for a deck of cards. I need
the ability to constantly grow and shrink the deck (this is for Yugioh so
cards would get put back into the deck often). I did consider using an array
but it seemed to be more trouble than its worth considering the need to remove
cards from random locations in the deck. My question is then what kind of data
structure is optimal for this then? Are arrays still the better choice or is
there another data structure I don't know about that is optimal for card
decks?

~~~
VLM
"optimal for card decks"

When n is human-typical card deck size, the optimal solution is whatever
minimizes some balance of development time and debugging time. CPU and coding
efficiency will never enter as a limitation.

The absolute dirt simplest way to test your numerous card manipulation
algorithms might be two arrays (or plain text files?) and your algos copy from
one array into the other.

In Yugioh isn't there some inherent (however ridiculously large) limit to the
possible number of cards in a deck?

------
themstheones
If you use a List in .NET, you get a wrapper for an array that will expand as
needed (doubling in size every so often). The reasoning is quite similar to
the reasons given in this article.

~~~
thomasz
System.Collections.Generic.List<T> is similar to std::vector, but oddly named.
System.Collections.Generic.LinkedList<T> is the implementation of the
canonical double linked list ADT.

------
pfortuny
What is that about "premature optimization" and "the root of all evil"?

O(n) arguments are all very well as long as you keep in mind that, ehem, there
are constants all around and they usually tend to be pretty big.

So a more honest title would be "linked lists may harm your efficiency in
high-speed environments". Notice the 'may' and the context.

The title implies straightaway "stop using LISP", which to my taste is a
rather bold statement.

------
mamcx
So, exist a list of wich structures are better for the main use cases? ie, a
simple "cheat list" or something like that? Because the norm, I think, for the
naive developer is use the main one offered by the language (ie: in python,
list, dicts).

------
francispelland
I'm almost certain my tests I did a few years ago were valid, when attempting
to push, pop and sort array vs linked list. Would be nice if there were so
numbers, examples, etc.

~~~
Bill_Dimm
The article links to another article that gives a ton of numbers and examples:
[http://kjellkod.wordpress.com/2012/02/25/why-you-should-
neve...](http://kjellkod.wordpress.com/2012/02/25/why-you-should-never-ever-
ever-use-linked-list-in-your-code-again/)

------
justin_hancock
They're useful where you need to grow a collection in constant time. Re-sizing
arrays is potentially very expensive.

~~~
voidlogic
If really needed you can have it both ways, you could roll a data structure
that is a linked list of arrays. Then you have constant time growth and good
caching/prefetching.

Toy example:

    
    
      struct ListSegment {
        T[64] items
        int nextItem = 0
        ListSegment* nextSeg, prevSeg
      }

~~~
Bill_Dimm
If you're using C++ you don't need to roll your own because the standard
library provides it. It is called deque.

------
strictfp
Who used linked lists in practice anyway?

~~~
GnarfGnarf
LIBXML uses them for in-memory representation of DOM parsed XML.

They are also very effective in Trie node structures
(<http://en.wikipedia.org/wiki/Trie>). They provide super-fast searching of
large texts.

I prefer arrays. I experimented with linked lists in my early C days, and this
code turned about to be the most bug-prone and hard to maintain. They have
their place, but given a choice, arrays are simpler, and faster to code for.

~~~
gizmo686
>I experimented with linked lists in my early C days, and this code turned
about to be the most bug-prone and hard to maintain.

I had almost the exact opposite experience. Arrays always felt like I was
shuffling indexes around, and needed to do extra bookkeeping to keep track of
where I was. Linked-lists seemed more explicit.

Granted, I have had times where arrays produced easier code. These tended to
be when I need random access (or to backtrack n-elements or such). Furtuantly,
these cases also (normally) coincide with the cases where arrays are the more
(asymptotically) performant data-structure.

------
Qantourisc
Maybe we should use Linked Arrays :)

~~~
angersock
Back in highschool we'd come up with the idea of using arrays of pointers,
those getting the speed of iteration of arrays with the smaller memory
footprint of linked-lists.

At the time, we thought we'd discovered something magical.

------
ebbv
This is akin to saying "Don't use text! Binary data is always more efficient!"
Or "Never use uncompressed data storage!"

The reality is linked lists are a tool. They can be used well or they can be
used poorly. Just because there are disadvantages doesn't mean they should
never be used.

Very link-baity article.

~~~
thirsteh
They're bad in the same way arrays are bad and "should never be used": appends
often require reallocating the entire array, and are thus O(n) (although the
reallocs are usually better on average.)

~~~
koenigdavidmj
Right, but if you overallocate a bit you can ensure amortized constant time.
(A comment in Python's list resize code claims that it does this, at least.
You can look at
[http://hg.python.org/cpython/file/d047928ae3f6/Objects/listo...](http://hg.python.org/cpython/file/d047928ae3f6/Objects/listobject.c)
, function 'list_resize' for their sizing algorithm.)

~~~
tracker1
.Net's System.Text.StringBuilder does the same... and you can choose the
starting provision... When it runs out of space underneath, it will double the
allocation for text. Which works pretty well.

I wrote a utf-8 character encoder for use with PostgreSQL early in the C#/.Net
1.0 days, and allocated a StringBuilder for the output at 3x the original
string size (up to 8K), which worked very well.

------
ucee054
I don't think any advice you can get from highscalability.com can be any good,
because their whole mission statement is crap.

They talk about getting 10 million concurrent clients only; they don't talk
about scaling in any other dimension. And you'll _never_ have 10 million
clients of one server in practice, because how much network bandwidth does
your super server have? 10 gbps? So that means you'll be strangling each
client down to only a few kbps - way to go for performance. The 1980s called,
they want their network apps back.

In real life, you'd offload the task from the server onto middle-tier machines
that talk to the clients. There'll be at most _thousands_ of these, and each
one would have at most _thousands_ of clients. And this will let you provide
several mbps to each client if you get the networking right.

In fact, it means that even highscalability.com's original mission statement
of 10 thousand clients was _moot in the first place_.

------
seivan
Funny this should pop up on a day where I'm writing some flocking....
switching away from a home built linked list. Mostly to use foundation classes
- just saw the talk form the developer of Braid about not wheel-inventing or
optimizing data structures.

------
dudus
Just a bogus claim by a troll. Nothing to see here move on.

