
Some Performance Experiments for Simple Data Structures (2011) [pdf] - vmorgulis
http://lsub.org/ls/export/adtperf.pdf
======
pjscott
If I'm reading this right, their array resizing was linear rather than
geometric -- definitely a radical difference from the usual implementations!
That is, each time they called realloc, they added a constant amount of
capacity instead of doing the usual thing, which is multiplying it by some
constant (e.g. doubling it). Consider what happens in the most naive
implementation, where realloc calls malloc+memcpy+free, taking time
proportional to the size of the buffer to be copied: appending n times to such
an array would take O(n^2) time, rather than O(n) as you'd get from a linked
list or from an array that grows geometrically.

Of course they can get away with it on most systems, because most memory
allocators don't have that kind of naive implementation. They may do a
geometric growth policy automatically; they may switch to grabbing memory from
the OS's virtual memory system for allocations above a certain size; they can
do all sorts of things that have the convenient side-effect of making the
constant-growth algorithm here more efficient. But it shouldn't come as any
great surprise that things can get weird on some systems with different memory
management, when they've taken such an unusual approach to their array
resizing.

------
nickpsecurity
Nice data supporting why I was about an evidence-based approach to data
structure or algorithm selection with realistic measurements. There was a lot
of work in automating this with special compilers in 90's and early to mid
2000's. Not sure what they're doing now.

Another thing I used to bring up was using in-order execution and compiler-
directed scratchpads (w/ option for manual) to get more predictable
performance. Esp for real-time or preventing covert channels.

~~~
vmorgulis
I found it here:

[https://listserv.tamu.edu/cgi-
bin/wa?A2=ind1105&L=PIVOT&F=&S...](https://listserv.tamu.edu/cgi-
bin/wa?A2=ind1105&L=PIVOT&F=&S=&P=691)

"From: Bjarne Stroustrup..."

------
vvanders
Key quote from the article:

    
    
      That is, in practice, results from complexity theory seems to be totally neglected.
    
      In our opinion, what happens is that software is so complex, and there are so many
      lay­ers of  software  underneath  the  application  code,  that  it  is  not  even
      clear  which data structures are better; at least for the simple case we describe
      here.
    

As always, trust but verify(profile).

~~~
crpatino
I have mixed feelings about this.

In principle, I applaud the initiative to put the sacred theories given from
above to the test. We need more of this and much earlier. Every CS 101 course
should be presenting something similar.

On the other hand, what where this guys expecting? They are testing a
LinkedList(TM) for goodness sake!!! Even if you assume hardware that follows
the abstract theoretical machine of the C standard to the letter, all the
toying around with pointers should be a good predictor that things are going
to be _slow_ regardless of what Complexity Theory says. _And_ , speaking of
Complexity Theory, with the samples sizes they were using, all the Big Oh
analysis does not apply. The limits of the equations are irrelevant, the one
with the lowest fixed cost wins!

So, I find extremelly annoying all that mumbo-jumbo about how the so called
complexity of software and the processor caches, etc, make analysis
impossible. It is not that difficult; the guy who does not overthink stuff and
packs everything close enough to cause the least cache misses wins (and the
idiot that needlessly spreads data around and uses indirection everywhere
looses).

I follow up article that compares slight variants of the same data structure
to take advantage of modern hardware would provide great insights!!!

~~~
meric
I spent the entire last weekend trying to figure why a Linked List is slow no
matter the operation or data type, compared to a flat array.

In the end I put my Big O knowledge in the drawer and stripped linked lists
from the program I was working on, and got the 100x improvement in overall
performance I was looking for.

~~~
adrianN
I've spent the last year wondering why compiling and linking took so long. Now
I've been looking at the source of our toolchain and see that not only does it
use linked lists for everything, it also appends everything at the end --
without storing a pointer to the last element in the head. Fixing the O(n)
append improves performance a lot.

~~~
meric
I was using linked-list to append things to the _head_ , which was supposed to
be it's optimal use case. I've refactored the function it's calling to take an
_n_ argument, which means to leave first _n_ holes in the returned vector for
the calling function to fill.

I had also used a linked list of characters to represent strings to iterate
through them one by one. This was very slow compared to iterating through an
array.

Linked-lists were supposed to be O(1) inserting at the head, and O(n)
iterating through it. However, it's no match against array O(1) assignment and
O(n) array iteration.

~~~
adrianN
Big-O doesn't care about constant factors, but in practice you do. Linked
lists are one of the shittiest data structures on modern computers because
random memory access is one of the most expensive operations you can do. In
fact, on modern machines a random access is not an O(1) operation, it's O(log
n), due to the TLB.

------
rixed
94 if((a−>naels%incr) == 0){ 95 a−>naels += incr;

Seriously?

~~~
shultays
That should grow array by inc on every insert right? I am assuming it is not
the actual code

edit: there is full code in the end and it is same too.

