
When Big O Fools You - matthewwarren
https://jackmott.github.io/programming/2016/08/20/when-bigo-foolsya.html
======
justusw
This article points out a few right things, but also skips over the wrong
parts.

Namely, the amortised time complexity of dynamic lists.

Amortised analysis treats operations not as single events but looks at the
time complexity over the span of many operations (through something called
"Accounting"). An initial "investment" of array over-allocation will be
amortised by inserting, but only over time. Inserting into an ArrayList will
be in O(1) given enough insert operations.

Essentially, you over-allocate in order to save time. You're trading time
complexity for space complexity.

A really good explanation on amortised analysis can be found here on
Wikipedia, which explicitly treats ArrayList:

[https://en.wikipedia.org/wiki/Amortized_analysis#Dynamic_Arr...](https://en.wikipedia.org/wiki/Amortized_analysis#Dynamic_Array)

As always, a look into the source code of your language of choice helps. In
Python, a list object over-allocates using a very peculiar, but finely tuned
formula:
[https://github.com/python/cpython/blob/09dc3ec1713c677f71ba7...](https://github.com/python/cpython/blob/09dc3ec1713c677f71ba7c536140ce12801a5036/Objects/listobject.c#L42)

~~~
bfstein
The nuance you're missing here is that the author is always inserting at the
front of the arraylist. So even though the list grows dynamically, it still
needs to move every value one spot over.

e.g. arraylist: [0][1][2][3][ ]

Even though there is space left in the array, we still need to move all the
values one index to the right to be able to insert at the front, which takes
O(n) time. In contrast, inserting at the front of a linked list is simply a
matter of moving pointers and therefore constant time.

~~~
gens
Can't you just imagine it is reversed, where the_array[num_things-1] is the
first ?

~~~
DougBTX
Yes, then it would be a discussion about append performance instead of insert
performance. Might be handy to do if you know that all the changes to an array
will be inserts at the front, then you can just write the array in reverse and
also read in reverse.

But it is helpful to keep the terminology clear, because an insert in the
middle of a linked list is still O(1), but inserting into the middle of an
array will require moving some fraction of the data.

The OP is discussing inserts at the start just because it is the worst-case
scenario for arrays, then a positive result will still be true if some of the
inserts are in the midddle or even near the end.

~~~
thaumasiotes
> But it is helpful to keep the terminology clear, because an insert in the
> middle of a linked list is still O(1), but inserting into the middle of an
> array will require moving some fraction of the data.

This is true if you already have a reference to the middle of the list. If you
don't (say, because you want to insert while preserving the fact that the list
is sorted), then inserting into a linked list is O(n) just like the array is.

~~~
wolf550e
Memory reads and memory writes are not the same. Linked list insert in the
middle needs to do only O(1) writes. But because linked lists preclude memory
prefetch, they should not be used these days.

------
ncw33
In no so many words, the article is pointing out that O(n) + O(1) is not
necessarily quicker than O(n) + O(n): both add to O(n) (since only the
highest-order factor matters), and the constant factor is unsurprisingly
better for array lists.

He's not benchmarking an O(1) operation against an O(n), or anything
surprising, or even pointing out a "hidden" O(n) operation, it's simply a
demonstration that O(n) + O(1) = O(n).

~~~
JohnLeTigre
Of course this was a single example.

The main point is that the big O notation ignores the cache misses.

For example:

\- try writing a quicksort as a template to avoid the comparison function
calls

\- in the sorting loop, switch to an insertion sort when buckets have <= 32
elements

You will see speed improvements of 2 to 3 times faster than a vanilla
implementation of quicksort.

Although this approach is slightly more complex on the abstract level, it's
faster because it reduces lots cache misses.

~~~
tillytal
Surely the _main_ point is that

    
    
        O(n) + O(n) = O(2n) = O(n) = O(n + 1) = O(n) + O(1)
    

and the constant factors left out of O() often dominate execution time.

Cache performance is just one example of a constant factor, right?

~~~
JohnLeTigre
Cache performance is not constant, if it was, we wouldn't even consider it
when optimizing.

The point is to use data that was recently fetched into the (faster) cache
memory as much as possible instead of incuring the penalty of a cache miss

Cache performance really depends on memory access patterns.

Anyways :) I'm being pedantic here, I should probably go back to work.

~~~
hcs
Memory access time is a constant multiplier, whether it hits or misses the
cache. There can be a big difference in that constant depending on the access
pattern, but it is still considered a constant factor.

Yes, random access (missing the cache every time) may be 100x slower than
sequential (hitting the cache almost always), but if you're iterating through
an array twice as large, it will still be 100x slower.

~~~
JohnLeTigre
I agree, on the same machine, the slow-downs are constant for each level of
cache and for ram access.

------
readams
A cache miss can be a slowdown of ~200x on modern CPU architectures. Lets say
you have a choice of a O(n) algorithm that always misses the cache and a O(n
lg n) algorithm that never misses the cache.

The crossover point where it makes sense to use the O(n) algorithm will occur
when:

200 * n = n lg n

which occurs for n > 2^200. Most of us are not dealing with problems larger
than can be represented in our universe, so cache efficiency matters, kids.

Note that this changes for other comparisons. Going from n maximally cache
inefficient vs n^2 cache efficient is only worth it for n up to 200, and n lg
n --> n^2 up to n about 2223.

Of course, in practice your algorithm won't be missing the cache on every
access, so reality is somewhere between these values.

------
stcredzero
_Make Sure the Abstraction is Worth It_

Yes! All abstraction has a price. This is what people really mean when "all
abstractions leak." Basically, all code and runtime features have a cost in
terms of developer resources, cpu, memory, etc. The "inner game" of
development isn't being able to muster huge amounts of "cleverness" power. The
"inner game" is being able to put whatever resources you have to the best
possible use.

~~~
johncolanduoni
This isn't true at all. What's the cost in developer resources, cpu, _or_
memory of making a complex number class in C++ a template over floats and
doubles? How about the cost of using Rust's generics to make a parser that can
parse from any linear source of bytes (e.g. both files and in memory byte
arrays)?

There's plenty of abstractions which don't actually cost anything, and there's
room for even more.

~~~
Retra
What's the cost in making a template more generic? Greater compile time. Need
for compiler that can handle generics. Greater number of functions in
binaries, and thus more unique places for bugs to occur. Dynamic library
bloat. Harder to understand code. Quirky edge cases. Harder to learn
languages. Harder to _parse_ languages. And of course, developer time is spent
dealing with all of this.

Abstractions leak because they are _models_ , and thus are necessarily
different from the concrete things that they model. These differences can and
will cause problems. Generic programming is an abstraction that pretends you
aren't working on low-level un-typed system. And you pay for that abstraction
any time you need to deal with low-level demands through that abstraction.

~~~
johncolanduoni
> Greater compile time.

Not really with newer generics implementations (.NET, Rust, etc.). I don't
think anyone thinks C++ templates are ideal, and the others show this at least
a non-essential cost.

> Greater number of functions in binaries, and thus more unique places for
> bugs to occur.

Less human written code, so this reduces the surface area to the compiler;
which is an "issue" no matter how low level your code is.

> Dynamic library bloat.

Your alternatives are A. duplicate the classes yourself B. let the compiler
duplicate them. Where's the bloat? C++ _requires_ you to explicitly
instantiate templates if you want them in a library, and usually they're just
left to the header. The others I listed won't generate anything unless you
actually use a particular instantiation.

> developer time is spent dealing with all of this

It's pretty hard to argue that more developer time is wasted by the compiler
writer than that of the time saved for developers who use the language for
such a broadly used feature.

> Abstractions leak because they are models, and thus are necessarily
> different from the concrete things that they model.

What concrete thing do generics model? Tediously copy-pasted code? These will
only be different if you mess up when you do the copy-pasting.

> Generic programming is an abstraction that pretends you aren't working on
> low-level un-typed system.

What? It's an abstraction over an _already_ typed system that obviates the
need to duplicate code, which after the generics are unwound produces the same
code you would have had anyway. You're just as far or close to the low-level
un-typed system as you were before.

"All abstractions leak" is a nice platitude, but handwaving about "models" and
"concreteness" doesn't prove anything.

~~~
Retra
You seem to be misunderstanding the generality at which I'm speaking. Take
this:

>It's pretty hard to argue that more developer time is wasted

That's NOT what I'm arguing. I'm saying that it takes a _different_ amount of
time. Hence the concept of "trade-offs", AKA "cost."

With that said...

>Your alternatives are A. duplicate the classes yourself B. let the compiler
duplicate them. Where's the bloat?

C. Monomorphise at runtime.

>Less human written code, so this reduces the surface area to the compiler;
which is an "issue" no matter how low level your code is.

Yes, but it's a different issue. Again, there are trade-offs.

>What concrete thing do generics model? Tediously copy-pasted code?

No. They model the behavior of the algorithm as it physically exists in a
machine.

>What? It's an abstraction over an already typed system that obviates the need
to duplicate code, which after the generics are unwound produces the same code
you would have had anyway. You're just as far or close to the low-level un-
typed system as you were before.

Generics are part of your language. They do not abstract over the language.
They abstract over the behavior of your program. It's just a feature of the
language that does so with greater generality that the rest of the language.

And the code you "would have written anyway" is _machine code_. Which you
_are_ abstracting over. Generics allow you to pretend your algorithm doesn't
need a fixed machine representation. The fact that you can produce two
different machine representations for the same template code is an example of
what I'm talking about, not a counter example. The abstraction leaks. You
write one algorithm, and it has to be compiled into two different
representations. That's the cost of the abstraction. You get increased
generality at the cost of memory and time.

~~~
johncolanduoni
> Generics are part of your language. They do not abstract over the language.

Most generics systems allow the code to be unfolded into generic free code;
this is how templates are usually compiled in C++. So in a material sense the
compiler treats templates as syntactic sugar added over C++ without templates
(which is a valid language, which the compiler uses internally).

> C. Monomorphise at runtime.

A lot of generics (like my complex number example) can't be monomorphized at
runtime since floats and doubles don't have dynamic dispatch in these
languages. Not to mention the loss of type safety even if you implemented it
this way. So that is not equivalent.

> And the code you "would have written anyway" is machine code.

That's not analyzing the cost of generics, that's analyzing the cost of the
whole language, and then putting it on generics. If that's fair, why wouldn't
we add the cost of designing and operating computers instead of doing it on
paper? There's nothing special about machine code here.

> The abstraction leaks. You write one algorithm, and it has to be compiled
> into two different representations. That's the cost of the abstraction.

If (as in my example before) you need complex numbers with floating point
numbers, and complex numbers with doubles, the compilation will likely be
_faster_ because the compiler doesn't need to take two parallel
implementations through the whole parsing process, and can instead generate
the necessary ASG itself. If you use on only one, then maybe there is a
detectable differential cost.

Also, how is that even a leak? That's the abstraction working absolutely
perfectly and giving you exactly what you intended.

------
sn41
I agree with the fundamental point in the article, but isn't it well known
that when you consider multiple levels of the memory hierarchy as a whole, big
O notation needs to be modified to take into account the relative cost of
access?

I think the problem is not so much the big-O notation, but that the underlying
assumption that the data access is from an "all main memory" model with no
cache or secondary storage is often forgotten.

For example, in database algorithms, all memory operations are often
considered to be unit cost since they are an order cheaper when compared to
accessing the disk storage?

~~~
geophile
Big O notation is typically counting comparisons, and ignores constant
factors. It isn't that memory hierarchy is ignored so much as the equating of
comparison growth (in n) with performance.

The field of cache oblivious algorithms is focused explicitly on memory
hierarchy, and accounts for cache misses.

~~~
SamReidHughes
It's typically not counting comparisons, there are none when inserting
elements. But sometimes it pretends that comparisons are O(1).

------
CountHackulus
Big O isn't fooling you, it's giving you a one-function explanation of how the
algorithm runs on an idealized system as the size of the problem increases.
There's other systems you can simulate for big O, but the math is much harder
and you wouldn't generally use it unless you're doing something complex like a
cache-oblivious algorithm.

If you're using solely big O to decide on an algorithm, you're fooling
yourself. If performance is an issue, profile, benchmark, study, don't guess.

------
lziest
Big O notation is asymptotic. Using only 5 insertions to understand Big O is
definitely the wrong way. For example, insertion sort works better when the
array is small, but insertion sort is definitely O(n^2), worse than qsort.

Don't let Big O notation fool you, don't misunderstand Big O.

~~~
Guvante
Inserts in that chart is the ratio of inserts to reads.

5 means "write five times then read once per iteration" not "write five
times".

------
caf
_The result is that when you iterate over contiguous memory, you can access it
about as fast as the CPU can operate, because you will be streaming chunks of
data into the L1 cache._

This is not true at all. You'll be able to access it about as fast as the
_memory_ can operate - it'll still be much faster than randomly chasing
pointers all over the place - but the maximum processing speed of the CPU is
an order of magnitude greater than that again.

------
altendo
A lot of the focus in the HN comments is on ArrayLists, but I'm curious why
the author chose to give linked lists an O(1) insert time. In some
implementations (doubly linked list, inserting at the head of the list) I can
see the O(1) time, but when appending to the end it also is O(n) because one
has to traverse the entire list of elements before updating the link at the
end of the list. Might just be nitpicking here but that might also be
affecting the author's results.

~~~
tbirdz
Just keep a pointer to the head and tail of the list. You can then append to
either end of the list in O(1). When you put a new node at the tail end, use
the tail pointer to get to the tail node instead of walking the whole list. Of
course, you need to make sure the head/tail pointers are always updated to
point to the current head/tail of the list.

------
Rhapso
The question I always ask before replacing an n^2 process with an nlg(n) one
is "Am I in that window where n^2 is actually faster?"

------
Const-me
Ported to C++, added two ATL collections, also added 100M elements data point:

[https://github.com/Const-me/CollectionMicrobench](https://github.com/Const-
me/CollectionMicrobench)

Arrays are still generally faster than lists.

The funny thing is Microsoft’s linked lists are faster than C++ standard
vectors.

~~~
ric129
>The funny thing is Microsoft’s linked lists are faster than C++ standard
vectors.

If I had to guess, it's because the std::vector is more conservative in memory
use and it causes more malloc/array copy calls.

~~~
Const-me
I think the main reason is CAtlList class encapsulates its own memory pool. It
allocates RAM in batches. The default batch size for CAtlList is 10
elements/batch, user-adjustable in constructor, but I kept the default value
10.

The elements are created directly adjacent to each other. This makes iteration
faster because RAM locality despite the pointer-based data structure.

------
AStellersSeaCow
There's a simpler and broader point: don't use Big O as the sole means of
analysis of a high level language's data structures. The theoretical
time/space complexity of a data structure may or may not accurately reflect
how it's actually implemented in that language.

------
jontro
I just read this nice piece on linked lists and why they have almost no place
in programming nowadays

[http://cglab.ca/~abeinges/blah/too-many-
lists/book/](http://cglab.ca/~abeinges/blah/too-many-lists/book/)

------
nayuki
Arrays are fast for getting random elements and inserting/deleting at the end
(amortized), but slow for inserting/deleting in the middle.

Linked lists are fast for inserting/deleting/getting at the beginning and end,
but slow for random access for anything in the middle.

So far, that's what the article covered. Now if you want fast random
insertions/deletions and fast random access, this is possible with balanced
trees. A tree-based list can support all single-element operations in O(log n)
time. Sample code: [https://www.nayuki.io/page/avl-tree-
list](https://www.nayuki.io/page/avl-tree-list)

~~~
edejong
That's not what the article is about nayuki. The author of the article makes
and proves a claim that arrays are faster than linked lists regardless of the
insertion point. The metrics presented clearly show this.

The reason for this abnormality is cache locality (there are other reasons,
which I will not go into right here).

Balanced trees can be pretty slow in fact. After some operations, a tree
structure can become quite fragmented in memory, leading to many cache misses.
In my experience, it is often faster to work with arrays instead of trees when
processing in memory. However, using external storage, a b-tree often
introduces quite some performance gain.

------
qwertyuiop924
Here's a question: How often does this _matter_? No, not that big O can fool
you. That always matters.

How often does it matter that non-contigouous memory access is slow? Really.
How much do those few useconds really matter? In most apps, I would guess that
a CPU cache miss isn't noticable by humans.

Yes, non-contiguous structures are significantly slower, but if you don't need
to be as fast as possible, eliminating them for that reason only (assuming
that there is any non-perf reason to stick with them) is a premature
optimization.

But if you ARE optimizing, yeah, you need to think about how often worst-case
occurs. Because big O only tells you about worst-case. NOT the average.

~~~
corysama
_A_ cache miss isn't noticeable by a human. Code that cache misses a lot runs
10-100x slower than code that takes into consideration that it's running on a
physical machine and not an abstraction. That is very noticeable by humans.
Even when your data structures and algos are designed with a nice O(logN),
it's very noticeable when one program bogs down with 1/6 the data compared to
another.

I work in games, so the story I tell new kids is: The PlayStation2 ran at
300Mhz and a cache miss would cost you 50 cycles. The PlayStation3 ran at
3200Mhz and a cache miss would cost you 500 cycles. So, if you are cache
missing a lot, your PS3 game will run as if it were on a PS2.

In other words, not paying attention to cache make your computer run like it's
10 years older than it is. You paid for a modern machine, but you are getting
the results of a Craigslist junker. This is true outside of games. It's the
reason 4x2Ghz cellphones struggle to run seemingly simple apps. It's a big
part of the reason people struggle to orchestrate armies of servers
(distributed computing is easier when it's 90% less distributed).

Is it really harder to work with the cache system instead of ignoring it?
Yeah, it requires a tiny bit of study an a little bit of planning. In
contrast, the theme I see a lot online is to completely dismiss physical
reality in favor or theory. And, the theme I see almost universally in the
students (and many senior engineers) I interview is complete ignorance of the
very existence of cache and it's effects on the code they write. It's very
concerning...

~~~
qwertyuiop924
No, I don't deny it's important to know that cache misses exist, and what they
can do: to the contrary, it's vital.

However, in 90% of applications, it's not going to matter, because those
applications are spending hundreds of cycles waiting anyways: Disk or network
IO, user input, all that stuff is way slower than a cache miss. If you're
writing a video game, or a database, or other software with very high soft-
realtime speed requirements, or heavy data access, by all means, optimize to
avoid cache miss.

But if you're writing a company-internal Rails app, nobody's going to notice,
even if you're getting cache miss after cache miss. Which you probably won't.

Actually, if your language isn't compiled, a cache miss is the least of your
worries, perf-wise.

And now I've got to see if I can optimize my code to avoid cache misses. But
the code's in Scheme, so unless the initial access cost is amortized, I'm
already doomed...

~~~
Koromix
You're looking at this problem backward. For example, you mention user input.
Users may need a second to click or touch a button, but when they do the
software should react _instantly_ , and that does not leave you that many
cycles. My smartphone's lock screen is my go-to example: most times it fails
to follow my finger, and I barely have anything running on it.

Most of the dynamic languages are data and instruction cache-miss machines.
They chase objects and pointers all around the memory.

~~~
qwertyuiop924
>My smartphone's lock screen is my go-to example: most times it fails to
follow my finger, and I barely have anything running on it.

...That doesn't sound like a cache miss. Knowing Android, A cache miss is
probably the least of your worries.

>Users may need a second to click or touch a button, but when they do the
software should react instantly, and that does not leave you that many cycles.

You raise a good point...

>Most of the dynamic languages are data and instruction cache-miss machines.
They chase objects and pointers all around the memory.

...and this is part of my point. If you look at a problem and think, "A high-
level language is fast enough," then you are implicitly saying that the
latency of a cache miss is acceptable. And IME, in most cases that's true. I
mean, heck, I'm using Scheme, so while I may have pointer chases like the
Amazon has trees, but I CAN optimize them into array lookups, and my code is
compiled: not great, but better than most HLLs.

It's the same argument as always: perf vs. development speed. You can be in
the C and FP loop, or the Lisp and JS loop.

~~~
Koromix
> That doesn't sound like a cache miss. Knowing Android, A cache miss is
> probably the least of your worries.

My example was meant to illustrate the user input problem. From what I know
about Android, the absymal performance is very much a case of "death from a
thousand cuts".

> It's the same argument as always: perf vs. development speed. You can be in
> the C and FP loop, or the Lisp and JS loop.

The fast(er) languages we have are old and full of warts, and that makes them
slow to develop in. The heavily used HLLs such as Python and Ruby were made by
people who did not care much (at all?) about performance, and it shows in many
design decisions. But here's the thing: we could have both at the same time. I
don't buy this dichotomy.

~~~
qwertyuiop924
>But here's the thing: we could have both at the same time. I don't buy this
dichotomy.

That's actually not true: OO, dynamism, late binding, and a lot of the other
things that HLLs have to offer require a lot of pointer chasing and non-
consecutive datastructures. I'm mostly a Schemer, and Scheme and Lisp have had
decades of research put into making them compile and run fast. Most dynamic
languages aren't so lucky. But the required pointer chasing and garbage
collection mean they'll never be as fast as C.

Functional programming languages, however, are rarely late-binding, and don't
expose as much about their implementation, so some of the pointer chasing can
be avoided.

Rust doesn't need a GC, and is fairly C-like - or rather, ALGOL-like and
BLISS-like - with added memory safety. So with a programmer who knows what
they're doing, it can be pretty fast. But here's the rub: the faster a
language is, _the closer it has to be to the metal,_ and the less it can do
with high-level features.

So yes, you can make HLLs faster, but you can't take the cache misses out of
an HLL, and you can't make a systems language wearing an HLL's clothing -
although Rust is making an admirable attempt.

~~~
Koromix
> OO, dynamism, late binding

None of these are required for ease of development. At least the first two
often result in precisely the opposite.

HLLs are not required to focus on slow abstractions. For example, homogeneous
arrays of tagged unions can replace inheritance most of the time. And they
avoid breaking your code in 10 files and 20 classes (though for some reason
this metric is seen as a good thing way too often).

~~~
qwertyuiop924
OO, perhaps, but without dynamism and late binding, metaprogramming is
difficult, as is any number of techniques. Like, for instance, class
generation, and extending methods.

And while I don't set much store by inheritance, it set considerably more
store by duck-typing and polymorphism.

>None of these are required for ease of development. At least the first two
often result in precisely the opposite.

So are you talking about Java-style OO? Because I was talking about Smalltalk
OO, which is pretty different.

 _Required,_ no, but they're tools, and they come in handy. Certainly make
development a lot more comfortable.

And some of the time, they make delvelopment a lot easier.

~~~
Koromix
> Like, for instance, class generation, and extending methods.

This is precisely the kind of thing that leads to unmaintainable "magic" code,
even though it can still be useful but with _extreme_ moderation. So I don't
see the point of making that a core feature of any language.

If you have any example to the contrary, I'd love a link to a good open-source
project that uses these things extensively.

~~~
qwertyuiop924
RSpec? A lot of ruby uses metaprogramming in some respect.

As for not seeing the point of making it a feature of your language, how about
Lisp? And I'm not just talking macros. Lambdas, late binding, and in Scheme,
the ability to rebind anything lead to a lot of cool tricks and capabilities.

And late binding is extrordinarily important.!

~~~
Koromix
I don't consider the Ruby ecosystem to be a good example of much. Idiomatic
Ruby code is much slower and usually not more maintainable than C++. Actually,
it may even be worse thanks to dynamic typing, which makes refactoring much
more painful than it already is in large code bases.

Well I guess it's good at making CRUD web sites. Hardly rocket science.

~~~
qwertyuiop924
Okay then: Lisp.

Tinyclos is a fairly sophisticated implementation of OO and the MOP, written
in Scheme.

Its descendants, COOPS, GOOPS, and others, are in most schemes today. Many of
them are written in their respective dialect of scheme, with little or no
specific compiler support.

SXML allows for writing XML in native scheme syntax.

The anaphoric macros (aif, acond, etc.) are all, well, macros, and thus use
metaprogramming principles.

tclOO, [incr tcl], and other OO TCL systems are usually implemented in regular
TCL.

Give or take, any large Lisp or Smalltalk codebase takes advantage of dynamic
typing, late binding, and some form of metaprogramming.

However, you've made it clear that you hate Ruby, Dynamic Typing, and other
such things, as given as much of metaprogramming requires this sort of
flexibility, I very much doubt anything I say will convince you that dynamic
languages are in any way useful.

~~~
Koromix
I don't hate them. I've used python more than once, and will continue to do
so. And I think it's great teaching material. It's a good _scripting_
language. But I think its drawbacks far outweigh its advantages for large
projects.

All your examples are programming gimmicks, and I've yet to see stuff that
solves actual hard problems. I'm not interested in programming for
programming's sake. I want to use it to make my computer do _useful_ stuff.

~~~
qwertyuiop924
Gimmicks? Some of them, yes, but CLOS and its descendants are used in Real
applications, as are the TCL OO systems. But if you want Real World, I'll give
you real world.

Maxima is a descendant of the original MACSYMA, developed at MIT. It is still
a usable and viable system, even if it has ageda bit.

Emacs is a popular programmer's text editor written in C and a dialect of
lisp.

Both of the above programs are large, useful, and written in an HLL - one
particularly amenable to pointer chasing, I might add - and they make use of
the variety of abstractions which that HLL provides.

If those aren't modern enough for you, check out some Clojure applications.

------
billconan
I think the overemphasis of big O, especially during job interviews, is a sad
thing.

I think multi-threading is an equally important skill, that gets less
attention.

~~~
SamReidHughes
It's not. The reason is, a lot of stuff isn't multithreaded or is just a bunch
of threads talking to a database. (I've been asked questions about
multithreaded stuff, in interviews at companies that specifically did
multithreaded stuff. Companies that talked to databases would be better off
asking SQL stuff.)

------
jwatte
One thing I didn't see in the article: in addition to cache misses, the list
stores 3 words per word; the array between 1 and 2 (up to 3 only when
reallocating,) so array touches less RAM, even when inserting.

------
yandrypozo
The author is doing less than 10 insertions per benchmark, that's practically
a constant O(1), even if you have to copy the entire array 5 or 10 times don't
affect the Big O analysis

~~~
Moto7451
But he's inserting at the front of an array, which requires the rest of the
elements to be copied into a new array. DotNetPerls has a clearer example:
[http://www.dotnetperls.com/list-insert](http://www.dotnetperls.com/list-
insert)

------
akandiah
Technically speaking, he ought to be using Big Theta (Θ) to describe his
bounds. Throwing Big O around to describe everything is foolish.

------
kazinator
The benefits of the list abstraction are small---because it's a clumsy, blub-
like list abstraction with a container object and iterators.

Look at the code; termination of the loop is even based on an integer count
pulled from the container.

A True Scotsman's linked list has no such thing. It's either an empty
indicator (like NIL in Lisp) or a binary cell consisting of an item and a
pointer to the next one.

The benefit of that abstraction is that you can recurse over it directly
without clumsy additional parameters having to be passed.

Another benefit is substructure sharing. We can insert at the front of a list
not only in O(1) time and that is great. But perhaps more importantly,
existing copies of the list before the insertion _do not change_. And it is
the same way if we delete from the front: we just move the local head pointer
to the previous node, which doesn't affect anyone else who is still holding on
to the pointer to the original front node.

These lists also allow lock-free operation, unlike "heavy weight" containers.
Suppose we have a list that acts purely as a container, but is shared by
multiple threads. We can insert into it by consing a new node onto the head of
the current snapshot of the list, and then doing an atomic-compare-swap to
install that head in place of the old head. If it fails, it means the list
changed behind our back; we cons the node onto the new list (or rewrite the
cons to point to the new one as its "rest" pointer) and try again.

Some of the caching benefits of the array will disappear if the array holds
only references/pointers to items. In this example, the containers are typed.
The List<int> actually can allocate the int objects in an array that packs
them together in memory. Whereas the LinkedList<int> has individual separately
allocated nodes which hold the int. Suppose the List and LinkedList hold
pointers to heaped objects. Then the impact of caching is softened. It's still
the case that multiple pointers in the array can be cached together; but these
have to be traversed to access the items themselves. In the case of the
LinkedList, we have to traverse a pointer to get to the next node and traverse
a pointer to get to the heaped object. But two pointer traversals versus one
is not as bad as one traversal versus zero. If the objects being traversed are
fairly complex, and have pointers to additional objects that have to be
examined during the traversal, it matters even less what kind of list they are
accessed from. If I have a list of File objects, for each of which a stream is
opened and a regex scan performed, why would I care whether it's an array list
or a linked list.

The results shown in this test case tell me that the performances of the two
containers are not that far off! Here they are subject to a test case that is
designed to highlight the difference between them by making the container
content, and its processing, almost as trivial as possible. That 38 seconds
versus 51 difference is almost purely in the container-related operations.
That is as bad as it gets: from there, the more actual real work you do per
container node, the smaller the actual difference. (What is 51 versus 38 in
"orders of magnitude"? Why 0.12 orders. It's 0.42 "binary orders of magnitude"
(where 1 binary order is a doubling; terminology mine). So in terms of classic
Moore's Law (speed doubling every 18 months), that's a 7.6 month advance. "My
arrays are 7.6 months ahead of your linked list container, in Moore's Law, in
a pure benchmark; eat my dust!"

------
happytrails
Oprah fooled me once!

------
ctvo
If the author sees this: please consider changing your color choices. It'd
make reading the content you put so much work into producing easier for
everyone. I couldn't finish the post. The 90s hacker's lair colors were that
offensive.

~~~
T0T0R0
The process of weeding out. Sometimes it's better to select against sensitive
people.

~~~
acbabis
Why would you select against people with poor vision?

~~~
k__
The green seems a bit harsh, yes.

But all in all it is a good page for people with poor vision.

No low-contrast and no white background, which are both considered bad on
screens.

~~~
acbabis
Yes, the white on black is fine. I checked the #8a7ae2 purple
([http://webaim.org/resources/contrastchecker/](http://webaim.org/resources/contrastchecker/))
and, surprisingly, it passes WCAG AAA guidelines. I say "surprisingly" because
it hurts my eyes. I think jumping between the alternating white and purple is
what does it.

------
Kenji
This is a great point and I'll probably have to do some performance
measurements and change parts of my code now. I should be more careful with
sacrificing contiguous storage for O(1) insertions.

It is also worth noting that some manuals say that appending to an array list
is O(1) amortized. Which is true, if you make an amortized analysis (which
essentially distributes the workload of copying the array into a larger block
over all the inserts). That's to keep in mind for systems that have to be
realtime or at least produce stable framerates. The worst-case is important
and amortized analysis generously glosses over it.

EDIT: Not inserting, appending

~~~
ncw33
Insertion into an array list at a uniformly-distributed location is always
O(n): you can't avoid moving half the list. Appending is amortized O(1).

~~~
rifung
Not the person you're responding to but perhaps that person meant assuming
your insertions are uniformly distributed? Then I think insertion is O(n)
amortized..?

At an insertion you'd have

(n + (n-1) + .. + 1)/n = O(n^2) / n = O(n)

The first part comes from each possible run times, each with probability 1/n.
You might have to expand the list but that'd also be O(n) amortized.

