
The Myth of RAM (2014) - ddlatham
http://www.ilikebigbits.com/blog/2014/4/21/the-myth-of-ram-part-i
======
Tojot
It so happens that a large part of my PhD was on this very subject. The result
I've got N log(N), this is more visible when you get to larger RAM (I had 0,5
TB RAM at the time). We have an empirical result, a justification and a
rigorous predictive model.

The reason has to do with hashing, but a different type: TLB.

I posted more details as
[https://news.ycombinator.com/item?id=12385458](https://news.ycombinator.com/item?id=12385458)

~~~
lorenzhs
Thanks, I came across your research before and thought it was quite cool. In
Section 8.5, when discussing whether hash tables would be suitable for
handling TLB misses, wouldn't denial of service attacks also be a concern? If
an attacker knows the hash function and can control the addresses being looked
up, they might be able to trigger worst-case behaviour on every lookup,
couldn't they?

~~~
Veedrac
That's addressed in that very section:

> Moreover, an adversary can try to increase a number of necessary rehashes.

It seems to me that the section is a bit too dismissive, though, as there are
hash tables and hash functions that mitigate these concerns. In particular,
collisions can be replaced with trees, like in Java, limiting the worst case
to O(log n) again.

~~~
lorenzhs
Rehashes and bad probing behaviour aren't quite the same thing, but I'll let
it count and admit I may have replied too quickly ;)

------
aaronbwebber
The problem with this analysis is that in the graph in the very first part he
shows that memory access IS O(1) for pretty substantial scaling factors, and
then when you hit some limit(e.g. size of cache, size of RAM) access times
increase very rapidly. Sure, if you draw a line across 6 orders of magnitude,
it ends up looking like O(n^1/2), but how often do you scale something through
6 orders of magnitude?

The "memory access is O(1)" approximation is pretty good, certainly good
enough for almost all every day use. The median size of a hash table I
allocate definitely fits in L1 cache, so why shouldn't I think of it as O(1)?
If you are reading off of disk, the O(1) approximation holds as long as your
dataset stays between 1 MB and 1 GB. That's quite a bit of room to play around
in.

Yes, you need to be aware of access times and the changes in them if you are
really scaling something way up. But I'm not convinced that I shouldn't just
keep thinking of "hash access is O(1)" as a convenient, generally accurate
shortcut.

~~~
xenadu02
It is trivial to exceed the L1 cache size and not that uncommon to exceed L2.
That brings us to a 100x delta which is worth thinking about, no? Even
exceeding L3 isn't horribly rare for average desktop CPUs (let alone mobile
devices).

There are other dimensions to this too like prefetching, streaming,
pipelining, vectorization, etc. In some cases using an array and doing a
linear search is faster than any hashmap on a modern CPU.

I think the takeaway from this article is not to blindly trust the theoretical
big-O numbers for data structures and actually test them with realistic
datasets on your target hardware.

~~~
aaronbwebber
This is a very good point. You should run actual performance tests. The
results may surprise you!

In my actual work, optimizing in memory searches is not worthwhile because
saving half a dozen microseconds on a search is not real valuable when you are
about to spend half a dozen milliseconds making a database call (which is
technically O(log N), since there's a B-tree index back there). But I do spend
quite a bit of time figuring out which of those queries should be moved to an
in-memory O(1) cache (redis or memcache, depending).

And I actually never think about the perf tradeoff between the DB and the
cache being O(log N) vs O(1) - I think of it in terms of the median and 99th
%tile times that I know from monitoring our actual production servers. So as
xenadu points out, I guess there is some truth in the article, but it's just
sort of lost in this somewhat academic discussion of scaling things to
infinity and beyond.

~~~
pjc50
_In my actual work, optimizing in memory searches is not worthwhile_

Funnily enough, I had to do this last week. We started from the question "why
does it take 160ms to scan 60,000 objects in memory?" and reached the
disappointing conclusion that, on this particular platform (iMX6/800MHz/DDR3)
a cache miss costs you an astonishing _233ns_. I got as far as double-checking
all the RAM initialisation parameters before giving up.

There is now an index on the in-memory search.

------
scott_s
> For the purpose of this series of articles I'll be using the O(f(N)) to mean
> that f(N) is an upper bound (worst case) of the time it takes to accomplish
> a task accessing N bytes of memory (or, equivalently, N number of equally
> sized elements).

That's not really valid; it's not how algorithmic analysis works. The author's
conclusion for what is happening and why is correct, but I believe he is
confused about how to get there.

Simply, when doing complexity analysis on an algorithm, one must always count
an operation. It's not okay to point to the time taken for an implementation
and say "That's our function." It is _a_ function, but it's a function of
time, not a count of how many operations are performed at given sizes of N.

However, he _is_ correct that naive analysis of arrays and linked lists will
result in this odd behavior: arrays will tend to outperform lists on real
systems. The problem with the naive analysis is in what it counts. For
example, on an insert, a naive analysis will count the number of elements
accessed in the structure. That's naive because it assume all accesses are the
same - which is what he's getting at with the "myth of RAM". Because of the
memory hierarchy, they are _not_ all equal.

But the correct response is not to give up counting operations and look at
time, the correct response is to find the right thing to count. And the right
thing to count is basically going to be last level cache misses - the
operations that force one to go to memory. If you do that, then you will find
that the operations you are counting will correlate much better to the actual
time spent.

In some places, the author gets this mostly correct: "You can also use Big-O
to analyze the time it takes to access a piece of memory as a function of the
amount of memory you are regularly accessing." That's fine, as you're counting
memory accesses.

In other places, it's not correct: "That I use Big O to analyze time and not
operations is important." You can't count time, only operations. You want to
count the operations that correlate with your actual running time, but the
entire _point_ of good analysis is to find those operations. You can't just
shortcut it, only measure time, and then call it algorithmic analysis.

The author gets a lot right, but despite the lengthy discussion, I think he
still has some confusions about algorithm complexity analysis.

For the record, these lessons should be familiar to anyone who has done
serious performance analysis of computer systems, either on their own, or in
the context of a course that focused on systems or architecture.

~~~
_delirium
Fwiw theoretical computer scientists do this kind of analysis as well, while
the post makes it sound like they don't. The equal-cost model of RAM
operations is only one model used in asymptotic algorithm analysis, and there
are other cost models with other properties. The introduction of this paper
gives a decent concise overview of some of them:
[http://www.cs.cmu.edu/~rwh/papers/iolambda/short.pdf](http://www.cs.cmu.edu/~rwh/papers/iolambda/short.pdf)

------
ChuckMcM
Since it is a topic I'm interested in I took the time to read all 4 parts, the
author manages to summarize it in a paragraph which would have been helpful at
the beginning:

 _When somebody says “Iterating through a linked list is a O(N) operation”
what they mean to say is “The number of instructions needed to be executed
grows linearly with the size of the list.”. That is a correct statement. The
argument I’m trying to make is that it would be a mistake to also assume that
the amount of time needed would grow linearly with the size of the list as
well. This is an important distinction. If you only care about the number of
instructions executed that’s fine, you can use Big-O for that! If you care
about the time taken, that’s fine too, and you can use Big-O for that too!_

Sadly, he doesn't take this knowledge to its conclusion. Let's introduce the
notation Oi() for the Big-O notation in instructions, and Ot() for the Big-O
notation for time.

Lemma: For all f(N), if Oi(f(N)) > Oi(g(N)), Ot(f(N) will be > Ot(g(N)).

Or put another way, it's important not to confuse complexity scaling with time
scaling, but the more complex the computation, the longer it will take.

~~~
DanWaterworth
> if Oi(f(N)) > Oi(g(N)), Ot(f(N) will be > Ot(g(N)).

That doesn't hold. Some instructions take longer than others.

~~~
jacobparker
If all instructions complete within some constant deadline independent of
N/the input then it does hold even if they take different amounts of time.

~~~
DanWaterworth
It doesn't hold for binary tree lookups vs B-tree lookups.

~~~
malisper
The hypothesis of the statement doesn't hold. O(n log n) isn't greater than
O(n log n).

~~~
DanWaterworth
You have to evaluate it in a model with an explicit cache width size, then
O(log_b(n)) < O(log(n))

~~~
SamReidHughes
O(log_b(n)) is the same as O(log n). With O(sqrt i) memory accesses you still
have a constant factor separation between the absolute time performance of the
two.

~~~
DanWaterworth
> O(log_b(n)) is the same as O(log n)

Not in every abstract model. See other reply.

~~~
SamReidHughes
It has nothing to do with the model and is just a question of whether b is a
variable in the big O notation.

And in this case of O(sqrt i) memory access times, a binary tree and b-tree
stay within a constant factor even as you vary b. (The reason is, the binary
tree accesses that a single b-sized access replaces get exponentially more
"local.")

~~~
DanWaterworth
> It has nothing to do with the model and is just a question of whether b is a
> variable in the big O notation.

There are models, like the cache oblivious model, where b is assumed to be a
variable, so the model matters.

> a binary tree and b-tree stay within a constant factor even as you vary b.

it's not a constant factor if b is a variable. It's a factor of log b.

~~~
SamReidHughes
> it's not a constant factor if b is a variable. It's a factor of log b.

It's between 1/(sqrt(2)-1) and sqrt(2)/(sqrt(2)-1), depending on how full the
b-tree nodes are.

~~~
DanWaterworth
I don't know where you got those numbers from.

~~~
SamReidHughes
> The reason is, the binary tree accesses that a single b-sized access
> replaces get exponentially more "local."

sqrt(n)+sqrt(n/b)+sqrt(n/b^2)+... versus sqrt(n)+sqrt(n/2)+sqrt(n/4)+...

And since b-tree nodes can be half empty, there's a sqrt(2) uncertainty. (And
of course there is no other memory overhead at all, none whatsoever.)

~~~
DanWaterworth
Thanks, I see. Are you assuming ephemeral usage of nodes to make that claim?

~~~
SamReidHughes
I don't know what that means. I'm assuming a random element of the tree is
picked, that parent nodes are in a smaller or equal cache level than children,
that all but one cache levels are used completely, that each cache level has
O(sqrt n) access time, and that there is an upper bound on the ratio between
successive cache sizes.

Or less generally: it takes sqrt(j) nanoseconds to dereference the pointer
with value j, and parent nodes are at smaller addresses than their children.

~~~
DanWaterworth
Right, you're assuming there's one tree being maintained.

~~~
SamReidHughes
Or any fixed number of them, or any curve where the number is polynomially
smaller than the total size...

------
reikonomusha
I think there's some good info in this article covered by various degrees of
misinformation. For some reason, the article starts off with this totally
wrong definition of big-O, and proceeds to make conclusions with this wrong
definition. Let me provide the accurate definition:

The statement "f is O(g)" means there exists some input, call it t, such that
for every x >= t, it only takes some constant multiplier M (i.e., constant in
x) to _always_ have g absolutely no smaller than f. In notation:

|f(x)| <= M * |g(x)|, where x is at least t.

This bit about "x is at least t" is very important and notifies us that this
is "asymptotic behavior".

It does not make a difference how wacky or weird f is compared to g below t.
It can contain all these crazy memory hierarchy artifacts, it could contain a
short burst of exponential slowdown, it could contain anything.

Furthermore, according to the above definition, big-O has nothing to do with
any tangible quantity whatsoever. It's a method for comparing functions. The
functions may represent whatever is of tangible or intangible interest:
memory, time, money, instructions, ...

Big-O analysis usually posits that the details below t aren't the details that
matter. (Of course, there are situations where they do, but in such you would
not use big-O.) If you want to have some analysis that is global, you don't
need asymptotic analysis (though it might help as a start). You can just talk
about functions that are strictly greater than or less than your function of
interest everywhere. But these analyses are difficult because a much higher
level of understanding of your function of interest is required.

~~~
zeroer
The author gets the definition of big-O completely backwards. Every single
time the author says "Big-O" he means "little-o".

Here are the definitions:

[https://cathyatseneca.gitbooks.io/data-structures-and-
algori...](https://cathyatseneca.gitbooks.io/data-structures-and-
algorithms/content/analysis/notations.html)

Basically, big-O is a claim that your process is at least -this- fast. Small-o
is a claim that your process can't be faster than -this-. He means the latter.

> Still not convinced? Think I'm misunderstanding Big-O?

Yea, I do.

~~~
jo909
Please see
[https://en.wikipedia.org/wiki/Best,_worst_and_average_case](https://en.wikipedia.org/wiki/Best,_worst_and_average_case)

Big-O is valid notation for _many aspects_ of an algorithm. The author applied
it to a very practical one and explained what exactly he describes very well.

~~~
reikonomusha
I think it's a massive and needless abuse of standard computer science
terminology to re-define it as something completely different.

~~~
MrManatee
I'm not entirely sure what you are referring to. You might be referring to the
fact that the author's definition of big-O doesn't say anything about constant
factors or asymptotics. This makes the definition incorrect, or at least
sloppy. But judging by usage, it seems that he actually knows and is using the
standard definition. The error is just in that one sentence, and it doesn't
affect the rest of the argument.

You might also be objecting to the fact that he makes a distinction between
time and instruction count, and is using big-O notation for both. I don't
think there's anything nonstandard about making this distinction when it needs
to be made. Take, for example, Karmarkar's algorithm:

[https://en.wikipedia.org/wiki/Karmarkar%27s_algorithm](https://en.wikipedia.org/wiki/Karmarkar%27s_algorithm)

------
wscott
Great series of articles and the lessons are very important to someone writing
performance system's programs.

Here is another chart I like you show people:
[https://dl.dropboxusercontent.com/u/4893/mem_lat3.jpg](https://dl.dropboxusercontent.com/u/4893/mem_lat3.jpg)

This is a circular linked list walk where the elements of the list are in
order in memory. So in C the list walk looks like this: while (1) p = *p;

Then the time per access was measured as the total length of the array was
increased and the stride across that array was increased. The linked-list walk
prevents out of order processors from getting ahead. (BTW another huge reason
why vectors are better than lists)

(This is from an old processor that didn't have a memory prefetcher with
stride detection in the memory controller. A modern x86 will magically go
fast.)

From that chart you can read, L1 size, L2 size, cache line size, cache
associativly, page size, TLB size. (It also exposed an internal port
scheduling bug on the L2. A 16-byte stride should have been faster than a
32-byte stride.)

------
jcoffland
Math is pure and not constrained by the real world. Big O analysis begins with
the assumption that you have unlimited uniform memory. The author points out
that memory is not uniform in the real world. It's equally untrue that we have
infinite memory at our disposal. The limits of the real world are good to
remember but that does not invalidate Big O analysis.

~~~
claudius
For the analysis in the article, it is enough to assume an unlimited amount of
space into which you can put your memory as well as an unlimited amount of
time. For your own analysis, you can of course make any assumptions you wish,
but the author argues (and I agree with him there) that the spatial scaling
O(sqrt(N)) for random access is both relevant in practice (as seen by the
caching graph) and cannot be improved on when considering the laws of physics.

> The limits of the real world are good to remember but that does not
> invalidate Big O analysis.

No, it just means that you failed to take one specific bit into account and
that your analysis may hence be less relevant to the real world. Much the same
way we can say that integer addition is O(1), because usually we deal with
fixed-size integers, we can say that memory access is O(1), because usually we
deal with a fixed maximal memory size. Of course, integer addition is
O(log(N)) and, according to the arguments made in the article, memory access
is O(sqrt(N)).

~~~
jcoffland
The author assumes throughout the article uniformly random memory access. This
is the worst case for a cache. If you're going to get down to the nuts and
bolts of your implementation and come out of the high world of mathematics and
Big O then you cannot just consider one element, namely, in this case caching.
You should also consider your access pattern which very likely is not
uniformly random and therefore probably does not fit the authors analysis. In
fact, the only reason caching works is because the authors premise is
generally wrong.

~~~
claudius
The author does address this point in part three of the series where he
compares access times to a small area of a larger memory region. Even with
absolutely sequential access to an array of size K, you first have to find
this particular array in your larger memory region of size N, giving you total
cost to iterate over the full array as O(sqrt(N) + K).

I am not sure how costly it is to iterate through your entire memory (assuming
a no-op), but I would argue that eventually you reach some bottleneck and end
up at O(sqrt(N)+N), too.

------
michaf
Interesting read. Researchers in the HPC community have developed a number of
performance models to predict real-world performance in more detail than
possibe through simple Big-Oh of number of operations, e.g. while OP
concentrates on latency, the Roofline model (
[https://en.wikipedia.org/wiki/Roofline_model](https://en.wikipedia.org/wiki/Roofline_model)
) mainly considers limited memory bandwidth.

------
corysama
A lot of people are pointing out that BigO is a purely theoretical,
mathematical model that should be understood and used properly without regard
to silly details like physics.

That is theoretically correct. But, the difference between theory and practice
is that in practice there exists a large percentage of programmers writing
code for the real world without understanding and using BigO properly. Their
mental model of performance begins and ends with BigO. As far as they are
aware, its model is reality.

Source: I've been giving a large number of programmer job interviews lately.
It's a rare day when I encounter an engineer (even a senior one) who is aware
of any of the issues brought up in this series. And, I work in games!

------
MaulingMonkey
The article is still wrong - iterating through a linked list is O(N log(N)
sqrt(N)). You can't have infinite nodes in a 16-bit, 32-bit, or even a 64-bit
address space - to deal truly with N, one must consider the more generic case
of a variable address encoding, which has a variable size (log(N)) and
associated lookup etc. costs as the number of nodes grows.

This is the motivation behind e.g. the "x32 ABI" in Linux: All the power of
x86-64 instructions, with none of the additional cache pressure/overhead of
64-bit pointers - log(32) being cheaper than log(64).

...ahh, being this explicit in your Big-O notation is probably not that
useful, usually, although I've seen it occasionally in papers (where they're
quite explicit about also counting the number of bits involved). Maybe they're
dealing with BigNum s, which would make it a practical concern? The key
takeaway is this:

> That I use Big O to analyze time and not operations is important.

Time depends on compiler settings, allocation strategy, and a whole host of
other factors that are outside the purview of your algorithm. Operations is a
lot easier to contrast and compare between different algorithms, the meat of
what you're trying to do most of the time. Both are valid choices, just know
which one you're dealing with.

The time factors are good to be aware of, to be sure - the performance
pitfalls of (potentially) highly fragmented, hard-to-prefetch linked lists
over unfragmented flat arrays should be well known to anyone charged with
optimizing code - but it's probably easier to think of them as some nebulous
large time constant (as even array iteration is going to hit the same worse-
than-O(N) behavior, although with proper prefetching the bottleneck may become
memory bandwidth rather than memory latency) and deal with those differences
with profiling and other measurements, instead of Big-O notation.

~~~
AstralStorm
Sorry, but for asymptotic analysis pointer size is considered constant. So,
even the smallest last uses B-bit pointers.

Feel free to introduce a packed pointer type that would be asymptotically
faster, but then it is a different data structure.

~~~
MaulingMonkey
Since when does a linked list mandate unpacked, fixed-width pointers? Citation
needed! I note they are not called _pointered_ lists, and challenge the
assumption that they even need use pointers - or do you think this abstract
computer science concept is something different when applied to disk storage
formats by using file offsets as the link? Linked lists as a concept date back
to ~1955, when RAM was _very_ expensive - magnetic-core memory being just
introduced. Think punch cards.

Returning back to the modern era - one also does not need a packed pointer
type, merely support for multiple pointer sizes within the same process, and
the ability to dynamically dispatch to the correct one for the job. And
multiple pointer sizes within the same process are a matter of routine:

In the 16-bit C era, vanilla "near" and "far" pointers very clearly weren't
the same size. A very similar situation occurs under the hood when dealing
with 32-bit processes on 64-bit windows via WoW64, which can lead to some
interesting complications if you use a 64-bit application to create a
crashdump for a "32-bit" process, where you can actually inspect the 64-bit
side of things too (fire up WinDbg and dig in: [https://msdn.microsoft.com/en-
us/library/windows/desktop/aa3...](https://msdn.microsoft.com/en-
us/library/windows/desktop/aa384163\(v=vs.85\).aspx) )

This is to say nothing of 20+ byte member function pointers on 32-bit
architectures because C++ is only out-crazied by C++ compilers. Or dealing
with the complications of multiple pointer sizes via shared memory. Or dealing
with file "pointers" (read: offsets) of varying sizes.

------
DanWaterworth
> You'll know that iterating through a linked list is O(N), binary search is
> O(log(N)) and a hash table lookup is O(1). What if I told you that all of
> the above is wrong?

It's not wrong, it doesn't have enough contextual information to be right or
wrong.

~~~
tantalor
"Not even wrong"
[https://en.wikipedia.org/wiki/Not_even_wrong](https://en.wikipedia.org/wiki/Not_even_wrong)

~~~
Retric
Ehh, his assumption is that O notation should say something at lower bounds,
which is simply wrong. He should be testing 10-15+ GByte data structures not <
MB data structures.

Consider, O(X * Log (X)) is generally thought of as good. But, let's assume
the algorithm takes 24 hours + x * log(x) nanoseconds. Well for low values of
X that 24 hours is going to be a pain and it's going to look like constant
time, but for really really big data sets x * Log(x) will actually dominate.

------
kenjackson
There's been a fair bit written on the topic. One of the better papers that
has a parameterized model is here:
[https://www.computer.org/web/csdl/index/-/csdl/proceedings/f...](https://www.computer.org/web/csdl/index/-/csdl/proceedings/focs/1990/2082/00/089581.pdf)

I should note that this paper is more than 25 years old. :-)

------
StillBored
I guess the author is trying to simplify, but its way more complex than that.
Simply assuming a few layers of cache completely misses all the other layers
that have effects starting with.

Cache lines, RAM Read vs write turnaround, dram pages, number of open dram
pages, other CPU's interfering with the same RAM channel, remote NUMA nodes,
and probably some I'm forgetting. All this is very similar to secondary
storage access rules (even for SSDs)...

~~~
AstralStorm
Assuming each layer is at worst tree-like, but you can easily devise the upper
bound.

~~~
StillBored
Sure, but I think the point is that big-O notation fails miserably at
analyzing certain algorithms because it doesn't have a way to represent the
locality of the algorithm.

Put another way, all the little "constants" thrown away in the analysis may
not actually be constants, and their non-constantness may be enforced with
actual physics. In other words, like the article says, the idea that storage
access times are constant is nonsense. Due to physical limitations, this is
insurmountable rather than being a side effect of architecture. So, its quite
possible that for certain algorithms the "constant" factors may be overriding
terms in the analysis.

------
jlarocco
The article is conflating theoretical algorithm analysis and low level
implementation details.

Big O analysis is a theoretical measurement of algorithm performance. By
definition it ignores details like memory access speed, the exact instructions
used, and other details of specific hardware architectures.

Real life algorithm implementations obviously need to deal with those low
level implementation details, but that doesn't change the theoretical
analysis. It's easy enough to find (or design) machines without cache where
this difference in memory speed doesn't exist.

~~~
arielb1
"RAM+arithmetic"-style complexity analysis in terabyte-scale-and-above
problems is unable to distinguish between very practical and absolutely
impractical algorithms. "square-root ideal cache hierarchy + arithmetic" has
much better distinguishing power (for parallel programs, you also need to
remember to bound throughput, e.g. Bernstein's area*time, for similar
reasons).

------
maker1138
It's amazing how many people didn't actually read all 4 parts of the article.

His argument has nothing to do with caching or prefetching, etc.

First, it's about _random_ access. You can't prefetch a random fetch!

Second, he's measuring time, a perfectly valid thing to do. And the reality is
when you lay your memory cells out in 2 dimensions it takes order of sqrt(n)
time to fetch a random memory cell value, where n is the number of memory
cells you're using.

Third, it turns out order of sqrt(n) time is the best you can do even if you
had the best technology in the universe.

------
falcolas
I'm not sure the cost of accessing the storage medium belongs in the
complexity of the algorithm, since that cost will change based on the storage
medium, not the algorithm itself. It strikes me as more of a constant, (even
though it isn't constant).

Still, interesting read, nontheless.

~~~
malisper
In part two, the author demonstrates that if you packed information as densely
as possible, i.e. your computer is a black hole, the best you could do is
O(n^1/2).

~~~
falcolas
If, and only if, you're accessing that memory randomly. If you're accessing it
in a linear scan, that line is going to drop away from sqrt(N) very quickly.

~~~
yxhuvud
Yeah, but that is not an argument not to include that in the analysis of how a
function behaves when you throw lots of data into it. Also, the author
comments this in one of the parts.

------
jimminy
I find this really odd, it's not wrong, but it doesn't invalidate O(1). It's
mashing two-things together that are unneccessary and can cause
misunderstanding.

Big-O provides a decent tool for generic analysis and an understanding of
access times of memory hierarchies. Since memory hierarchies can vary, they
shouldn't be considered while doing generic analysis, much anyways.

Both are important to understand. The key thing is setting your Big-O access
expectations to the slowest level of your heirarchy. In that way, your
expectation remains generic and still proximally accurate across the average
cases.

When you consider them together, think of the heirarchy as a series of
piecewise functions that modify the value of the constant time based on the
speed of the bounds that fit your data.

This square of N notation falls apart in other cases. 128GB's of RAM would
have roughly the same access speed as the 8GB's he had available, if he had
that much in his system. But having 128GB of RAM would completely destroy the
squaring by flattening an entire magnitude from his hypothesis.

But it is a nice display of memory heirarchies, IMO.

~~~
benkuykendall
If you go on to read the later parts, the author explains how physics limits
random access to N bits to O(sqrt(N)) even without a memory hierarchy. This
has practical implications when comparing sequential access in an array to
random access in a hashmap.

------
hacknat
Nah. Sorry cache-misses don't count as part of a theoretical analysis on
complexity. Why? Because you're getting into specific access pattern
performance. Complexity is about "all things being equal". Is it the only
thing you should consider? At first it should be, then if you run into a
problem with a specific structure that has remarkable scale or access then go
ahead and consider what the underlying hardware might be doing with the
specific access patterns your structure is encountering.

It's interesting to see linked-list as his example, because it is the most
likely to have cache-misses as you move through it as the allocations are very
fragmented. I'd be very curious to see the same chart on a warmed-up hash-
table.

Also, if we're considering the hardware, can we take into account pre-fetching
and branch prediction? What are your numbers then? Yeah RAM is farther out
then the local caches, but the CPU is also not completely ignorant of what it
has to do next.

~~~
zeroer
Did you read part II of the series? It's a deeper analysis than you seem to
give him credit for.

His argument is that in this universe, the specific laws of physics we have
here enforce the property that memory access takes o(sqrt(N)) time [1] where N
is the size of your data set.

[1] - What's actually going on is that it takes o(N * sqrt(N)) time to touch N
bits of information. Since for any particular bit, it might be very close to
the processor and thus fast, but hitting N bits necessitates that most of the
bits you access are far away.

~~~
hacknat
I did read part II, and I think his analogy is off. Memory isn't linearly
limited (it _happens_ to be in the processor, but this isn't a given), I don't
even understand why he brings circles into his conception of memory. Memory
isn't linearly bound to the previous local cache. RAM isn't an order of
magnitude larger than the L3 cache, it's MANY of order of magnitude larger!
His analogy is just wrong. A better analogy would be library A takes X amount
of time to access its books and can store 100 of them, and library B takes 10X
amount of time to access its books, but it can store 1,000,000 books!

Yes as we increase in memory size latency goes up, but he never proves that
this is sqrt(N) (he correlates that it is). Each jump in up can be explained
by cache misses in each successive local cache, but RAM can scale more than a
few orders of magnitude beyond the L3 cache. He needed to keep his chart going
past 1GB to see that there is actually a plateau to be hit. If he had scaled
to 10GB and then to 100GB he would have seen access times be about the same
that they were for 1GB.

~~~
zeroer
Re-read the "The theoretical limit" section in part II. His arguments depend
on the physics in this particular universe, not how computers happen to be
built.

~~~
hacknat
My response was about his math and physics:

If you scale the radius of a sphere by one unit then you indeed get an order
of magnitude increase in volume (bits of info), but that's not the correct
model! We don't increase by on unit! RAM isn't a one unit increase in radius!
It's an order of magnitude increase! If you increase the radius by an order of
magnitude you get a 3 fold order of magnitude increase in memory. So you jump
up in latency by one order of magnitude and you get 3 orders of magnitude of
memory in return.

His analysis and math are wrong.

~~~
zeroer
You're totally missing his argument. If you scale RAM cubicly at any positive
density, as you are suggesting, eventually you will achieve the matter density
sufficient for your RAM to gravitationally collapse into a black hole.

If the density of your RAM is d, then the volume that a mass of M RAM takes up
is M/d. If it's arranged in a sphere, the radius is ((3M)/(4 _d_ pi))^(1/3).
Notice how this radius is a constant times the cube root of M.

Whereas the Schwarzschild radius scales proportionally to mass.

[https://en.wikipedia.org/wiki/Schwarzschild_radius](https://en.wikipedia.org/wiki/Schwarzschild_radius)

Thus, if you put enough mass of constant postive density together (no matter
how small that density is) eventually you get a black hole.

N.B. - The author's argument is a little more subtle because he's talking
about information density via the Berkenstein Bound, and he gets a square root
instead of a cube root. But the argument is the same flavor.

------
geophile
Why is this wrong-headed discussion top-rated on HN?

And why is there so much misunderstanding on HN of big-O notation wrt cache
misses lately?

All you kids, get off my lawn.

------
jandrewrogers
Closely related but unfamiliar to most software geeks, Bélády's work in the
1960s and later on the theoretical limits of operation throughput when using
cache hierarchies is very relevant to high-performance software design. The
theory generalizes nicely to any topology where you can control how access
latencies are distributed, and carefully designed software can get relatively
close to the throughput limits (though it is somewhat incompatible with the
way most software engineers design systems these days e.g. multithreaded
concurrency is a non-starter).

~~~
infinite8s
Do you think something like lightweight modular staging ([https://scala-
lms.github.io/](https://scala-lms.github.io/)) would allow fairly high level
code to get to the theoretical limit?

------
lorenzhs
> _At this point some of you may argue that the whole idea of Big-O analysis
> is to abstract architectural details such as memory latency. This is correct
> - but I argue that O(1) is the wrong abstraction._

No, your model is wrong. Others have already pointed out some issues with the
author's understanding of Big-O notation. However, this is a fundamental
misunderstanding. Big-O is a tool to analyse some function's asymptotic
behaviour, i.e., how it behaves when the input parameter grows versus
infinity. You have to put your model of cost into that function. If your
measure is time, and memory access doesn't take constant time in your model,
then you have to account for that in your cost function. You can just as well
use Big-O notation to describe the asymptotic space complexity of an algorithm
(how much memory does it need?). _O(1)_ has no special meaning - it's just the
set of all unary functions whose value stays below a constant, no matter how
large their input parameter gets.

The author is literally blaming his tools for his own misunderstandings.

~~~
AstralStorm
Blaming the authors for ignoring true memory models, more like.

The notation is worthless for big data use due to unrealistic machine model it
implies.

~~~
lorenzhs
No, the _RAM model_ is worthless for big data use, because it doesn't model
distributed computing at all. You need a different model - one that also
includes communication, and probably file I/O as well. Big-O notation is not
to blame if the thing you put into it models the wrong thing. Just assign a
cost α to initiating a connection, β for sending a word of data, and similar
for I/O. Treat those as variables, and you'll get O(local work + β
communication volume + α latency).

------
Symmetry
Thanks to the prefetcher a low-entropy access to memory, like reading the next
value in an array, will tend to happen in constant time. For a linked list,
tree, or other data structure where the location of the next access can't be
predicted easily by something like stride analysis then the author is correct.

------
tailrecursion
The author argues that a random access to memory is not O(1) but instead
O(root N) because of distance.

The easy reactive response is that with respect to algorithm design the size
of RAM, N, is a constant.

On the other hand for very high scaling factors, as input size rises the size
of RAM must also rise. In this way N can be thought of as a variable and that
seems to be what the author is thinking. Different algorithms will behave
differently as they are scaled to infinity and beyond.

I think the author's argument is interesting but maybe it's better to make new
models for time complexity analysis. I think Bob Harper's students have done
good work on this.

In addition to distance there is also the cost of selection, namely the muxes
and decoders, which would multiply the cost of access by log N.

------
justAlittleCom
I am sorry... but no, the article is interesting and well written, but it has
nothing to do with big O notation. Random access in memory is still in O(1),
it doesn't depend on the size of the data structure (I am assuming that is the
"n" the author talk about by pretending that a memory access is O(sqrt(n)).
Even if you have a very complex memory architecture with 15 caching levels,
spread all over the world, if you have a maximum of 5 day delay for accessing
your memory through the mail, it will still be O(1), because 5 day is
constant, it does not depend on the size of the data structure.

The "n" the author is really talking about may be the depth of the cache
hierarchy.

~~~
ricardobeat
What he tried to show is precisely that in practice it _does depend on the
size of the data structure_.

------
truantbuick
What the graph really seems to indicate is that time is only linear when
working within a cache size on the author's computer (remember that iterating
a linked list accounts for the gradual increase in between the cache jumps).
If the theoretical upper bound of RAM access was really the important factor
at this scale, I wouldn't expect it to be almost flat and to suddenly jerk up
every time we have to go to the next cache.

Assuming the author's O(sqrt(n)) is correct, it seems only relevant on much,
much larger scales.

In light of that, it really doesn't make sense to pollute the typical use of
Big O notation. It should always be understood to be just one metric to
understand an algorithm.

------
vvanders
Related, Herb Sutter's fantastic talk about arrays:

[https://channel9.msdn.com/Events/Build/2014/2-661](https://channel9.msdn.com/Events/Build/2014/2-661)
@ 23:30

------
chris_va
The black hole piece in part II was amusing, if you keep reading.

~~~
cshimmin
I am pretty sure it's wrong. I just made a comment about this on the author's
article. He claims information N~rm, but for any normal material you have mass
proportional to r^3. So N~r^4, not r^2 as the author claims.

Black holes and information is a thorny issue, see e.g. [1]. So I suspect
something went awry in the argument when (s)he brought quantum gravity into
it.

[1]:
[https://en.wikipedia.org/wiki/Black_hole_information_paradox](https://en.wikipedia.org/wiki/Black_hole_information_paradox)

~~~
sampo
You have a volume V1, and let d1 be the maximum density of information you can
store in V1. If you try to store any more, your storage will collapse into a
black hole, so d1 is the maximum density.

Now you get 8 times more data. But cannot story this data in a volume 8*V1,
because if you try to use 8 volumes V1 with density d1, and place these next
to each other, they will collapse into a black hole.

Thus, the more data you have, the smaller density you can use for storing it.

------
captainmuon
I think this way of looking at the problem is misleading. O(1) or O(N) always
stays O(1) or O(N), just the constant changes. You can always access any
element in RAM (on a SSD, HDD) in a bounded amount of time. Use that
pessimistic time as the time of one step.

Viewed in this way, O(N) is still O(N), and a processor with caches is a magic
device that somehow computes faster than O(N)... or for O(1) computes in sub-
constant time (if that can be even well-defined).

------
faragon
If I understood it correctly, the author links cache miss from memory
subsystem hierarchy to asymptotic complexity (big O), so if an operation for
fixing a cache miss takes higher time complexity, he takes that instead of
O(1).

Similar happens when you write an O(1) algorithm while relying on malloc(),
which is usually O(n log n), thus your algorithm is not really O(1), but O(n
log n).

~~~
lorenzhs
What's your _n_? The amount of memory allocated? In that case, I'd be
interested to see such an algorithm that takes time O(1) but touches O(n)
memory cells in the process...

Obviously you need to include the time your calls take in the analysis, but
your example seems ood.

------
greggyb
I think there is a key point in the FAQ (article four, all linked through the
series):

> You are conflating Big-O with memory hierarchies

> No, I’m applying Big-O to memory hierarchies. Big-O is a tool, and I am
> applying it to analyze the latency of memory accesses based on the amount of
> memory you are using.

As some others have pointed out, the line is crossing hierarchies of cache,
and that he is not looking at the big O of instructions. Both of these are
accurate, and the author is aware of this.

He is using the tool of big O analysis to measure a performance
characteristic. That characteristic is not the traditional number of
instructions or amount of memory utilized in the computation of an algorithm.
It is the latency for access to a random piece of data stored on a system.

There are two cases considered, the practical, and the theoretical.

At the practical level, we do not have a unified physical implementation of
the address space in a modern computer. This means that accessing a random
address in memory is an action that will most likely cross levels of the cache
hierarchy. It is well known that there are order of magnitude jumps crossing
these levels. Perhaps it is uninteresting to you, and the importance of cache
locality in an algorithm is something that you already have a very strong
handle on. That makes his observation of time-to-access a random address
trivial, but not wrong.

Big O tells us that a binary search is the most efficient search algorithm for
an array (constraint - the array must be sorted), but in practice a linear
search with a sentinel value across an unsorted array will be faster if the
array fits in cache. Keeping in mind the big O latency of random memory access
across cache hierarchy levels would be the theoretical analysis to tell us
this. The traditional big O looks at number of instructions. These are both
valid tools in choosing an optimal algorithm.

The second point the author makes is the theoretical limit. Assume the ideal
storage medium with minimum access latency and maximum information density.
This storage medium is matter. The limit of packing is the point at which you
would create a black hole.

With this ideal storage medium, you cannot pack an infinite amount of data
within a distance that can be traversed at the speed of light within one clock
cycle. For this colossal storage array, there are some addresses which cannot
be physically reached by a signal moving at the speed of light within the
amount of time that a single clock cycle (or single instruction) takes.
Accessing a random address is not a constant time operation, though the
instruction can be dispatched in a constant time. There is a variable time for
the result of that instruction to return to the processor.

At this theoretical limit, we would still end up with a cache hierarchy,
though it would be 100% logical. With a single storage medium and unified
address space, the cache hierarchy would be determined by physical distance
from CPU to physical memory location. Those storage cells (whatever form they
take) that can be round-tripped by a speed of light signal in one clock cycle
are the first level of cache, and so on. You could have very granular, number-
of-clock-cycles cache levels stepping by one at each concentric layer of the
sphere, or you could bucket the number of clock cycles. Either would
effectively act as a cache.

This theoretical exercise is an extreme limit, but bears out the practical
implications that our current physical implementations of cache hierarchy
exhibits in practice.

Again, perhaps these observations are trivial, but I believe they do stand up
to scrutiny. The key insight is that the performance characteristic being
described by big O is time, not the more traditional space or number of
instructions.

I think time is a valuable metric in terms of algorithm selection. If we think
about end users - they don't care that one instruction or 1,000,000,000 are
being executed. They care about how quickly work is done for them by the
computer. Instruction-based analysis can be a huge help in this consideration,
but so can time-based analysis.

Neither should be ignored, and neither invalidates the other.

~~~
yxhuvud
> Big O tells us that a binary search is the most efficient search algorithm
> for an array (constraint - the array must be sorted), but in practice a
> linear search with a sentinel value across an unsorted array will be faster
> if the array fits in cache.

From a cache perspective, a binary search is actually the most pessimal case.
If you want efficiency, look at b-trees optimized for memory hierarchies.

------
bastijn
Only after reading the last article of the series I checked the link to share
it. Only then noticed that I misread the heading on the blog. I read "I like
big tits" and though is this page hacked or something? The url corrected my
dirty mind :).

Great series. Even if you don't agree with the notation it has still valuable
information. Thanks author!

------
donrodriguez
Let me quote Einstein: "Everything should be made as simple as possible, but
NOT simpler!"

And that's IMHO exactly where the original author erred. But i find his
musings so incredibly funny and enlightening, that i will use them as a future
reference of how NOT to do an analysis.

He didn't just do an apple vs oranges comparison, but he essentially threw
eggs, potatoes and ham in the mix and tried to deduce an universal law from
his concoction by sprinkling some quantum mechanics fairy dust into the mix!
Hilarious!

Just by simply looking at his sloppy graph (Typical origin-shenanigans are
often a dead give away for the quality of an examination.) one should be able
to recognize the form of an underlying step function, as expected from a
multi-layered memory system (L1,L2,L3,RAM, ...) .

But NO, he envisions a Square-Root function, just by arbitrarly placing a line
in a logarithmic coordinate system. WTF?! Where is the fitting? And how does
he defend his conclusion? _drum-roll_ QUANTUM MECHANICS .... Muhahahaha! Great
show!

So essentially it's not cost that prohibits our brave engineers of increasing
L1 cache size ad infinitum, but because of quantum mechanics! Muhahahaha! Stop
it, my stomach hurts.....

~~~
angry_octet
I love that you have had the foresight to create throwaway accounts and leave
them for a year before using them to be insulting and indulging in pointless
theatrics.

~~~
donrodriguez
I'm sorry, that you feel that way, but instead of criticizing me as
"insulting" and "indulging" in "pointless theatrics" maybe you could explain
how and why his argument of "Myth of Ram" and O(√N) holds any merit.

Just by looking at his graphs you could argue, if it's about RAM he should
just look at the the part from 10MB to 8G, which shows an almost ideal
approximation to O(n log n). The L1 part is even better then O(1), which is
curious in itself.

So please excuse me if i'm a bit sarcastic about the "groundbreaking"
insights. "Myth of RAM" seems like an attention grabbing caption, and i
couldn't help to comment in an according way. No insult or harm ever meant. If
perceived as such i do apologize.

------
Double_Cast
Why is information within a sphere bound by m * r? Naively, I'd expect it to
be bound by r^3 or m * r^3.

------
fengwick3
If anybody is curious about the physics, the principle he described is pretty
similar to the Holographic Principle.

[https://en.m.wikipedia.org/wiki/Holographic_principle](https://en.m.wikipedia.org/wiki/Holographic_principle)

------
whack
It's a very interesting experiment/conclusion, but it rests upon one
assumption: The assumption that the entire dataset has been preloaded into the
L1/L2/L3 caches.

This assumption is a shaky one to make, and is easily violated. Imagine if you
have a hashmap that is small enough to fit entirely in L3 cache. However, most
of it has been evicted from the L1/L2 caches, by other data that the core has
been reading/writing to as well. Eventually, the thread returns to the hashmap
and performs a single lookup on it. In this scenario, the time required will
indeed be O(1).

So what you really have is a best-case-complexity of O(sqrt(N)), if your data
has been preloaded in the closest possible caches, and a worst-case-complexity
of O(1) if your data is stuck in an outer level cache/DRAM. Given that we
usually care more about the worst-case-scenarios, not the best-case-scenario,
using the O(1) time complexity seems like a reasonable choice.

Going back to the author's premise that the time-complexity of a single memory
access is O(sqrt(N)), not O(1), this is true only where N represents all/most
of the dataset being processed. If N represents only a small fraction of the
dataset being processed, and your caches are going to be mostly filled with
other unrelated data, then the time complexity is closer to O(1).

Clearly the O(sqrt(N)) is more accurate than O(1) under some circumstances,
but even so, it's not clear what benefit this accuracy confers. All models are
inaccurate simplifications of reality, but simple-inaccurate models can still
be useful if they can help in decision-making. Big-O analysis isn't used to
estimate the practical running-time of an application. For that, you'd be
better off just running the thing. Big-O analysis is more used to compare and
decide between different competing algorithms/data-structures. And in that
sense, whether you choose to model linked-lists/binary-search/hash-maps as O(N
_sqrt(N)) /O(log(N)_sqrt(N))/O(sqrt(N)), or O(N)/O(logN)/O(1), the
recommendation you end up with is the same.

------
bryanlarsen
Great article. It gets better, too, so make sure you read all 4 parts.

------
rdiddly
The library example is a bad one, since it leads to O(N) and not O(√N), a
conclusion that contradicts the thesis.

"In general, the amount of books N that fits in a library is proportional to
the square of the radius r of the library, and we write N∝ r²."

No, the number of books N is proportional to the area of the front face of the
shelving, not the area enclosed within the circle. Assuming all libraries are
the same height, that means N is proportional to the circumference of the
circle, which is proportional to r, not r². Meanwhile, assuming that all books
are reachable in the same amount of time by the librarian no matter their
height on the shelf, that means T ∝ r (as before). Since T ∝ r and N ∝ r, that
means T ∝ N or T=O(N).

~~~
yxhuvud
Eh, your library seems to have a very poor use of floor space. Place the
shelves well and you will definitely have something that can store books
proportional to the area of the library.

Hint for more efficient placement: You can put shelves away from the outer
wall in arbitrary positions within the circle. At that point, the assumption
that the librarian can reach any point in the circle at constant time is
blatantly false (as the books closer to the centre will generally be faster to
reach).

~~~
rdiddly
It's not "my" library, that's how it was defined in the article (Part II).

------
lsh123
The graph in the article shows the impact of L1, L2, and L3 cashes. If array
fits into L1 cache the access will be the fastest and then it degrades with L2
cache, then L3, then generic memory.

------
caf
_If you instead iterate through an array of size K you will only pay O(√N + K)
since it 's only the first memory access that's random. Re-iterating over it
will cost O(K). This teaches us an even more important lesson: If you plan to
iterate through it, use an array._

This is rubbish. Re-iterating it is the same as iterating it the first time:
if you array doesn't fit into cache, you're going to pay for pulling it from
further out into the memory hierarchy.

To anyone who doubts me: try it. Try iterating an array that fits entirely in
L1 many times, then do the same with an array that has to be pushed out to
swap. The slowdown will be considerably worse than linear.

------
joseraul
The theoretical discussion is interesting, especially the circular library
that gives some intuition of the square root law.

But in practice, you usually know the order of magnitude of your data, so
access is rather O(1), for some constant that depends on the size of the data.
Jeff Dean's "Numbers Everyone Should Know" quantifies this constant.

[http://highscalability.com/numbers-everyone-should-
know](http://highscalability.com/numbers-everyone-should-know)

------
haddr
I think that at some point this O(n * sqrt(n)) is actualy not precise. Maybe
it works for the first few GB, but then other mechanisms come into play.

For example processing 100GB of data actually don't have to be O(n _sqrt(n))
because if you process it on cluster, then other machines are also using L1,
L2, L3 caches and RAM. Then the whole process can be streamlined which means
that some operations can be faster than the pessimistic n_ sqrt(n).

~~~
Jweb_Guru
Unfortunately, you now have to deal with network transfer between the
machines, which are even more expensive than RAM access.

~~~
haddr
Definitely but then is is still n * sqrt(n)? Do I have to add sqrt(n) to
_each_ n i'm processing? I doubt so.

I think the problem is that at each stage there is different component
(sqrt(n)) that comes into play.

~~~
Jweb_Guru
There comes a point where the latency is high enough that you're very
unwilling to pay for random access. And very often that point is "broadcast
over a network," where think time _really is_ often dominated by things like
the speed of light. So yes, it's still n * sqrt(n) in some abstract sense
(where n is the amount of theoretically available memory within a given
distance over the network); in practice it's much _worse_ than that because
you never come close to saturating all the available space between computers
with storage nodes.

------
chongkong
Isn't it log(N) instead of sqrt(N)?

~~~
lorenzhs
See
[https://news.ycombinator.com/item?id=12385458](https://news.ycombinator.com/item?id=12385458)

------
wyager
"I can vaguely fit a line to this graph that's clearly nonlinear, so that line
describes the asymptotic complexity of the system."

Huh? Am I taking crazy pills, or is this a horrible analysis? It looks like
the behavior is O(whatever it's supposed to be) times a constant multiplier at
a few different regions. The OP conveniently cuts off the graph so you can't
see it level off.

~~~
pimterry
No, it will never level off.

You'll always eventually expand past the limits of whatever storage you're
using. It's clearly impossible to build _totally_ unlimited storage with
constant access time, and you'll always eventually need larger storage (to
disk, to other machines, to other data centres, etc), at the cost of speed.

Take a look at [part two]([http://www.ilikebigbits.com/blog/2014/4/28/the-
myth-of-ram-p...](http://www.ilikebigbits.com/blog/2014/4/28/the-myth-of-ram-
part-ii)), where he runs through this in more concrete depth to back his
claims up. The theoretical physical limit for accessing N bits of data is
O(sqrt(n)), no matter how you do it, just from the speed of light.

It's counter-intuitive, but this really is true in the general case, not only
in the regions shown on that graph.

~~~
vidarh
I don't think it's all that counter intuitive. You just need to consider that
information transfer is limited by speed of light - it naturally follows that
access times must grow with size of data unless you are able to pack the
information infinitely dense.

It's a great series for reminding us of that, though, and illustrating it well
and actually putting numbers on it.

~~~
wyager
If this were the only factor, access time would actually grow with the cubic
root of N, because you could arrange memory in a sphere, which grows with r^3.

The ultimate theoretical limit is the berkenstein bound, which implies that
the information content of a region is bounded by its area, not its volume.
This is where r^2 comes from.

~~~
vidarh
I get that there's additional constraints - my point was simply that
intuitively even without thinking through or knowing about additional ways the
communication is constrained, you'll arrive at the necessity of an increase in
latency just with the knowledge of the limitation of speed of light alone

------
bjd2385
Now I wonder what would happen to our time complexities if we were near a
black hole...

------
Skunkleton
To me, all this article has show is that depending on the size of a data
structure, you will need slower and slower memory. We already know that. The
article shows that within the bounds of a particular type of memory the access
time is mostly constant, which is exactly what O(1) means.

------
grabcocque
The Myth of RAM is that you need to have lots of it, but it's bad to use it.
Because that's 'bloat'.

~~~
pixl97
Why not both?

RAM should be filled with 1) Things that are likely to be reused or 2) Things
you want faster access to, or at least a constant fast access time when
needed.

RAM should not be filled with 1) Things you will never use again (for example
log files, or things that are very quickly invalidated by synchronous writes
but rarely read) and 2) bloated data structures. For example if you have data
format _a_ and data format _b_ that achieve the same thing, but _b_ takes 30%
more space at the same speed, _b_ is very bad and should not be used.

------
otterley
Editors, can you please date this submission? It's from 2014.

------
dingo_bat
My laptop has been frozen for half an hour now after running the benchmark
from the article :(

------
fractal618
> And so we come to the conclusion that the amount of information contained in
> a sphere is bounded by the area of that sphere - not the volume!

mindblown.jpg

------
known
"You'll know that iterating through a linked list is O(N), binary search is
O(log(N)) and a hash table lookup is O(1)"

Apples and Oranges? You'll select the relevant data structure depending on
your application needs.

------
solarexplorer
Something that the author seems to be missing is that traditional complexity
analysis (with mathematical proofs etc) is done for Turing Machines which have
one-dimensional memory (an abstract tape), and reachable memory is linear with
time. Current microchips are two-dimensional, so reachable memory increases
square with time. If we had three dimensional memory (stacked chips?), then
reachable memory would increase cube with time.

It all depends on what kind of machine you are talking about...

~~~
lorenzhs
What? No! (Sequential) algorithms are usually analysed in the Random Access
Machine Model (RAM model). That's a machine that can access any memory
location in constant time, and perform arithmetic operations in constant time.
See [https://en.wikipedia.org/wiki/Random-
access_machine](https://en.wikipedia.org/wiki/Random-access_machine)

A Turing machine can only go left or right _one cell in each step_ , so e.g.
comparing two strings for equality takes O(n²) time as you have to go
character by character, moving the head over a suffix of the first word and a
prefix of the second word for every character.

