Namely, the amortised time complexity of dynamic lists.
Amortised analysis treats operations not as single events but looks at the time complexity over the span of many operations (through something called "Accounting"). An initial "investment" of array over-allocation will be amortised by inserting, but only over time. Inserting into an ArrayList will be in O(1) given enough insert operations.
Essentially, you over-allocate in order to save time. You're trading time complexity for space complexity.
A really good explanation on amortised analysis can be found here on Wikipedia, which explicitly treats ArrayList:
As always, a look into the source code of your language of choice helps. In Python, a list object over-allocates using a very peculiar, but finely tuned formula: https://github.com/python/cpython/blob/09dc3ec1713c677f71ba7...
e.g. arraylist: [ ]
Even though there is space left in the array, we still need to move all the values one index to the right to be able to insert at the front, which takes O(n) time. In contrast, inserting at the front of a linked list is simply a matter of moving pointers and therefore constant time.
But it is helpful to keep the terminology clear, because an insert in the middle of a linked list is still O(1), but inserting into the middle of an array will require moving some fraction of the data.
The OP is discussing inserts at the start just because it is the worst-case scenario for arrays, then a positive result will still be true if some of the inserts are in the midddle or even near the end.
This is true if you already have a reference to the middle of the list. If you don't (say, because you want to insert while preserving the fact that the list is sorted), then inserting into a linked list is O(n) just like the array is.
If you give metrics for the case that should be worst for your argument, people give you shit about using such a stupid example.
There's no way to avoid someone giving you grief no matter what you do.
I think the biggest takeaway from this isn't that big-O can fool you (which it very well can), but that misunderstanding analysis can lead to poor decisions.
A better argument for big-O fooling you can be found using Quicksort and any proper O(n Log(n)) sorting algorithm. Quicksort will outperform most in practical cases, but is technically O(n^2) in the worst case (average case O(n Log(n)).
The Python growth allocation doesn't look peculiar to me. It's just calculating exponential growth without floating point operations. Whether you over-allocate in 50%, 25%, 12.5%, or whatever proportional increments (all trivial to calculate using bit shifts), they are all enough to make dynamic growth of the array O(n) over time. Any exponential growth will do. If anything, Python's list growth looks to me like it trades off slightly too much time to save space. Actual results will vary depending on realloc implementation etc of course.
Would love to get Ierusalimschy and van Rossum in a room together and let them talk it out.
If you insert into arbitrary positions, or the front of these particular array lists, it will be O(n) even ignoring the growth issue, because every element has to be copied over 1 spot.
He's not benchmarking an O(1) operation against an O(n), or anything surprising, or even pointing out a "hidden" O(n) operation, it's simply a demonstration that O(n) + O(1) = O(n).
The main point is that the big O notation ignores the cache misses.
- try writing a quicksort as a template to avoid the comparison function calls
- in the sorting loop, switch to an insertion sort when buckets have <= 32 elements
You will see speed improvements of 2 to 3 times faster than a vanilla implementation of quicksort.
Although this approach is slightly more complex on the abstract level, it's faster because it reduces lots cache misses.
O(n) + O(n) = O(2n) = O(n) = O(n + 1) = O(n) + O(1)
Cache performance is just one example of a constant factor, right?
The point is to use data that was recently fetched into the (faster) cache memory as much as possible instead of incuring the penalty of a cache miss
Cache performance really depends on memory access patterns.
Anyways :) I'm being pedantic here, I should probably go back to work.
Yes, random access (missing the cache every time) may be 100x slower than sequential (hitting the cache almost always), but if you're iterating through an array twice as large, it will still be 100x slower.
I don't know about you, but I don't sort 100k records in a single batch, and if I am it's because I messed up. But I might sort different batches of 100 records 1000 times a minute.
If your software engineering begins and ends with Big-O notation, your likely doing a terrible job. Measure, and test. Always.
As a field, do we measure and test the effect of using various programming paradigms, patterns, and coding standards? I don't think most companies do this. Most developer decisions are made by "gut." As a field, we are still more like the "before" of Moneyball than the after.
This seems to be entirely forgotten by the "measure, test" crowd - the scientific method requires a hypothesis to test for measurements to be meaningful, and separately not all things that are produced are "tested" in every aspect, since many things are well established, and the application of that existing theoretical framework makes measurement and testing a waste of time.
Having a theoretical understanding of the thing you care about is far more important than the "measure, test" step - which is certainly important, but by itself of really limited power.
i.e., it's not about checking your hypothesis, it's about checking the method by which you formed your hypothesis.
Checking your hypothesis is fine, but of value only to reject, not to actually find a hypothesis you can accept.
That's the simplistic grade school version. In practice, characterization and measurement is part of a feedback cycle that is used to develop and refine hypotheses. Experiments need a hypothesis to test, but measurements do not require experiments. Where the "measure, test" crowd usually goes off the rails is in not entering that scientific feedback cycle after the initial measurements and instead jumping to conclusions based on gut feel interpretation of them.
However I disagree that measurements have any meaning unless they are conducted as experiments, or potentially as exploratory work looking for an experiment to conduct. Even such exploratory work needs to be driven by at least the loosest of hypotheses, since measuring everything is infeasible. So this is just as much an experiment, just one you don't have a high expectation of predicting the outcome of.
Either way, this recognition of the necessity of at least a minimal hypothesis in all of these scenarios is typically lacking is my fundamental point.
The cycle of revision of that hypothesis is clearly the next step and, as I say, I thought clearly implied.
> Either way, this recognition of the necessity of at least a minimal hypothesis in all of these scenarios is typically lacking is my fundamental point.
That's an overly broad definition of hypothesis that results in every decision-making process fitting the model. Even the "measure, test" people you were complaining about have that. They formulate a "minimal hypothesis", take measurements, fix what they see, and call it a day. What they don't do is iterate on the process and get down to the root causes.
Furthermore, in some closed systems measuring everything (at some level of granularity) is feasible. It leads to a troubleshooting cycle of continuously focusing the measurement towards an indeterminate point, with the specific point being the answer. This kind of fault isolation is much more productive, in general, than continuously developing and testing hypotheses about what specific things might be an issue.
You don't have to start with measurements. You can often do some analysis before you have anything to test, and that may save you a lot of time measuring, testing and tinkering with something that doesn't have a prayer of working. If you are measuring the environment that the system will interact with, you have already done some analysis in determining what to measure, and developed some theories about what is going to matter; that is hard to do without some causal reasoning. If your tests show you have a problem, you need to think about what is going on in order to find a solution.
There's a curious anti-intellectualism in software development that is opposed to any suggestion that thinking about problems is useful. It seems to go hand-in-hand with a simplistic world view that only sees dichotomies, together with the assumption that there can only be one answer.
I'm not making one. I'm saying that measurements don't require an experiment to be meaningful.
> You don't have to start with measurements.
I didn't say you did, I said you can. The person I was responding to claimed otherwise.
> There's a curious anti-intellectualism in software development that is opposed to any suggestion that thinking about problems is useful. It seems to go hand-in-hand with a simplistic world view that only sees dichotomies, together with the assumption that there can only be one answer.
As someone who constantly rails at the lack of engineering rigor in software "engineering", I agree completely.
(the business people always pick the worse solution that sounds better unless you can explain why they should care, in english, meant for normal human beings, without sounding condescending about it)
The crossover point where it makes sense to use the O(n) algorithm will occur when:
200 * n = n lg n
which occurs for n > 2^200. Most of us are not dealing with problems larger than can be represented in our universe, so cache efficiency matters, kids.
Note that this changes for other comparisons. Going from n maximally cache inefficient vs n^2 cache efficient is only worth it for n up to 200, and n lg n --> n^2 up to n about 2223.
Of course, in practice your algorithm won't be missing the cache on every access, so reality is somewhere between these values.
Yes! All abstraction has a price. This is what people really mean when "all abstractions leak." Basically, all code and runtime features have a cost in terms of developer resources, cpu, memory, etc. The "inner game" of development isn't being able to muster huge amounts of "cleverness" power. The "inner game" is being able to put whatever resources you have to the best possible use.
There's plenty of abstractions which don't actually cost anything, and there's room for even more.
Abstractions leak because they are models, and thus are necessarily different from the concrete things that they model. These differences can and will cause problems. Generic programming is an abstraction that pretends you aren't working on low-level un-typed system. And you pay for that abstraction any time you need to deal with low-level demands through that abstraction.
Not really with newer generics implementations (.NET, Rust, etc.). I don't think anyone thinks C++ templates are ideal, and the others show this at least a non-essential cost.
> Greater number of functions in binaries, and thus more unique places for bugs to occur.
Less human written code, so this reduces the surface area to the compiler; which is an "issue" no matter how low level your code is.
> Dynamic library bloat.
Your alternatives are A. duplicate the classes yourself B. let the compiler duplicate them. Where's the bloat? C++ requires you to explicitly instantiate templates if you want them in a library, and usually they're just left to the header. The others I listed won't generate anything unless you actually use a particular instantiation.
> developer time is spent dealing with all of this
It's pretty hard to argue that more developer time is wasted by the compiler writer than that of the time saved for developers who use the language for such a broadly used feature.
> Abstractions leak because they are models, and thus are necessarily different from the concrete things that they model.
What concrete thing do generics model? Tediously copy-pasted code? These will only be different if you mess up when you do the copy-pasting.
> Generic programming is an abstraction that pretends you aren't working on low-level un-typed system.
What? It's an abstraction over an already typed system that obviates the need to duplicate code, which after the generics are unwound produces the same code you would have had anyway. You're just as far or close to the low-level un-typed system as you were before.
"All abstractions leak" is a nice platitude, but handwaving about "models" and "concreteness" doesn't prove anything.
>It's pretty hard to argue that more developer time is wasted
That's NOT what I'm arguing. I'm saying that it takes a different amount of time. Hence the concept of "trade-offs", AKA "cost."
With that said...
>Your alternatives are A. duplicate the classes yourself B. let the compiler duplicate them. Where's the bloat?
C. Monomorphise at runtime.
>Less human written code, so this reduces the surface area to the compiler; which is an "issue" no matter how low level your code is.
Yes, but it's a different issue. Again, there are trade-offs.
>What concrete thing do generics model? Tediously copy-pasted code?
No. They model the behavior of the algorithm as it physically exists in a machine.
>What? It's an abstraction over an already typed system that obviates the need to duplicate code, which after the generics are unwound produces the same code you would have had anyway. You're just as far or close to the low-level un-typed system as you were before.
Generics are part of your language. They do not abstract over the language. They abstract over the behavior of your program. It's just a feature of the language that does so with greater generality that the rest of the language.
And the code you "would have written anyway" is machine code. Which you are abstracting over. Generics allow you to pretend your algorithm doesn't need a fixed machine representation. The fact that you can produce two different machine representations for the same template code is an example of what I'm talking about, not a counter example. The abstraction leaks. You write one algorithm, and it has to be compiled into two different representations. That's the cost of the abstraction. You get increased generality at the cost of memory and time.
Most generics systems allow the code to be unfolded into generic free code; this is how templates are usually compiled in C++. So in a material sense the compiler treats templates as syntactic sugar added over C++ without templates (which is a valid language, which the compiler uses internally).
> C. Monomorphise at runtime.
A lot of generics (like my complex number example) can't be monomorphized at runtime since floats and doubles don't have dynamic dispatch in these languages. Not to mention the loss of type safety even if you implemented it this way. So that is not equivalent.
> And the code you "would have written anyway" is machine code.
That's not analyzing the cost of generics, that's analyzing the cost of the whole language, and then putting it on generics. If that's fair, why wouldn't we add the cost of designing and operating computers instead of doing it on paper? There's nothing special about machine code here.
> The abstraction leaks. You write one algorithm, and it has to be compiled into two different representations. That's the cost of the abstraction.
If (as in my example before) you need complex numbers with floating point numbers, and complex numbers with doubles, the compilation will likely be faster because the compiler doesn't need to take two parallel implementations through the whole parsing process, and can instead generate the necessary ASG itself. If you use on only one, then maybe there is a detectable differential cost.
Also, how is that even a leak? That's the abstraction working absolutely perfectly and giving you exactly what you intended.
I think the problem is not so much the big-O notation, but that the underlying assumption that the data access is from an "all main memory" model with no cache or secondary storage is often forgotten.
For example, in database algorithms, all memory operations are often considered to be unit cost since they are an order cheaper when compared to accessing the disk storage?
The field of cache oblivious algorithms is focused explicitly on memory hierarchy, and accounts for cache misses.
If you're using solely big O to decide on an algorithm, you're fooling yourself. If performance is an issue, profile, benchmark, study, don't guess.
Don't let Big O notation fool you, don't misunderstand Big O.
5 means "write five times then read once per iteration" not "write five times".
This is not true at all. You'll be able to access it about as fast as the memory can operate - it'll still be much faster than randomly chasing pointers all over the place - but the maximum processing speed of the CPU is an order of magnitude greater than that again.
Arrays are still generally faster than lists.
The funny thing is Microsoft’s linked lists are faster than C++ standard vectors.
If I had to guess, it's because the std::vector is more conservative in memory use and it causes more malloc/array copy calls.
The elements are created directly adjacent to each other. This makes iteration faster because RAM locality despite the pointer-based data structure.
Linked lists are fast for inserting/deleting/getting at the beginning and end, but slow for random access for anything in the middle.
So far, that's what the article covered. Now if you want fast random insertions/deletions and fast random access, this is possible with balanced trees. A tree-based list can support all single-element operations in O(log n) time. Sample code: https://www.nayuki.io/page/avl-tree-list
The reason for this abnormality is cache locality (there are other reasons, which I will not go into right here).
Balanced trees can be pretty slow in fact. After some operations, a tree structure can become quite fragmented in memory, leading to many cache misses. In my experience, it is often faster to work with arrays instead of trees when processing in memory. However, using external storage, a b-tree often introduces quite some performance gain.
How often does it matter that non-contigouous memory access is slow? Really. How much do those few useconds really matter? In most apps, I would guess that a CPU cache miss isn't noticable by humans.
Yes, non-contiguous structures are significantly slower, but if you don't need to be as fast as possible, eliminating them for that reason only (assuming that there is any non-perf reason to stick with them) is a premature optimization.
But if you ARE optimizing, yeah, you need to think about how often worst-case occurs. Because big O only tells you about worst-case. NOT the average.
I work in games, so the story I tell new kids is: The PlayStation2 ran at 300Mhz and a cache miss would cost you 50 cycles. The PlayStation3 ran at 3200Mhz and a cache miss would cost you 500 cycles. So, if you are cache missing a lot, your PS3 game will run as if it were on a PS2.
In other words, not paying attention to cache make your computer run like it's 10 years older than it is. You paid for a modern machine, but you are getting the results of a Craigslist junker. This is true outside of games. It's the reason 4x2Ghz cellphones struggle to run seemingly simple apps. It's a big part of the reason people struggle to orchestrate armies of servers (distributed computing is easier when it's 90% less distributed).
Is it really harder to work with the cache system instead of ignoring it? Yeah, it requires a tiny bit of study an a little bit of planning. In contrast, the theme I see a lot online is to completely dismiss physical reality in favor or theory. And, the theme I see almost universally in the students (and many senior engineers) I interview is complete ignorance of the very existence of cache and it's effects on the code they write. It's very concerning...
However, in 90% of applications, it's not going to matter, because those applications are spending hundreds of cycles waiting anyways: Disk or network IO, user input, all that stuff is way slower than a cache miss. If you're writing a video game, or a database, or other software with very high soft-realtime speed requirements, or heavy data access, by all means, optimize to avoid cache miss.
But if you're writing a company-internal Rails app, nobody's going to notice, even if you're getting cache miss after cache miss. Which you probably won't.
Actually, if your language isn't compiled, a cache miss is the least of your worries, perf-wise.
And now I've got to see if I can optimize my code to avoid cache misses. But the code's in Scheme, so unless the initial access cost is amortized, I'm already doomed...
Most of the dynamic languages are data and instruction cache-miss machines. They chase objects and pointers all around the memory.
...That doesn't sound like a cache miss. Knowing Android, A cache miss is probably the least of your worries.
>Users may need a second to click or touch a button, but when they do the software should react instantly, and that does not leave you that many cycles.
You raise a good point...
>Most of the dynamic languages are data and instruction cache-miss machines. They chase objects and pointers all around the memory.
...and this is part of my point. If you look at a problem and think, "A high-level language is fast enough," then you are implicitly saying that the latency of a cache miss is acceptable. And IME, in most cases that's true. I mean, heck, I'm using Scheme, so while I may have pointer chases like the Amazon has trees, but I CAN optimize them into array lookups, and my code is compiled: not great, but better than most HLLs.
It's the same argument as always: perf vs. development speed. You can be in the C and FP loop, or the Lisp and JS loop.
My example was meant to illustrate the user input problem. From what I know about Android, the absymal performance is very much a case of "death from a thousand cuts".
> It's the same argument as always: perf vs. development speed. You can be in the C and FP loop, or the Lisp and JS loop.
The fast(er) languages we have are old and full of warts, and that makes them slow to develop in. The heavily used HLLs such as Python and Ruby were made by people who did not care much (at all?) about performance, and it shows in many design decisions. But here's the thing: we could have both at the same time. I don't buy this dichotomy.
That's actually not true: OO, dynamism, late binding, and a lot of the other things that HLLs have to offer require a lot of pointer chasing and non-consecutive datastructures. I'm mostly a Schemer, and Scheme and Lisp have had decades of research put into making them compile and run fast. Most dynamic languages aren't so lucky. But the required pointer chasing and garbage collection mean they'll never be as fast as C.
Functional programming languages, however, are rarely late-binding, and don't expose as much about their implementation, so some of the pointer chasing can be avoided.
Rust doesn't need a GC, and is fairly C-like - or rather, ALGOL-like and BLISS-like - with added memory safety. So with a programmer who knows what they're doing, it can be pretty fast. But here's the rub: the faster a language is, the closer it has to be to the metal, and the less it can do with high-level features.
So yes, you can make HLLs faster, but you can't take the cache misses out of an HLL, and you can't make a systems language wearing an HLL's clothing - although Rust is making an admirable attempt.
None of these are required for ease of development. At least the first two often result in precisely the opposite.
HLLs are not required to focus on slow abstractions. For example, homogeneous arrays of tagged unions can replace inheritance most of the time. And they avoid breaking your code in 10 files and 20 classes (though for some reason this metric is seen as a good thing way too often).
And while I don't set much store by inheritance, it set considerably more store by duck-typing and polymorphism.
>None of these are required for ease of development. At least the first two often result in precisely the opposite.
So are you talking about Java-style OO? Because I was talking about Smalltalk OO, which is pretty different.
Required, no, but they're tools, and they come in handy. Certainly make development a lot more comfortable.
And some of the time, they make delvelopment a lot easier.
This is precisely the kind of thing that leads to unmaintainable "magic" code, even though it can still be useful but with extreme moderation. So I don't see the point of making that a core feature of any language.
If you have any example to the contrary, I'd love a link to a good open-source project that uses these things extensively.
As for not seeing the point of making it a feature of your language, how about Lisp? And I'm not just talking macros. Lambdas, late binding, and in Scheme, the ability to rebind anything lead to a lot of cool tricks and capabilities.
And late binding is extrordinarily important.!
Well I guess it's good at making CRUD web sites. Hardly rocket science.
Tinyclos is a fairly sophisticated implementation of OO and the MOP, written in Scheme.
Its descendants, COOPS, GOOPS, and others, are in most schemes today. Many of them are written in their respective dialect of scheme, with little or no specific compiler support.
SXML allows for writing XML in native scheme syntax.
The anaphoric macros (aif, acond, etc.) are all, well, macros, and thus use metaprogramming principles.
tclOO, [incr tcl], and other OO TCL systems are usually implemented in regular TCL.
Give or take, any large Lisp or Smalltalk codebase takes advantage of dynamic typing, late binding, and some form of metaprogramming.
However, you've made it clear that you hate Ruby, Dynamic Typing, and other such things, as given as much of metaprogramming requires this sort of flexibility, I very much doubt anything I say will convince you that dynamic languages are in any way useful.
All your examples are programming gimmicks, and I've yet to see stuff that solves actual hard problems. I'm not interested in programming for programming's sake. I want to use it to make my computer do useful stuff.
Maxima is a descendant of the original MACSYMA, developed at MIT. It is still a usable and viable system, even if it has ageda bit.
Emacs is a popular programmer's text editor written in C and a dialect of lisp.
Both of the above programs are large, useful, and written in an HLL - one particularly amenable to pointer chasing, I might add - and they make use of the variety of abstractions which that HLL provides.
If those aren't modern enough for you, check out some Clojure applications.
There is at least one area where even a split micro will cost you millions.
P.S.: and even in consumer apps there microseconds here and there add up and then user will have overall sluggish experience.
But I do agree that even if you don't have to worry about cache latency for every single thing you write, it is something you should know about.
I think multi-threading is an equally important skill, that gets less attention.
Look at the code; termination of the loop is even based on an integer count pulled from the container.
A True Scotsman's linked list has no such thing. It's either an empty indicator (like NIL in Lisp) or a binary cell consisting of an item and a pointer to the next one.
The benefit of that abstraction is that you can recurse over it directly without clumsy additional parameters having to be passed.
Another benefit is substructure sharing. We can insert at the front of a list not only in O(1) time and that is great. But perhaps more importantly, existing copies of the list before the insertion do not change. And it is the same way if we delete from the front: we just move the local head pointer to the previous node, which doesn't affect anyone else who is still holding on to the pointer to the original front node.
These lists also allow lock-free operation, unlike "heavy weight" containers. Suppose we have a list that acts purely as a container, but is shared by multiple threads. We can insert into it by consing a new node onto the head of the current snapshot of the list, and then doing an atomic-compare-swap to install that head in place of the old head. If it fails, it means the list changed behind our back; we cons the node onto the new list (or rewrite the cons to point to the new one as its "rest" pointer) and try again.
Some of the caching benefits of the array will disappear if the array holds only references/pointers to items. In this example, the containers are typed. The List<int> actually can allocate the int objects in an array that packs them together in memory. Whereas the LinkedList<int> has individual separately allocated nodes which hold the int. Suppose the List and LinkedList hold pointers to heaped objects. Then the impact of caching is softened. It's still the case that multiple pointers in the array can be cached together; but these have to be traversed to access the items themselves. In the case of the LinkedList, we have to traverse a pointer to get to the next node and traverse a pointer to get to the heaped object. But two pointer traversals versus one is not as bad as one traversal versus zero. If the objects being traversed are fairly complex, and have pointers to additional objects that have to be examined during the traversal, it matters even less what kind of list they are accessed from. If I have a list of File objects, for each of which a stream is opened and a regex scan performed, why would I care whether it's an array list or a linked list.
The results shown in this test case tell me that the performances of the two containers are not that far off! Here they are subject to a test case that is designed to highlight the difference between them by making the container content, and its processing, almost as trivial as possible. That 38 seconds versus 51 difference is almost purely in the container-related operations. That is as bad as it gets: from there, the more actual real work you do per container node, the smaller the actual difference. (What is 51 versus 38 in "orders of magnitude"? Why 0.12 orders. It's 0.42 "binary orders of magnitude" (where 1 binary order is a doubling; terminology mine). So in terms of classic Moore's Law (speed doubling every 18 months), that's a 7.6 month advance. "My arrays are 7.6 months ahead of your linked list container, in Moore's Law, in a pure benchmark; eat my dust!"
If you don't want to go that route, most modern browsers have a reader function built in, which extracts the page content and displays it with high contrast, a readable font, and no annoyances.
What would you have preferred the author do differently?
1. causing resentful displeasure; highly irritating, angering, or annoying:
offensive television commercials.
2. unpleasant or disagreeable to the sense:
an offensive odor.
3. repugnant to the moral sense, good taste, or the like; insulting:
I don't know how long it will take us till we finally stop with the white backgrounds...
But all in all it is a good page for people with poor vision.
No low-contrast and no white background, which are both considered bad on screens.
It is also worth noting that some manuals say that appending to an array list is O(1) amortized. Which is true, if you make an amortized analysis (which essentially distributes the workload of copying the array into a larger block over all the inserts). That's to keep in mind for systems that have to be realtime or at least produce stable framerates. The worst-case is important and amortized analysis generously glosses over it.
EDIT: Not inserting, appending
At an insertion you'd have
(n + (n-1) + .. + 1)/n
= O(n^2) / n
The first part comes from each possible run times, each with probability 1/n. You might have to expand the list but that'd also be O(n) amortized.