Hacker News new | past | comments | ask | show | jobs | submit login

In no so many words, the article is pointing out that O(n) + O(1) is not necessarily quicker than O(n) + O(n): both add to O(n) (since only the highest-order factor matters), and the constant factor is unsurprisingly better for array lists.

He's not benchmarking an O(1) operation against an O(n), or anything surprising, or even pointing out a "hidden" O(n) operation, it's simply a demonstration that O(n) + O(1) = O(n).




Of course this was a single example.

The main point is that the big O notation ignores the cache misses.

For example:

- try writing a quicksort as a template to avoid the comparison function calls

- in the sorting loop, switch to an insertion sort when buckets have <= 32 elements

You will see speed improvements of 2 to 3 times faster than a vanilla implementation of quicksort.

Although this approach is slightly more complex on the abstract level, it's faster because it reduces lots cache misses.


Surely the main point is that

    O(n) + O(n) = O(2n) = O(n) = O(n + 1) = O(n) + O(1)
and the constant factors left out of O() often dominate execution time.

Cache performance is just one example of a constant factor, right?


Cache performance is not constant, if it was, we wouldn't even consider it when optimizing.

The point is to use data that was recently fetched into the (faster) cache memory as much as possible instead of incuring the penalty of a cache miss

Cache performance really depends on memory access patterns.

Anyways :) I'm being pedantic here, I should probably go back to work.


Memory access time is a constant multiplier, whether it hits or misses the cache. There can be a big difference in that constant depending on the access pattern, but it is still considered a constant factor.

Yes, random access (missing the cache every time) may be 100x slower than sequential (hitting the cache almost always), but if you're iterating through an array twice as large, it will still be 100x slower.


I agree, on the same machine, the slow-downs are constant for each level of cache and for ram access.


One of my favorite ACM papers was an analysis on how the asymptotic runtime behaviors of various sort algorithms didn't tell the full story, and that the faster algorithms took a pretty big data set before they even broke even with some of the other sort algorithms. In fact I think at even 10,000 records it was still about 30% slower than a 'worse' algorithm and you had to get up to sorting a considerable amount of data before it was 'best'.

I don't know about you, but I don't sort 100k records in a single batch, and if I am it's because I messed up. But I might sort different batches of 100 records 1000 times a minute.


Not necessarily. You might want to create an algorithm that iterates in a sequential fashion through memory, since your memory controller will see the iteration and prefetch the next block of memory. This is a huge performance win on larger data structures.


Also, Big-O notation is a theoretical construct for thinking about complexity that works well for academic proofs but glosses over many real world considerations. Those ignored constant factors can be large and the datasets are not always large enough to amortize their cost.


Furthermore amortization isn't always accounted for in Big-O. Inserting into into the head of a linked list is O(1), and for a vector it is O(n). But on modern hardware vector inserts are much faster.

If your software engineering begins and ends with Big-O notation, your likely doing a terrible job. Measure, and test. Always.


I would say, "If your software engineering begins and ends with [jargon] you're likely doing a terrible job. Measure, and test. Always."

As a field, do we measure and test the effect of using various programming paradigms, patterns, and coding standards? I don't think most companies do this. Most developer decisions are made by "gut." As a field, we are still more like the "before" of Moneyball than the after.


There's a third option: analysis and understanding.

This seems to be entirely forgotten by the "measure, test" crowd - the scientific method requires a hypothesis to test for measurements to be meaningful, and separately not all things that are produced are "tested" in every aspect, since many things are well established, and the application of that existing theoretical framework makes measurement and testing a waste of time.

Having a theoretical understanding of the thing you care about is far more important than the "measure, test" step - which is certainly important, but by itself of really limited power.


The programming field isn't lacking in typical programmers with hypotheses. What it's lacking are typical programmers who are checking their hypotheses.


If you take the word hypothesis at its meanest, and ignore everything else I said, sure. But a hypothesis that is grounded in an absence of analysis or understanding of the situation, or an appreciation of the existing body of literature related to a topic, or any other informing principle, is of limited value.

i.e., it's not about checking your hypothesis, it's about checking the method by which you formed your hypothesis.

Checking your hypothesis is fine, but of value only to reject, not to actually find a hypothesis you can accept.


Many would rate Aristotle as a 1st rate thinker. Too bad he didn't check his hypotheses more.


> the scientific method requires a hypothesis to test for measurements to be meaningful

That's the simplistic grade school version. In practice, characterization and measurement is part of a feedback cycle that is used to develop and refine hypotheses. Experiments need a hypothesis to test, but measurements do not require experiments. Where the "measure, test" crowd usually goes off the rails is in not entering that scientific feedback cycle after the initial measurements and instead jumping to conclusions based on gut feel interpretation of them.


I deleted from my comment an explicit calling out of this feedback loop, as it was implied, I thought, by the statement that the measurement and testing were both important, but of limited value by themselves.

However I disagree that measurements have any meaning unless they are conducted as experiments, or potentially as exploratory work looking for an experiment to conduct. Even such exploratory work needs to be driven by at least the loosest of hypotheses, since measuring everything is infeasible. So this is just as much an experiment, just one you don't have a high expectation of predicting the outcome of.

Either way, this recognition of the necessity of at least a minimal hypothesis in all of these scenarios is typically lacking is my fundamental point.

The cycle of revision of that hypothesis is clearly the next step and, as I say, I thought clearly implied.


> However I disagree that measurements have any meaning unless they are conducted as experiments, or potentially as exploratory work looking for an experiment to conduct. Even such exploratory work needs to be driven by at least the loosest of hypotheses, since measuring everything is infeasible. So this is just as much an experiment, just one you don't have a high expectation of predicting the outcome of.

> Either way, this recognition of the necessity of at least a minimal hypothesis in all of these scenarios is typically lacking is my fundamental point.

That's an overly broad definition of hypothesis that results in every decision-making process fitting the model. Even the "measure, test" people you were complaining about have that. They formulate a "minimal hypothesis", take measurements, fix what they see, and call it a day. What they don't do is iterate on the process and get down to the root causes.

Furthermore, in some closed systems measuring everything (at some level of granularity) is feasible. It leads to a troubleshooting cycle of continuously focusing the measurement towards an indeterminate point, with the specific point being the answer. This kind of fault isolation is much more productive, in general, than continuously developing and testing hypotheses about what specific things might be an issue.


I am not buying your distinction between measurements and experiments - experiments are measurements with a purpose.

You don't have to start with measurements. You can often do some analysis before you have anything to test, and that may save you a lot of time measuring, testing and tinkering with something that doesn't have a prayer of working. If you are measuring the environment that the system will interact with, you have already done some analysis in determining what to measure, and developed some theories about what is going to matter; that is hard to do without some causal reasoning. If your tests show you have a problem, you need to think about what is going on in order to find a solution.

There's a curious anti-intellectualism in software development that is opposed to any suggestion that thinking about problems is useful. It seems to go hand-in-hand with a simplistic world view that only sees dichotomies, together with the assumption that there can only be one answer.


> I am not buying your distinction between measurements and experiments - experiments are measurements with a purpose.

I'm not making one. I'm saying that measurements don't require an experiment to be meaningful.

> You don't have to start with measurements.

I didn't say you did, I said you can. The person I was responding to claimed otherwise.

> There's a curious anti-intellectualism in software development that is opposed to any suggestion that thinking about problems is useful. It seems to go hand-in-hand with a simplistic world view that only sees dichotomies, together with the assumption that there can only be one answer.

As someone who constantly rails at the lack of engineering rigor in software "engineering", I agree completely.


Hell, let's go a step farther. If your software engineering begins and ends with jargon, you're likely doing a terrible job.

(the business people always pick the worse solution that sounds better unless you can explain why they should care, in english, meant for normal human beings, without sounding condescending about it)


What's worse, the field is constantly changing. What was true on one architecture may not be true on another, and even different generations of a single architecture can change a lot.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: