What's interesting is the implicit assumption that mutating memory is O(1).
Unfortunately for real memory (caches, DIMMs, swap etc) that's not true. It's O(log n) for any non-trivial size of memory, and can have bad (constant?) factors for mutating memory that is shared between processors.
Of course functional languages have hidden and not-so-hidden costs too, starting with the garbage collector, but including the time taken to get the larger working set into cache.
I'm interested in any studies that look at these total costs in real systems.
You raise a valid point, and one that a lot of algorithm and data structure textbooks neglect to mention at all; seen purely from a practical perspective, things like locality of data with respect to CPU caches are probably more important than picking a slightly cleverer algorithm/data structure on any modern-day CPU.
This once again raises the more subdued or neglected aspect of "premature optimisation": blindly picking an algorithm that looks better on paper may in reality be much slower than a naive algorithm that better matches the characteristics of a computer.
Some algorithms are O(1) and you could easily be fooled into believing that they are, indeed, always better than their counterparts that're asymptotically worse than a constant-time; that may not be so, though, if the "constant-time" computations required for non-trivial inputs take longer than say a linear one.
You raise a valid point, and one that a lot of algorithm and data structure textbooks neglect to mention at all; seen purely from a practical perspective, things like locality of data with respect to CPU caches are probably more important than picking a slightly cleverer algorithm/data structure on any modern-day CPU.
It may not make it into the basics, but locality of data is absolutely considered in a lot of real work on algorithms and data structures. For example, the theoretical advantages of hopscotch hashing rest on data locality concerns. On a hypothetical machine for which memory access always takes the same amount of time, I believe - but have not bothered to prove - that hopscotch hashing and other open addressing techniques should actually be slower than other forms of collision resolution.
As for how it affects the question of imperative vs. functional data structure performance, it should tend to favor the imperative ones. The issue is that purely functional data structures aren't allowed to just grab a block of memory all at once and insert data into it at leisure. (That would be a form of mutation.) Instead space has to be allocated as it is needed, in a piecemeal fashion. As a result, longer-lived and more active purely functional data structures will tend to scatter across the heap over time.
Actually, the question then becomes one of trading off sharing vs cache locality. To whit, for many functional languages, memory allocation is essentially just a pointer increment, so there is no a prior reason to not assume that a freshly allocated tree occupies a contiguous region of memory, or at least contiguoudly on each page where it's been allocated.
Claims about data structures in the context of how they're allocated in a modern machine architecture / memory model are not the same thing as claims regarding the semantics that are exposed to the user of a programming language.
I think you do raise an interesting question, namely what's the layout in memory of a tree like data structure in eg scheme or Haskell, when the tree is built from scratch, but also when updated and a garbage collection run has compacted the data. Oooo, I totally want to investigate this now! (I don't have the time to, but I totally want to.)
Point being, depending on the memory management semantics for heap data in a programming language, any claims about locslity happening for tree data may or may not happen.
This actually raises a really beautiful question: what is the complextity of any garbage collection algorithm that exactly or approximately maximizes the locality of tree or directed graph like data structures in memory? I want to speculate that perhaps the data structures which people find fast in practice are actually in some sense enabling the gc to preserve locality. I wonder how optimally the layout can be when only linear or nearly linear time algorithms are considered!
Very true! Hence the part where I'm thinking out loud regarding how the gc ordering of data after a copy/compact (in the one generation gc) impacts locality. And I suppose this also applies to more modern gc setups like multiple generations or parallel collection. Is there some notion of data locality that we can formalize and analyze for at least simplified versions of modern memory hierarchy? How well do standard gc algorithms perform when analyzed thusly? How efficient can each style of gc algorithm be if it's designed to optimize locality?
It clearly has some relation to optimal caching optimization problems, but with more explicit interdependencies. It's also seemingly both an online and offline problem in terms of how to analyze it, but I need to think on it more
> As for how it affects the question of imperative vs. functional data structure performance, it should tend to favor the imperative ones. The issue is that purely functional data structures aren't allowed to just grab a block of memory all at once and insert data into it at leisure. (That would be a form of mutation.) Instead space has to be allocated as it is needed, in a piecemeal fashion. As a result, longer-lived and more active purely functional data structures will tend to scatter across the heap over time.
It's a bit more complicated than that. To talk about runtime (and memory) of an algorithm or a data structure, you need to have a machine model. If you look at what happens in, say, Haskell: We write our algorithms in a purely functional way, but we are interested in their runtime after compilation into native code. The compiler is free to do any transformation that preserves correctness, so if it's smart enough, it might grab a block of memory all at once. Or if it can prove that data won't be used again, it can safely change variables in place. (The language Clean even has type system support for that kind of mutation. (See https://en.wikipedia.org/wiki/Linear_type_system))
You might be able to analyse functional algorithms with a machine model that doesn't involve compilation to some form of imperative code. Perhaps by counting reductions in lambda calculus. But then it's hard to compare that to asymptotic performance of imperative algorithms.
Absolutely. But many of those hypothetical compiler transformations would only work if the code isn't really taking advantage some of the most important advantages of purely functional data structures. Persistence, for example.
That, and in a sense it's side-stepping the theoretical issue more than anything else. A compiler that's smart enough to find spots where a performance improvement can safely be gained by replacing a pure structure with a mutable one in the background doesn't really counter the idea that impure structures can be faster so much as acknowledge it.
Yes. And instead of comparing purely functional vs imperative languages, we could sidestep those issues by comparing persistent vs ephemeral data structures, regardless of language.
(For an example of the distinction--straight out of Okasaki's book: the standard purely-functional queue implementation as two linked lists has O(1) amortized runtime in adding and removing only if used `single-threaded' / ephemeral. As a persistent data structure you get a worst case of O(n).)
Even the "on-paper" analyses can take some of this into account, with algorithm analysis models that take into account cache and NUMA architectures. But, people do tend to just look at the classic big-O analyses, which may not be right.
In a way that kind of dependence isn't new. If you look at old versions of Knuth's TAOCP, for many algorithms he has separate analyses for the in-RAM case and the on-tape case, using two different memory models, and in some cases which algorithm is better differs a lot between the two cases. But it may be that algorithms reference books are doing a worse job keeping up now than they used to.
Knuth is probably the only (major) author that I can think of off-hand who took the application of his algorithms and data structures so seriously (going so far as to invent his own little RISC instruction set, MIX (later MMIX.) in a book that concerned itself with the theory as aswell as the practical aspects. Most books focus almost exclusively on one or the other.
I think aside from things like cache coherency, memory models and things like SIMD, computer scientists will have to think of their algorithms (or at least cover it when they write books for undergrads or professionals) in terms of commutativity and associativity so people can see how parallelisable an algorithm is or can be.
I can't upvote you enough. Too often programmers conflate "functional programming" with "availability of an O(1) array data type". The former does not imply lack of the latter, and (as you stated) lack of the latter does not imply the former.
The important takeaway I think, is that O(log n) approaches O(1) for all practical values of n.
To the downvoters: gee golly, would it hurt you to reply to my comment? If you think I'm wrong then say why!
Edit: O(log n) is clearly worse than O(1) if you just considered an O(log n) array-replacement data structure or hash table-replacement data structure and considered what its performance would be. I mean, how is this even a question?
Also if memory is O(log n) then arrays are O(log n) but BSTs are O((log n)^2).
(Edit II: Let's be clear that BSTs are already (log n)^2 if you're counting CPU cycles with O(1) memory access. They're O(log n) if you count cache misses or comparisons, or you might say they're O((log n)^2) but the first log is base <2 and the second is the logarithm base 2 to the 256th power, so only one really counts. (The second manifests as a cache line miss if your key comparison walks past the first 32 bytes of your key.))
(Actually memory access is at best O(n^(1/3)) thanks to space being three dimensional, but really worse thanks to cooling concerns, and uh, for sufficiently large n, worse because of gravity.)
O(log n) is clearly worse than O(1) if you just considered an O(log n) array-replacement data structure or hash table-replacement data structure and considered what its performance would be. I mean, how is this even a question?
Why yes, it is worse in some theoretical manner. But if you think it is necessarily worse as a practical manner, you have failed to understand the math fully.
Suppose that n is 1000 in some toy problem. Then you scale it up to a huge data set, a billion items! The log(n) factor only got worse by a factor of 3. Maybe we didn't go high enough. Perhaps we can take a large data set. Let's see, how about all of the data that is produced at CERN's Large Hadron Collider in a year? That's 15 petabytes. Now the log(n) factor is a whopping 5.39.
In other words while log(n) represents unbounded possible growth, it gets worse in practice by at most a fairly small constant factor.
How about O(1)? O(1) does not mean fast. It just means constant relative to the data size. It can be a bad constant, but as long as it is a constant it qualifies as O(1).
Let's take a practical example. Wavelets are cool in lots of ways, but one of the properties people noticed quickly is that doing a basic wavelet transform on a data set with n elements takes time O(n). Doing a FFT (Fast Fourier Transform) on the same data set takes time O(n log(n)). Yay, we're faster!
But hold on a minute. The interesting wavelets that we like to use are O(n) with a worse constant than the FFT. So the FFT tends to be faster on practical data sets. (There are lots of reasons why one would prefer wavelets over the FFT, but speed of calculation is not generally one of them.)
I was just explaining the comment that O(log(n)) approaches O(1) for all practical values of n.
If you understood this, then why did you start off your post arguing that O(log(n)) was worse than O(1)?
Incidentally comparing a logarithmic lookup to a hash lookup, a lot of nosql solutions use hash tables because O(1) is good, right? For instance look at Apache's Cassandra. By contrast Google's BigTable uses an ordered data structure which is logarithmic.
Guess what, if you've used both, BigTable is better. Because the extra log cost is irrelevant for the normal use case, and being ordered reduces the operational cost of things like range searches.
"I can't upvote you enough. Too often programmers conflate "functional programming" with "availability of an O(1) array data type". The former does not imply lack of the latter, and (as you stated) lack of the latter does not imply the former."
Yes, even the pure Haskell has unboxed, mutable O(1) arrays. They're just
a bit hidden for the beginner.
To my knowledge, mutating memory is asymptotically the same as just reading it. Hence any asymptotic slowdown purely functional language have will translate, regardless of how long memory operations take.
Are you sure? Writing to memory that might be shared between processors certainly isn't the same as reading, and so a parallel algorithm (and what isn't, these days?) should do better sharing lots of read-only data, rather than mutating shared state.
The top-voted answer mentions that the example O(n log n) algorithm can be made O(n) by replacing eager evaluation with lazy evaluation. My question is, does this count as using the same algorithm or not?
I sort of view lazily-evaluated data structures (like generator expressions) as not data structures at all, but rather control flow constructs. They control when, how, and if expressions are evaluated and results are returned. As a result, I feel that using lazy equivalents for your data structures qualifies as an algorithmic change. In fact, one could manually code lazy data access in an eager language, and the resulting algorithm would look very different from the original.
On the other hand, the laziness is enabled by the purity of the algorithm, so... well, it depends on your perspective. You could say that laziness makes the functional equivalent just as fast, or you could say that the functional equivalent is slower, but there's a different functional algorithm that is just as fast.
 Except they're still data structures. They're like, a quantum superposition of data structure and control-flow operator. Which one it is depends on how you look at it.
Yes. That's why in Haskell programming a common sentiment is that data structures are control structures. And why we need non-linear data structures for concurrent execution (http://vimeo.com/6624203).
You're getting dangerously close to describing continuations. And I think it's important to remember that just because a programming language doesn't give you access to a data structure, it doesn't mean it's not using it under the covers. The call stack is a data structure in all programming languages, only a few provide you with easy access to it.
> O(n log n) algorithm can be made O(n) by replacing eager evaluation with lazy evaluation. My question is, does this count as using the same algorithm or not?
I don't think anyone is claiming that it is the same algorithm. Rather, they're claiming that using lazy evaluation gives you more choice of algorithms; and one of these algorithms is faster than the best algorithm under strict evaluation. I haven't been able to download Bird et. al's paper, but I wouldn't assume it's simply "implementing the same algorithm in Haskell instead of strict ML".
In addition to the great answer mentioned there, it should also be mentioned that every language referred to as functional still has the ability to use mutable arrays in some manner. Yes, even Haskell, it just requires you to carefully control where the mutation is used.
Yes, I think the great lesson we can take from functional programming is not of being functional 100% of the time, but of not having mutable global state. Your inner loops, local variables, etc. can all be mutable and nobody will care, as long as your interface is functional.
Yes and no. It is true that what matters most of the time is a functional interface, not how that is achieved. That being said, it is much harder to successfully implement a functional interface if you are using side-effects willy-nilly inside.
Purely functional programming? I dunno, what's the asymptotic cost of flipping a switch on a box and watching it warm up?
Nobody advocates purely functional programming, at least not in the implementation of a language itself, but it can be an incredibly valuable model for a language interface. In the same way, a stack (or better yet, semi-infinite tape) can be an incredibly valuable model for a language interface at well. Functional programming is, by definition, just another way of saying that your model of computation is based on the lambda calculus instead of a Turing machine. You can just as easily ask what the asymptotic 'cost' of using a non-random access Turing machine is.
If people really want a functional language with a purely functional implementation, then you have to say 'farewell' to even things like tail-recursion, since you can make the argument that converting recursion to a loop is anti-lambda calculus.
EDIT: It seems there's some confusion over the distinction between implementing a purely functional algorithm in a language and implementing an language in a purely functional manner. My point is that, even if you implement your algorithm in a purely functional manner, if your compiler is simply converting it to a (well-optimized) imperative, destructive code behind-the-scenes and making it appear to be immutable, then this entire question is rather beside the point.
 (With apologies to Simon Peyton Jones).
If it were really purely functional it wouldn't even warm up. When you compile your code, referential transparency would result in your executable being nothing more than the final value produced by your code.
If we want to get really pedantic, I guess we could say that we're using one Turing machine (our computer) to emulate another equivalent model of computation (the lambda calculus), so the computer warms up because the Turing Machine is an inefficient emulator of the instantaneously-evaluating lambda calculus evaluator. But the lambda calculus evaluator doesn't change state.
...so, what we really need is to wait for someone to invent a lambda calculus evaluator. Then we wouldn't need to emulate them with these silly Turing machines, and we could get instant evaluations of our programs based solely on ϐ-reductions, with no side-effects (thermal side-effects included). That would put an end to this debate!
(Furthermore, who cares if P=NP if ϐ-reductions can be evaluated 'directly' instead of being emulated by these obsolete Turing machines? Even the hardest decidable problems would be solved instantaneously!)
Depends on the program. If it were computing all the twin primes it would warm up. Here the input would be round about - how long you kept the program running. And since each time a bit was erased or a non-not operation was done > klog2(T) J must be dissipated, a sufficiently sophisticated being could use the timings and heat signature to infer the state and value to extract a runnning output. Another output you could more easily get is that if it started to cool down and you had a notion to return an output after 10^(absurdly large number) successive failures then you have a pretty good confidence bound on the falsity of the conjecture..
Okay, great, but what's the point? At the end of the day, we're not running anything on Turing machines; we're running them on computers, which are almost but not quite the same thing. Remember that the Turing machine is intended as a model for algorithms and not a model for the actual execution process/environment of those algorithms. (Again, a distinction that's irrelevant when discussing TMs in most contexts, so it's generally brushed over).
And at the end of the day, my original point stands: we're talking about writing purely functional code, but if the code is being compiled by a compiler that takes advantage of non-functional code optimization (and as far as I know, nearly all general-purpose compilers do to some degree), then it doesn't make sense to compare the functional code vs. non-functional code. Even if it appears that data structures are immutable, if the compiler is mutating them behind-the-scenes, it all depends on how well-optimized the compiler is, and that's a very different discussion than the one implied by the title.
Do you ever try to understand the asymptotic performance of a program in terms of the source code, or only read compiled binaries?
You can describe the performance of a functional program in something like asymptotic number of reductions. You can write compilers that will run those programs on real hardware in time related to the performance bound you get by working in the source model. Also, perfect optimization is undecidable. So, the question is whether there are problems where you necessarily lost asymptotic performance by working with a purely functional programming language.
In the general case, not necessarily for any subset, and this would apply to to non-functional algorithms as well. Just because I write a 'simple' for/while loop, that doesn't necessarily imply anything about the actual
At the end of the day, an algorithm is simply an unambiguous, executable, and terminating specification of a way to solve a problem. We may implement an algorithm by creating a physical or virtual machine that conforms to that specification, but even abstractions like Turing machines and lambda calculi are just attempts at creating a deterministic model that we can manipulate in order to understand the algorithm, which is an abstract specification.
The problem is that in general, algorithms are not specified in perfect detail, or they are implemented in ways that are logicially equivalent, but not exactly equivalent, with the specification. (Or oftentimes both). When dealing with any specific algorithm, we can usually infer that analyses of one will translate to the other, but in the general case, this assumption is murky.
The difference between functional and non-functional for an algorithm happens at the level of the specification, which is different from the level of the implementation. The translation from one to the other involves optimization. And if perfect optimization is undecidable, as you point out, so is the question of whether a perfectly optimized implementation of one class of algorithms is 'better' than a perfectly optimized implementation of another class, since we're no longer dealing with individual instances.
Lazy evaluation probably adds more unintuitive cost. E.g. Haskell uses thunk to track yet-to-be-evaluated values. These thunks take up memory and take up CPU cost. For lazily evaluated function calls (or expression), each call has a thunk created and it will be evaluated later when needed. For deep recursive calls the thunk link can be very long.
Lazy evaluation is great in that the actual computation might not needed at the end but there are non-trivial cost to track and evaluate all the thunks.
It's an extra dimension to consider when comparing the cost of doing simple computation with strict evaluation vs utilizing lazy evaluation with the added thunking cost.
There is a very large difference between what you write and what your computer actually does.
Something may look like it is O(1) but in reality it may be much slower. The only way to find out is to put the program through its paces with various values for 'n', and to measure the time it took to compute the answer.
There seem to be a lot of people confusing complexity with practical issues.
I think we can agree that:
1) you can write most software in a functional programming language without any (noticeable) loss of performance
2) there will always be software (video encoders, HPC stuff) that will always be done in Fortran/C :D
However, I think the question is very interesting from a theoretical point of view: if you model imperative algorithm as programs for a RAM machine (good theoretical approximation of real single core computers) and purely functional algorithms as programs for a TBI machine (good theoretical approximation of a hypothetical machine executing pure algorithms), is there a problem for which a faster solution exists for a RAM than for a TBI?
Note. RAM machines exist and are standard in literature; to my knowledge TBI (to-be-invented) machines do not (yet) exist.
> 2) there will always be software (video encoders, HPC stuff) that will always be done in Fortran/C :D
Alas, yes. But there are efforts underway to challenge that. Theoretically, your compiler should be able to do a better job, if it knows more about your intent. I'm learning about some techniques for really high performance Haskell. E.g.:
The C equivalent of stream fusion would be loop fusion: if you have to for loops one after another iterating over the same array, you could fuse them into one loop. That same trick works in Haskell, even more often and with more benefits.
I've had some whoppers in Erlang, where "garbage generated" ends up costing N-squared, even though the operations were just N. In fact, the garbage collection footprint turns out to be one of the most significant markers for functional performance in my experience. Erlang's per-process heaps help here, because each heap can typically be tiny, but it's still something that needs to be managed as closely as, say, stale pointers in C++.
A copying gc helps with that somewhat. When the young generation gets copied you only pay for data that is still alive, not garbage generated. I believe erlang uses mark-and-sweep rather than copying though.
It is kind of sad that there are no real results since Okasaki 1996. I remember reading that while I was doing my startup-interrupted PhD in the late 1990s. A result that lazy eval was algorithmically equivalent to imperative would be very compelling.
It is not a matter of imperative or functional. It's the difference between mutable and immutable style programming. You can have an immutable imperative program or a mutable functional program.
Lots of algorithms maintain mutable states. When these algorithms are translated into immutable state style, their complexity (BigO) shoots up. The SO question is asking what's the increased complexity when moving to PURE functional programming style.
Uhh… none, seeing as any imperative program can be translated directly into a purely functional one by threading an implicit global state parameter around everywhere. Then it's just a matter of your compiler recognizing this paradigm (for example, Mercury does this).
Equivalently, I can write an x86 emulator in a functional language whose time per instruction is independent of the amount of memory used. Then I can just compile an imperative program and run it on my emulator without any asymptotic slowdown.
How, you ask? Either (a) my language provides me O(1) functional arrays (e.g. Mercury does this), or (b) I use an O(log n) structure (such as a binary tree): since n is bounded by the size of physically addressable memory, this reduces to O(1). (See also rwmj's comment about memory access times. )
The real question to ask is, what is the asymptotic cost of programming in a language without an O(1) array structure? (A property which unfortunately has falsely come to be associated with purely functional programming, due to early functional languages being merely naïve interpreters of the lambda calculus.) The answer to this question of course is likely O(log n).
>(b) I use an O(log n) structure (such as a binary tree): since n is bounded by the size of physically addressable memory, this reduces to O(1). (See also rwmj's comment about memory access times. )
Uh, no. By definition, a O(log n) structure takes, on average, log n operations to get to the node you want to get to. That never reduces down to constant time, because any given access will take log n reads to get to the correct place.
Also, keep in mind that O(1) array lookup isn't really constant time. It's still variable, depending on where the information actually is stored(i.e. in the L2 cache, RAM, on HDD, etc.), the practical time can vary from double digit nanoseconds to several milliseconds. Cache misses can be hugely expensive.
O(log n) structure takes, on average, log n operations to get to the node you want to get to
Big-O notation is for asymptotic upper bounds, not the average case. That is, it's a guarantee that using the algorithm will never take longer than the function given. It is not a statement of the average case. In order to do that, you must make some analysis on the distribution of the inputs.
Sorry, but that's exactly what Big-O analysis is intended to do. If you have an binary search tree that can be imbalanced, then the lookup time is O(n). It's not "can be," it is. Worst-case analysis is across all inputs. Not "for most," but all.
The only binary search trees that people use in practice are balanced. That is, the algorithms that control insertion and deletion in the tree guarantee that the tree is always balanced, and they themselves run in O(log n) time. In my experience, most trees used in practice are in fact red-black trees, but a much easier to understand balanced binary tree is an AVL tree. So when we say that a red-black tree has O(log n) lookup, that is a guarantee across all inputs. There is no input to a red-black tree that will yield a linear lookup time.
O(f(n)) means that an operation takes less than k f(n) time to complete (where k is some constant) for all n.
I predicated my assertion on bounded memory, hence n is constant. Therefore O(log n) means that the operation is bounded in time by k log n. But since n is also a constant, this is equal to some other constant k'. This is equivalent to saying that the operation is bounded by O(1).
So you're arguing that all algorithms on a bounded-memory machine (that is, all of them) are O(1). Sure, but that constant in the O(1) is sufficiently large on most machines that this isn't useful to programmers who need to analyze the run times of their algorithms.
While strictly true, that is not a very useful approach to take. By the same reasoning, _every_ operation is either O(1) or does not terminate, because a system with bounded memory has a bounded number of possible states.
I can see why it's especially tempting to make that jump in the case of O(log n), but that's just not the way this type of analysis is performed - a more conventional way of stating your point would be to say "yes, it's theoretically worse than O(1) but for practical purposes the constant factors matter far more".
>I predicated my assertion on bounded memory, hence n is constant. Therefore O(log n) means that the operation is bounded in time by k log n. But since n is also a constant, this is equal to some other constant k'. This is equivalent to saying that the operation is bounded by O(1).
You're fundamentally misunderstanding what big-O notation means. You cannot say that since n has an upper limit, everything is constant. By your argument, the actual speed of an algorithm doesn't matter, period, since the sample size is bounded, which means that everything evaluates to a constant.
Big-O notation is used to give a ballpark estimate of an algorithm's efficiency. In this case, the metric being used is the number of data fetches the CPU needs to get to the specific location that it's looking for.
O(1) means that the number of possible CPU instructions is constant, no matter how large n is. So a static array lookup is O(1) because, no matter how large N is, the CPU only has to do a fetch. That's 1 instruction, and while the actual time is variable due to cache misses, the total number of instructions is constant.
O(log n) means that it will take log n operations to get the result. Binary trees are a great example. Assuming a perfectly balanced BST, the worst possible number of lookups(i.e. to the farthest leaf) will be log n different fetches. If you have a system that creates 127 buckets, the worst possible number of lookup operations is 7, which is log base 2 of 128.
Big-O notation is used to give a ballpark estimate of an algorithm's efficiency.
No it's not. Big-O notation is used to denote asymptotic running times. The OP's question is invalid because his assumption of O(1) array accesses is only valid on hardware with finite memory size, which makes what is actually a O(log n) or O(n^(1/3)) memory read/write appear to be O(1).
I made the same invalid assumption and came to the same invalid conclusion as the OP (i.e. that language X has O(1) array accesses). You can't deny me this assumption (and therefore conclusion, which I reached logically) without denying it to the OP as well.
O(1) means that the number of possible CPU instructions is constant, no matter how large n is.
This is simply false. If n is larger than 2^32, the number of instructions (and more importantly, time) grows immensely.
Now if on the other hand you want to posit an infinite O(1) memory; OK, I posit a lambda calculus reducer which can treat infinitely sized terms as a single redux. Bam, I've got O(1) array accesses in lambda calculus (and thus functional programming) now.