
Scala Vector operations aren't “Effectively Constant” time - dmit
http://www.lihaoyi.com/post/ScalaVectoroperationsarentEffectivelyConstanttime.html
======
kbenson
I'm a little disappointed that I bothered to read what turned out to be a
massive act of pedantry.

A few sections are devoted to showing that O(log32(n)) is the same as
O(log(n)). This is entirely correct, as shown, because the difference between
them is reducible to a constant multiplier, which big-O notation does not care
about. It's also, from the problematic sources given, and entirely made up
argument, given that I didn't find O(log32(n)) mentioned in any of them that I
was able to view. Plenty of notes in them about it being logarithmic though...

I understand making sure people know that it's not constant time, but when all
the examples of problematic descriptions you give make sure to say that it _is
logarithmic_ time and _then_ go on to explain why it's "effectively constant",
you're railing against a non-issue with an absurdly complicated argument.

~~~
KirinDave
Words may be relative, but that's why we set up systametized math notation to
appeal to when we need to start characterizing the precise nature of
algorithms and systems.

To argue that this is act of pedantry is to argue that any form of canoncial
notation to describe the essential nature of algorithms is itself pedantry.
It's to suggest bigO notation itself is pedantry and by extension so is it's
defense.

In the face of this, I'd say, "If you don't like it, don't use it. But many
people do find it useful and it's corrosive in the extreme to demand the right
to redefine it for what amounts to marketing purposes." Scala's got a bunch of
actors in it's community that do this; and it's frustrating when you're trying
to evaluate the ecosystem.

The asymptotic boundaries of algorithms ARE theoretical tools that can be
misleading. For example, how many people incorrectly suggest that quicksort is
asymptotically faster than merge sort? It's not.

But the response to this should not be to vaguely redefine the terms. The
response to this should be to quantify the implementation's docs with real
metrics.

~~~
kbenson
> In the face of this, I'd say, "If you don't like it, don't use it. But many
> people do find it useful and it's corrosive in the extreme to demand the
> right to redefine it for what amounts to marketing purposes."

But as I pointed out, in all the examples provided in the article as evidence
of this behavior, not only did they _not_ mention O(log32(n)), they often
(possibly always in the links presented) _did_ specifically call out running
time as logarithmic prior to mentioning what that meant _in practice_.

> But the response to this should not be to vaguely redefine the terms. The
> response to this should be to quantify the implementation's docs with real
> metrics.

This is not a case of vague wording being used, it's a case of exact wording
being used and additional context being provided.

Edit: I noticed some of the examples weren't as upfront about the time
complexity, I was mistakenly seeing what was presented for Lists as applying
to Vectors. In lieu of that, I agree that they could, and should, be more up-
front about the _theoretical_ time and the _practical_ time if they plan to
continue using "effectively constant" in documentation.

~~~
KirinDave
I've seen numerous examples of "essentially constant time" on slide decks,
presentations and talks. But I'd argue even if you did do this after a prior
disclaimer, it's still somewhat misleading. It's adopting language that sounds
very similar and creating greater confusion.

> This is not a case of vague wording being used, it's a case of exact wording
> being used and additional context being provided.

And yet the confusion persists. And I can point to enthusiastic jr. engineers
who don't fully grasp the difference, and that were somewhat surprised when I
corrected them (the equivalence of log32 and log2 should be enough but...)

Similar stories abound from multiple actors in the world of Scala. Another
great example is the (now infamous) "non-blocking backpressure." Same
principle, taking well-understood industry terms (some of which even have
precise definitions lifted from more rigorous disciplines) and wrangling them
into something almost unrecognizable.

------
kevinwang
Effectively constant seems like a reasonable thing to say if the real world
number of operations is upper bounded by six. I don't interpret that claim as
making any claim about the theoretical asymptotic complexity of an algorithm.

Of course, as the author notes near the end, this also means that any real
world algorithm running on a real world data type which is bounded in size can
also be considered "effectively constant", which does throw doubt onto the
effectiveness of the term.

I guess in the real world we should use graphs in units of time instead of
classes of functions to discuss performance based on input size, since that's
what we care about.

Also interesting to note is that"effectively constant", although extremely
hand-wavy and not rigorous, is used even by computer scientists to denote
extremely slow growing functions. A professor once used similar words to
describe the complexity of the inverse Ackerman function:
[https://en.m.wikipedia.org/wiki/Ackermann_function#Inverse](https://en.m.wikipedia.org/wiki/Ackermann_function#Inverse)

~~~
victor___
I don't think pulling out the number 6 separately from the constant factor
results in an apples-to-apples comparison with other operations with
logarithmic complexity.

The 32-way tree operations have asymptomic complexity 6 * C_1 * log(n) = C_2 *
log(n)

Operations on a balanced binary tree have asymptotic complexity C_3 * log(n) =
6 * C_4 * log(n).

The only difference between the two data structures is the actual values of
the constants.

I think the Scala people have a valid point that logarithmic complexity may be
as good (or nearly as good) as constant-time in practice. The precise way the
claim is formulated is wrong and abuses big-O analysis.

A more correct argument is to choose a practical upper bound on log(n). E.g.
maybe 64. Then multiply that by the constant factor (not playing any tricks
with splitting out 6 *). If that number is always good enough if practice,
then you don't need to worry about your operation being logarithmic.

~~~
rrobukef
I don't think they should use effectively constant. Because the difference
between O(1) and O(log(n)) may not be very big but the difference between O(n)
and O(n log(n)) is very measurable. Even for small n.

------
smitherfield
Even a statically-allocated C array doesn't "really" have constant-time lookup
with respect to size; the more elements it has the greater the rate of cache
misses and page faults on lookups becomes (I'm pretty sure this can be proven
true of all data structures). Moreover the possibility of cache misses and
page faults means that while lookups take a fixed number of CPU cycles _on
average,_ any single lookup might take between <1 and thousands of cycles. If
you truly need your data accesses to take a fixed number of CPU cycles, you
have to handwrite assembly and put everything in registers. And even _that_
doesn't account for instruction reordering and the possibility of interrupts
or context switches.

I would assume the reason the Scala website calls Vector log( _n_ ) element
lookup "effectively constant" is because in real-world use (as opposed to
hypothetical ideal Von Neumann machines), the slowdown of the lookup algorithm
with respect to _n_ is nil, in relation to the decrease in cache and
allocation efficiency with respect to _n._

If you're creating some sort of ultra-fine-grained real-time system where
everything needs to take a fixed number of clock cycles, Scala would be a
pretty terrible tool for the job. For that sort of thing you'd want MISRA C or
assembly running on bare metal, or a FPGA or ASIC.

~~~
arnioxux
There's an interesting argument based on physics on why memory access is
O(sqrt(N)):

[http://www.ilikebigbits.com/blog/2014/4/21/the-myth-of-
ram-p...](http://www.ilikebigbits.com/blog/2014/4/21/the-myth-of-ram-part-i)

Relevant quote:

> The amount of information that can fit within a sphere with radius r can be
> calculated using the Bekenstein bound, which says that the amount of
> information that can be contained by a sphere is directly proportional to
> the radius and mass: N ∝ r·m. So how massive can a sphere get? Well, what is
> the most dense thing in existence? A black hole! It turns out that the mass
> of a black hole is directly proportional to its radius: m ∝ r. This means
> that the amount of information that can fit inside a sphere of radius r is N
> ∝ r². And so we come to the conclusion that the amount of information
> contained in a sphere is bounded by the area of that sphere - not the
> volume!

> In short: if you try to squeeze too much L1 cache onto your CPU it will
> eventually collapse into a black hole, and that would make it awkward to get
> the results of the computation back to the user.

But honestly I find all of these arguments equally pedantic.

People often forget Big Oh is a tool just like any other tool. It's useful for
estimating number of X, whether X is cpu instructions, cache miss, disk read
or whatever. When it stops being a good estimate (whether it's because your
real world N is too small for growth rate to dominate, or stuff you're
counting are not unit-cost comparable, or you're not even counting the right
things in the first place), just stop using it and count cycles on a benchmark
instead.

~~~
deepsun
You're comparing radius (in Bekenstein bound) and event horizon (in black
hole).

Event horizon is a pretty different beast than radius. If we define "density"
of a black holes using event horizon instead of radius (black holes don't have
real radius), then supermassive black holes are actually pretty "light". A
black hole of the mass of our universe would have "density" of the universe,
which is mostly empty.

~~~
arnioxux
Not my article and I don't know any theoretical physics.

In case that point happens to be false, I think his other argument still
works:

\- "data centers ... are spread out on the two-dimensional surface of the
earth"

\- if we tightly pack these data centers on the surface it will form a circle
with area O(N) so distance to furthest data center is O(sqrt(N)). (sky scraper
height is negligible)

Anyway, it's more of an interesting thought experiment than of any practical
use. Nobody sane would estimate cost of generic "information retrieval" that
mixes cost of ram/disk/network.

------
pierrebai
This mainly shows that the author mistakes the prupose of Big-O notations.
It's a measure of asymptotic performance, so there need to be an asymptote to
reach. If the range of values you're measuring is small (and a non-constant
factor that varies between 1 and 6 is small) then it's teh fact that one uses
Big-O to characterize an algorithm that is wrong.

It's a classic mistake. There are many pitfall to performace measurements,
including: using Big-O when the range is small or ignoring cinstant factor
when the constant is big. Both of these has lead people down the wrong path,
for example usng a theoretically faster algorithm that is in reality slower
because in the typical use case the constant overweigh the complexity factor.

The counter-example to the author argument would be to say that a binary
decision is linear over the input range.

~~~
flgr
> and a non-constant factor that varies between 1 and 6 is small

Wouldn't the 6 in O(((n^1000)^n)^6) be a non-constant factor that makes a big
difference in asymptotic performance though?

~~~
mamon
This is not a constant factor, this is an exponent. Constant factors are
what's standing in front of whole term, like this: O(7*((n^1000)^n)^6), 7
being constant factor.

------
lorenvs
I think the author missed an opportunity to point out what I feel is the the
most convincing argument against labeling these operations as "Effectively
Constant". The only reason we drop constant factors out of Big-O expressions
is because we measure as N approaches infinity. Once you start reasoning about
the size of N being bounded, the rationale for dropping constant factors
vanishes.

The cost of maintaining a 32-element tree node might not be measurable as N
approaches infinity, but for N < 2 ^ 32 it is, especially in relation to the
cost of maintaining 2-element tree nodes. Once you consider constants, the
runtimes are likely comparable between a binary tree with depth 32 and a
32-ary tree with depth 6 (with a skew towards the 32-ary tree for locality of
access).

------
deepsun
TL;DR:

Scala doc claims that their algorithm is O(log32(N)), where N is not greater
than 2^31, so it's O(6)=O(1) -- "effectively constant".

Author claims that then all real-world algorithms are constant, for example
bubble sort is O(N^2), where N is not greater than 2^31, so it's O(2^62)=O(1)
-- also "effectively constant".

~~~
benlorenzetti
Well put! Taking author's basic proposition to its logical conclusion shows
that faulting Scala for this is wrong.

Scala has illuminated the author to a pedantic dissonance in academia: in this
case, that true asymptotic analyses should be using O(logn) instead of O(1)
for random memory access.

Hence why algorithms like Hoare's quicksort are so fast, they actually are
Theta(nlogn) in practice because _linear_ memory access looks like O(1) on
modern machines.

(but again pedantically it is probably still O(logn) from gate depth of
multiplexing/demultiplexing the architecture's pointer size.)

Edit: Remembering that O((logn)^2) = O(logn), I suppose the author is probably
correct.

------
Asdfbla
Seems like a pedantic article, but I liked his remark that the 'effectively
constant' argument can of course be extended to any algorithm where the input
size is bounded and therefore, shouldn't really be used to describe your
algorithmic complexities, even in real life implementations (except when you
maybe use the inverse Ackermann function or something).

------
cameldrv
This is a pet peeve of mine in interviews when asked what the big O of hash
table lookup is and I say "you're probably looking for O(1), but it's really
O(log n). The assumption is that you can hash a word sized quantity and access
the corresponding memory in constant time, but the whole point of Big O is
asymptotic complexity, and just taking the hash of N bits is going to take
O(n) time. Now I get called a pedant for this, but really log n time is for
most purposes close enough to constant time to be negligible for practical
purposes. Log n factors are unlikely to determine whether an algorithm is
feasible, as the constant factor is usually way more significant.

~~~
boethian
Be careful here, because you hash the element, granted that may take O(n)
where n is bits in the element, but n usually denotes the number of elements
in a list, which can obviously be much larger and is thus far more important.
EDIT: in part because the number of bits in each element doesn't "grow"; it's
usually fixed.

------
stcredzero
By now certain elements of a modern programming environment have become pretty
apparent. (Vectors and maps, for example) Could there be something gained by
supporting these directly in the architecture?

------
DSrcl
Labelling an operation -- number of steps of which upper-bounded by 6 --
"effectively constant" is different from considering any real world algorithm
with bounded input size.

Calling Scala's immutable vector ops "effectively constant" is not a
stretch/hack by any means, considering we also say the same for integer
addition and multiplication.

~~~
MaulingMonkey
> Calling Scala's immutable vector ops "effectively constant" is not a
> stretch/hack by any means, considering we also say the same for integer
> addition and multiplication.

This rationale ad absurdum means we can call an in-memory linear search
"effectively constant" because we have an upper-bound of 2^32 or 2^64,
depending on processor architecture.

Yes, log32 is very nice. Even log2 is pretty nice though, and we don't call
that "effectively constant", because we already have a much more precise and
well defined term for it - "logarithmic" \- which the docs also use:
[http://docs.scala-
lang.org/overviews/collections/performance...](http://docs.scala-
lang.org/overviews/collections/performance-characteristics.html)

You _might_ be able to tell me with a straight face that labeling log32
"effectively constant" and log2 "logarithmic" is "correct enough". But can you
tell me with that same straight face, that labeling it "Log32" (or even just
"Log") isn't just as correct, more informative, and less confusing?

------
flgr
This is why I've never liked stating that something has "run-time of
O(log(n))" since that's rarely true — the assumption for that is that all
machine instructions take pretty much the same time, which is not the case.
CPU instructions involving cache misses are multiple orders of magnitude more
expensive than others.

I think it makes much more sense to talk about concrete operations (or cache
misses) instead. Sounds like their implementation has O(log(n)) cache misses.

------
wohlergehen
Has anyone actually benchmarked the vector lookup for various vector sizes in
scala?

That would straightforwardly determine if the runtime is "effectively
constant" or "effectively logarithmic", e.g. cause it is really O(a+b*log(n))
with a >> b.

~~~
flgr
Sounds like the author did —
[http://www.lihaoyi.com/post/BenchmarkingScalaCollections.htm...](http://www.lihaoyi.com/post/BenchmarkingScalaCollections.html#vectors-
are-ok)

Notably Vector index access takes 4,260,000 ns for a vector with 1,048,576
elements instead of measured 0 ns for native arrays. is about +4 ns extra per
element and hints at the whole data set still fitting into L2 cache.

If the process ends up accessing more working memory than fits into L2 (32 or
64 MB or so) and lookups aren't nicely bundled together, the overhead will
approach about +80 ns per element access. Or +0.08 seconds per million element
accesses.

It does seem unlikely that every element access would end up causing a cache
miss since I'd expect lookups to happen close together, but this can be
significant for intense workloads such as OLAP.

~~~
andrewla
Based on the numbers for lookup, it looks worse than that -- it seems to be
linear (r2 = .9988).

    
    
        df <- data.frame(
          s=c(0,1,4,16,64,256,1024,4096, 16192, 65536,
              262144, 1048576),
          t=c(0, 1, 5, 17, 104, 440, 1780, 8940, 38000,
              198000, 930000, 4260000))
        > summary(lm(t ~ s + 0, data=df))
        ...
        Coefficients:
          Estimate Std. Error t value Pr(>|t|)
        s  4.02825    0.04161   96.82   <2e-16 ***
    
        Residual standard error: 45060 on 11 degrees of freedom
        Multiple R-squared:  0.9988,    Adjusted R-squared:  0.9987
        F-statistic:  9374 on 1 and 11 DF,  p-value: < 2.2e-16

~~~
cscheid
That's because the program is looking up every item. It's _right there in the
text_:

> Note that this time is measuring the time taken to look up every element in
> the collection, rather than just looking up a single element.

[http://www.lihaoyi.com/post/BenchmarkingScalaCollections.htm...](http://www.lihaoyi.com/post/BenchmarkingScalaCollections.html#lookup-
performance)

~~~
andrewla
Shame on me for not looking more closely!

So when you look at the per-operation cost (t/s in my model) then what emerges
is, indeed, logarithmic (r2 ~ .96 for lm(t/s ~ s + log(s)). Bounded, for sure,
but if you walk in with a vector-indexing-bound but array-size-independent
algorithm expecting that the performance for 1e3 elements will be
representative of the behavior for 1e6, you're in for a surprise.

------
clhodapp
log_32(x) is only a constant factor of 5 off of log_32(x). If we are going to
call log_32(collection_size) "effectively constant", we may as well call any
log-complexity operation "effectively constant". If you call the op
"logarithmic-complexity", the "effective-constant" property should be just as
clear to someone who knows what's up with time complexities and logarithms
plus you gain the benefit of being technically accurate. Actually calling
log_2 ops "logarithmic" and log_32 ops "effectively constant" crosses the
boundary into being actively misleading.

~~~
wtetzner
I think the point is more that the implementation is effectively constant
time, even if the algorithm isn't. In the Scala implementation, vectors have a
maximum size of 2^32. So we know that a lookup won't need to walk more than 6
levels deep, no matter what.

Hash tables are usually considered to have constant time lookup, but of course
in practice they have to deal with collisions. I don't think it's unreasonable
to say that Scala's vector lookups are "effectively" constant time, because
they are.

~~~
clhodapp
That's definitely a fair way to describe things (even if it is strictly less
precise than just saying "log_32") but if you're going to do that, you should
call all of the logarithmic-in-size operations "effectively constant". Having
a bounded max of e.g. 30 lookups isn't markedly different from having a
bounded max of 6. Basically... my claim is that it's misleading for the Scala
docs to draw the distinction where they've drawn it.

Edit: Ah! I messed up the first sentence of my original comment but it seems
to be too late to edit it! I meant to say "log_2(x) is only a constant factor
of 5 off of log_32(x)".

~~~
wtetzner
> Basically... my claim is that it's misleading for the Scala docs to draw the
> distinction where they've drawn it.

Yeah, I see where you're coming from.

I guess it would be constant time if they made sure lookups always took as
long as the worst case :)

------
marvinalone
Wow, you really showed that straw man what's what.

