> but count_ones stays a bottleneck; I'm pretty sure you can't do that in O(1) even on a word RAM machine
It's actually easy to do logn bit operations in O(1) time on the kind of machine you are talking about too. Just make a table with the result of all (lg n)/2 bit inputs. Such a table only takes sqrt(n) memory and time to create, and since you seem to accept O(1) table lookups you now have count_ones in two lookups.
Similarly it's easy to make small tables that allow you to do arbitrary binary operations on (lg n)/4 bit strings.
> at least definitely not what most people have in mind when talking about computational complexity.
I'm pretty sure if you look in CLRS or any standard text book of algorithms, they allow lgn bit operations in constant time. Every heard people saying "Sorting takes O(n logn) time"? They are clearly assuming comparing two numbers can be done in constant time.
> Answer: Yes! And that’s a standard assumption! The first time you see this it may seem totally weird: you’re assuming the computer hardware size depends on the input size?! That seems to make no sense. But once you calm down, it’s actually quite logical and cool. Of course you want a single pointer to fit into a word. You should just think of w as a parameter of the model, and w ≥ log n as an assumption.
> Every heard people saying "Sorting takes O(n logn) time"? They are clearly assuming comparing two numbers can be done in constant time.
Where n is the number of items. Not the number of digits! And we don't normally talk about "time" when discussing sorting algorithms in the first place. We talk about number of comparisons (precisely because we have no clue how big the items are!).
> Just make a table with the result of all (lg n)/2 bit inputs
We're interested in the number of ones in a C-bit integer, where C is the size of the character set (and incidentally the maximum acceptable value of the window size W). Simply counting the ones bit by bit costs O(C) time. Making a lookup table for all (C/2)-bit numbers - as you seem to suggest - costs O(2^(C/2)) time and space. And the algorithm we wanted to improve runs in O(N * C) time.
Look, I'm trying to decipher what your point is, but you seem to be confusing the size of the input space with the size of the input. So AFAIC, this is as much attention as this algorithm deserves. It's a nice party trick, but nothing more.
> And we don't normally talk about "time" when discussing sorting algorithms
Sure you might be a purist and talk about comparisons, but I'm pretty sure if you Google it most people would talk about time. And "n logn" is exactly the amount of time it takes to sort n word-length integers in Word RAM.
> We're interested in the number of ones in a C-bit integer
You're not really engaging with the point that you can make a table in sqrt(n) time that allows counting the bits in logn bit strings in constant time. That means the final algorthm runs in O(NC/logN) time.
This is definitely not a party trick, as you would know if you've ever written a program using these methods. Now with AVX instructions it's more important than ever.
You also didn't engage with O'Donnells lecture notes. I can find you many more text books explaining why this is the right assumption to make, in case you don't like the reasons I've given above.
It's actually easy to do logn bit operations in O(1) time on the kind of machine you are talking about too. Just make a table with the result of all (lg n)/2 bit inputs. Such a table only takes sqrt(n) memory and time to create, and since you seem to accept O(1) table lookups you now have count_ones in two lookups.
Similarly it's easy to make small tables that allow you to do arbitrary binary operations on (lg n)/4 bit strings.
> at least definitely not what most people have in mind when talking about computational complexity.
I'm pretty sure if you look in CLRS or any standard text book of algorithms, they allow lgn bit operations in constant time. Every heard people saying "Sorting takes O(n logn) time"? They are clearly assuming comparing two numbers can be done in constant time.
Also look at any lecture notes from actual CS researchers, like this: http://www.cs.cmu.edu/~odonnell/toolkit13/Lecture05.pdf
> Doesn’t that imply that we need w ≥ log n?
> Answer: Yes! And that’s a standard assumption! The first time you see this it may seem totally weird: you’re assuming the computer hardware size depends on the input size?! That seems to make no sense. But once you calm down, it’s actually quite logical and cool. Of course you want a single pointer to fit into a word. You should just think of w as a parameter of the model, and w ≥ log n as an assumption.