Yes, you're missing something. Radix sort, like most sorting algorithms, requires storing the intermediate results in memory. This problem seems impossible at first blush because storing 1 million 4-byte integers would seem to require 4M of RAM, and only 2M is available.
But if I pick my buckets right, I can store only the lsb in each bucket.
Or more efficient, dynamically construct an implicit trie/heap type thing, where each leaf is the count of times that int appears (and the int itself is encoded in the location)
You're sort of on the right track, but of course you have to encode those counts very carefully because 1M counts will exceed 2MB of memory unless you work hard.
Right, you can't store them as ints. 1 million bits is the max you need, but like this whole thing is just super tricky because you can't just address things normally anywhere, since that's too big.
For (I=0; I<1<<31; I++) For (J=0; J<1e6; J++) If (input[J]==I) print(I);
Shoddy runtime, but hey...