It's still pretty useful for sorting data where you know the keys are small integers (say, less than a machine word).
So if you put a bound on the size of the keys, let's say 32bit, it becomes linear? Obviously it would be cheating to put a giant number here :)
If k ∈ Θ(log N), then O(Nk) becomes O(N log N), which is asymptotically no better than any optimal comparison based sort.
However, if k ∈ o(log N), then we get an asymptotically better algorithm.
A comparison isn't always O(k). It's O(k) for strings. It's O(1) for machine-word integers. It can be much more than O(k) for user-defined types, eg. it can be O(k^3) if you have to multiply a matrix out and take the determinant of the product.
So the fact is that quicksort is a factor O(log n) slower than radix sort. The flip side is that quicksort is more generally applicable to any comparison function whereas radix sort only works for lexicographic sorts. In almost all cases that is exactly what you want, but for example I'm not aware of any way to efficiently sort rational numbers with radix sort.
Number theory? Not seeing the connection.
The usual bound of Omega(n log n), proved using decision trees, is only applicable when your only operation is to compare two elements. Radix sort asks for more than this, and so it can assume a specific structure of its input, so it can violate the lower bound.
Depending on which operations you assume, sorting can become more or less easy. In the extreme case, if you can ask the array "Please sort yourself." as a basic operation, sorting is O(1). Radix sort assumes bitmasking as a basic operation, which falls into the "make things easier" spectrum, leading to an O(n) algorithm under the stated assumption of constant bit length (or any encoding, really, it doesn't really need to be bits).
For example, one might implement a variant of InsertionSort that stores a duplicate count for distinct elements, and then an insertion sort on e.g. 32-bit integers would require at most 4294967296 comparisons per insertion -- a constant factor that can be technically ignored in the complexity analysis. (I did warn you that the constant factor would become unwieldy!)
Note that this doesn't require values to be integers -- it suffices for them to be comparable with a lot of duplicates. The variant of InsertionSort described above would require O(k×n) comparisons where `n` is length of the input list and `k` is the number of distinct values, or O(n) whenever `k` is bounded by a constant (as required for radixsort to work in linear time).
That's not to say that radixsort doesn't outperform other sorting algorithms in practice -- it usually does. However, that isn't obvious from a strictly complexity theoretic point of view.