Radix sort is technical O(k * n) where k is the number of digits. This is very useful when you know k falls within a bounded range (eg. sorting a bunch of integer keys, all of which range from 0-255), but it reduces to O(n log n) for arbitrary keys, because in general you need log n digits to represent n distinct items.
Whenever you talk big-O you have to be aware of what your primitive operations are. When talking about normal sorting algorithms we usually assume comparison is a primitive operation, and then we're measuring the number of comparisons. This is not actually the case for strings (and several other data types), but that cost is the same regardless of which comparison sort you use, and so it usually doesn't matter in your analysis.
With radix sort, you're usually considering using it precisely because K is likely to be significantly smaller than log N, and so it's absolutely relevant to the actual problem at hand.
(For that matter, multiplication is not constant time either - it's O(N) in the number of bits, which is O(log N) in the size of the values stored - but this is conveniently forgotten in most algorithm analysis. If you limit the problem to integers that fit into a machine word, then this factor drops out as a constant, and nobody cares.)
Regardless of what algorithm you're working with, you have to be aware of the limits of the abstraction you use to analyze it. Fibonacci heaps are O(1), but nobody uses them because the constant factors swamp other simpler algorithms with worse computational complexity. And sometimes it's faster to use a red-black tree (or even linear search over an array) than a hashmap because hashmaps are technically O(k) in key size; red-black trees are too, for comparisons, but in a sparse key space the processor usually only has to examine the first 1-2 characters before it can bail out of the comparison routine while the hashmap has to examine every character.
True enough. The idea for Big-O notation is really cost = O(whatever) * (algorithms constant difficulty factor) + (algorithms overhead). My point is if you start adding difficulty factors then the same terms often wind up in your other algorithms. Granted string comparisons are generally O(log k) and pure Radix would end up as O(k) but you can also short circuit a MSD Radix sort if the buckets are small enough which effectively drops things back to O(log k) assuming sparse inputs. (if it's not sparse your not going to be doing anything past a depth of about 4 anyway.)