There is a (fairly) simple proof of that, however. You may well be aware of this already, but in case someone hasn't seen it before:
Each comparison of objects provides one bit of information.
So k comparisons provide k bits of information, or 2^k possibilities.
Therefore to distinguish between n options, we need at least log_2(n) comparisons.
There are n! ways of ordering a list of length n.
log(n!) ~ n * log (n)
So to sort a list of length n, we need at least O(n * log(n)) comparisons.
It certainly requires more maths than just counting 'for loops', but I find the explanation surprisingly elegant.
Information-theoretic lower bounds are neat. Here's an even simpler example. In the decision tree model of computation (where decision nodes have constant degree), consider the problem of finding the value corresponding to a key in a dictionary with n elements. Each leaf is labeled by a value, and in the general case where all values are distinct there are n leaves. So, the tree must have at least one path of length Omega(log n). Note that this is true even when the key space is a contiguous interval like [0..n], where it's just an array indexing problem. That's why the RAM model of computation is strictly more powerful than the decision tree model.
> log(n!) ~ n * log (n)
Random aside: while this is usually proven with Stirling's approximation, there is a much simpler proof. For the upper bound, n! <= n^n because 1 <= n, ..., n <= n. For the lower bound, n! >= (n/2)^(n/2) because you can drop the first n/2 terms without increasing the product and estimate the last n/2 terms as being >= n/2. That gives the double inequality (n/2)^(n/2) <= n! <= n^n. Take logs and you get (n/2) log(n/2) <= log(n!) <= n log(n). What makes this proof tick is that while (n/2)^(n/2) is a really shitty lower bound on n!, the shittiness evaporates once you take logs.