The points about LLVM being designed for "static C-like languages" isn't totally on-point. There as an impedance mismatch between Python and LLVM, but it's less about dynamic versus static than it is about the nature of the stack frame.
In LLVM, most optimizations operate on SSA values (the Value class in the LLVM IR). There is some support for CSE-ing loads, etc, but the effectiveness of that depends heavily on your alias information. So to get good optimization out of LLVM, you've got to represent things as Value's.
This is hard to do in Python. Python's lexical scoping semantics are a little bit wonky and there are lots of scenarios in which the thread's stack frame is reified as an object. So without some heavy analysis, you end up keeping your local variables in an activation frame object on the heap, and at that point you're toast. The local variables themselves won't be represented as SSA Value's, and most of the LLVM optimizations won't do anything with them.
This is not a "dynamic language" thing per se. Lisp, which is also a dynamic language, actually maps quite cleanly to LLVM. Every binding introduced by LET is already in SSA form unless it is either closed-over, assigned-to, or both.
1) If the value is just closed-over, you demote it to the function's environment and replace uses of the variable with a load from the environment vector.
2) If the value is just assigned-to, you just demote it to a stack slot via ALLOCA and LLVM's mem2reg pass will take care of re-promoting it to a Value. This latter technique is exactly what Clang does for all C local variables, so LLVM is highly-tuned for handling this scenario. In C, variables that have their address taken cannot be promoted, but this cannot happen in Lisp so every assigned-to value demoted to an ALLOCA should be promotable.
3) If a value is both assigned-to and closed over, you demote it to a heap-allocated box and replace all uses with heap references.
After this transformation, nearly every Lisp variable is an SSA Value, and the optimizers can work with them. Even if you use function calls to do generic arithmetic, etc, LLVM will happily CSE them for you as long as you mark those functions as readnone (ie: pure).
Now, there are some things LLVM won't do for you. It can't const-propagate generic arithmetic because it does't know the semantics. It can't do reassociation, etc, because you're not using the ADD/SUB, etc instructions. I don't see anything that would prevent you from doing it yourself in a custom pass, however.
In short, the criticism isn't so much that LLVM has an impedance mismatch with dynamic languages as it is that it only handles the bottom of the optimization stack. You still need to do the high-level language-specific optimizations before handing things to LLVM.
I think the deeper problem is not that stuff like arithmetic is done by function calls, but that it is done by method calls. This requires points-to-analysis and a lowering to function calls first. Since this analysis will be very conservative for performance reasons, you want to steal more tricks. Tracing seems to be en vogue today.
Global type inference (which is what you'd be doing if you used a points-to analysis to figure out what methods would be called) is a fragile optimization.
Runtime type feedback (observing the actual type at call sites and doing optimistic compilation based on that) is more robust and predictable (your points-to analysis doesn't suddenly fail just because there is a single polymorphic use somewhere in the code). That in itself is orthogonal to tracing. Runtime type feedback was developed on whole-method compilers, after all. Tracing is neat, and it's in vogue because it gets you to a "good" optimizer with relatively little development cost, but you can do all the things tracing does in a whole-method compiler with more engineering work. On-stack replacement (necessary for doing optimistic compilation) is arguably easier in a tracing compiler, and is definitely the major thing missing in LLVM for dynamic language implementations.
But all of this is one step beyond the threshold problem. Even all these tricks don't help you if you can't reify your local variables as SSA values --- all that fancy LLVM machinery won't see anything to optimize.
There are various aspects of Python's local variable semantics that seem to harken to days when stack frames were implemented with dictionaries (eg: del). There may be a way to map this cleanly to register variables but it's not clear to me how you'd do this.
Beyond that there are several places where you can introspect the activation. See: locals(), some of the fields in the code object, etc. Again, there may be implementation techniques you can use to get around this, but there isn't an obvious and clean mapping to SSA values as there is for Lisp-like languages.