Here's a whole post on the issue by an actual hardware designer:
To see the limitations of hardware support, let's first look at what hardware can do to speed things up. Roughly, you can really only do two things:
1. Specialization - save dispatching costs in speed and energy.
2. Parallelization - save time, but not energy, by throwing more hardware at the job.