One of my favorite things about ML is using it to reduce human labor hours on tasks like these.
Furthermore, if you have a decent physical model and/or some constraints (e.g. lots of industrial QA, cell counting, etc.) with a fixed FOV etc., you can do quite well with classic approaches and that can be quite robust. Some of the deep models you see performing well on e.g. small curated sets for conferences just don't generalize well at all, which isn't surprising given the setup.
I would also be interested in any alternative approach that even comes to the ballpark of DL performance in pose estimation.