As always, the answer is “it depends”. If you are getting too many cache misses, and are memory bound, adding more threads will not help you much. If you have idling processor backends, with FP integer or memory units sitting there doing nothing, adding more threads might extract more performance from the part.