erlang now has a run queue per core as opposed to a single run queue, which will improve things, and they have also improved message passing characteristics and ets table locking optimisations.
the "problems" with >16 (and even >2) cores have not particularly been with the erlang runtime, quite often its an inherent problem with the application or at least the way its been written