The loudness war isn't happening because of "crappy earbuds", the earbuds included with smartphones have been rather good for a long time now. The ones that came with my Samsung S8 were designed partially by AKG (Samsung owns the Harman Group, including AKG) and are really damn good. Apple's included earbuds are also very good now, a far cry from the original iPod earbuds, which were decidedly mediocre.
The real issue is radio and Youtube/streaming services from before they implemented loudness targets, and it's been going on since the 50s at least, just listen to some old singles from back then, they're mastered as loud as they possibly could, with the technology of the day. The objective has always been to make your song sound louder than the next song, because louder music sounds more impressive to a casual listener, it's simply more attention-grabbing.
In the beginning of the digital era, there was actually some hope that better dynamics would happen. In the guidelines for Sony's earliest digital recording equipment, the recommendation was to target an average level of -20dBFS, to use very little or no compression, and "let peaks fall where they may". Just imagine that, 20dB headroom!
In the worst days of the loudness war (~early 2000s) lot of music was mastered with barely 3-4dB of dynamic range, with peaks banging hard against 0dBFS. I have some CDs from that era, and they clip and distort like crazy, because everything was just pushed to 11, to be as loud as possible. "Californication" by Red Hot Chili Peppers is an excellent example, it's absolutely horrid.
Since then, two major things have happened to improve sound quality somewhat. Firstly the compression devices and plugins have improved massively, modern sidechain compression is really impressive, entire genres like EDM/dubstep simply wouldn't exist if not for the improvements in compression tech. Secondly, all of the streaming services use volume normalization now, with a set average sound level. Songs can peak over this average value, but the average must be in line with the target. This also results in brickwalled "turn everything to 11" tracks sound a lot quieter, because they have no peaks to use the additional dynamic range available.