The method in the paper seems different than what was reported. They used a last...

The method in the paper seems different than what was reported. They used a last.fm dataset of two billion listens across 50 million songs. They enriched the data and restricted the study to 1990-2020 to create a balanced dataset of the 350k songs for training their stat tools. Then they analyzed 2,400 songs per genre with that, 12,000 total. I didn't see "top 40" mentioned in the paper.