
The Data Science of K-Pop: Understanding BTS Through Data and A.I - soylentsucks
https://towardsdatascience.com/the-data-science-of-k-pop-understanding-bts-through-data-and-a-i-part-1-50783b198ac2?fbclid=IwAR230y-ppWZGKc1B7iHUCbaqEwTkgNTOXFaeHav0hL7l0te9GdewPUFuqjY
======
plouffy
I didn't know about SHAP graphs that's pretty cool. I'd love to be corrected
but it just seems like an ensemble model of LightGBM, Gradient Boosting and
Random Forest with 12 features is just overkill. Why ensemble and not just 1
of the three models ? Much less likely to overfit. Looking at the average
values BTS their speechiness values are so high compared to others I'm not
sure whether those other features are adding anything, seeing
featureImportance would be interesting (difficult to infer the relative
importance of one feature to another since all we know is the ranking but not
the values).

~~~
soylentsucks
Yea the shape is pretty cool. Hm, I don't think the author used the three
models in succession. I think he just tried out different models to measure
the results. Yea feature importance would've been cool too. I think the SHAP
value does list the order of importance on the y-axis. Wonder if there's a way
to extract more signal somehow?

------
soylentsucks
Thought this was pretty cool, exploring BTS and Kpop through data science.
Anyone's thoughts about it? Interested in what the author would talk about in
the lyrics portion.

