> and the simplistic categorization of humans into ~5 races corresponds well to ...

nhchris · on March 2, 2023

> it exaggerates the density and distinctness of clusters

It does the opposite - the PCA graph shown is 2D, meaning points in the N-dimensional space are projected onto a single plane. In this projection, information is lost - i.e. point-clouds that are distinct in 3D may overlap in 2D. While we are shown the axes along which the data is most distinct, the true clusters are even more distinct, as every added dimension would contribute additional 'distinctness'.

rafram · on March 2, 2023

No, sorry, you’re just wrong here. You can run PCA with pretty much any set of genetic samples and get a very nicely clustered graph, because that’s what PCA is for. See this PCA graph [0] on population groups present in Brazil. Should Xavante people be the sixth race in your taxonomy? They sure look distinct from other Amerindians and very distinct from the rest of the world!

(No, of course they shouldn’t. It’s just that extracting features to artificially cluster points along the most characteristic axes is what PCA is for.)

[0]: https://commons.wikimedia.org/wiki/File:3D_PCA_plot_of_Xavan...

nhchris · on March 2, 2023

> No, sorry, you’re just wrong here.

I believe I described PCA accurately, so could you elaborate which part of my statement is wrong?

> extracting features to artificially cluster points along the most characteristic axes is what PCA is for

How is the clustering "artificial"? Because if you generate data without clusters (e.g. points evenly distributed within a sphere), applying PCA to it won't show clusters either.