UMAP - topology manifold learning based method
Ivis - simese triplet network autoencoder
Both of them will blow PCA out of the water on basically all datasets. PCAs only advantages are speed and interpretability (easy to see explained covariance)
Ignoring the eigen aspect would miss a lot of both theory and practice.
Um, these seem to be perhaps the 2 most important advantages something could have? What are advantages that UMAP and Ivis have that are more important than speed and interpretability?
When you want to perform some sort of clustering on top of the dimensionality reduction its a very useful property.
Note that is it very common to get rid of 90% of the dimensionality with a PCA and then to apply something like UMAP on top to improve speed (and because PCA can be trusted to keep the relevant information for later postprocessig which cannot be said of other techniques).
I also think PCA still rules in a lot of cases like when people try to create a new index representative of a bunch of numeric variables
When have you needed something stronger than PCA? Anybody have good stories?
You can see some of the code here on Github, although a lot of it depends on having some audio to test among other closed source projects (sorry I have no control over that).
 - https://en.wikipedia.org/wiki/Mel-frequency_cepstrum
 - http://recherche.ircam.fr/anasyn/peeters/ARTICLES/Herrera_19...
 - https://scikit-learn.org/stable/modules/generated/sklearn.ma...
 - https://scikit-learn.org/stable/modules/generated/sklearn.ma...
 - https://umap-learn.readthedocs.io/en/latest/
Does the work you're doing give you that ability to compare the outcomes of various reduction schemes to compare them?
The work I am doing now has the ability for me to compare different plots as well as to experiment with another layer of processing that runs clustering on the output of the dimensionality reduction. I then manually play through the clusters to see how well the groupings seem to me, in terms of how 'close' the audio samples are together and how well a cluster is differentiated from another.
I hope to have some documentation on my project soon as its going to be part of my dissertation in music technology.
Also, to answer your question about guitar pedals - Stefano Fasciani has done some work on finding smoother parameter combinations across chaotic synthesisers.
That is perhaps up your alley!
Also that TSAM vid is exactly what i had in mind, thanks!
You need to save your degree of freedom. You want 10 observations to 20 observations per predictor. So you can use PCA to collapse a subset of predictors and keep the predictors you are inferring. This will help the sensitivity of your test.
Another thing is when you do linear regression or any modeling where it multicolinearity is a problem. This problem is where predictors are confounder or affect each other. PCA change basis so that new predictors are orthogonal to each other getting rid of multicolinearity problem.
A toy example is:
student gpa, sat score, math grade, height, hour of study
Where student GPA is the response or what you want to predict.
If you apply PCA to sat score (x1), math grade(x2), height(x3), and hour of study (x4) then it'll give you new predictors that is a linear combination of those predictors. Some statistic book will refer this to a sort of regression.
Anyway you may get new variables as:
new_x1 = 0.4sat score + 1.2math grade
new_x2 = 0.1* height + 0.5* hour of study
These new predictors are orthogonal to each other so they don't suffer from multicolinearity. You can now do linear regression using these predictors.
The problem is explanation, something you get grouping like height + hour of study.
Actually just look here for example: https://www.whitman.edu/Documents/Academics/Mathematics/2017...
Under "6.4 Example: Principal Component Analysis and Linear Regression"
Amazingly little code too. Numpy and Scypi are awesome :-)
Satellite data is collected using sensors that are multispecteral/hyperspectral (for example LandSat has 11 bands, but sometimes there are over 100) but this can be cumbersome to work with. PCA can be applied to the data so that you have a smaller subset that contains most of the original information that makes further processing faster/easier
What PCA does (to reduce this large number of dimensions) is hang this data on new set of dimensions by letting the data itself indicate them. PCA starts off by choosing its first axis based on the direction of the highest degree of variance. The second axis is then chosen by looking perpendicular (orthogonal) to the first and finding the highest variance here. Basically, you continue until you've captured a majority of the variance, which should be feasible within a lower number of dimensions than that which you started. Mathematically, these features are found via eigenvectors of the covariance matrix.
My teacher wanted to buy a car and he needed help on choosing; he wanted a "good" deal, and applied PCA to all models of car for sales:
His real question was:
* what are the most important variables that makes up car's PRICE? or, said in another way
* if I have to compare two cars that have the same price: with which car I get the best out of my money?
The answer was pretty surprising:
the most important variable is WEIGHT
So, while you choose a car, always check for its weight! do two car have the same price? take the heavier one :)
(this results relates to the 90s, are they still valid now? not sure: we need PCA)
I've been using this result since then, applying it in different context (which is, of course, not correct): when I am in doubt on which product to choose I always choose the heavier one. I would not use this 'method' to buy a speed bicycle, ...or to choose the best girl ;)
Then, you even have somebody stating that we are using this method even without this explanation https://www.securityinfowatch.com/integrators/article/122343...
Can't comment on longevity, I went for realism over real-time not long after.
We had 200,000 dimensions (ACGT's), which we reduced into 2 via PCA and sure enough if someone said they were "Filipino" then they generally appeared close to the other folks who said they were "Filipino".
https://breckuh.github.io/eopegwas/src/main.nb.html (chart titled: QC: PCA of SNPs shows clustering by reported ethnicity, as expected)
1) Reduce the dimensionality of your data, then perform the inverse transform. This will project your data onto a subset of the original space.
2) Measure the distance between the original data and this 'autoencoded' data. This measures the distance from the data to that particular subspace. Data which is 'described better' by the transform will be closer to the subspace and is more 'typical' of the data and its associated underlying generative process. Conversely, the data which is far away is atypical and can be considered an outlier or anomalous.
Precisely which dimensionality reduction technique (PCA, neural networks, etc.) is chosen depends on which assumptions you wish to encode into the model. The vanilla technique for anomaly/outlier detection using neural networks relies on this idea, but encodes almost zero assumptions beyond smoothness in the reduction operation and its inverse.
An intro example: principal component regression -- simplifies the inputs into a regression technique (can be used with other ML)
General algorithm is called Singular Value Decomposition, and can be used in lossy compression and other similar simplications.
I find this visualization really helpful when I need to explain PCA:
(Some of hardest parts of ML imho is more in selection)
PCA is very practical for dimensional reduction, ICA for blind source separation.
You wouldn't usually use ICA for dimensional reduction unless you have a known contribution you want to get rid of, but for some reason have difficultly identifying it.