What would it look like if we used PCA rather than UMAP? PCA is simpler than UMAP, so it's in some sense "less arbitrary". If the image is similar then we know we're seeing something about numbers rather than about our methods.
Also, I don't think that the (very legitimate) "dimensional" critique of PCA applies here. The units on the coordinates of the representation are the same: the presence or absence of that prime factor.
To the original question: my suspicion is that PCA might pull out the even numbers (first PC) and the divisible by 3 numbers (second PC), because these two factors may explain the most variability in the underlying vector representation. If it did, that would be pretty intuitive, although not as interesting.
Edited to add: Suspicion turned out to be true. For the first 2000 integers, the top 6 PCs turned out to correspond to the first 6 primes (2, 3, 5, 7, 11, 13).
Plot at: https://imgur.com/a/qi2Sx5u?
nums = zeros(nMax, nMax);
for k = 2:nMax,
nums(k,factor(k)) = 1; % vector representation of "k"
% 2:end because don't care about 1 as a "prime"
pcs = pca(nums(2:end,:), 'NumComponents', nPC);
[nums,pcs]=pca_prime(2000,10); % "svd" would work too
plot(pcs(:,1:6)); % first 6 PCs
floor(n / (p[i]*p[j])) / n - floor(n / p[i]) * floor(n/p[j]) / n^2
floor(n / p[i]) / n - ( floor(n / p[i]) / n )^2
Those are the same thing. Is your first case just a reader who has no idea what the SVD is?
What does this mean?
Prime factors of a number is the ultimate high-dimensional space.
Damn. The more I see UMAP, the more I think it is going to be a central and generic tool for high-dimensional analysis. I haven't taken the time to go in depth into it yet, though :/
So far, my understanding of it is: t-SNE on steroids
* t-SNE is great for local proximity, but it 'rips' high dimensional global structures too early. UMAP solves both scales by using transformations to map overlapping points of the different lower dimensional spaces that are locally relevant.
* It is faster than t-SNE, and has a better scale factor.
* t-SNE is about moving the points when UMAP is about finding the transformations that move the points.. which means:
a) it yields a model that you can use to create embeddings for unseen data. This means sharing your work by contributing to the public model zoos.
b) And you can also do supervised dimension reduction as you create your embedding. Ie You can judge if the shape looks good for unseen data (aka it generalizes well), and then correct the embedding by choosing which unseen instances to add to the training set. This means you control the cost of labeling data. You can see where your errors are, and back-propagate them to the collection process in a cost effective manner. For high dimensional data.
* You can choose your metric! Specific a distance function and you're good to go. Haversine for a great orange peeling, Levenshtein for visualizing word spelling (and maybe provide an embedding for ML-based spell checking?)
* You can choose the output space to be greater than 2 or 3, in order to stop the compression at a specified level.
I believe it will replace t-SNE in the long term.
Here is a great video of the author presenting his work:
t-SNE, at least the implementation I usually work with (from the R Rtsne package) happily accepts any distance matrix as input. I successfully used all kinds of distance measures with it.
1. A Riemannian manifold is constructed from the dataset.
2. The manifold is approximately mapped to an n-dimensional topological structure.
3. The reduced embedding is an (n - k)-dimensional projection equivalent to the initial topological structure, where k is the number of dimensions you'd like to reduce by.
I don't know how well that answers your question because it's difficult to simplify the math beyond that. But you can also check out the paper on arXiv. 
The underlying idea is to transform the data into a topological representation, analyze its structure, then find a much smaller (dimensionally speaking) topological structure which is either the same thing ("equivalent") or very close to it. You get most of the way there by thinking about how two things which look very different can be topologically the same based on their properties. A pretty accessible demonstration of that idea is the classical donut <-> coffee mug example on the Wikipedia page for homeomorphisms. 
In the strict sense two things which are equivalent share the same properties, yes. This is the topological generalization of algebraic homomorphisms and analytic bijections. See the example about coffee mugs and donuts both being topological tori.
That being said I can't really comment on the potential artifacting details of this specific algorithm. In theory the overarching idea makes sense because if you find structure preserving maps between sets of varying dimensions you should expect relations within the set to be preserved (i.e. the relational information in the smaller set is equivalent, there's just less of it). But practically speaking not all datasets can be usefully abstracted to a manifold in this way, which means that (efficiently) finding an equivalent lower dimensional projection for the embedding might involve a fair amount of approximation.
With enough approximation you'll introduce spurious artifacts. But that's precisely where all the innovation comes in - finding ways to efficiently find equivalent structures with the representative data in fewer dimensions. This isn't the first proposal for topological dimension reduction (see information geometry); the devil is really in the details here.
As in the root of the "What caused the Big Bang" or "but who created God?" questions. Stuff like 1+1=2 shouldn't need a root cause, and would give rise to patterns like these.
As a new batch of integers is introduced in going from frame N to frame N+1, the prime numbers within that batch would become new points in the 2d projection, because they have not been seen before.
Then, as you go from frame N+1 to frame N+2, etc., the new prime factors from frame N start to re-occur in those successive frames, and new points are added to the loop.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 1 0 2 0 1 0 3 0 1 0 2 0 1 0
There is topological structure https://math.stackexchange.com/questions/2879/mapping-natura...
http://ncatlab.org/nlab/show/arithmetic+topology as well as geometric (arithmetic geometry).
Question is how does the picture correspond to any of it. Glancing at the UMAP paper https://arxiv.org/abs/1802.03426 it looks like a proper invariant preserving embedding, so maybe loops in the image are real. Even if not, it does hint artistically at sorts of stuff that exist.
I enjoyed the plain markdown page though, very readable.
The site you linked does look much friendlier (still.. self host your js when you can!)