Hacker News new | past | comments | ask | show | jobs | submit login
Counterintuitive Properties of High Dimensional Space (2018) (marckhoury.github.io)
230 points by behnamoh on May 20, 2023 | hide | past | favorite | 60 comments



Excellent article. I especially liked the explanation of Kissing Numbers and was surprised to learn how few dimensions we've found that have them defined. Additionally,

I think sometimes the people in some contexts that shake their fists at high-dimensional data forget that having lots of dimensions is not always inherently a curse.

In my perspective, calling it the Curse of Dimensionality in an neural network context is to me sort of like calling it the Curse of the Storage Shed.

Oh no, all of this gestures hands mathematical space in this place to put my stuff orthogonally, whatever shall I do. Oh, woe is me. That's not the Borsuk–Ulam theorem, hiding over there, that's just a fake shrub! I should never take a look at that. Oh, woe is me indeed. (extremely thickly-laid facetious remark concluded)


Well it's definitely a curse for anyone that wants their embeddings/vectors to be meaningful outside of the most nearest of neighbors.


Could you give an example of an embedding space that is only meaningful locally?

The word vector idea of modifying a word by some operation e.g King->Queen ~= man->woman ~= uncle->aunt Which intuitively seems non-local to me.


Toss that intuition out. That's in some sense the major problem with higher dimensional spaces; our intuition about locality is built in a super-small-dimensional space.

Compare the locality of King, man, and uncle to King, lemma, and titrate. The former set is quite local.


The parent commenter in this case is referencing the tendency for high dimensional spaces to very quickly go "off the rails" in that, if I understand correctly, most smooth spaces are locally flat, but as you get further and further away, the curvature can really screw with things.

Even with that fixed, Euclidean and otherwise measures do not translate intuitively to high dimensions outside of much more local spaces.


I'm unfortunately less-experienced in embeddings than I'd want to be for this discussion.

Something I do find interesting is that lots of people use a scaled cosine distance for vectors (the dot product) that is much more tolerant to high dimensions during training, and then a Euclidean distance metric which is not as friendly towards high dimensions during inference.

I wish I had more classical statistical experience to dive into the nittier and grittier of the details here with appropriate aplomb.


I thought that they use the dot product during training[0] and then the cosine similarity during inference.

Sometimes the vectors are pre-normalized for inference (thus reducing cosine sim. to just the dot product), is this what you're thinking of perhaps?

[0]. Not at my desktop, but I read this in the Stanford NLP course in the past two weeks, specifically the section on the skip gram algorithm (word 2 vec)


That's why I said the scaled cosine distance, the dot product between two vectors is a scaled version of the cosine distance.

This is practically every neural network layer that uses a MAD (multiply-add) computational motif, including 3x3s.

Surprisingly few people know this.

I believe you might have the reduction order a bit backwards -- the dot product reduces to cosine similarity and not the other way around as cosine similarity is the normalized dot product.


You're correct! It's dot product reduces to cosine sim, not the other way around.

Could you clarify your second statement about Mad and 3x3s? Afraid I don't follow.


Ah, of course. Sure. I can do that.

Any operation in neural networks that multiplies two vectors together and then collects them with a single addition operation is a dot product.

So 3x3 convs are technically a dot product, even though they have a spatial dimension. Same for the initial part of most transformers' attention layers, MLP layers, etc....

The overlap can be a bit tricky to deal with in terms of implications, but I've found it a very helpful formulation for squeezing out some performance boosts in previous implementations of neural networks that I've worked on.

Hope this helps, feel free to let me know if you have any more questions/thoughts/etc, love! <3 :)) :D :fireworkds:


Interesting!

I'm working through implementing each of these w/ only numpy, so the "it's just a dot product" should come in handy:)

Case in point I just finished implementing the skip gram algorithm... In possibly the least efficient way you could imagine xD


if your vectors are normalized, dot and cosine are the same thing.


The person you're replying to also said that.


Great article.

A corollary of one of the facts listed there is that the ratio between the volumes of unit sphere and of the unit cube converges to 0 with d. Meaning, in high dimensions, the unit cube is all corners and no interior.

Another one is to consider the n-dimensional standard Gaussian distribution. In dimensions 1, 2 and 3 these look like a fuzzy ball around the origin. But in high dimensions, it is more like a thin shell of radius sqrt(n). You can see this as the (squared) length of a random vector from that distribution is from the Chi^2 distribution which becomes more and more concentrated in higher dimensions.


> in high dimensions, the unit cube is all corners and no interior.

Wouldn't it be rather "all faces" ?

As you say, it is useful to think about these volumes in terms of probabilities. You can pick a random point in a one-million dimensional cube by throwing one million random numbers uniformly between -1 and 1. Being near a corner means that each one of these coordinates is very close to either 1 or -1, which is extremely unlikely. Being near a face means that just one of the coordinates is near 1 or -1, which is almost sure to happen! Thus, essentially the whole volume of the cube is near its boundary.

By a similar reasoning, you can see that the volume of the sphere is negligible with respect to that of the cube. Consider your random point inside the cube, it will fall inside the sphere if x_1^2 + ... x_n^2 < 1, which is extremely unlikely if n is large: you are summing a million numbers between 0 and 1, how likely is it that the sum is smaller than 1? Very unlikely: if just one of these numbers was too close to 1, all the others must be nearly zero.


Well depends what you mean. It's a nice way of looking at the "all faces" bit.

My point is that if you instead define the "interior" as the inscribed sphere, that becomes vanishingly small.


> In dimensions 1, 2 and 3 these look like a fuzzy ball around the origin. But in high dimensions, it is more like a thin shell of radius sqrt(n).

But it’s also still like a ball. The “shell” thing is not something particular the Gaussian: the same happens for a hard ball. As the dimension increases the share of the mass of the ball close to its surface goes up.


That's true for any ball, even in 2D - most volume is by the edge. But a Gaussian in low dimensions looks "closer" to 0.


Sure, in a 2D ball most volume is by the edge and most of a 2D Gaussian is around radious sqrt(2).

In either case the “concentration” gets more and more important as the number of dimensions goes up.

(Concentration in quotes because for the Gaussian the typical density goes down.)


In the article the unit cubes have side length 1. So isn't the volume always 1, whatever the dimension?


Yes, the volume of the unit cube is 1 in any dimension, but the volume of the unit sphere goes to 0 as the dimension increases.


I believe thats because volume is a measurement of the ratio between "amount of stuff" in the hyper-object vs the amount of stuff in a hyper-cube

The amount of stuff increases with higher dimensions in a sphere, but not as fast as with a cube, hence the ratio between the two converge to zero


Another oddity: in 4D or higher, you cannot tie a knot - the rope can always become untied by shifting in a higher dimension and bypassing the obstruction.

You also can't tie a knot in 2D at all because you can't even cross the rope over itself.

So sailing in other-dimensioned universes will be very different!


Sailing is about tying knots around things. In 4D you can still tie a rope around a 2d plane, like we tie ropes around sticks.

I presume you can similarly tie knots in 'planes' with perhaps an even richer set of 'knots'.


Well, at least it would explain "sheet bend": the use of the word "sheet" to mean (some) rope must be an import from 4D seafaring!


You probably know this, but for the record: https://en.wikipedia.org/wiki/Sheet_(sailing)


You can tie a 2-knot (a sphere) in 4D.

And in general, an n-sphere knots in n+2 dimensions. You even get new kinds of knot moves in higher dimensions.


You can tie planes in a 4d knot


A good chunk of this comes directly from Richard Hamming's incredible The Art of doing Science and Engineering (a video of the specific lecture on n-dimensional spaces can be found here[0]) and yet I see no mention of this talk anywhere in the article, which is unfortunate.

I highly recommend checking out Hamming's lectures if you find this enjoyable.

0. https://www.youtube.com/watch?v=uU_Q2a0S0zI


Something also happens to regular polytopes. In 2D there are infinitely many regular polygons, in 3D there are 5 regular polyhedra, in 4D: 6, but for >=5D there are only 3.

[1] https://en.wikipedia.org/wiki/Regular_polytope


"At 7 dimensions or higher, things start to acquire a strong smell of vanilla..." was literally what my college professor said when trying to explain n-dimensional space to us.


Except with so many dimensions the vanilla-smelling molecules would have a difficult time diffusing into your nose.


“High dimension cubes are qualitatively more like hedgehogs than building blocks.”

https://plus.maths.org/content/round-peg-square-hole-or-squa...


If you lived in a 1000 dimensional space, would you actually be able to figure out the number of dimensions?

You could try counting how many sticks you could place at right angles to each other, but the problem is that you can arrange 1001 vectors so that they're all very nearly at right angles.


Don't pick sticks. Pick sides. Sides of a box. Every time you pick a direction with a stick, also set up two hyperplanes spaced appart. Effectively trying to build a hypercube.

Once you have 1000 directions chosen (i.e. 2000 sides) you will have fully enclosed a volume. You probably won't have a perfect cube, but this trick works as long as the set of vectors you picked 'are in general position' which has probability 1.


These beings can do experiments that get better and better to finally find that the number of dimensions can’t be more than 1000 but also not less.


I don't know, us poor three-dimensional beings figured out that the continuous functions from R to itself form an infinite-dimensional vector space.

https://math.stackexchange.com/questions/466707/what-are-som...


I’m a little bit interested in intuitive properties that high dimensional space retains as well, just to get a handle on how weird these shapes get. Like, you can inscribe a circle inside a square and have it touch all sides without extending outside the square, and you can similarly “inscribe” a sphere inside a cube such that the sphere touches all sides of the cube without extending outside the cube. Does this hold true in higher dimensions? The picture near the end of the article showing hyper-cubes with jutting out vertices and hyper-spheres being pushed inwards sort of indicates maybe they do, but obviously that isn’t an actual mathematical answer.


You can and as the dimensions go up the volume of the inscribed ball as a percentage of the volume of the hypercube shrinks down a lot. There's a nice video about this from Numberphile and 3blue1brown - https://www.youtube.com/watch?v=6_yU9eJ0NxA


I. Hypersphere: length(v) <= 1

II. Hypercube: abs(max(v)) <= 1

Clearly II implies I, therefore the hypersphere is inside the hypercube.

The points where they touch are: length(v) = abs(max(v)) = 1, the solutions of which are the points with exactly one ±1 coordinate.


> Like, you can inscribe a circle inside a square and have it touch all sides without extending outside the square, and you can similarly “inscribe” a sphere inside a cube such that the sphere touches all sides of the cube without extending outside the cube. Does this hold true in higher dimensions?

Of course, points with only one nonzero coordinates xi in $\{1,-1}$ are in both the sphere and the sides of the cube, and those are the minimal distance points from the sides of the cube to the center of the circle (since they are orthogonal to the hyperplane xi=+1, or xi=-1 that contains it). That also shows that no side of the cube extends outside of the circle. Some more math: A well known result in math is that the minimum distance from the origin O to a linear subspace of R^n is attained at points P such that OP is orthogonal to the direction of the subspace. In this case, since the linear space is a hyperplane there is just one orthogonal direction to that hyperplane, and there is just one point in the intersection of the hyperplane and the line defined by the point O and the orthogonal direction to the hyperplane, and that intersection is just the points we alluded before.


It does. For an n-dimensional cube of size 2, all surface points are at least distance 1 away from the center of the cube (the surfaces are where one of the dimensions is exactly 1 from the center, and the remaining dimensions can only increase that distance), and so a unit sphere around the center point is always contained within the cube.


What does volume mean in higher dimensions? 2-D has no volume. 3D has a volume. What's volume in 4D and higher?

Edit: Just to further the thought. Do all dimensions have the same unit length? In 3D, the spacial x,y,z dimensions have the same unit length. In 4D like 3D+Time, does the t-dimension have the same unit length as the spacial dimensions? What is 1-unit of t meant vs 1-unit of x? What does it mean to say a sphere in 4D with radius r having the same "distance" from the center in all 4 dimensions? What unit length is r in?


A hyper cube with sidelength 1 has a 'volume' of 1. Then you sorta just define 'volume' (or the proper term 'hypervolume' based on how many hyper cubes you can fit. If the fit isn't perfect, half the size of your cubes, and try again. Whatever limit this system approaches is the true volume. (Edge cases are fractals, and they are already complicated in 3 dimensions)

Edit: each dimension has the same unit length, by definition of that dimension. At least, in 'flat' space. Distance is just done by Pythagoras. This is actually meaningful because Distance is unique in being preserved by rotations. And rotations are special because they form a subgroup of all linear transformations.

Hence the choice to use Pythagoras for Distance is not arbitrary, it is very natural. Because rotations are very natural.


> Do all dimensions have the same unit length?

The units of length are whichever you choose them.

You are free to choose different units of length for each of the dimensions of a multi-dimensional space, but that does not bring any benefit and it introduces many constant factors corresponding to the ratios between the different units of length in all mathematical formulae.

Therefore everybody chooses to have the same unit of length for all dimensions, in order to have the simplest formulae.

Of course, this is true for the multidimensional spaces in which a scalar product of the vectors is defined. When there is no scalar product, the units of length for the different dimensions are independent. There is no way to compare them and the question whether they are the same or different is meaningless. In such spaces, angles, areas and volumes are also undefined, only the affine geometry is applicable.


You need to distinguish between physics and mathematics. The explainer is about math. Here, 2D absolutely has volume (i.e. area). Also, higher dimensionality spaces are the same: each dimension uses the same units, therefore r is always the same. Thinking in "time" dimension doesn't help here imho.


> What does volume mean in higher dimensions?

I think you need a scalar product to define volume. But fuzzy thinking an analogy: Think friendship, define three concepts related to friendship and that they are orthogonal, now establish some numeric scale to measure each concept, then you have just defined a volume to measure the degree of friendship. Unfortunately there is no canonical way to establish those orthogonal features, but they could be obtained applying a linear model to big data sets of measures of friendship.


2D volume is area.


> The volume of the unit -sphere goes to 0 as grows!

But, that's equivocating on the word volume. Consider that 2D "volume" is just area. The area of a unit circle is pi, and the volume of a unit sphere is 3/4 pi. So the latter is numerically smaller; but it's volume and not area! Cubic meters are vastly bigger than square meters. Each higher dimensional volume is vastly bigger than the previous.


It is just an artifact of the Euclidean distance. If one tries a different metric, the story would be different. It should not be that surprising.


If you want to work in a Hilbert space (i.e. have angles) then you are stuck with the euclidean metric up to scaling of the axes.


Indeed … by definition … artifact of the choice then :-)


Do you know of any other metrics that result in more intuitive properties?


Consider the metric where the distance between <a1, a2, …> and <b1, b2, …> is

max(|a1-b1|, |a2-b2|, …)

In the this metric, no point of a unit n-cube is more than one unit from any other point.


I don't think L Infinity norm is an intuitive distance metric at all.


I think here is something that helps me understand intuitively how volume goes to zero in an n-sphere

I think the counterintuitive part for me was I kinda misunderstood what volume really means

So lets think instead in terms of how hyperdimensional volume relates to a basic concept of the "amount of stuff" contained within an n-dimensional object. The "amount of stuff" in a cube increases as you go from a 2d hypercube (i.e. a square), then to a 3d hypercube, and so on to higher dimensions, even though measurement of the volume (volume for 3d, area for 2d, etc...) stays the same

Another way to think about this "amount of stuff" is to imagine some kind of infinite dimensional "fluid" with extremely small particles that covers a square with one layer of particles. Thats going to have X**2 number of particles right? Where X is the approximate number of particles along each dimension. Now if it covers a cube, that will then have X**3 number of particles. So the number of particles of that fluid thats covering an n-cube increases exponentially as you increase dimensions of the n-cube

Similar thing happens with the number of particles blanketing first a circle, then a sphere, then to higher dimensions, the number of particles increases, but not as fast as the unit volume. I think this can be more easily seen where the diameter is 1, rather than the radius being 1, and comparing that to the unit cube

So the volume is really kinda like the ratio between the number of those hyper particles in the object vs the number of those hyper particles in the sphere

And this ratio converges to zero at higher dimensions, even though the amount of stuff increases (Perhaps to infinity as well? I suspect so but not entirely sure on that one. That would take a mathematical proof and I'm not quite cut out for that right now)


Not sure I understand what you’re saying. It sounds like the limit of the ratio of the surface area to volume is 0. But how is that the same thing as volume? Eg you don’t use this definition of volume for a 3-sphere.


So by surface area of the square, thats the "volume" of a 2d hypercube

Then the "volume" of a 3d hypercube is the same, but has "more stuff" in it

i.e. not talking about ratio of surface area to volume in an object, but that volume stays the same but amount of stuff increases as dimensions increases


For me the n-sphere problem makes intuitive sense if you think of it as a subset of the volume of a unit n-cube:

No matter what n is, a unit n-cube has volume 1^n = 1. So the problem is equivalent to: what proportion of that 1 is inside the n-sphere? And then think about how the difference between a unit cube with a unit cylinder vs. a unit sphere in it: the cylinder takes the same proportion as a circle from a square; the sphere takes a subset of that because instead of just pushing the circle through the cube, you rotate it around a point.

A rigorous proof would show that the series in fact trends to zero, and not just monotonically smaller values, but for intuitive purposes, that's good enough for me.


I'm pretty good with math but this article is making absolutely no sense at all to me. I'd like to focus mainly on just the volume-of-a-sphere and corner-of-cubes aspects if someone here can clarify?

Because the idea is that the "volume of a d-sphere goes to 0 as d grows". The chart as provided seems nonsensical -- because the unit of volume is different in every dimension. What does it even mean to say to compare the area of a circle with the volume of a sphere? Because they don't have the same units.

The best I can guess is that it's attempting to plot the volume of a d-sphere of radius r as a proportion of the volume of a d-cube it fits inside of with edge length 2r, so it is now unitless. Is that correct?

So in that case, a circle with radius 1 occupies 79% of its square, and a sphere occupies 52% of its cube, and a 3-sphere occupies 31% of its hypercube. All of which is fairly intuitive, that the sphere takes up proportionally less space of a hypercube the more dimensions you add. This is what you'd expect, because we already see it from 2 to 3 dimensions. And the intuition here is easy, because adding each dimension is an "extrude-and-slice" operation -- extrude the current shape along the new dimension (which preserves proportion), then slice away with a freely spinning circle-cutter (which decreases proportion).

But for some reason the chart shows sphere volume increasing up to 6 dimensions before it decreases, which I don't understand at all.

And then it attempts to show this "graphically" by drawing a kind of concave rhombus figure, which is even more mystifying, because every point on a hyphersphere is still the same distance from the center. There are no "corners"!

And that figure is next to a drawing of a hypercube that looks more like a star with the explanation "Notice that the corners of the cube are much further away from the center than are the sides." But this is also given zero explanation whatsoever, and a hypercube is the easiest thing in the world to visualize in higher dimensions because a unit hypercube simply stretches from 0 to 1 in every dimension.

I fail to see how the corners are "much farther away". According to what measure? Because the distance from the center of a unit square to its corner is .707 units. The distance from the center of a unit cube to its corner is .866. The distance in 20 dimensions is 2.236, and in 100 dimensions is 5.0. So yeah the corners get a little bit farther away -- exactly as you'd expect intuitively -- but the increase is also extremely slow, as you'd expect. And a distance of 5.0 in a hundred dimensions doesn't really seem "much" farther. It's a mere factor of 7 after adding a whole 98 additional dimensions. If anything, this shows that corners always stay "close by" no matter how many dimensions you add.

So I'm utterly failing to see what's supposedly so counterintuitive about high dimensional spaces at all here. This article baffles me completely.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: