
Making big data manageable - seycombi
https://news.mit.edu/2016/making-big-data-manageable-1214
======
unix-junkie
Here is the paper: [https://papers.nips.cc/paper/6596-dimensionality-
reduction-o...](https://papers.nips.cc/paper/6596-dimensionality-reduction-of-
massive-sparse-datasets-using-coresets.pdf)

------
fnord123
This is neat. I bet if you use a hypercircle technique then you could project
wikipedia articles into a GiST or BallTree or similar n dimensional tree
structure. Then search can be done by projecting a query into the hypercircle
and returning results close by.

~~~
stdbrouw
You would have a representative dataset but you'd still be stuck with the
curse of dimensionality that plagues nearest neighbors: in high dimensions,
everything is far away from each other and nothing is close to the center.

------
collyw
This sounds like something that could be done with a materialized view, or am
I missing something?

~~~
zbjornson
The point of this algorithm is sort of to construct a materialized view, but
how it's done is the cool part. It's not a simple database query -- it reduces
the number of dimensions into a manageable number.

~~~
jortiz81
Indeed, the core contribution (section 1.2) is the cool part.

------
MeteorMarc
But then it is not big data anymore!

------
afian
I know these people! They are amazing :)

