
Data diversity: Preserving variety in data sets should aid machine learning - upen
http://news.mit.edu/2016/variety-subsets-large-data-sets-machine-learning-1216
======
rokosbasilisk
I believe they are using mcmcs at the core. markov chain multi carlos. this
might be useful if you are wondering what it is
[http://mlwhiz.com/blog/2015/08/19/MCMC_Algorithms_Beta_Distr...](http://mlwhiz.com/blog/2015/08/19/MCMC_Algorithms_Beta_Distribution/)

------
q_revert
I think this is the paper, which oddly isn't linked in the article:

[https://arxiv.org/abs/1509.01618](https://arxiv.org/abs/1509.01618)

~~~
sidrajaram
That seems to be a precursor to the work mentioned in the article. This is the
one that was presented at NIPS this year:
[https://papers.nips.cc/paper/6182-fast-mixing-markov-
chains-...](https://papers.nips.cc/paper/6182-fast-mixing-markov-chains-for-
strongly-rayleigh-measures-dpps-and-constrained-sampling)

------
opaqe
Is there a more detailed paper describing the algorithm? The description is
very vague in the article. When they pick the two points, is there an
evaluation on how much "diversity" increases w/r/t each of the three possible
operations, and that's how they choose?

edit: thanks @q_revert for linking the paper

