
CrossCat: a domain-general, Bayesian method for analyzing high-dimensional data - Jach
http://probcomp.csail.mit.edu/crosscat/
======
zbjornson
Paper describing the software, including some example figures:
[http://arxiv.org/abs/1512.01272](http://arxiv.org/abs/1512.01272)

------
MrQuincle
There are not so many nonparametric Bayesian alternatives yet for multiview
clustering. So, this software is a really nice contribution!

Alternatives:

* The group of Ghahramani [1] use a stick-breaking representation instead of the authors of CrossCat, plus a variational method rather than sampling.

* It's even possible to reuse features across categories by stacking an Indian Buffet Process and a Chinese Restaurant Process [2].

The difficulty for the person who has to cluster the data is that the
inference method has to be obtained from scratch. Each time that something
small changes, not only the model has to be altered, but also a new inference
process has to be implemented.

So, suppose we have dependencies between the categories, they might be
spatial, they might be temporal, this leads again to more complex processes.

I think it's important for CrossCat to describe what kind of structure it can
discover. To be future-proof it's also important to describe what the roadmap
will be.

[1] [http://www.ece.neu.edu/fac-ece/jdy/papers/nonparametric-
mult...](http://www.ece.neu.edu/fac-ece/jdy/papers/nonparametric-multiclust-
kdd10.pdf)

[2]
[http://machinelearning.wustl.edu/mlpapers/paper_files/AISTAT...](http://machinelearning.wustl.edu/mlpapers/paper_files/AISTATS2012_NiuDG12.pdf)

------
lqdc13
Looks like a good lib, but only Python2 with lots of C++ hooks =/

It would probably be fairly time consuming to port this to Python3.

However, shouldn't look a gift horse in the mouth.

------
urvader
Sounds interesting! Are there any examples of the output/analysis somewhere?

