
HoloClean: Weakly Supervised Data Repairing - polm23
https://holoclean.github.io/gh-pages/blog/holoclean.html
======
yboris
Very related: _Open Refine_

[http://openrefine.org/](http://openrefine.org/)

> is a powerful tool for working with messy data: cleaning it; transforming it
> from one format into another; and extending it with web services and
> external data.

~~~
massaman_yams
Thanks for sharing! This looks much more practical than HoloClean.

------
y04nn
There is also ExceLint [1] for fixing bad data in (Excel) spreadsheets. I
never tried it.

[1]
[https://github.com/ExceLint/ExceLint](https://github.com/ExceLint/ExceLint)

------
dx034
The code is on GitHub[1] for anyone interested.

[1]
[https://github.com/HoloClean/HoloClean](https://github.com/HoloClean/HoloClean)

------
hooande
This looks useful.

I wonder if the concept could be used to make predictions about data instead
of just preparing it. ie, to interpolate likely missing values like in
collaborative filtering?

There are probably several datasets that could be explained with this kind of
scalable weak semi supervision

------
arcboii92
This looks really cool, but I must admit - when I saw the name I hoped it was
a HoloLens AR gamification of tidying up my house that gives me points for
each dish washed or item of clothing folded. But hey, data is cool too.

------
polskibus
Holoclean was presented at this year's SIGMOD conference.

