
"Data Science Machine" finds patterns more effectively than most humans - dmckeon
http://www.csail.mit.edu/data_science_machine
======
querious
Original paper: [https://groups.csail.mit.edu/EVO-
DesignOpt/groupWebSite/uplo...](https://groups.csail.mit.edu/EVO-
DesignOpt/groupWebSite/uploads/Site/DSAA_DSM_2015.pdf)

Basically how the "Data Science Machine" works: 1) "Feature synthesis":
features related to the target are discovered by following foreign keyed
relationships in a relational database, automatically generating "deep"
queries involving many joins. Aggregates like MIN, MAX, AVG, and STD are
automatically calculated to be used as additional features. 2) "Dimensionality
Reduction": Truncated SVD reduces feature length 3) "Modeling": Remaining
features are clustered and then modeled by Random Forest decision trees with
learned hyper parameters.

There's a ton more optimizations, but that's the gist.

The machine can't be evaluated as a stand-alone product, though, because the
researchers also mention manually generating features on the problems they
tested, without reporting the effect of their own tampering.

They do conclude the system is more of a time-saving device than the general
pattern-recognition one that our journalists seem to think it is.

Edited: formatting

------
BenoitP
Is this related to BayesDB (also from MIT's CSAIL)?

[http://probcomp.csail.mit.edu/bayesdb/](http://probcomp.csail.mit.edu/bayesdb/)

------
dang
Url changed from [http://phys.org/news/2015-10-human-intuition-algorithms-
outp...](http://phys.org/news/2015-10-human-intuition-algorithms-outperforms-
teams.html), which copies this.

------
degenerate
What a crap article. No examples of "predictive patterns in unfamiliar data
sets" are given, no explanation of the competition, nothing of value. For all
we know the humans could be toddlers.

