"Data Science Machine" finds patterns more effectively than most humans

querious · on Oct 21, 2015

Original paper: https://groups.csail.mit.edu/EVO-DesignOpt/groupWebSite/uplo...

Basically how the "Data Science Machine" works: 1) "Feature synthesis": features related to the target are discovered by following foreign keyed relationships in a relational database, automatically generating "deep" queries involving many joins. Aggregates like MIN, MAX, AVG, and STD are automatically calculated to be used as additional features. 2) "Dimensionality Reduction": Truncated SVD reduces feature length 3) "Modeling": Remaining features are clustered and then modeled by Random Forest decision trees with learned hyper parameters.

There's a ton more optimizations, but that's the gist.

The machine can't be evaluated as a stand-alone product, though, because the researchers also mention manually generating features on the problems they tested, without reporting the effect of their own tampering.

They do conclude the system is more of a time-saving device than the general pattern-recognition one that our journalists seem to think it is.

Edited: formatting

BenoitP · on Oct 18, 2015

Is this related to BayesDB (also from MIT's CSAIL)?

http://probcomp.csail.mit.edu/bayesdb/

dang · on Oct 18, 2015

Url changed from http://phys.org/news/2015-10-human-intuition-algorithms-outp..., which copies this.

degenerate · on Oct 18, 2015

What a crap article. No examples of "predictive patterns in unfamiliar data sets" are given, no explanation of the competition, nothing of value. For all we know the humans could be toddlers.