Hacker News new | past | comments | ask | show | jobs | submit login
"Data Science Machine" finds patterns more effectively than most humans (csail.mit.edu)
24 points by dmckeon on Oct 18, 2015 | hide | past | favorite | 4 comments



Original paper: https://groups.csail.mit.edu/EVO-DesignOpt/groupWebSite/uplo...

Basically how the "Data Science Machine" works: 1) "Feature synthesis": features related to the target are discovered by following foreign keyed relationships in a relational database, automatically generating "deep" queries involving many joins. Aggregates like MIN, MAX, AVG, and STD are automatically calculated to be used as additional features. 2) "Dimensionality Reduction": Truncated SVD reduces feature length 3) "Modeling": Remaining features are clustered and then modeled by Random Forest decision trees with learned hyper parameters.

There's a ton more optimizations, but that's the gist.

The machine can't be evaluated as a stand-alone product, though, because the researchers also mention manually generating features on the problems they tested, without reporting the effect of their own tampering.

They do conclude the system is more of a time-saving device than the general pattern-recognition one that our journalists seem to think it is.

Edited: formatting


Is this related to BayesDB (also from MIT's CSAIL)?

http://probcomp.csail.mit.edu/bayesdb/



What a crap article. No examples of "predictive patterns in unfamiliar data sets" are given, no explanation of the competition, nothing of value. For all we know the humans could be toddlers.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: