Basically how the "Data Science Machine" works:
1) "Feature synthesis": features related to the target are discovered by following foreign keyed relationships in a relational database, automatically generating "deep" queries involving many joins. Aggregates like MIN, MAX, AVG, and STD are automatically calculated to be used as additional features.
2) "Dimensionality Reduction": Truncated SVD reduces feature length
3) "Modeling": Remaining features are clustered and then modeled by Random Forest decision trees with learned hyper parameters.
There's a ton more optimizations, but that's the gist.
The machine can't be evaluated as a stand-alone product, though, because the researchers also mention manually generating features on the problems they tested, without reporting the effect of their own tampering.
They do conclude the system is more of a time-saving device than the general pattern-recognition one that our journalists seem to think it is.
What a crap article. No examples of "predictive patterns in unfamiliar data sets" are given, no explanation of the competition, nothing of value. For all we know the humans could be toddlers.
Basically how the "Data Science Machine" works: 1) "Feature synthesis": features related to the target are discovered by following foreign keyed relationships in a relational database, automatically generating "deep" queries involving many joins. Aggregates like MIN, MAX, AVG, and STD are automatically calculated to be used as additional features. 2) "Dimensionality Reduction": Truncated SVD reduces feature length 3) "Modeling": Remaining features are clustered and then modeled by Random Forest decision trees with learned hyper parameters.
There's a ton more optimizations, but that's the gist.
The machine can't be evaluated as a stand-alone product, though, because the researchers also mention manually generating features on the problems they tested, without reporting the effect of their own tampering.
They do conclude the system is more of a time-saving device than the general pattern-recognition one that our journalists seem to think it is.
Edited: formatting