Hacker News new | past | comments | ask | show | jobs | submit login

Even if it were practically useless (which it is not, although the practical applications are less impressive than the research achievements at this point), it would be magical. Deep learning has dominated imagenet for a decade now, for example. One reason this is magical is because the sota models are extremely over parametrized. There exist weights that perform perfectly on the training data but give random answers on the test data [0]. But in practice these degenerate weights are not found during sgd. What's going on there? As far as I know there is no satisfying explanation.

[0] https://arxiv.org/abs/1611.03530




If you look at these “degenerate” parameterizations, they’re clearly islands in the sea of weight parameter space. It’s clear that what you’re searching for is not a “minimum” per say but an amorphous fuzzy blobby manifold. Think of it like sculpting a specific 3D shape out of clay. Sure there are exact moves to sculpt the shape, but if you’re just gently forming the clay you can get very close to the final form but still have some rough edges.

As for a formal analysis, I just can’t imagine there existing a formal analysis of ML that can describe the distinctly qualitative aspects of it. It’s like coming up with physics equations to explain art.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: