Hacker News new | past | comments | ask | show | jobs | submit login

>This kind of pattern matching is also fairly evidently not all that difficult, since much simpler brains than ours can manage it, as can ML models with caveats (albeit caveats often misunderstood and exaggerated).

Do tell me, since I'm writing a paper on a related topic, which current ML models can "pattern match" to recognize or generate multimodal (ie: visual, auditory, and tactile) percepts of cats, in arbitrary poses, in any context where cats are usually/realistically found?

Or did you just mean that the "cat" subset of Imagenet is as "solved" as the rest of Imagenet?




Please try to argue in good faith. I've already said ML models have caveats, obviously I don't think they're perfect or par-human.


I think that "perfect" or "par-human" would be a judgement about performance on a set computational task. My caveat is that ML models are usually performing a vastly simplified task compared to what the brain does. But it looked like you were saying they perform "pattern matching" with the same task setting and cost function as the brain, and just need to perform better at it. What's your view?


“Not all that difficult” is in the context of the brain, where things tend to vary between ‘pretty difficult’ and ‘seemingly impossible’. I say ML shows pattern matching of this sort isn't all that difficult because progress has been significant over very short stretches of time, without any particular need to solve hard problems, and with a general approach that looks like it will extend into the future.

We have this famous image showing progress over the last 5 years.

https://pbs.twimg.com/media/Dw6ZIOlX4AMKL9J?format=jpg&name=...

The latest generator in this list has very powerful latent spaces, including approximately accurate 3D rotations.

https://youtu.be/kSLJriaOumA?t=333

We have similarly impressive image segmentation and pose estimation results.

https://paperswithcode.com/paper/deep-high-resolution-repres...

Because you mentioned it, note that models that utilize multimodal perception is possible. The following uses audio with video.

https://ai.googleblog.com/2018/04/looking-to-listen-audio-vi...

For sure, these are not showing off the full breadth of versatility that humans have. I can still reliably distinguish StyleGAN faces from real faces, and segmentation still has issues. These all have fairly prominent failure cases, can't refine their estimates with further analysis like humans can, and humans still learn much, much faster than these models.

However, note that (for example) StyleGAN has 26 million parameters, and with my standard approximate comparison of 1 bit:1 synapse, that puts it probably somewhere around the size of a honey bee brain. Given such a model is already capturing sophisticated models fairly reliably using sophisticated variants of old techniques without need of a complete rethink, and the same cannot be said for (eg.) high-level reasoning, where older strategies (eg. frames) are pretty much completely discredited, “not all that difficult” seems like a pretty defensible stance.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: