I'd need to see the math for this: how will the Huffman codes preserve a semantic bijection with the original English while throwing out the spellings as noise? It seems like if you're throwing out information, rather than moving it into prior knowledge (bias-variance tradeoff, remember?), you shouldn't be able to biject your learned representation to the original input.
Also, spelling isn't all noise. It's also morphology, verb conjugation, etc.
>Humans have produced more than enough language for a sufficiently smart algorithm to construct a world model from it.
Then why haven't you done it?
>Any fact you can imagine is contained somewhere in the vast corpus of all English text.
Well no. Almost any known fact I can imagine, plus vast reams of utter bullshit, can be reconstructed by coupling some body of text somewhere to some human brain in the world. When you start trying to take the human (especially the human's five exteroceptive senses and continuum of emotions and such) out of the picture, you're chucking out much of the available information.
There's damn well a reason children have to learn to speak, understand, read, and write, and then have to turn those abilities into useful compounded learning in school -- rather than just deducing the world from language.
>Even very crude models can learn these things. Even very crude models can produce nearly sensible dialogue from movie scripts.
Which doesn't do a damn thing to teach the models how to shave, how to tell kings from queens by sight, or how to avoid getting hit by a car when crossing the street.
>Models with millions of times fewer nodes than the human brain. It's amazing this is possible at all.
The number of nodes isn't the important thing in the first place! It's what they do that's actually important, and by that standard, today's neural nets are primitive as hell:
* Still utterly reliant on supervised learning and gradient descent.
* Still subject to vanishing gradient problems when we try to make them larger without imposing very tight regularizations/very informed priors (ie: convolutional layers instead of fully-connected ones).
* Still can't reason about compositional, productive representations.
* Still can't represent causality or counterfactual reasoning well or at all.
>Trying to model video data first is wasted processing power. It's setting the field back. Really smart researchers spend so much time eeking out 0.01% better benchmark on MNIST/imagenet/whatever, with entirely domain specific, non general methods. So much effort is put into machine vision, when Language is so much more interesting and useful, and closer to general intelligence. Convnets, et al., are a dead end, at least for AGI.
Well, what do you expect to happen when people believe in "full AGI" far more than they believe in basic statistics or neuroscience?
I don't think anything like that exists today, or ever will exist. And in fact you are making an even stronger claim than that. Not just that vision will be helpful, but absolutely necessary.