No one would ever imagine that locking a baby in a featureless room with a giant stack of books would give them general intelligence. I don't understand why AI researchers think it will work for AIs. They need bodies that are biologically connected with the rest of the biosphere, with an intrinsic biological imperative, if they are ever to understand the world. I'm not saying they have to be exactly like us, but they will only be able to understand us to the extent that they have body parts and social experiences that are analogous to ours.
This isn't an engineering problem, it's a philosophical problem: We are blind to most things. We can only see what we can relate to personally. We can use language and other symbol systems to expand our understanding by permuting and recombining our personal experiences, but everything is grounded in our interactive developmental trajectory.
And here's the rub: Once you give an AI an animal-like personal developmental trajectory to use for grounding the semantics of their symbol systems, you end up with something which is not particularly different or better than a human cyborg.
I believe we can get AI from just text. Obviously that won't work for babies, because babies get bored quickly looking at text. AIs can be forced to read billions of words in mere hours!
Look at word2vec. By using simple dimensionality reduction on the frequency that words that occur near to each other in news articles, it can learn really interesting things about the meaning of words. The famous example is the vector "king" minus the vector for "man" plus "woman" equals "queen". It's learning the meaning of words, and the relationships between them.
Recurrent NNs, using similar techniques, but with much more complexity, can learn to predict the next word in a sentence very well. They can learn to write responses that are almost indistinguishable from humans. And it's incredible this works at all, given RNNs have only a few thousand neurons at most, and a few days of training, compared to humans' billions of neurons trained over a lifetime.
All of the information of our world is contained in text. Humans have produced billions of books, papers, articles, and internet comments. Billions of times more information than any human could read in their entire lifetime. Any information you can imagine is contained in text somewhere. I don't think it's necessary for AIs to be able to see, or interact with the world in any way.
If you can predict the word a human would say next, with enough accuracy, then you could also produce answers indistinguishable from theirs. Meaning you could pass the Turing test, and perform any language task they could do just as well. So language prediction alone may be sufficient for AGI.
This is the theory behind the Hutter Prize, which proposes that predicting (compressing) wikipedia's text is a measure of AI progress. The Hutter Prize isn't perfect (it's only a sample of wikipedia, which is very small compared to all the text humans have produced), but the idea is solid.
> All of the information of our world is contained in text
Communicate the concept of "green" to me in text.
The sound of a dog barking, a motor turning over, a sonic boom, or the experience of a Doppler shift. Beethoven's symphony.
Sour. Sweet. What does "mint" taste like? Shame. Merit. Learn facial and object recognition via text.
Vertigo.
Tell a boxer how to box by reading?
Hand eye coordination, bodies in 3 dimensional space.
Look, I love text, maybe even more than yourself. But all these things imbibe, structure and influence or text, but are not contained in them.
To make substantial inroads to something that looks like human esque AI, text is not enough. The division of these fields are artificial and based on our current limited tech and the specialisation of our researchers, faculties and limitations.
When we read, we play back memories, visions, sounds, feelings, etc, and inherent ideas gained through experience of ourselves as physical bodies in space.
Strong AI, at least to be vaguely recognised as such, must work with algorithms and machinery that understand these things, but which then works at that next level of abstraction to combine them into proper human type concepts.
Of course, there is the question about why we would want to create a human like AI, it's my contention that human like AI isn't actually what many of us would want, but that's another topic...
I won't touch the qualia aspect, but everything necessary to flawlessly pretend to understand the color green, the sound of a dog's bark, or the experience of hearing a sonic boom can be represented with text. As an existence proof, you could encode an adult human as a serial stream of characters.
But if you must pretend to be sighted and hearing, there are many descriptions of green, of dogs barking, of motors, etc, scattered through the many books written in English (and other languages.)
Are these descriptions perfect? Maybe not. But they are sufficient to mimic or communicate with humans through text. It's sufficient to beat a Turing test, to answer questions intelligently, to write books and novels, and political arguments, etc. If that's not AGI, I don't know what is.
Yes they are. However, is a blind, deaf, person with absolutely no motor control, no sense of touch, and no proprioception intelligent? Unclear. They certainly have no language faculties.
But a blind person can't describe green. A deaf person can't describe the sound of a motorboat. A person without taste can't describe mint flavor. That is the point I was making.
I don't propose that a human could lose all of their senses and still be able to communicate. But I do believe computers could do so, if they are designed to do that. Humans are not designed to work lacking those senses.
So a blind person would never be able to understand the different categories of color (other than that they are placeholders for distinct categories of something).
Now we are just speculating. We believe a computer might be able to understand things for which it doesn't have the sense - but that is speculation and totally untested, and certainly can no longer be justified by using human minds as an example.
A blind person could pretend to be sighted though. There have been blind authors who wrote about sighted characters, for instance. They need not experience the thing themselves. Just learn from experience how sighted people behave and describe things, and mimic that.
You can explain red by saying it's a "warm" color for example. Metaphors work, analogies, senstion from one sense can be explained using sensations from another. Now you need to have at least one sense, which machines clearly don't.
I don't think raw feels, qualia type stuff really counts as information in the information theoretic sense. Nor is understanding its nature necessary for artificial general intelligence (though perhaps it is (or perhaps not) for artificial consciousness, which is not the same thing.)
> All of the information of our world is contained in text
Even if this were a true statement, it's still the case that it might not be enough. There is a class of functions that are simply not learnable without some prerequisite knowledge. This is directly analogous to a one-time pad in crypto. It is entirely possible that the function 'language' is in this class of unlearnable functions. While it may be the case that certain varieties of intelligence are learnable tabula rasa from a powerful neural net, the surface form of human natural language (the part your recommending measuring) may simply not have enough information in it to decode the whole picture. It is entirely possible that you need to supply some of your own information as well to the picture, in a specific manner so as to act as a kind of decryption key. A record needs a record player, even if you can make similar sounds with cassettes and CDs.
And so, I'm willing to bet that you simply cannot, using raw, uninformed statistical techniques, predict what word a human would say next. You need to understand more of the underlying structure of humans first.
I will agree, however, that the success towards the Hutter Prize is a valuable demonstration of AI progress. Simply because I believe that maximal compression and the kind of intelligence I'm talking about are one and the same thing. You need to offload as much of the semantic weight of the corpus into the encryption algorithm as you can. That means building a very complex model of natural language. And if you accept the premise that this model is not simply learnable by observing the surface form, then that means building Strong AI
>There is a class of functions that are simply not learnable without some prerequisite knowledge. This is directly analogous to a one-time pad in crypto. It is entirely possible that the function 'language' is in this class of unlearnable functions.
I don't understand how this could possibly be the case. We can already make great progress on language understanding with simple methods like word2vec, or perhaps even markov chains. There are tons of statistical patterns in text that can be learned by computers.
It can be the case if Chomsky was right, and Universal Grammar and other similar structures are a thing. That would mean that part of our ability to understand language comes from the particular structure of our brain (which everyone seems to by and large share). That would mean that some of our ability to understand language is genetic in nature, by whatever means genes direct the structure of brain development.
So if language comes from the structure of the brain, what would stop us from simulating that structure to give a machine mastery of language? And specifically what would imply that a machine which had some of that structure would need to learn by interaction as the top level comment suggests?
Nothing would stop us from simulating human brain-like (or analogously powerful) structures to build a machine that genuinely understands natural language. I'm arguing that we can't just learn those structures by statistical optimization techniques though.
If it turns out that the easiest, or even only means of doing this is by emulating the human brain, then it is entirely possible that we inherit a whole new set of constraints and dependencies such that world-simulation and an emobdied mind are required to make such a system learn. If this turns out not to be the case, that there's some underlying principle of language we can emulate (the classic "airplanes don't fly like birds" argument) then it may be the case that text is enough. But that's in the presence of a new assumption, that our system came pre-equipped to learn language, and didn't manufacture an understanding from whole cloth. That the model weights were pre-initialized to specific values.
If there is an innate language structure in the brain then we know that it's possible to develop such a structure by statistical optimization, since this is exactly what evolution did, no?
But I don't see any reason a "universal grammar" couldn't be learned. It may take something more complicated than ANNs, of course. But it would be really weird if there was a pattern in language that was so obfuscated it couldn't be detected at all.
it comes down to the limits of available Information with a capital 'I'. If you're working within the encoding system (as you're recommending here with the "all the text in the world" approach), then in order to learn the function that's generating this information, the messages that you're examining have a minimum amount of information they can convey. There needs to be enough visible structure purely within the context of the messages themselves to make the underlying signal clear.
I don't think it's so weird to imagine that natural language really doesn't convey a ton of explicit information on its own. Sure, there's some there, enough that our current AI attempts can solve little corners of the bigger problem. But is it so strange to imagine that the machinery of the human brain takes lossy, low-information language and expands, extrapolates, and interprets it so heavily so as to make it orders of magnitude more complex than the lossy, narrow channel through which it was conveyed? That the only reason we're capable of learning language and understanding eachother (the times we _do_ understand eachother) is because we all come pre-equipped with the same decryption hardware?
1) They appear to have crafted the skeleton of a grammar as it is with their nodes, super nodes, and slot collocations. This is directly analogous to something like an Xbar grammar, and is not learned by the system; therefore, if anything, it's strengthening a Chomskian position; the system is learning how a certain set of signals satisfy its extant constraints.
2) The don't appear to go beyond generative grammar, which already seems largely solvable by other ML methods, and is a subset of the problem "language". Correct me if I'm wrong here, it's a very long paper and I may have missed something.
Connotation. Connotation is a huge part of human language, and is completely orthogonal to the denotation, which is what a vector is going to find. For instance, an AI should accurately be able to distinguish the fact that calling someone "cheap" is different from calling them "frugal", even though both objectively mean that the person doesn't spend much money.
There's also the related phenomenon of "subtext" -- the idea that some language has a different meaning than what's said. For instance, when I ask about whether a signature line on a form is required, and the other person says, "Yes, it's required. However you think best to get the signature." There's a subtext there of, "This signature won't actually be checked, so don't worry about it."
Wouldn't you still need to attach meanings to the words though? How could an AI system ever understand, for example, the Voynich Manuscript? There's plenty of text in it, and encryption methods when it was written weren't particularly strong. Or how would a person do if they were locked in a room with lots of books written in a language unknown to them?
Of course we have no idea how the Voynich manuscript is encrypted (which would make the assumptions of word2vec wrong), or if it even has any meaning at all. And it's an incredibly small dataset compared to modern text corpuses, so there is probably significant uncertainty and overfitting. And other problems like inconsistent spellings, many errors in transcriptions, etc. But in principle this is a good strategy.
>how would a person do if they were locked in a room with lots of books written in a language unknown to them?
If you spent all day reading them, for years, and you somehow didn't get bored and kept at it, eventually you would start to see the patterns. You would learn how "slithy toves" are related to "brillig", even if you have no idea how that would translate to English. Study it long enough, and you may even be able to produce text in that language, indistinguishable from the real text. You may be able to predict the next word in a sentence, and identify mistakes, etc. Perhaps carry out a conversation in that language.
And I think eventually you would understand what the words mean, by comparing the patterns to those found in English. Once you have guesses for translations of just a few words, you can translate the rest. Because you know the relationships between words, and so knowing one word constrains the possibilities of what the other words can be.
If the translation it produces is nonsense, the words you guessed must have been wrong, and you can try again with other words. Eventually you will find a translation that isn't nonsense, and there you go. This would be very difficult for humans, because the number of hypotheses to test is so large, and analyzing text takes forever. Computers can do it at lightspeed though.
I'm familiar with this particular attack, as it was discussed here previously. It's a worthwhile attempt but the identification as star names, if real, hasn't been confirmed. But your reservations are justified.
More generally, has any attempt been made to identify the meanings of words in any sufficiently large corpus of text in a known foreign language (for example, Finnish), without being provided with a translation into English, and then compare the identified meanings to the actual meanings, as a first step towards translation?
There was a paper where they trained word vectors for English and Chinese at the same time. But they forced a few Chinese words to have the same vectors as their translated English words. This gave accurate translations for many Chinese words that didn't have translations.
Doing this without any translated words at all, would be more difficult. But I believe possible. It's actually a project I want to try in the near future.
> All of the information of our world is contained in text.
This statement is false. There is a well known thought experiment called Mary’s Room the gist of which is that knowing all conceivable scientific knowledge about how humans perceive color is still not a substitute for being a human and perceiving the color red: https://philosophynow.org/issues/99/What_Did_Mary_Know
The experience of seeing red is an example of what is called “qualia”.
In Google AI systems that identify cats, birds, etc it is reasonable to imagine AI technology evolving towards systems that can discuss those objects at the level of a typical person. However with an AI based on text only there is no possibility of that. It would be like discussing color with a blind person or sound with a deaf person.
Mary's Room and qualia is totally irrelevant. I'm not asking if the computer will "feel" "redness", simply if it can pretend to do so through text. If it can talk about the color red, in a way indistinguishable from any other human talking about red.
In any case, at some level everything is symbols. A video is just a bunch of 1's and 0's, as is text, and everything else. A being raised on only text input would have qualia just like a being raised on video input. It would just be different qualia.
Like explained elsewhere here, it may be that our brains share a genetically coded "decryption key", and that many of the things we talk about are too poorly and noisily expressed for a purely text based computer AI to ever truly replicate the processes going on inside our brains. Insufficient data, simply put, and no way to get it.
It may sure look indistinguishable, but on the inside it just wouldn't be the same.
The Mary's Room thought experiment is garbage, if you ask me. You can't just assume your hypothesis and then call the result truth.
If you assert that a person can understand everything there is to know about the color red and then still not understand what it is like to see red, you have either contradicted yourself or assumed dualism.
They assert that Mary understands the physical phenomenon of red. That is, she understands photons and eye structure, and therefore knows that light of a particular wavelength will trigger these sensors in the eye and thereafter be interpreted as "red" by a brain. All the physical components necessary to produce and sense the color "red". But when Mary sees the apple for the first time, did she learn something more about "red"?
Also, it's a thought experiment. Some people will claim the answer to that question is no, she learned nothing. Others will claim that she did. It's that thing she learned beyond the physical that theoretically cannot be conveyed by science, or even possibly by language.
To further this comment, also see Tacit Knowledge, of which the very definition is essentially knowledge that cannot or is extremely difficult to transfer through words alone.
It's learning the meaning of words, and the relationships between them.
Word2vec is definitely an impressive algorithm. But at the end of the day, it's just a tool that cranks out a fine-grained clustering of words based on (a proxy measure for) contextual similarity (or rather: an embedding in a high-dimensional space, which implicitly allows the words to be more easily clustered). And yes, some additive relations between words, when the signal is strong enough.
But to say that it's "learning" the "meaning" of these words is really quite a stretch.
I wonder if anyone has ever run this system on Lewis Carrol's Jabberwocky[1] or even something like Anthony Burgess's A Clockwork Orange, both of which contain a large number of made up words/slang/re-use.
I remember that when I first read A Clockwork Orange, it took me a while but I finally started to understand the meanings of those words/phrases (though I may not have every encountered them before.) It did feel like my brain was re-wiring itself to a new language. It'd be interesting to see how some type of language AI would treat these works.
Word2vec may be crude, but it demonstrates that you can learn non-trivial relationships between words with even such a simple algorithm. What is the meaning of a word, if not the relationship it has to other words?
Gender was just an example. There are lots of semantic information learned by word2vec, and the vectors have shown to be useful in text classification and other uses. It can learn subtle stuff, like the relationship between countries, celebrities, etc. All that information is contained in a few hundred dimensions, which is tiny compared to the neurons in the brain.
I use word2vec a lot, and things like it, and I've always found it overstated to say that it "learns the relationships between things".
You say, as many people do, that the operation "king - man + woman = queen" indicates that it understands that the relation between "king" and "queen" is the relation between "man" and "woman". But there is something much simpler going on.
What you're asking it for, in particular, is to find the vector represented by "king - man + woman", and find the vector Q with the highest dot product with to this synthetic vector, out of a restricted vocabulary.
The dot product is distributive, so distribute it: you want to find the maximum value of (king * Q) - (man * Q) + (woman * Q).
So you want to find a vector that is like "king" and "woman", and not like "man", and is part of the extremely limited vocabulary that you use for the traditional word2vec analogy evaluation, but not one of the three words you used in the question. (All of these constraints are relevant.) Big surprise, the word that fits the bill is "queen".
(I am not the first to do this analysis, but I've heard it from enough people that I don't know who to credit.)
It's cool that you can use sums of similarities between words to get the right answer to some selected analogy problems. Really, it is. It's a great thing to show to people who wouldn't otherwise understand why we care so much about similarities between words.
But this is not the same thing as "solving analogies" or "understanding relationships". It's a trick where you make a system so good at recognizing similarities that it doesn't have to solve analogies or understand relationships to solve the very easy analogy questions we give it.
Well in my example the AI doesn't have to interact with the world at all. To pass the Turing test simply requires imitating a human, predicting what words they would say. You only need to know the relationships between words.
If literally the only thing you know is the relationship between words, but you have a perfect knowledge of the relationship between words, you'll quickly determine that "Day" and "Night" are both acceptable answers, and have no means of determining which is the right one. At the very minimum, you need a clock, and an understanding of the temporal nature of your training set, to get the right one.
A beautiful rainbow glimmering gently in the sky after a summer shower.
What do you see?
What do you smell?
What do you hear?
What does the landscape look like?
What memory does this bring up?
These are messages that the language is communicating. If an AI can't understand at least some of the content of the message then can it compose one effectively? I'm not certain it can understand the meaning from words alone, but we can certainly try.
Only knowing the relationships between words alone would just be a poor proxy for knowing the meanings of the words, e.g. what real world concepts the words attempt to represent. You might be able to get pretty far with this technique, but I would bet a lot of money you would not be able to get reliable, in-depth human level communication. The system needs to have an understanding of the world.
And then there is the fundamentally dynamic aspect of language, which strengthens the need for a rich understanding of the world that words describe and convey.
There are other tests for AI besides the Turing Test, some of which require more understanding on the part of the program. Check out Winograd Schemas: http://www.cs.nyu.edu/faculty/davise/papers/WinogradSchemas/... which hinge on understanding the subtleties of how words refer to the world.
But it and methods like it are still very limited in what they can learn. For example, they can't learn relations involving antonyms. They can't tell apart hot from cold or big from small.
> All of the information of our world is contained in text.
That's a bit overstated. I think maybe you mean to say "all of the information you'd need to be an effective citizen of our world is contained in text"? Or something similar? I think even that is too strong a claim, but it's at least understandable.
As stated the assertion doesn't make any sense. There is more information in a glass of milk than could be stored on all of the computers on earth, and Heisenberg showed that it's impossible to even record all of the information about a single particle.
Certainly there is no textual information available about my grandfather's eyes, but that information is accessible in the world for those who'll look. You seem to be underestimating the quantity of information you absorbed as a baby, just by reacting robotically to the people and events around you and absorbing the relationships between percepts and feelings.
Well of course that's what I meant. Any information an average person knows is contained in text, somewhere. That is all common sense knowledge. Everything from detailed descriptions of trees, to the color of the sky, to the shape of the human face, etc. But also much more, like all of our scientific knowledge and written history. Billions of things the average person doesn't know.
Learning the statistics of language is not going to tell the ML model anything about the underlying stuff to which the language actually refers. It will need actual "sense-data" to do that. For instance, to get a model that generates image captions, you need to train it with actual images.
If the much-vaunted "general intelligence" consists in both vague and precise causal reasoning and optimal control with respect to objects in the real world, then no, it obviously cannot be done with mere language. At least one sensor and one effector will be needed to train an ML/AI model to perform those tasks.
There is nothing magical about "sense data". A video is just a bunch of 1's and 0's, just like text data. A model of video data is not superior in any way to one of text data, they are just different.
The internet is so large and so comprehensive (especially if you include digitized books and papers, e.g. libgen or google books) that I doubt any important information that can be learned through video data, can't be obtained through text data.
>A video is just a bunch of 1's and 0's, just like text data. A model of video data is not superior in any way to one of text data, they are just different.
Uhhhh... there's this thing called entropy. A corpus of video data is vastly higher in entropy content than text data, and probably a good deal more tightly correlated too, making it much easier to learn from.
Remember, for a human, the whole point of speech and text is to function as a robust, efficient code (in the information-theoretic sense) for activating generative models we already have in our heads when we acquire the words. This is why we have such a hard time with very abstract concepts like mathematical ones: the causal-role concepts (easy to theorize how to encode those as generative models) are difficult to acquire from nothing but the usage statistics of the words and symbols, in contrast to "concrete", sense-grounded concepts, which have large amounts of high-dimensional data to fuel the Blessing of Abstraction.
Nevermind, I should probably just get someone to let me into a PhD program so I can publish this stuff. If only they'd consider the first paper novel enough already!
I think you mean redundancy. And yes videos are highly redundant. But I don't see how that's any kind of advantage. Text has all the relevant information contained within it, with a ton of irrelevant information discarded. But there are still plenty of learnable patterns. Even trivial algorithms like word2vec can glean a huge amount of semantic information (much easier than is possible with video, currently.)
I don't know if humans have generative models in their heads. There are people who have no ability to form mental images, and they function fine. Regardless, an AI should be able to get around that by learning our common patterns. It need not mimic our internal states, only our external behavior.
No, I meant a pair of specific information-theoretical quantities I've been studying.
>And yes videos are highly redundant. But I don't see how that's any kind of advantage.
Representations are easier to learn for highly-correlated data. Paper forthcoming, but conceptually so obvious that quantifying it is (apparently) non-novel.
>I don't know if humans have generative models in their heads.
The best available neuroscience and computational cognitive science says we do.
>Regardless, an AI should be able to get around that by learning our common patterns. It need not mimic our internal states, only our external behavior.
Our external behavior is determined by the internal states, insofar as those internal states are functions which map sensory (including proprioceptive and interoceptive) statistics to distributions over actions. If you want your robots to function in society, at least well enough to take it over and kill everyone, they need a good sense of context and good representations for structured information, behavior, and goals. Further, most of the context to our external behavior is nonverbal. You know how it's difficult to detect sarcasm over the internet? That's because you're trying to serialize a tightly correlated high-dimensional data-stream into a much lower-dimensional representation, and losing some of the variance (thus, some of the information content) along the way. Humans know enough about natural speech that we can usually, mostly reconstruct the intended meaning, but even then, we've had to develop a separate art of good writing to make our written representations conducive to easy reconstruction of actual speech.
Deep learning can't do this stuff right now. Objectively speaking, it's kinda primitive, actually. OTOH, realizing the implications of our best theories about neuroscience and information theory as good computational theories for cognitive science and ML/AI is going to take a while!
>Representations are easier to learn for highly-correlated data. Paper forthcoming, but conceptually so obvious that quantifying it is (apparently) non-novel.
I know what you are saying, but I don't think it's true.
Imagine a hypothetical language that is so compressed, so non-redundant, so little correlated, that it's indistinguishable from random noise. Learning this language may seem an impossible task. But in fact it's very easy to produce text in this language. Just produce random noise! As stated, that's indistinguishable from real text in this language.
Real language, of course, has tons of statistical patterns, and is definitely not random. But I don't see how it is harder to learn than, say more redundant audio recording of the same words, or a video recording of the person speaking them. That extra information is irrelevant and will just be discarded by any smart algorithm anyway.
>The best available neuroscience and computational cognitive science says we do.
>Our external behavior is determined by the internal states, insofar as those internal states are functions which map sensory (including proprioceptive and interoceptive) statistics to distributions over actions. If you want your robots to function in society, at least well enough to take it over and kill everyone, they need a good sense of context and good representations for structured information, behavior, and goals.
Robots are never going to have exactly the same internal states and experience as humans. They could be very, very different, in structure, to the human brain. Being exactly like humans isn't the goal. Mimicking humans is an interesting diversion, but it's not necessary, or the goal in and of itself.
And you may be right that a robot without vision would be disadvantaged. I think that's mostly anthropomorphism, imagining how disadvantaged blind humans are (and in fact even blind humans can function better than most people expect.) But even if it's true, my point is that sight is not strictly necessary for intelligence.
In fact I think vision may even be a disadvantage. So much of the brain is devoted to visual processing. While text, and even language itself, are hacks that evolution created relatively recently. A brain built purely for language could be much more efficient at it than we can probably imagine. Ditching vision could save a huge amount of processing power and space.
>I know what you are saying, but I don't think it's true.
Reading your post, you actually seem quite confused.
>Imagine a hypothetical language that is so compressed, so non-redundant, so little correlated, that it's indistinguishable from random noise. Learning this language may seem an impossible task.
Well yes, learning a class of strings in which each digit of every finite prefix is statistically independent from each other digit, is very hard, bordering on impossible (or at least, impossible to do better than uniform-random guessing).
>But in fact it's very easy to produce text in this language. Just produce random noise!
But that isn't the learning problem being posed! You are not being asked to learn `P(string | language)` (which is, in fact, the uniform distribution over arbitrary-length strings), but `P(language | string1, string2, ..., stringn)`, which by the way you've posed the problem factorizes into `P(language| character1) x P(language|character2) x ... x P(language|characterm)`. If the actual strings are sampled from a uniform distribution over arbitrary-length strings, then we have two possibilities:
1) The prior is over a class of languages some of which are not optimally compressed, and which thus do not render each character (or even each string) conditionally independent. In this case, the posterior will favor languages that do render each character conditionally independent, but we won't be able to tell apart one such hypothesis from another. We've learned very little.
2) The prior is over a class of languages all of which yield strings full of conditionally-independent noise: no hypothesis can compress the data. In this case, the evidence-probability and the likelihood cancel, and our posterior over languages equals our prior (we've learned nothing).
>Real language, of course, has tons of statistical patterns, and is definitely not random. But I don't see how it is harder to learn than, say more redundant audio recording of the same words, or a video recording of the person speaking them. That extra information is irrelevant and will just be discarded by any smart algorithm anyway.
Noooo. Compression does not work that way. Compression works by finding informative patterns in data, not by throwing them away. If your goal is to learn the structure in the data, you want the structure to be more redundant rather than less.
I'm telling you, once the paper is submitted, I can send you a copy and just show you the equations and inequalities demonstrating this fact.
Differences in sensorimotor cortex function that leave the brain unable to perform top-down offline simulation with a high subjective-sensory precision don't invalidate the broad theory that cortical microcircuits are generative models (in particular, hierarchical ones, possibly just large hierarchies in which the individual nodes are very simple distributions).
>Robots are never going to have exactly the same internal states and experience as humans. They could be very, very different, in structure, to the human brain.
Duh. However, if we want them to work, they probably have to run on free-energy minimization somehow. There is more necessity at work here than connectionism believes in, but that's a fault in connectionism.
>Being exactly like humans isn't the goal. Mimicking humans is an interesting diversion, but it's not necessary, or the goal in and of itself.
I didn't say that a working robot's representations had to exactly match those of humans. In fact, doing so would be downright inefficient, since robots would have completely different embodiments to work with, and thus be posed different inference problems in both perception and action. The fact that they would be, necessarily, inference problems is the shared fact.
>And you may be right that a robot without vision would be disadvantaged. I think that's mostly anthropomorphism, imagining how disadvantaged blind humans are (and in fact even blind humans can function better than most people expect.) But even if it's true, my point is that sight is not strictly necessary for intelligence.
Sight isn't. Some kind of high-dimensional sense-data is.
>In fact I think vision may even be a disadvantage. So much of the brain is devoted to visual processing. While text, and even language itself, are hacks that evolution created relatively recently. A brain built purely for language could be much more efficient at it than we can probably imagine. Ditching vision could save a huge amount of processing power and space.
That's putting the cart before the horse. Language is, again, an efficient but redundant (ie: robust against noise) code for the models (ie: knowledge, intuitive theories, as you like) the brain already wields. You can take the linguistic usage statistics of a word and construct a causal-role concept for them in the absence of a verbal definition or sensory grounding for the word, which is arguably what children do when we read a word before anyone has taught it to us, but doing so will only work well when the concepts' definitions are themselves mostly ungrounded and abstract.
So purely linguistic processing would work fairly well for, say, some of mathematics, but not so much for more empirical fields like social interaction, ballistic-missile targeting, and the proper phrasing of demands made to world leaders in exchange for not blowing up the human race.
>But that isn't the learning problem being posed! You are not being asked to learn `P(string | language)` (which is, in fact, the uniform distribution over arbitrary-length strings), but `P(language | string1, string2, ..., stringn)`...
Hold on. Let's say the goal is passing a Turing test. I think that's sufficient to demonstrate general intelligence and do useful work. In that case, all that is required is mimicry. All you need to know is P(string), and you can produce text indistinguishable from a human.
>Noooo. Compression does not work that way. Compression works by finding informative patterns in data, not by throwing them away. If your goal is to learn the structure in the data, you want the structure to be more redundant rather than less.
Ok lets say I convert English words to smaller huffman codes. This should be even easier for a neural network to learn, because it can spend less effort trying to figure out spelling. Of course some encodings might make it harder for a neural net to learn, since NNs make some assumptions about how the input should be structured, but in theory it doesn't matter.
>Some kind of high-dimensional sense-data is [necessary]... purely linguistic processing would work fairly well for, say, some of mathematics, but not so much for more empirical fields
These are some really strong assertions that I just don't buy, and I don't think you've backed up at all.
Humans have produced more than enough language for a sufficiently smart algorithm to construct a world model from it. Any fact you can imagine is contained somewhere in the vast corpus of all English text. English contains a huge amount of patterns that give massive hints to the meaning. E.g. that kings are male, or that males shave their face and females typically don't, or that cars are associated with roads, which is a type of transportation, etc.
Even very crude models can learn these things. Even very crude models can produce nearly sensible dialogue from movie scripts. Models with millions of times fewer nodes than the human brain. It's amazing this is possible at all. Of course a full AGI should be able to do a thousand times better and completely understand English.
Trying to model video data first is wasted processing power. It's setting the field back. Really smart researchers spend so much time eeking out 0.01% better benchmark on MNIST/imagenet/whatever, with entirely domain specific, non general methods. So much effort is put into machine vision, when Language is so much more interesting and useful, and closer to general intelligence. Convnets, et al., are a dead end, at least for AGI.
>Ok lets say I convert English words to smaller huffman codes. This should be even easier for a neural network to learn, because it can spend less effort trying to figure out spelling.
I'd need to see the math for this: how will the Huffman codes preserve a semantic bijection with the original English while throwing out the spellings as noise? It seems like if you're throwing out information, rather than moving it into prior knowledge (bias-variance tradeoff, remember?), you shouldn't be able to biject your learned representation to the original input.
Also, spelling isn't all noise. It's also morphology, verb conjugation, etc.
>Humans have produced more than enough language for a sufficiently smart algorithm to construct a world model from it.
Then why haven't you done it?
>Any fact you can imagine is contained somewhere in the vast corpus of all English text.
Well no. Almost any known fact I can imagine, plus vast reams of utter bullshit, can be reconstructed by coupling some body of text somewhere to some human brain in the world. When you start trying to take the human (especially the human's five exteroceptive senses and continuum of emotions and such) out of the picture, you're chucking out much of the available information.
There's damn well a reason children have to learn to speak, understand, read, and write, and then have to turn those abilities into useful compounded learning in school -- rather than just deducing the world from language.
>Even very crude models can learn these things. Even very crude models can produce nearly sensible dialogue from movie scripts.
Which doesn't do a damn thing to teach the models how to shave, how to tell kings from queens by sight, or how to avoid getting hit by a car when crossing the street.
>Models with millions of times fewer nodes than the human brain. It's amazing this is possible at all.
The number of nodes isn't the important thing in the first place! It's what they do that's actually important, and by that standard, today's neural nets are primitive as hell:
* Still utterly reliant on supervised learning and gradient descent.
* Still subject to vanishing gradient problems when we try to make them larger without imposing very tight regularizations/very informed priors (ie: convolutional layers instead of fully-connected ones).
* Still can't reason about compositional, productive representations.
* Still can't represent causality or counterfactual reasoning well or at all.
>Trying to model video data first is wasted processing power. It's setting the field back. Really smart researchers spend so much time eeking out 0.01% better benchmark on MNIST/imagenet/whatever, with entirely domain specific, non general methods. So much effort is put into machine vision, when Language is so much more interesting and useful, and closer to general intelligence. Convnets, et al., are a dead end, at least for AGI.
Well, what do you expect to happen when people believe in "full AGI" far more than they believe in basic statistics or neuroscience?
I don't think we are getting anywhere. Look, can you point to any instance of machine vision being used to improve a language model of English? Especially any case where the language model took more computing power to train than the model aided with vision?
I don't think anything like that exists today, or ever will exist. And in fact you are making an even stronger claim than that. Not just that vision will be helpful, but absolutely necessary.
A video is encoded to a series of ones and zeros by a codex. That codex determines the interpretation of that data. The codex basically becomes a sensor -- it serves the same purpose as the eyes, which is taking raw data (photon excitements or binary string) and turning it into meaningful data (an image). And without that codex, the information is basically meaningless.
If we ever want 'true AI' then interacting with the world would certainly be a crucial component of building one. Text is a crucial component, but ultimately being able to 'see' gives one an entirely new perspective on what words mean. There's a reason after all, for why toddlers touch hot surfaces even after repeatedly being told that it will hurt - until they experience it for themselves, the word 'hot' doesn't quite connect even if they heard it dozens of times and have a very good abstract sense of what it means.
I agree that learning to reason about the world likely does not require experience with motor control and proprioception (i.e. literally how babies do it), though I do think that you either need at least some sort of tempo-spatial experience (e.g. visual). Tempo-spatial representations are just extremely hard to convey by text only. You might get the idea of closeness by saying 'close is when two words are close in a sequence of words' and ordering by saying 'this word comes after that word', but I think it would be very difficult to extrapolate that concept to more than one dimension, and dimensions that actually have not just an ordering but also a metric (just think about our inability to reason about just the fourth dimension). You need rich representations of our 3+1 dimensional world to be able to reason about it and text only gives you perhaps "0.5" dimensions (because it lacks a metric, i.e. it does not convey durations in terms of the ticks of the recurrent network). But I doubt, too, that interaction with the world is necessary, in fact I think it would be rather easy for an AI to simply write motor programs in a programming language given unrestricted and noiseless memory once it has learned to reason about tempo-spatial patterns from just observing them and identifying them with our language-coded shared concept space. It is not constrained to real-time performance of actions as humans are, therefore it can take the much easier way of programming any interaction with the world as needed, on the fly. Our shared concept space likely sufficiently conveys our general (common sense) knowledge how these patterns are known to interact and evolve over time once rudimentary tempo-spatial representations are in place.
tl;dr I think you need a small set of tempo-spatially grounded meanings (though not necessarily agent-related) and you can bootstrap everything from that using only textual knowledge.
I agree that AI is possible from text alone but only with the added stipulation that full understanding of text requires a very sophisticated learner. In order to predict what a human will say next, you need the ability to maintain extended context, use world models plus deduction to narrow and maintain multiple possibilities; all while being able to infer the state of the thing you are trying to predict (which means the AI itself has complex internal state far beyond anything an RNN or Neural Turing machine could manage today).
If I said "That is one huge fan", could you predict what my next word will be? It would depend a lot on context and the ability to reason within a complex world model. Depending on whether I had gone to a concert or to a wind tunnel, your distribution over guesses would alter. If I had gone whale watching you might even suspect I made a typo. Changing huge to large would lead to major to no adjustments, depending on each guess.
So while I agree an AI could emerge from text alone, it would have to be very sophisticated to do this.
> Look at word2vec. By using simple dimensionality reduction on the frequency that words that occur near to each other in news articles, it can learn really interesting things about the meaning of words
doesn't that depend on the syntax of a language? my guess is that it would not work for a language that has a more flexible word order than English; so it wouldn't quite work for such languages as Russian, Turkish and Finnish (and other languages).
I think if AGI were possible from the basic statistical NLP techniques outlined in most advanced NLP textbooks, it would have already happened a decade ago.
I'm not saying it is possible from just basic statistical NLP techniques. It may take much more advanced techniques. And it may take much more computing power than we have even now.
But I do believe it is possible, someday. Probably within our lifetime.
I certainly think AGI is possible. I just don't think Word2vec, RNN's, i.e. stuff from the NLP textbooks, is in the same ballpark as what it will take to achieve.
Edit - To be more clear I also agree AGI could be possible with text as the only input. I just think we need a new paradigm. Ostensibly AGI is meant to mimic human intelligence (minus the pitfalls), so IMO, the best approach will be that which mimics the underlying processes of human intelligence - not just the results. Traditional statistical NLP methods will probably have some role in this final system, but not the heart of it, as far as mimicking intelligence by mimicking intelligence's underlying processes goes.
> It's learning the meaning of words, and the relationships between them.
It's learning a meaning, not the meaning. It's just a probabilistic model for the occurrence of a word based on the words that surround it. This should not serve as a base for the rest of your claims.
Anyway -- the improvements gained by multi-modal systems essentially disprove your thesis. Which is a good news! We're making progress.
The problem seems to be in understanding the overall meaning over long periods of time. We can understand general structure quite easily and a single sentence might make sense, but overall it never ends up cohesive or meaningful.
Maybe we can do that with just words, but humans certainly don't. Words are related to concepts first, then we figure out the meaning of the rest.
I can't say anything about the feasibility of what you describe, but something about the idea of being a being composed of pure text, no ability to perceive or visualise the world around me except through the medium of text, utterly horrifies me
So a related question becomes, can you learn to understand and thus predict physics (the way a child does - I'm not talking about quantum mechanics) from literature only, without interacting in space?
Yes, the child doesn't learn physics he or she learns motor control of its body. Baby learning to talk is more about the brain learning to control its body through throwing its neurological system.
Look at animal kingdom some animals are walking about 5 minutes of being born and running in hours.
Giving the machine a camera and wheels is basically embedding it in the real world. It goes against the spirit of my question, and since I was asking it skeptically it's actually the thing I intuitively (without any relevant expertise!) expect to be required.
Strangely (or maybe horribly?) there are a number of studies of children raised in Romanian orphanages that somewhat cover this area.
Under Nicolae Ceaușescu the government outlawed abortion (with some exceptions) in an attempt to increase the birth rate (https://en.wikipedia.org/wiki/Abortion_in_Romania). Coupled with a poor economy this lead to masses of infants and children being given over to government "care" in orphanages. These were pretty bleak places for infants and children, with infants often spending hours in a crib with little stimulation.
It's probably a mistake to assume that just because it's the way we do it that it has to be the way machines do it. Although that's usually the initial assumption. In the early days of flight most attempts were based on birds, similarly submersible vehicles were based on fish. We know now it's better to use propellers. It could be we just haven't found what is analogous to a propeller for the AI world.
The only general intelligence we know of is us. It stands to reason that the first step towards creating AGI is to copy the one machine we know is capable of that type of processing. Why doesn't our research focus on understanding and copying biological brains? Numenta did, with good results, but it isn't an industry trend.
The steps to learning to make aircraft didn't come from understanding how birds flap their wings. I can't imagine the kind of intelligence we consider general will be from study of how humans biologically think.
> Why doesn't our research focus on understanding and copying biological brains?
There's lots of basic research being done to better understand the biological brain. Progress is slow and steady, but the brain remains poorly understood at this point in time. Most applied research has pursued more pragmatic methods because these methods have had faster progress and proven more useful in practice.
I agree that machines won't necessarily have to do everything the same way we do things to be "intelligent". On the other hand, the concept of artificial general intelligence implies a machine that can perform all of the same functions a human can: "Artificial general intelligence (AGI) is the intelligence of a (hypothetical) machine that could successfully perform any intellectual task that a human being can."[0]
And if you look at what humans do that we characterize as "intelligence", it's a substantial list of different functions: "Human intelligence is the intellectual capacity of humans, which is characterized by perception, consciousness, self-awareness, and volition. Through their intelligence, humans possess the cognitive abilities to learn, form concepts, understand, apply logic, and reason, including the capacities to recognize patterns, comprehend ideas, plan, problem solve, make decisions, retain information, and use language to communicate. Intelligence enables humans to experience and think."[1] If I take an engineering perspective and look at the list of functions that an AGI would have to perform to be an AGI, I would definitely want to give it more than a text-in, text-out interface so that it could have, for example, perception and volition.
For example, imagine a theoretical "AGI" that exclusively deals with a stream of text in (as human language) and a stream of text out (again, as human language). If you ask it any questions about its physical surroundings, it's either going to make things up (which isn't perception and therefore fails both at being useful and meeting the definition of intelligence above), or it's going to get information about its physical surroundings via some proxy that feeds it a linguistic stream of information about its surroundings. But it doesn't matter if it's getting perceptual information from a proxy or from more directly embedded sensory interfaces; if it's getting perceptual information and able to use it sensibly, then it's performing the function of perception. And in that case the proxy source of perceptual information may as well be considered part of the "intelligent" system.
These kinds of articles seem a bit silly to me, because they seem to imply that we should expect to be able to create a system which is "intelligent" but which only performs the function of understanding and producing meaningful language. But if you're only handling language, you're taking away all of the other functions that are part of the definition of "intelligence" above, which leaves you with a system which is far short of anything we'd consider "intelligent".
> This isn't an engineering problem, it's a philosophical problem [...]
Indeed. However we can not rule out the possibility that we engineer a system without a "body" (I think you are referring to the Embodied mind thesis?). It is a complicated topic and discussions about it are futile without precise definitions of loaded terms, like "intelligence", "body", etc.
A rather well-defined test is the classic Turing test and I wouldn't dismiss the possibility that it can be passed by a bodyless machine/program-thing.
> It is a complicated topic and discussions about it are futile without precise definitions of loaded terms, like "intelligence", "body", etc.
On the contrary, to make progress we should give up on making precise definitions of non-technical terms such as 'body' and 'intelligence'. They are folk notions that don't have relevant precise definitions talking about them in an engineering context is distracting.
You don't need a definition of 'beauty' to paint. You don't need a definition of 'justice' to practice law. You don't need a definition of 'intelligence' to build clever robots.
I don't find the Turing Test convincing either, because someone smart enough to build it should be smart enough to recognize it from its answers. And if that depends on the intelligence of the questioner, whose intelligence is tested then, really?
IIRC the test is a binary classifier, but intelligence is a spectrum that's fuzzy and therefore inherently hard to define.
IE, how low is someone willing to assume the lack of intelligence in a human is not a good definition of general intelligence, as that's circular reasoning.
I would suppose that babies possess general intelligence, but they lack the knowledge about the environment.
The Turing Test equivalent in the game of Go is not beating the computer. Instead, it is: given a history of moves, can you determine whether or not a computer was playing? Determining whether or not AlphaGo was playing seems like a relatively easy task to me for the designers. Since they have access to the AlphaGo system, they can just calculate the probability that each move corresponds to one AlphaGo would make.
> Instead, it is: given a history of moves, can you determine whether or not a computer was playing?
No, it's really not. For the Turing Test, the AI is meant to be adversarial—it's objective is to convince you that it is human.
AlphaGo's objective isn't to "play like a human," it is to win. If they gave it an objective of playing like a human, I'm sure AlphaGo could play in a way that would be indistinguishable from a human.
> Since they have access to the AlphaGo system, they can just calculate the probability that each move corresponds to one AlphaGo would make.
Peeking at the system/data is cheating. Obviously the person who sets up a Turing test knows which player is AI.
> If they gave it an objective of playing like a human, I'm sure AlphaGo could play in a way that would be indistinguishable from a human.
It could just play unbelievably bad and appear like a beginner. That wouldn't prove intelligent.
> Peeking at the system/data is cheating
Someone ignorant of computers would hardly ever assume a machine. Of course the omission of this rule would leave someone smarter than the computer.
If you talk statistics, IE the machine has to convince only a fair share of humans, the definition of the threshold is a problem. Intelligence would depend on the development of the society. I thought this is about an intrinsic value.
It's an interesting thought experiment, but hardly conclusive, just observational.
> It could just play unbelievably bad and appear like a beginner. That wouldn't prove intelligent.
Sure, which is why it's not a very good metric. The correct metric for looking at whether computational game intelligence has exceeded human capacity is that computers can consistently beat humans.
To be clear, I'm not convinced that we'll ever make a generalized intelligence which can pass the Turing Test. My point was merely that the fact that humans create the system is not a good argument for why it's impossible: in many domains, we can already create computer systems which vastly outperform ourselves.
Good question. Someone beat it. He and his games as training sets were part of the development of AlphaGo development.
I edited the post, did you read that? You are making my point, you can't bootstrap a definition for artificial intelligence by comparison to humans, when human intelligence is not well defined either.
I read your post, but it's very muddled. You might consider advancing a clearer thesis, because it seems that you are under the impression that it's impossible for humans to build systems which are smarter than themselves.
The first versions of AlphaGo were certainly inferior to human players, but the current version is superior to any human.
> because it seems that you are under the impression
I made a hopeful hypothesis and I opposed immediately that human intelligence might just not be optimized for recognizing intelligence. It is optimized for other things, not to waste energy and because of that it recognizes indeed that to play go very well but nothing else is rather less intelligent.
You do make a strong point there, specialized computers are stronger than humans in a specific task, but we are talking about general intelligence. I have to admit, too, that I have a hard time getting the bigger picture and get confused to easily. I didn't read any of the literature that would rather well define the problem, as the OP put it, so the discussion is likely less informative.
In my opinion, the comparison is unequal, still, because the Computer used a ton more resources and memory. There aren't enough go professionals to put together and let their averaged opinion learn and play, consuming as much energy.
But that's a big if. You've just taken one really hard problem (learning about the world) and turned it into an even harder problem (simulating the world).
I second this. To further illustrate that semantics is far more than syntax and that emotions are an inseparable part of understanding language, do the following experiment: take any word or concept and look up its definition in a dictionary. Take note of the words used in the definition and look up the meaning of those words. Repeat recursively and if you go far enough down the tree you will find that all words lead to self-recursive definitions that state "it is a feeling of" this or that. This is proof that semantics of language - and intelligence required to grasp it's semantics - is inseparable from subjective states.
What does the experiment prove? Of course dictionaries are circular. But that doesn't mean they don't contain huge amounts of information about the world, and about language. Information an AI could infer without any interaction with the outside world at all.
This means that the semantics of the language is rooted in subjective states. Restated, this means humans "understand" language because of humans' emotions. Computers may "understand" language too, but it surely will not be due to the subjective states as it is with humans. If we define AI as a computer that must "understand" language the same way as humans do, then by definition, AI is not possible.
This is why Turing invented his famous test. At the time people were arguing about what it would take to prove a machine is intelligent. People argued that the internal properties of the machine mattered, that it needed to feel emotions and qualia just like humans.
But Turing argued that if the machine could be shown to do the same tasks that humans can do, and act indistinguishable from a real human, then surely it's intelligent. Its internal properties don't matter, at least not to judge intelligence.
Exactly. The computer will fail such a test where such tests can only be passed if and only if the agent under test experiences subjective states. Since humans can always craft this class of tests, and the computer cannot pass it, it will always fail the "Turing Test".
What test could possibly test for subjective states? You can ask the computer how it feels, and it can just lie, or predict what a human would say if asked the same question. There's no way to know what the computer actually feels, and it doesn't really matter for this purpose.
The easy answer is this: these tests exist. Since no computer put to the turing test has passed, simply look up the test and observe how humans have induced the computer to fail.
In practice, a good class of tests to use is a test that must evoke an emotional response to produce a sensical answer. An example is art interpretation. Questions involving allegory. Interpret a poem etc.
Important to note that whatever the challenge is, it must always be a new example - as in never been seen before. Anything that is already in the existing corpus, the computer can simply look up what is already out there. In other words, there is no one concrete thing you can use again and again repeatedly.
Example of test that would foil a computer: A personally written poem and having discussion about it.
This sort of anthropomorphic intuition pump is counter production. It would be much easier to believe that a baby could do that if we designed the baby's brain from scratch to be able to.
In this vein, have you read about the work done by people mostly from the AI lab at the Vrije Universiteit in Brussels (Belgium)? (They're also affiliated with the Sony CSL in Paris: http://csl.sony.fr/language.php) They're precisely interested in the philosophical problem of how a grounded language emerges and is perpetuated among a population of embodied agents, as opposed to the engineering problem of, say, understanding complex, context-dependent natural language queries.
There's a great book which gives an overview of this field, The Talking Heads Experiment: Origins of Words and Meanings by Luc Steels, which discusses many of the advances made in this field (including for instance how having a grammar, as opposed to just stringing words related to what you want to say at random, is an evolutionary advantage because it boosts communicative success). It's published as open access, so go grab your free copy! :)
Chapter 4 in particular has a very interesting discussion of what's problematic with the machine learning approach -- that it takes a lot of training examples for a classifier to start making interesting decisions -- and presents a selectionist alternative to that, where distinctions (as in e.g. nodes in decision trees) are grown randomly and they're reinforced / pruned based on feedback. Crucially, the categories (semantic distinctions) are not labels given at the outset, but they emerge along with the language, based on the environment the agents encounter and the tasks they're using language for.
In general, I'd recommend Chapters 1 and 2 for a quick introduction, but in a pinch, I attempted to give a 50,000-foot summary in an essay I wrote (look under the heading Evolutionary Linguistics):
I realize that engineering applications of these ideas might be a long way off (and perhaps they'll never materialize), but boy are these exciting discoveries about the very fabric of language :)
If you could simply program a human's behavior into a machine, that would be fine, but humans aren't capable of encoding their neural circuitry in a programming language; in fact, the combined efforts of all humans have only begun to shed light on what human behavior is. As such, generating formalized information (that describes behavior) requires some process (other than human hands) to do it -- and in the case of a machine that interacts with the world, "turing machine" is not a complete description of any such process.
By Shannon's information theory, everything is bits of information. Completely irrelevant for the mechanism of a Turing Machine is, who writes the initial bits on the tape. For all I care, the world is the tape and the computer is the head being moved through it. It's the old fallacy seeing the brain as a computer, since we've build computers after structures from our brains. Hence I wondered, what more there should be.
> you end up with something which is not particularly different or better than a human cyborg.
Even if that were true, which I am not sure about, such a robot will have a very different moral status, as an artifact, and one that can be reproduced cheaply and indefinitely. This is very useful indeed, so on this axis it could be counted as 'better'.
The idea of development being important for AI is an old one, but it hasn't had much concrete success. Brook's Cog robot at MIT is a prominent example of a robot that didn't do very much, despite this approach being taken in a good faith effort by talented and well-supported people.
Humans seem to be created cheaply and indefinitely too. Population is growing out of control. We could attach them to pods and harvest energy from their souls!
Here is the reason why I don't believe in that: First of all, it seems that mind space is huge, i.e. there are very different programs that can lead to general intelligence, many of which will be very different from humans (this along is evidenced by the fact how strongly human characters and intellects vary, e.g. highly functioning mental conditions). A lot in machine learning points to that possibility. There are for example many ways to get supervision signals e.g. reconstruction error, prediction error, adversaries, intrinsic motivation (reducing the number of bits required for a representation), compression, sparseness etc.
We basically just need a system that comes up with efficient representations of the world such that it can reason about it, i.e. that it can tell you which hypotheses about the world are likely true given some data. This computation allows you to make predictions and predictions are really at the heart of intelligence. If you can follow a hypothetical trajectory of generated, hallucinated or simulated samples of reality, i.e. samples that likely correspond to what actually happens in the world (and in the agent's own brain), then you can actually perform actions that are targeted at some purpose (e.g. maximizing reward signals). However, there are many sources of data that essentially give you the same information. Whether you create a representation by directly interacting with the world or just watch many examples of how the world generally evolves over time and how different entities interact with one another, you essentially get the same idea about how the world works (except in the first case you also learn a motor system). I think the anthropomorphism is really misplaced here, because computer systems are not dependent on actually performing in the real world. Since computers have near unlimited, noiseless memory and have super fast access to that memory, they can actually plan interactions by careful reasoning on the fly instead of needing to learn e.g. motor skills for manipulating objects, eating and handwriting before one can get anywhere near the performance of computers wrt. access to reliable external memory. A computer system also does not have hormone and neuromodulator levels that need to be met for healthy development (e.g. dopamine), therefore the intuition that deprivation of interaction with the world prevents learning is extremely misleading.
> They need bodies that are biologically connected with the rest of the biosphere
On what basis do you conclude that "biologically" is important here? There may be some reason to suspect that human-like intelligence requires human-like ability to sense and interact with the outside world, but I see less reason to suspect that it is important that the mechanism of the sensors or manipulators must be biological.
In fact it is a very hard engineering problem, which doesn't exclude a "philosophical" one. There's a line of research surrounding anthropomimetic robots with the specific task of studying cognition, going back at least a decade.
Helen Keller's life argues against the proposition that a machine requires the same sort of interaction with its environment that the average human experiences, before it can achieve intelligence.
She was blind and deaf, but she still had enormous amounts of tactile information and the ability to physically interact with her environment. Moreover, those sensory elements were critical for her to finally be able to start learning language--she finally caught on that the signs another person was making in one hand represented the water being run over her other hand. And she didn't make that breakthrough until she was 7 years old, and only then began to learn with the persistent help of an instructor.[0]
I would say that Helen Keller's life argues that a intelligent machine must be able to have experiences and the capacity to associate experiences with language. The machine probably doesn't need all of the perceptual modalities that we have, as Helen Keller demonstrated, but it should probably have some similarities with our own so that there would be common ground for initiating communication about experiences. A machine with just a text in / text out interface has nothing in common with us.
So Keller's "enormous amounts of tactile information and the ability to physically interact with her environment" means that her achievements were not that remarkable? Speaking personally, if I were to lose my sight and hearing, I imagine I would find life extremely daunting, even though I have what seems to me the very considerable advantage of having learned language.
Anne Sullivan's equally remarkable (IMHO) role as a teacher is not really an issue here, as training is also an option for AI, though it might be evidence in a rather different discussion about whether unsupervised learning alone, particularly as practiced today, is likely to get us to AI (clearly, the evolution of intelligence can be cast as unsupervised learning, but that is a very long and uncertain process...)
This isn't an engineering problem, it's a philosophical problem: We are blind to most things. We can only see what we can relate to personally. We can use language and other symbol systems to expand our understanding by permuting and recombining our personal experiences, but everything is grounded in our interactive developmental trajectory.
The kitten-in-a-cart experiment demonstrates this clearly: http://io9.gizmodo.com/the-seriously-creepy-two-kitten-exper... Interaction is crucial for perception. Sensation is not experience.
And here's the rub: Once you give an AI an animal-like personal developmental trajectory to use for grounding the semantics of their symbol systems, you end up with something which is not particularly different or better than a human cyborg.