If you enjoyed his blog posts, I highly recommend watching his talk on "Automated Image Captioning with ConvNets and Recurrent Nets". In it he raises many interesting points that he hasn't had a chance to get around to fully in his articles.
He humbly says that his captioning work is just stacking image recognition (CNN) on to sentence generation (RNN), with the gradients effectively influencing the two to work together. Given that we've powerful enough machines now, I think we'll be seeing a lot of stacking of previously separate models, either to improve performance or to perform multi-task learning. A very simple concept but one that can still be applied to many other fields of interest.
: One of the earliest - "Parsing Natural Scenes and Natural Language
with Recursive Neural Networks" http://nlp.stanford.edu/pubs/SocherLinNgManning_ICML2011.pdf
yup. this is the first time I understood someone from this field. Honestly, this dude just broken down the wall.
What's more important, passion flows through his writing. And it can be felt. I got so excited while reading it.
As a bonus, there's an ongoing class on deep learning architectures for NLP which covers Recurrent (and Recursive) Neural nets in depth (as well as LSTM's and GRU's). Check out cs224d.stanford.edu for lecture notes and materials. The lectures are definitely being recorded, but I don't think they're publicly available yet.
His username is badmephisto if you're interested.
(a) He seems to be very intelligent. Kudos. But…
(b) How good of an idea is it really to create software with these abilities? We're already making machines that can do most things that had once been exclusive to humans. Pretty soon we'll be completely obsolete. Is that REALLY a good idea? To create "face detectors" (his words!)?
Our relevance is ephemeral, but our influence will be lasting. Do we want to have a legacy of clinging to our personal feelings of importance, or of embracing the transience of our existence and nurturing our (intellectual) progeny?
Assignments 1 and 2 alone give a solid intro to implementing these algorithms, and the lab-oriented iPython-based format gives you a very high probability of writing a correct implementation even if you're clueless at the start.
I wonder if all that's missing is just a few more layers, and another source of input. Maybe a list of requirements/output/input matched with the code so it understands why what was written was written. I wonder what would happen if you ran the program, took the output, and fed it back in as input.
Really cool stuff here.
As an exercise, when I think of the word "circle", images of circles and spheres show up in my head. Also the equation of a circle. My quick definition of it would be "a perfectly round object" which leads to questions of what "round" and "perfect" mean. The more I think about it, all my knowledge seems quite circular in that there are no axiomatic concepts, everything is relative and it just builds on itself. I wonder if that's the key to decipher meaning, increase the connections of the web -- with strong enough references you can pinpoint which of the nodes in the web something refers to.
In the case of this article, the NN isn't being asked to do any abstract task like "decipher meaning", but the very concrete task of "predict the next word". As the article shows NNs can do this fairly well.
There is also a evidence that they can learn very high level knowledge about words and objects. See the success of word vectors: http://technology.stitchfix.com/blog/2015/03/11/word-is-wort...
There seems some evidence that this stuff is fairly central to human intelligence and the ability to visualize in 3d is kind of hard wired. Deciphering meaning is approximately "seeing what it means" which can correspond to visualizing it in your head. For example "the cat sat on the mat" is a bunch of symbols but someone or some machine can convert that to an image of a cat sitting on a mat then I guess they've understood it.
As one of the other commenters pointed out - it is like a tree (words/concepts) branching out from one another. I would be fascinated by seeing if this research can be continued into adulthood, where the individual "concepts" aren't as important as the interplay between them.
I once asked a similar question on some online forum  where many linguists hung out. My question was if an English-only speaking household left a general interest Spanish language TV station on most of the time when they weren't actively using the TV to watch something, so that their child received a very large exposure to Spanish language programming (news, sports, soap operas, sitcoms, movies, etc) from birth onward, would the child naturally learn Spanish?
I don't recall for sure what the linguists who responded said, but I think they all said the child would not learn Spanish from this.
 I have no recollection of where this was.
If the child will actually watch the Spanish TV he will learn the language.
EDIT: Even now I often learn new japanese words (and remember them) just by watching animes. The difference is that now I have english subtitles but back then I had no subtitles, only the images to help me understand the meaning.
Not knowing basically anything about AI state of the art what stops us from feeding a RNN image data and text data and make it correlate them automatically by context? Just like a child learns words by hearing them many times in similar contexts so could a RNN.
I imagine the biggest problem is gathering and structuring the data. We humans receive lots of data and have lots of time to process it in our lives compared. And by lots I mean difference of a few orders of magnitude. It's amazing what this thing learns in just a few hours of processing.
I've picked up quite a bit of Russian by watching Discovery channel this way.
The methods involve providing more detailed feedback at each example. With most training data used now, we give a 0 or 1, does this example belong to this class. In the teacher networks, they were able to teach with more subtly: this is definitely not a car, it is very lizard like and a little snake like.
 - https://www.youtube.com/watch?v=EK61htlw8hY
Although for obvious reasons this is very hard to study experimentally:
Seems it would be far harder to infer the basic initial structure from just plain text.
Put another way there is nothing magical about a child learning about the world. A child's brain is just a large neural network being fed patterned data over the course of many years by a variety of extremely high resolution analog sensors. Eventually the child begins to respond to the patterns.
Second, the 3D topology of a neuron is IMO more complex than reducing it to an FP32 activation threshold (all IMO of course).
Finally, I have to admit as a former biologist, I'm intrigued by microtubule activity and it seems like Dileep George and even Geoffrey Hinton are heading towards smarter but fewer neurons as opposed to just increasing the neuron count. Not surprisingly, the deep learning digerati are resisting this notion mightily just like the SVM peeps harped on neural networks until they kicked them in the keester.
TLDR: It's still early, and I'm biased that there are some interesting twists and turns yet to unfold here.
If you can computationally define how different common neurotransmitters affect the function of neurons at a broad, high level, then you can create your "psychoactive drug" by just writing a routine that excessively applies the function that those neurotransmitters represent.
An artificial serotonin reuptake inhibitor would just allow the serotonin-like activity to more active in the model.
Seems a bit early to jump to the conclusion that we understand cognition. We don't. I agree that there is nothing exotic or metaphysical about brain meat, but really we're still feeling around in the dark with respect to how thinking occurs.
I'm confident that we'll get there eventually though.
mimicking the brain's power-consumption-to-compute-power ratio is difficult, if not impossible, with today's technology.
an aside: since reading an article about the potential role of quantum mechanics in photosynthesis, i've wondered, as a lay person, whether quantum mechanics play a role in human cognition.
Why not? Your brain isn't magic, just highly associative. We can do the same thing with computers real soon now.
Haven't people been saying this for decades? AI has a long history of impressive results, but somehow none of them have actually produced "thought".
Nobody even understands how the brain "thinks" at a neural level, let alone how to model that. All we can do at this point is try different models (which way or may not actually match reality) and hope we find one that works. But there's no evidence that we'll find a working model "real soon now". Impressive results that we can kinda-sorta imagine being the product of an intelligent system haven't historically been enough.
A handful of years ago I put together a computer fully loaded that gave me 1 teraflop of commuting power.
Today I can put together a computer the same size that will give me 32 to 50 teraflops of programmable computing power.
Many of the "AI" advances since 2007 are just running old 1970s-1990s AI algorithms on faster and faster and more parallel hardware. If you have to train a model for a few hundred trillion instructions, but your CPU only does 20 operations per second (and you have to share it with 1,000 other people), you can't iterate your science fast enough to make progress. Now we can iterate our science almost too quickly.
> how the brain "thinks" at a neural level,
Planes don't fly like birds. Birds don't fly like bees. True AI doesn't have to replicate mammalian (or avian or reptilian) neural topology.
I also agree that AI will never be "human" (i.e. it will be different), however without understanding how the human brain works, what chances do we have to create AI?
And we have yet to crack that nut. We have yet to understand even high-level stuff in detail, like how information is flowing from short-term memory to long-term and how we forget and why we do that (i.e. forgetfulness is surely an evolutionary trait). A brain is also fascinating in how it recovers from serious strokes by re-purposing brain structures. We have yet to produce software that is that sophisticated. And we don't even understand the brain from a biological perspective yet.
Surely huge progresses have been made, but on the other hand we may still be hundreds of years away and there's a very real possibility that we lack the intellectual capability, or maybe the resources to do it (we have a history of settling for lesser solutions if we stop seeing financial benefits, like with space exploration).
Turing-complete platforms are universal simulators. There's nothing they can't represent.
> like how information is flowing from short-term memory to long-term
Sure, we know that. The little seahorse helps out.
> re-purposing brain structures
rudimentary artificial neural nets do the same thing. they also self-specialize automatically with no innate programming (line detectors, edge detectors, eye detectors, cat detectors, all the way up—automatically).
> we may still be hundreds of years away
lol. nope. gotta think exponentially.
> lesser solutions if we stop seeing financial benefits, like with space exploration
can't do space exploration without the approval of a nation-state. can do AI tomfoolery in your own basement with nobody else finding out until it's too late.
No existing computer is a universal turing machine. The infinite ram requirement is pretty hard to implement in practice.
And yet it is less limiting and more efficient than pretty much all analog computing devices we have built. I don't think the hardware is the issue anymore, I suspect that with the right models and training we can have thinking machines.
I could have misinterpreted their work, though, as I'm far from an expert, but that's what it sounded like to me.
I think what the parent is trying to say is not that it's easy (it's not) but that there is nothing, in principle, to stop us from writing a program that acts like a brain.
It's also not pure algorithm, it's a physical entity, tangible and with real world properties and interactions.
Who said (or proved) it's just an information processing device?
So are computers.
Whether that's the case in human cognition remains to be shown (else we're taking for granted what we're trying to prove).
That's not correct.
> The physical properties of the computer don't matter in this regard.
> You could do exactly what a computer does with pen and paper (it would just take a much longer time).
Yes, and that time matters greatly as it's the difference between practical and hypothetical. Beyond that, programs that can evolve their hardware have been shown to come up with optimizations no human could have created and thus the physical properties of the computer do matter.
If we can bridge the simulated world to our world then we can interact with it.
Being in different worlds does not imply that it can never reach conciousness (among other properties). To imply that is invoking magic.
Anything from our world can be simulated.
To be literally "as real in its world (as we in ours)" several things need to happen:
1) its world should be an 1-1 simulated mapping of our world. Perhaps not to its whole extend (e.g. not the whole universe), but to ANY extend that affects the final result.
2) its world should have randomness equivalent to the quality of randomness (not sure if it's perfect) that our world has.
As for "Anything from our world can be simulated" -- that's a bold claim, provided that we haven't simulated ANYTHING at all yet, to the degree of interactions and complexity that exist in our world.
When we simulate the behavior of water in a fluids physics simulation, or the behavior of planets etc, it's amazing how much stuff we leave out. Our simulations are to a full-blown simulation what South Park cut-outs are to a photograph.
Besides, this notion reminds me of the naive 19th century ideas, that they could predict the course of the universe if only they had the details (motion, momentum, weight, etc) of all objects and the capacity to calculate their interactions. QM put a hole in that.
As for 2) likewise, randomness isn't a requirement, you're arbitrarily picking one quality and saying that quality has to be identical for it to be real. why? I don't believe that for a second.
I'm fully aware of the simplifications of simulation... being simplified compared to an external universe does not change the premise of it being real to its inhabitants.
Quantum Mechanics does not say that the universe is not mechanistic, just that there is a random element (that in itself may ultimately be modelled).
It has, if it has to be "moisture" and also to be "just as real".
Else, you can define as "moisture" any parameter in the simulation (since it can be "whatever it likes").
E.g. the property of being "alive" in Conways Game of Life.
In what sense will that be a simulation of "moisture" and "just as real" inside the simulation as moisture is to us?
A child knows that if it says "Mama food," it is likely to get attention, and if it gets attention, it is likely to minimize its hunger. Right now, a neural network can be trained to know that "Mama" occurs often in human dialogue, what words occur around it, even its dictionary definition and images of mothers. But it's not making the deeper connection to a strategy that minimizes hunger.
When I think about this, I wonder if insights from the world of gaming "AI" would be useful in developing the training datasets for real AI. Because you can't be a mother to a billion virtual babies, but you might be able to program a set of heuristics to be a mother to a billion virtual babies. Then you have some system that trains on their life experiences...? All speculation, but very interesting stuff.
See any of the recent papers from Google DeepMind, such as  or their most recent work which is startlingly good 
The problem is simulations of the brain are not "machines", they are algorithms, e.g. they assume everything is happening at the information processing level.
To use your own example, we can design an algorithm to simulate making coffee. But the algorithm can never make coffee -- unless it's fitted and connected to a coffee making apparatus.
Or take something being "wet" for example. We can emulate the motions and powers in play in liquids, but not "wetness" in the sense of the physical property (moisture etc). If something depends on it, e.g. the emulation actually watering some actual flowers, then it will fail. An emulation can only water emulated flowers.
Simulations are executed on concrete machines that exist in the real world. Algorithms are abstract concepts.
> e.g. they assume everything is happening at the information processing level.
Everything does happen at the information processing level. Any kind of physical process can be seen as a type of information processing. Information processing is not an abstract concept like an algorithm, for it to occur requires the time-evolution of concrete physical processes.
> We can emulate the motions and powers in play in liquids, but not "wetness" in the sense of the physical property (moisture etc).
The physical property is experienced as sensory input. Machines can have sensory input.
> An emulation can only water emulated flowers.
You are asserting that virtual reality is different from reality, which is true. That's not the GP's question. The question is whether there is a fundamental difference between machines in the real world (with sensors and arms and so on) and the human body and brain.
This is pure philosophy, as no one yet knows the answers, but what if brain-like intelligence is an emergent property of non-deterministic processes? Wouldn't it then follow that a classical computer could not be able to compute the "think function" before the heat death of the universe?
personally my intuition says that strong AI cannot be encoded in silicon, or that it is a victim of the halting problem. I think we need a different substrate on which to model cognition. Or maybe not. Who knows?
That's an irrational and indefensible position.
Ah, so strong AI is finally here. A computer program that makes just the same mistakes as humans when writing in TeX.
I believe Markov Chains as a model quickly become inefficient (specially memory-wise) as you increase the complexity (long range correlations) of your prediction. It's an unnecessarily restrictive model for high complexity behavior that state of the art RNNs skip entirely.
If your prediction is good enough that you can always come up with two possible predictions for each character, each of which has a 50% chance of being correct, then obviously you can compress your input down to one bit per character by storing just enough information to tell you which choice to pick. More generally, you can use arithmetic coding to do the same thing with an arbitrary set of letter probabilities, which is exactly what you get as the output of a neural network.
When the blog post says the model achieved a performance of "1.57 bits per character", that's just another way of saying "if we used the neural network as a compressor, this is how well it would perform."
Ther deat is more; for in thers that undiscorns the unwortune,
the pangs against a life, the law's we know no trave, the hear,
thers thus pause.
Here's some character-by-character output from the same Markov Chain model.
T,omotsuo ait pw,, l f,s teo efoat t hoy tha fm nwo
bs rs a h enwcbr lwntikh wqmaohaaer ah es aer
do'ltyuntos sih i etsoatbrbdl
maybe the computer was drunk?
Second, even if it was, really? As if we see plays on Kundera titles regularly on the web?
I was thinking that both Eugene Wigner's 1960 article 'The Unreasonable Effectiveness of Mathematics in the Natural Sciences' and Karpathy's 'The Unreasonable Effectiveness of Recurrent Neural Networks' probably touch deep aspects of the nature of existence. The first on why the universe exists and is mathematical - because at the fundamental level it is mathematical, and in Karpathy's case the RNNs are probably effective because they are close to the mechanisms of human consciousness.
 Wigner's article: http://www.dartmouth.edu/~matc/MathDrama/reading/Wigner.html
 'physical world is completely mathematical' theory: http://en.wikipedia.org/wiki/The_Unreasonable_Effectiveness_...
It's an excellent read for anyone interested in learning about recurrent neural networks.
Nitpick: although tty == tty is, as you say, vacuously true in this case, that's just because tty is a pointer. If tty were a float, this wouldn't be the case, since it could be NaN. I wouldn't be surprised if it learned to test a variable for equality against itself from some floating point code.
It would drive those who attempt to understand & reference it absolutely crazy. :D
This kind of demo shows that deep neural networks can capture the structure of language, if not the semantics, in a very general way. And we have separate evidence that they can (in principle) capture semantic meaning and algorithmic reasoning as well, for example: http://arxiv.org/pdf/1410.5401v2.pdf (the "neural Turing machines" paper from DeepMind)
(And I mean plain markov chain, not something with additional logic that understands code structure)
comment by samizdatum shows pretty well how well markov chains work without some tweaking.
Take this example of code processing, and instead front it with a parser that generates an AST. For now, an actual parser for a single language. Maybe later, a network trained to be a parser. The AST is then fed to our network. What could we get out of the AST network? Could we get interesting static analysis out of it? Tell us the time and/or space complexity? Perhaps we discover that we need other layers to perform certain tasks.
This, of course, has parallels in language processing. Humans don't just go in a single (neural) step from excitation of inner ear cells ("sound") directly to "meaning". Cog sci and linguistics work has broken out a number of discrete functions of language processing. Some have been derived via experiment, some observed via individuals with brain lesions, others worked out by studies of children and adult language learners. These "layers" provide their own information and inspiration for building deep learning systems.
There is no need for to produce readable code, it makes it easy for humans, but computers have no problems with generating and subsequently understanding unreadable assembly.
For a brief while RNN-NADE made an appearance as well, though I do not know of an open source implementation
There are also a few of us who are working on more advanced versions of this model for speech synthesis, versus operating on the MIDI sequence. Stay tuned in the near future!
I can say from experience that some of the samples from the LSTM-DBN are shockingly cool, and drove me to spend about a week using K-means coded speech. It made robo-voices at least but our research moved past that pretty fast.
You can make money out of that kind of thing btw!
(Obviously not the same thing but the point is that silly robo-voice code is marketable :)
Here's a Bach-inspired computer-generated song:
With neural nets NOBODY really understands how they work.
Then again, this is essentially black magic to me:
However, you won't be able to understand why or how it works. That also means you won't be able to modify/improve/fix it using systematic methods. Only trial and error and it will be 'error' most of the time.
They will if it gives the answers they want to hear. History is full of critical decisions based on ridiculous pretexts or unclear processes.
That's the (morally neutral) wonder of the market--it'll beat ideological or emotional objections into the ground, for better or for worse.
And sooner or later, someone might start a company where all decision making is performed by a neural net...
Eliezer Yudkowsky would likely disagree with you: http://www.yudkowsky.net/singularity/aibox
EDIT: Also - http://www.explainxkcd.com/wiki/index.php/1450:_AI-Box_Exper...
" Never! Companies would never sacrifice principle and safety to save money! "
People are always worried about "computers taking factory jobs" resulting in mass unemployment, but the truth is, a rudimentary AI with acceptance tests on output will obsolete every programmer alive.
Hell, half the programming people do these days is just gluing APIs together then seeing if it actually works. It doesn't take 16 years of rich inner human life experience to accomplish that, just exhaustive combinational parameter searching on the subset of API interactions you're interested in evaluating.
I'm all for it, it's going to be a productivity gain. It's like going from a manual screwdriver to a motorized one.
With ngrams, Markov models are perfectly sufficient. With individual characters, complex concepts need to be remembered across many, many characters of input.
Deep learning has made great strides in recent years, but I don't think architectures which aren't recurrent will ever give rise to mammalian "thought". In my opinion, thought is equivalent to state, and feed forward networks do not have immediate state. Not in any relevant sense. So therefore they can never have thought.
RNNs, on the other hand, do have state, and therefore are a real step towards building machines that posses the capacity to think. That said, modern deep learning architectures based around feed forward networks are still very important. They aren't thinking machines, but they are helping us to build all those important pre-processing filters mammalian brains have (e.g. the visual cortex). This means we won't have to copy the mammalian versions, which would be rather tedious. We can just "learn" a V1, V2, etc from scratch. Wonderful. And they'll be helpful for building machine with senses different than biology has yet evolved. But, again, these feed forward networks won't lead to thought.
My second musing is where I think the next leap in machine learning will occur. To-date efforts have been focused on how to build algorithms that optimize the NN architecture (i.e. optimize weights, biases, etc). But mammalian brains seem to posses the ability to problem solve on the fly, far faster than I imagine tweaks to architecture could account for. We solve problem in-thought, rather than in-architecture; we think through a problem. Machine Learning doesn't posses this ability. It can only learn by torturing its architecture.
So, I believe there is this distinction to the learning that mammalian brains are able to do on the fly, using just their thoughts, and the learning they do long term by adjusting synaptic connections/response. It seems as if they solve a problem in the short term, and then store the way they solved it in the underlying architecture over the long term. Tweaking the architecture then makes solving similar problems in the future easier. The synaptic weights lead to what we call intuition, understanding, and wisdom. They make it so we don't have to think about a class of problems; we just know the solutions without thought. (Note how I say class of problems; this isn't just long term memory).
Along those lines, I come to my final musing. That mammalian brains are motivated by optimization of energy expenditure. Like anything in biologically evolved systems, energy efficiency is key, since food is often scarce. So why wouldn't brains also be motivated to be energy efficient? To that end, I believe tweaking synaptic weights, that kind of learning that machine learning does so well, is a result of the brain trying to reduce energy expenditure. Thoughts are expensive. Any time you have a thought running through your brain, it has some associated neuronal activity associated with it. That activity costs energy. So minimizing the amount we have to think on a day-to-day basis is important. And that, again, is where architecture changes come in. They are not the basis for learning; they are the basis for making future problem solving more efficient. Like I said, once a class of problems has been carved into your synaptic weights, you no longer have to think about that class of problems. The solutions come immediately. You don't think about walking; you just do it. But when you were a baby, I'll bet the bank that your young mind thought about walking a lot. Eventually all the mechanics of it were carved into your brain's architecture and now it requires many orders of magnitude less energy expenditure by your brain to walk.
So, the obvious question is ... how do mammalian brains problem solve using just thoughts. The answer to that, as I mentioned, is likely to lead to the next leap in machine learning. And it will, more likely than not, come from research on RNNs. What we need to do is find a way to train RNNs that are able to adapt to new problems immediately without tweaking their weights (which should be a slower, longer term process).
P.S. Yes, I know this was probably a bit off-topic and quite a bit wandering. I've had these musing percolating for awhile and don't really have an outlet for them at the moment. I hope it's on topic enough, and at least stimulates some interesting discussion. Machine learning is fascinating.
That doesn't square with empirical reality. Evolved biological systems appear to be optimized for robustness to perturbations, not efficiency (John Doyle argues that there is in fact a fundamental tradeoff between robustness and efficiency, for all types of complex systems not just biological).
> how do mammalian brains problem solve using just thoughts.
They don't. Sensory input is required for brains to learn new classes of problems.
> find a way to train RNNs that are able to adapt to new problems
Is this something different than multi-task learning?
Sensory input is required to gain the knowledge, but then you can just as easily muse over your gained knowledge for further insights in a sensory deprivation chamber as you can in a classroom.
Feed-forward networks do have state, but all the useful parts all obtained through explicit training (ye olde backprop, ye older hebbian). The typical scenario is "train model (write mode), deploy model (read-only mode)," which as you point out, has no "thought" since at runtime, no changes or introspections are happening.
> So therefore they can never have thought.
The key idea here would be: generative models. Most current AI fads are driven by discriminative models (image recognition, speech recognition, etc) which provide very narrow "faster than human" output, but, as you point out, have no thought or will or motives of their own.
But, once you have a sufficiently connected network, you can start to ask it open-ended questions ("draw a cat for me") in the form of sampling from the network (gibbs sampling, MCMC, ...) and it fills in the blanks.
The extra oomph of providing actual agency and intent and desire to the model is an exercise left to the reader.
> (which should be a slower, longer term process).
Sleep is a requirement of all things with neural network based brains as far as we know.
Suri and Shultz argue that dopamine in the mammalian brain follows the "reward prediction error" from Reinforcement Learning [doi:10.1016/S0306-4522(98)00697-6] (Indeed the DQN paper mentions dopamine in the very first paragraph.)
Because of this, I am very excited about DQN. (I do think that it's only a building block towards building a self-aware brain, though.)
1) Take the entire works of several popular content creators in a given field, complete with links out to articles etc.
2) Concatenate them into a single file
3) Train this thing to generate new articles
4) Create a map of popular articles that other people have written, to articles you have written on similar topics
5) Replace the originals with your articles
6) Publish millions of articles that can't be detected as spam automatically by Google
It's like bot wars: Spammers can train their robots to try and defeat Google's robots.
i mean, it's not like that's exactly what's happening right now.
I then buy, say, 1,000 domains. Doesn't matter what they are -- Or I buy 100 domains and setup 300 tumblr blogs, and 300 blogger blogs and 300 wordpress.com blogs.
Now I drip feed content to each of those blogs, but instead of linking to the articles on content marketing that kissmetrics and neil patel originally reference, I link to articles I have created instead.
How can Google tell the difference between a tonne of nobody bloggers link to Neil Patel's articles, and my bots linking to my articles? The fact is that if you blog on niche topics, with good article titles reflecting low competition long tail keywords, you'll get some traffic from Google pretty easily -- how can Google possible tell that links are coming from shitty bot generated pages versus from a tonne of obscure bloggers with virtually no audiences (of which there are thousands)?
The way they can tell the difference is Panda (or Penguin? I think it's Panda ... ) so as long as your pet robot can learn from Neil Patel and Kissmetrics well enough to produce content that cannot be penalised by Panda, and so long as you don't do it stupidly by like, having the same anchor text for all the articles and doing 1,000 articles overnight and actually phase it in so that it looks as though you're getting some reasonable organic spread, you'll be able to game Google's rankings pretty reliably for your real articles that you're trying to promote, and get higher volumes of traffic to those articles than you would be able to by just focusing on niche, long tail articles (for example because you'd be able to get on page #1 or in the top 5 for much higher volume keywords).
You would then get shares etc. for your actual content -- just because those "spam farms" don't have social shares or backlinks from PR6 blogs doesn't mean Google completely disregards them, just means that you need a lot more of them to make the same impact as lots of shares/backlinks from PR6 blogs.
This strategy is old, and was killed by Panda, but if you could beat Panda using a RNN then this would work again.
An AI is a computer doing those things a computer cannot do. As such, anything that a computer cannot do isn't AI, and anything a computer can do isn't AI either.
Writing a program to play Chess is not AI but doing so has helped figure learning out.
The form of the title has become a common trope.
Might there be properties of our biological brain that silicon can't capture? Is this related to the concept of computability? I'm not suggesting that there is a spiritual or metaphysical component to thinking. I'm not, I'm a materialist through and through. I just wonder if maybe there is some component of non-deterministic behavior occurring inside a brain that our current silicon-based computing does not capture.
Another way to ask this is will we need to incorporate some form of wetware to achieve strong AI?
Most researchers believe that brains are Turing machine equivalent, therefore can be simulated by any other equivalents. Even Gödel believed this, though he believed the mind had more capabilities than the brain. As a materialist, you would share the commonly-accepted view and reject his latter claim.
There is a small minority of philosophers and physicists who believe that there are meaningful quantum reactions happening in the brain, distinguishing them from classical computers. Some recent computer simulations have shown this to be plausible, but the general impression is that it seems unlikely, and we don't have specific evidence of effects of this sort.
Quantum effects of certain sorts are computationally infeasible to perform with classical computers. And it's theoretically plausible that such effects can not be conducted at scale with in-development quantum computer technology, and is only practical with organic chemistry, but again, this is quite a minority view.
It's also possible that classical brain features, such as its massive concurrence or various clever algorithms, prove difficult to replicate or simulate. If these are easy problems to solve, then strong AI may arrive in decades; if very difficult, centuries. In the latter case, it seems plausible that incorporating wetware would be a useful shortcut. But there's good reason to believe that the practical disadvantages of wetware (e.g. keeping it alive, coordinating with its slow "clock speed") overwhelm the computational conveniences.
> There is a small minority of philosophers and physicists who believe that there are meaningful quantum reactions happening
I wonder why this is a minority view. Bear in mind that I am an armchair scientist, but I recall reading that meaningful quantum effects are responsible for the efficiency of photosynthesis. It seems quite plausible (due to the electro-chemical nature of brain functioning) that there might be similar effects present in the brain.
I thought the difference is that a RNN allows connection back to previous layers, compared to a feed-forward net. Not this talk about "fixed sizes" and "accepting vectors". Or am I wrong?
In this case, his point was that one way RNNs differ from FFNNs is their ability to accept arbitrarily sized inputs and generated arbitrarily sized outputs. That's pretty important, which is likely why he emphasizes it.
But the rest of the article shows the salient point; RNNs are NNs that hold a state vector.
Saying that RNNs are NNs that allow connections back to previous layers is true, but that's only one way of looking at it. Holding state is another, since it implies backwards connections. Feedback is another term. And because they have backwards connections, state, feedback, etc, they also posses the capacity to handle non-fixed sized inputs and outputs.
In summary; it's different viewpoints of the same mathematical object. Karpathy focuses on the ability of RNNs to handle arbitrarily long inputs and outputs, because that's something FFNNs cannot do.
It's "unreasonable" mainly because it occasionally captures subtle aspects of the data source for "free".
If you've worked with procedurally generated content, Markov chains, and so on, you probably have had to perform a few tweaks in order to get plausible results.
From the article, an excerpt of the output from an RNN trained on Shakespeare:
They would be ruled after this chamber, and
my fair nues begun out of the fact, to be conveyed,
Whose noble souls I'll have the heart of the wars.
Come, sir, I will make did behold your worship.
I'll drink it.
It's also unreasonable that the same framework works well for so many different data sources.
My experience with other generative methods has been that they were fragile and prone to pathological behaviour, and that getting them to work required for a specific use case required a bunch of unprincipled hacks.
It used to be that when a talk started to veer towards generative models, I'd start looking around the room, wondering whether I could survive the drop from any outside-facing windows.
But with RNNs using LSTM (or neural Turing machines!) you can consider incorporating a generative model in the solution you're putting together without having to spend a huge chunk of time massaging it into usefulness and purchasing time on a supercomputer
1. I once wrote quick a Reddit bot with the aim of learning to repost frequent highly upvoted comments and trained it using a simple k-Markov model... it was not good at first, and in order to get it to work I had to do a lot of non-fun stuff like sanitizing input, adding heuristics for when/where to post, and at the end it was mediocre.
2. Alex Graves (from DeepMind) has a demo about using RNNs to "hallucinate" the evolution of Atari games, using the pixels from the screen as inputs. It's interesting because it shows that same sort of tendency to capture the subtle stuff: https://youtu.be/-yX1SYeDHbg?t=2968
3. As in occult knowledge and rules-of-thumb, but you might also read this as a double entendre about myself and my colleagues.
4. Well, you still might need an AWS GPU instance if you don't have a fancy graphics card.
My power to give thee but so much as hell:
Some service in the noble bondman here
A quick search of Shakespeare's corpus also shows that Shakespeare never called a bondman 'noble'; there must be some conception of parts of speech being captured by the RNN, to enable it to decide that 'bondman' is a reasonable word to follow 'noble'.
So yes, "unreasonable" seems about right.
(Put another way, English text is a lossy representation of English speech.)
Perhaps if you were to feed the IPA representation of each word in alongside the text, the RNN would do a bit better, though admittedly I'm not sure how you would do so.
If this is the case, I'd imagine training it against Lojban text would see similar results.
DopeLearning: A Computational Approach to Rap Lyrics Generation
Backpropagation suffers from vanishing gradients on very deep neural nets.
Recurrent Neural Nets can be very deep in time.
Or the weights could be evolved using Genetic Programming.
Especially when using saturating functions (tanh/sigmoid)
> Or the weights could be evolved using Genetic Programming
GA, not GP http://en.wikipedia.org/wiki/Genetic_algorithm
Some algorithms, such as NEAT, use a genetic algorithm to describe not only the weights on edges in the network, but also the shape of the network itself - e.g., instead of every node of one layer connected to every node of the next, only certain connections are made.
Their next paper is "Reinforcement Learning Neural Turing Machines"
http://arxiv.org/abs/1505.00521 based on Graves "Neural Turing Machines" http://arxiv.org/abs/1410.5401, which attempts to infer algorithms from the result.
In a lost BBC interview from 1951 Turing reputedly spoke of evolving cpu bitmasks for computation.
Thanks for the link, I'll take a look.
That would be one positive feedback loop to rule them all.
> They accept an input vector x and give you an output vector y. However, crucially this output vector's contents are influenced not only by the input you just fed in, but also on the entire history of inputs you've fed in in the past.
I was curious if the overhead of learning how to spell words (vs a pure task of sentence construction with word objects) out weigh the reduction in sample set size?
(Awesome article for a RNN newbie)
That said, I think the RNNs here are limited by the corpus. They need to be exposed to more writing. Even if all you want is a Shakespeare generator, you still need to expose it to other literature. That will give it greater context, and more freedom of expression and, dare I say, creativity. I mean, imagine if all you were exposed to your whole life was Shakespeare. Nothing else (no other senses). Even with your superior mind, I doubt you'd generate anything better than what this RNN spits out.
So yeah, it needs a large corpus to build a broader model. Then we need a way to instruct the broadly trained RNN to generate only Shakespeare-like text. Perhaps by adding an "author" or "style" input.
And, as I mentioned upthread, it has been known for about ten years, long before the current neural net revival, that high-order character-based models are competitive with word-based models (at least in terms of perplexity).
Aftair, unsuch, hearly, arwage, misfort, overelical, ...
(although I admit, some of them may be just old words I haven't heard of before)
You'd probably find the paper here: http://aclweb.org/anthology/ (everything in CL is open access). You want the proceedings of CL, TACL, ACL, EMNLP, EACL, and NAACL. Don't bother with the workshops.
Optimization of NNs isn't really that bad. Stochastic gradient descent is extremely powerful and roughly linear with the number of parameters, possibly better.
Is there any chance someone's come up with an RNN that has dynamic amounts of memory?
In this case, the memory of the RNN is an ensemble of differentiable stacks.
Second, one could envision paging the hidden units back to system memory on a coprocessor-based implementation (GPUs/FPGAs/not Xeon Phi, gag me). 256 GB servers are effectively peanuts these days relative to developer salaries and university grants (datapoint: my grad school work system was ~$100K in 1990 dollars) so unless you're trying to create the first strong AI, I don't think this is a serious constraint.
Good luck with that no matter what Stephen Hawking, Elon Musk, and Nick Bostrom harp on about: we have no idea what the error function for strong AI ought to be and even if we did, it's over a MW using current technology to achieve the estimated FLOPS of a human cerebrum.