If you're also interested in real time OCR like this, I did a write up  of the approach that worked well for my project. It only needed to recognize Scrabble fonts, but it could be extended to more fonts by using more training examples.
Can't get https://www.dcs.shef.ac.uk/intranet/teaching/campus/projects...
More generally, I really like the idea of generating controlled synthetic images and then messing them up for regularization.
I've not tried it on anything else, but I remember thinking that it has a lot of potential uses. Also I only used it on gray-scale features, but I'm sure it could make use of full RGB too. I'll have to try it some time!
Neural networks and deep learning are truly awesome technologies.
A neural net is a graph, in which a subset of nodes are "inputs" (that's where the net gets information), some are outputs, and there are other nodes which are called "hidden neurons".
The nodes are interconnected between each other in a fashion, which is called the "topology" or sometimes "architecture" of the net. For example I-H-O is a tipical feed forward net, in which I (inputs) is the input layer, H is the hidden layer and O the output layer. All the hidden neurons connect with all the input neurons "output", and all the output neurons connect to the hidden neurons "output". The connections are called "weights", and the training adjusts the weights of all the neuron with lots of cases until the desired output is achieved. There are also algorithms and criteria to stop before the net "learns too much" and looses the ability to generalize (this is called overfitting). In particular, a net with one hidden layer and one output layer is a universal function estimator -- that is, an estimator that can model any mathematical function of the form f(x1, x2, x3, ..., xn) = y.
Deep learning means you're using a feedforward net with lots of hidden layers (I think it's usually between 5 to 15 now), which apply convolution operators (hence the "convolutional" in the name), and lots of neurons (in the order of thousands). All this was nearly impossible until the GPGPUs came along, because of the time it took to train a modest network (minutes to hours for a net with a between 50 to 150 neurons in one hidden layer).
This is a very shortened explanation -- if you want to read more I recommend this link which gives some simple Python code to illustrate and implement the innards of a basic neural network and you can learn from the inside. Once you get that you should move to more mature implementations, like Theano or Torch to get the full potential of neutral nets without worrying about implementation.
Oh humbug! The black magic comes from the vast resources Google drew to obtain perfect training datasets. Each step in the process took years to tune, demonstrating that data is indeed for those who dont have enough priors.
> [...] the "black magic" part comes mostly from their mathematical nature and very little from them being "inteligent computers". A brain is a graph, in which a subset of neurons are "inputs", some are outputs, and others are "hidden". The nodes are interconnected between each other in a fashion, which is called the "topology" or sometimes "architecture" of the net.
The deep question about deep learning is "Why is it so bloody effective?"
The effectiveness comes from their non-linear nature and their ability to "learn" (store knowledge in the weights, that is derived from the training process). And black magic, of course!
As a side note, I was playing a board game last night (Terra Mystica I believe) and wondering if you could get 5 different neural networks to play the game and then train them against each other (and once they are good enough, against players). I wonder how quickly one could train a network that is unbeatable by humans? Maybe even scale it up to training it to play multiple board games til it is really good at all of them before setting it lose on a brand new one (with a similar genre). Maybe Google could use this to make a Go bot.
But what happens if this is used for evil instead? Say a neural network that reads a person's body language and determines how easily they can be intimidated by either a criminal or the government. Or one that is used to hunt down political dissidents. Imagine the first warrant to be signed by a judge for no reason other than a neural network saying the target is probably committing a crime...
This is likely due to the way Go works , random playout provides a rough estimate of who controls what territory ( this is how Go is scored ).
Recently two deep-learning papers showed very impressive results.
The neural networks were tasked with predicting what move an expert would make given a position.
The MCTS takes a long time 100,000 playouts are typical - once trained the neural nets are orders of magnitude faster.
The neural nets output a probability for each move ( that an expert would make that move ) - all positions are evauluated in a single forward pass.
Current work centers around combining the two approaches, MCTS evaluates the best suggestions from the neural net.
Expert Human players are still unbeatable by computer Go.
It learns to master level from self-play.
also his lecture bootstrapping from tree based search
and Silver's overview on board game learning
There was in fact a group within Google that worked on this: http://www.cs.toronto.edu/~cmaddis/pubs/deepgo.pdf
Move Evaluation in Go Using Deep Convolutional Neural Networks
Chris J. Maddison, Aja Huang, Ilya Sutskever, David Silver
- They use more parameters (and fewer computations per parameter.)
- They are hierarchical (convolutions are apparently useful at different levels of abstraction of data).
- They are distributed (word2vec, thought-vectors). Not restricted to a small set of artificial classes such as parts-of-speech or parts of visual objects.
- They are recurrent (RNN).
Seemed like a great way to highlight the limitations of patterns.
Convolutional networks are only one kind of deep learning. In particular, they generally apply only to image processing.
Two remarks. First, these guys probably don't know very well why what they are doing works so well ;) It requires a lot of trial and error, and a lot of patience and a lot of compute power (the latter being the reason why we are seeing breakthroughs only now).
Second, training a neural net requires different computing power from deploying the net. The neural network that is installed on your phone has been trained using a lot of time and/or a very large cluster. Your phone is merely "running" the network, and this requires much less compute power.
I took a few screen shots. Aligning the phone, focus, light, shadows on the small menu font was difficult. You must keep steady. Sadly, I ended up hitting the volume control on this best example. Tasty cockroaches! Ha! http://imgur.com/j9iRaY0
It worked ok (much better than nothing.) However there were a number of times when there were very large issues in the translations that created some pretty big misunderstandings. Luckily we had a friend who had fluent English and Portuguese who would translate when things got to confused.
To reduce errors, you do need to be really careful to use short, complete sentences with simple and correct grammar. It's also better to use and that contain words that aren't ambiguous. (Those two sentences would probably not translate well.)
e.g. Please write simple words, short phrases and simple phrases. Please write words with just one meaning. Those phrases and words are easier to translate.
Those words are very rare and tend to only be useful in very technical contexts.
It seems it can't really handle context, so 'cockroaches' may have been a mistranslation of 'cheap' in some contexts, as the 'it had stopped chestnut' may have simply been 'brazil nuts'
See here for what they did for offline speech recognition:
WordLens was awesome for translating fragments of foreign languages - stuff like signs, menus and so on. But its offline translation seemed to be little more than a word->word translation, so there is a huge scope for improvement there. Very difficult when working offline!
Now if the features in the domain problem is more well defined, like credit ratings, and data is sparse, and domain expertise is available, decision trees are perfectly valid options.
Just wanted to add: and word/character/phrase embeddings.
Was there any advantage of using a deep learning model instead of something more computationally simple?
On my desktop, the english dictionary is ~1 megabyte uncompressed, and compresses to ~250k with gzip. The download for Google Translate is somewhere around 30 megabytes.
The new fad for using the 'deep' learning buzzword annoys me though. It seems so meaningless. What makes one kind of neural net 'deep' and are all the other ones suddenly 'shallow' ?
If this is a serious question, then googling "what is a deep neural network" would take you to any number of explanations. But to summarize very briefly, it's not a buzzword; it's a technical term referring to a network with multiple nonlinear layers that are chained together in sequence. Deep networks have been talked about for as long as neural networks have been a research subject, but it's only in the last few years that the mathematical techniques and computational power have been available to do really interesting things with them.
The "fad" (as you call it) is not mainly because the word "deep" sounds cool, but because companies like Google have been seeing breakthrough results that are being used in production as we speak. For example:
Deep learning is a form of representation/feature learning.
In the end, we were able to get our networks to give us significantly better results while running about as fast as our old system—great for translating what you see around you on the fly.
Suggests that they previously were not using neural networks, or were using less powerful ones.
Number of layers
It's that simple
Eventually, people realized that you could improve performance by adding more hidden layers. So while theoretically Cybenko was correct, practically stacking a bunch of hidden layers made more sense. These network architectures with stacks of hidden layers were then labelled as "deep" neural networks.
> To achieve real-time, we also heavily optimized and hand-tuned the math operations. That meant using the mobile processor’s SIMD instructions and tuning things like matrix multiplies to fit processing into all levels of cache memory.
Let's see how this turns out to be. I'm still skeptical if other apps might crash because of this.