
Ersatz - Deep neural networks in the cloud - dave_sullivan
http://www.ersatz1.com
======
bravura
Recommendation: Reach out to my colleague James Bergstra, and build out
automatic hyperparameter selection. This will make your offering work off-the-
shelf, which is what is necessary for it to see wider adoption.

Why? The real pain in the ass in training a deep network is the hyperparameter
selection.

What is your learning rate? What is your noise level? What is your
regularization parameter?

Choosing these values is a far bigger pain than almost everything else
combined.

Doing a grid search is intractable. Random hyperparameter search is better.
You can use a sophisticated strategy, like Bergstra et al have proposed.

~~~
ninjin
I agree that the hyper-parameter selection is a huge pain, personally though,
I am more familiar with the work of Snoek et al. [1] from NIPS in December
last year. He even distributes a neat Python package that will perform
Bayesian optimisation combined with MCMC [2] so that even people, like me,
that are not yet familiar with Gaussian Processes can deploy it easily.

[1]: <http://arxiv.org/pdf/1206.2944v2>

[2]: <http://www.cs.toronto.edu/~jasper/software.html>

------
tlarkworthy
OMG I have been waiting for something like this. Deep Belief Networks have
been smashing machine learning records in jsut about every domain. The only
problem was that they were annoyingly slow to converge, and hard to
program/debug

see Hinton's google code slides for more info on how powerful these things
are:- <http://www.youtube.com/watch?v=AyzOUbkUf3M> (that's 2007, things are
even spicier now)

~~~
deadairspace
That was a great talk, and an impressive demo of feature generation.

------
RyanZAG
Little bit confusing on what this actually is. Is this

1) Cloud GPU computation where you upload some special model code that is run
on the neural network? ie. your own code

2) Upload data and run some pre-specified models on it, such as in the example
you have a '-d model=spanish_speech_recognizer' - in which case the offering
is all about how many and how good your pre-defined models are.

The two different use cases are for completely different target audiences.

~~~
dave_sullivan
Sure, I see the confusion.

So basically, you bring the data, pick the neural network architecture you
want to use, and set its parameters. The model trains on the data you've given
it using a GPU cluster (which still takes a while)

'spanish_speech_recognizer' is the name of the model you just trained, where
'MRNN' is the actual architecture (a multiplicative recurrent neural network
as described in <http://www.cs.toronto.edu/~ilya/pubs/2011/LANG-RNN.pdf>) used
in the example.

So the models themselves aren't pre-defined, but the architectures you can use
are. You can play with a lot of different parameters (if you want), but you
don't have to worry about optimizing the code for GPU or making sure your
implementation of the algo itself is correct. At least that's the idea.

~~~
snippyhollow
Just curious, are you using an existing library like Theano, Torch or some
other GPU-enabled lib?

~~~
dave_sullivan
Theano mostly--it really is good at what it does

------
gojomo
Does this, and the Google Prediction API, and similar emerging offerings,
herald the beginning of an "AIaaS" (Artificial Intelligence as a Service")
market?

------
ghc
I'd like to see a post-mortem for this submission one the traffic dies down.
Most startups these days feature flashy designs, and I've always wondered how
much it really mattered. This design is straightforward and not flashy at all,
so I'd be interested in seeing the conversion rate of an "academic-looking"
design.

------
benmccann
This seems like a very small market you're going after. It requires people to
have good knowledge of deep neural networks (they have to choose the model,
architecture, hidden units, multiplicative units, etc.) I think it would be
more interesting and open things up to a wider audience if some of these
parameters could be chosen for you.

~~~
cynusx
machine learning in general is a domain for specialists, it won't most of the
time if you don't know what you are doing. As for a small market, I disagree.
there may be few people who understand it but these are the ones that are put
in charge of trillions of rows of data to analyze too.

------
thechut
The company I work for could be very interested in testing this service. I
requested a beta invite and filled out the survey. Any idea when you might
start letting people test it out / accept beta invites?

~~~
dave_sullivan
Yes, most likely Monday. Are you with IOTworks? If so, I've got your survey
and will be sure you're on the list.

Although I should add that the response so far has been way beyond what we
thought it would be (which is fantastic!), so it may take some time to get to
everyone. The beta literally just opened this morning, and I'm not ready to
open up the product before working with beta users to polish it up.

------
mimog
Isn't the CPU/GPU intensive part of working with neural networks the training
of them? Once the network is trained to within an acceptable error-rate why
would you need the cloud?

------
tluyben2
How about presenting some demos to see how it works and what it can do? I
worked a bit with Theano and would like to see (video/tour) how this relates.

------
ajankovic
Layout is broken for me <http://imgur.com/Les2I>

Latest Firefox on Ubuntu 12.04.

------
chetan51
Demo demo demo!

------
cloudshoring
Can we process images and/or video as inputs to this service?

~~~
dave_sullivan
yes

------
sabalaba
Are you hosting your own cluster of using EC2 GPU instances?

~~~
dave_sullivan
Hosting our own for now--although weighing pros/cons of using AWS.

On one hand, AWS GPUs do seem a bit slower than bare metal, and it's
theoretically (actually?) more expensive. Plus we get more control if we host
our own.

But then again, running a data center is a problem that has been solved and
I'm not sure we can create much value there. In practice, it will probably end
up a bit of both, depending on demand.

~~~
sabalaba
Hah do you guys just have servers sitting around or are you at least co-
locating?

------
indrax
Can we export the trained model back out and use it, or only query it with the
API?

