
Introducing Keras 2 - mirceam
https://blog.keras.io/introducing-keras-2.html
======
minimaxir
Copying my rare product endorsement from the previous submission:

Keras is so good that it is effectively cheating in machine learning, where
even Tensorflow tutorials can be replaced with a single line of code. (which
is important for iteration; Keras layers are effectively Lego blocks). A
simple read of the Keras examples
([https://github.com/fchollet/keras/tree/master/examples](https://github.com/fchollet/keras/tree/master/examples))
and documentation ([https://keras.io/getting-started/functional-api-
guide/](https://keras.io/getting-started/functional-api-guide/)) will let you
reverse-engineer most the revolutionary Deep Learning clickbait thought
pieces.

It's good to see that backward compatability is a priority in 2.0, since it
sounds like a lot had changed.

~~~
raverbashing
Yes

Trying to do something simple in TF is a pain, on the code there are some
conflicting examples and code snippets that "train" a network just to print a
loss number on the screen but actually do nothing besides that

Keras is easy to use and better if you're running CPU only

------
juxtaposicion
Will Keras2 support PyTorch as backend, in the future?

Answer: [0] No, there are no plans to support PyTorch. There is nothing to be
gained in supporting every novelty framework that crops up every quarter. Our
goal is to make deep learning accessible and useful to as many people as
possible, and that goal is completely opposite to building up deep learning
hipster cred.

[0]:
[https://github.com/fchollet/keras/issues/5299](https://github.com/fchollet/keras/issues/5299)

~~~
fchollet
To put this quote in context: this isn't specifically about PyTorch. Every
couple of months since mid-2015, a new deep learning framework gets released.
In the following week, someone inevitably asks "will X get added as a Keras
backend?".

Supporting _several_ backends is a strong positive. But chasing every new
framework as a backend is a quick way to kill Keras, via bloat, support issues
and general technical debt. We should only support a backend that is
considered mature, and we should stay away from the hype surrounding the
release of every new framework. There will be another hyped up framework next
quarter anyway. And the one after.

It is in fact possible that Keras will eventually support PyTorch. But if it
ever happens, it would be at least 1-2 years in the future. When PyTorch
becomes "uncool", just like Keras :)

~~~
jph00
Yay, making deep learning uncool again! ;)
([http://www.fast.ai/about/](http://www.fast.ai/about/))

But seriously - does it even make use to have a "define by run" dynamic
framework as a backend? It seems to me that keras is particularly suited to
wrapping frameworks that define and run a computation graph.

~~~
fchollet
With the functional API of Keras, it would definitely make sense. In fact I do
think that imperative model definition would be great to have at some point in
the future. We'll see :)

~~~
jph00
I'm intrigued!... The kernel calling overhead and lack of any GPU
while/scan/map/etc for Pytorch seems like a limitation, but I guess on 2nd
thoughts you can still do all the keras fit/predict stuff and auto-connecting
up the layers.

~~~
apaszke
These ops are just not needed in PyTorch. while is just a Python while loop.
Scan is a for loop, map is a list comprehension that applies modules. No need
for anything fancy.

~~~
jph00
Sure - but on pytorch they suffer the kernel launch overhead each time through
the loop, whereas on tensorflow and theano they do not. Which really impacts
the kinds of algorithms that work well on each platform. Does that seem like a
reasonable assessment to you?

~~~
smhx
Currently not many frameworks have actual fusion of kernels (to avoid
launching many GPU kernels). If you look underneath a theano.scan or TF.scan,
GPU kernels are still being launched individually (but are likely stream-
overlapped where appropriate).

With TF's XLA compiler, they are slowly getting towards kernel fusion, which
will then reduce launch overheads.

We have similar things in the works for pytorch: to quickly JIT at runtime the
dynamic graph that is getting executed. More news on this will come when time-
appropriate.

~~~
whyrt12
I WANT to use pytorch, but no bayesian learning or stochastic nodes like in
edward. Any chance there are plans to for a compatibility layer with Edward or
roll your own bayesian stuff?

Also, have you looked at Numba to do the jitting? Probably best not to have
yet another separately maintained python JIT.

~~~
smhx
as core-devs, we dont plan to build-in something like Edward. However, folks
in the community are brewing something:

[https://discuss.pytorch.org/t/bayesian-computation-in-
pytorc...](https://discuss.pytorch.org/t/bayesian-computation-in-pytorch/755)
[https://discuss.pytorch.org/t/distribution-
implementations/4...](https://discuss.pytorch.org/t/distribution-
implementations/400/6)

------
caoxuwen
Highly recommend this course -
[http://course.fast.ai/](http://course.fast.ai/) that utilizes Keras as the
main programming tool

~~~
bpicolo
I'm definitely going to give this a shot, thanks for the link. Approaching ML
at a higher level is exactly what I need to develop a better interest in it. I
realize that underpinnings are important, but waiting 30 minutes for mnist on
to process on my localhost is just unbearably boring.

~~~
jph00
If you try the course, be sure to make use of the forums for it too:
[http://forums.fast.ai](http://forums.fast.ai) . As you'll see, they're
_extremely_ active and helpful for all deep learning students (and all
practitioners in general).

Disclaimer: I teach the course. Although it is free and ad-free... :)

~~~
bpicolo
Will do. I'm in a good position to start (no shortage of python/aws
experience), so the only fighting will be with the deep learning bits, hah.

Side note: Great job on making AWS setup/teardown straightforward. 90c/h is
not terrible, but not cheap to accidentally leave on!

~~~
jph00
Regarding AWS, one participant has created a system that lets you use spot
instances for the course. It's published on the forum. Great way to save $$$
(400% or more...)

------
epberry
Keras is fantastic. Not the tightest analogy and probably unoriginal but I
think of it as the Python to Tensorflow's C. It's easy to drop into tensorflow
flow when needed but you can probably get away with Keras for a long time.
Also, Francois helped us when we DM'd him on Twitter which was incredible.

Thank you so much Francois! I'm incredibly excited about this release!

------
krick
I'm only starting with all that machine-learning, NN stuff and as many others
I want to ask for some guidance/resources/learning material. What I feel
especially lacking is something very broad and generic, some overview of
existing techniques (but not as naïve as Ng's ML course, I assume). There
exist a lot of estimators and classifiers, there exist a lot of techniques and
tricks to train models, there exist a lot of details on how to design a NN
architecture. So how, for instance, do I even decide, that Random Forest is
not enough for this task and I want to build some specific kind of neural net?
Or maybe I don't actually need any of these fancy famous techniques, but
rather there exist some very well defined statistical method to do what I
want?

What should I read to start grokking this kind of things? I feel quite ready
to go full "DIY math PhD" mode and consume some heavy reading if necessary,
but where do I even start?

~~~
nl
_So how, for instance, do I even decide, that Random Forest is not enough for
this task and I want to build some specific kind of neural net?_

The problem here is that it's really hard to give generic advice. As an
analogy this is like asking "how do I know if Rails is enough for this task".

The answer is usually "yes", but the specifics matter a lot.

So in this specific case (and I realize you aren't looking for specific advice
here, but I think the principles are useful):

Random Forests are very powerful, and work really well for hundreds, maybe
thousands of features, on large but not huge amounts of data and are fairly
easy to train.

There are a large number of types of neural networks. One of the big
advantages of deep neural networks is that that can reduce the need for manual
feature engineering. For examples conventional neural networks extract
features from images that work better than any human engineered features, and
LSTMs (and variations) work well at extracting features from text. The problem
with deep neural networks is that they (generally) need a lot of data to
train.

So, as usual the answer is "it depends".

In industry though, 90% of the time the question isn't "what classifier should
I use". It's "how do I get the data"/"how do I extract features" and then
"lets try all the classifiers and see what works best".

~~~
mattkrause
"Try 'em all" is not just an answer, but the only answer.

The __No Free Lunch Theorem __says that averaged across all possible problems,
no single classifier is the best; in fact, they 're all equivalent.

However, you probably don't care about all possible problems, but a specific
one. Over the last decade or so, we've discovered that deep learning works
really well on certain classes of problems, particularly those that may have
some kind of nested structure, as in object or speech recognition. If your
problem resembles one of those, a deep neural network might be a good place to
start.

~~~
nl
Right - this is good advice.

To paraphrase the learnings of thousands of data scientists on years of Kaggle
competitions:

A quick and dirty model for a baseline: Random Forest

Structured data: Use a boosted tree algorithm (specifically the XGBoost
implementation of gradient boosting), ensembled with maybe Extra Trees, Random
Forests and MLPs

Some kind of time component on large datasets: FTL regression, XGB

Binary data (images or sound): Deep neural nets

Text: Try LSTMs, but this will often be beaten by manual feature engineering
and Word2Vec derived features put into XGB.

~~~
minimaxir
LightGBM
([https://github.com/Microsoft/LightGBM](https://github.com/Microsoft/LightGBM))
is shaping up to beat XGBoost; it has mostly API parity and it won in
benchmarks _before_ a v2 with a new algorithm.

~~~
nl
I tried LightGBM for a Kaggle. I couldn't get anywhere near XGB.

I was using the LambdaRank stuff. Given the boasting the LightGBM team had
done I had assumed it would be close to XGB out-of-the-box for a ranking
problem (since XGB only does pairwise ranking). It was far enough away that I
had to ask if I was misinterpreting the output[1].

That was 6 months ago now, so maybe it has improved. I know they made big
claims.

[1]
[https://github.com/Microsoft/LightGBM/issues/37](https://github.com/Microsoft/LightGBM/issues/37)

~~~
minimaxir
Development was rapid when I was working on a blog post in January using the
tool. Things have likely improved if you want to give it another shot.

~~~
nl
Yeah, I might, thanks.

Did you manage to replicate their results vs XGB?

I don't think anyone has successfully used it for a high result in a Kaggle
yet, which - for all its faults - is a good way to see what the maximum
performance of a software package seems to be.

LibFFM is the other thing I should have mentioned previously as being worth
trying.

------
uptownfunk
The mathematician in me has kept me from jumping into deep learning before I
understand the mathematical and statistical underpinnings of the algorithms
involved. Looking forward to reading through the latest book out by mit press
and giving things a whirl with Keras which I've heard so much about.

~~~
ice109
what's the name of the book?

~~~
ma2rten
[http://www.deeplearningbook.org/](http://www.deeplearningbook.org/)

------
gidim
I love Keras but I think this update broke more things than you realized. For
example it's no longer possible to get the validation set score (val_acc)
during training which renders early stopping impossible. This was a documented
feature on your FAQ.

Is the old documentation still available? I'd like to wait before I upgrade.

Edit:typo

~~~
fchollet
You can try opening an issue on Github. `val_acc` is definitely still
accessible by callbacks, and the `EarlyStopping` callback, which relies on it,
is fully unit-tested.

------
backpropaganda
1\. Still no support for multiple losses. Models like VAEs cannot be
idiomatically implemented. The second loss has to be 'hacked' in. Notice how
in the official example for VAE, the kl_loss is computed using variables which
are NOT available via the loss function
([https://github.com/fchollet/keras/blob/master/examples/varia...](https://github.com/fchollet/keras/blob/master/examples/variational_autoencoder.py#L46))

2\. It's still an input->output paradigm, rather than a {input, output}->loss
paradigm which gives more flexibility.

These two issues are the main reason why I stick to slightly lower level APIs,
even though I _want_ to use Keras.

~~~
fchollet
See the release notes:
[https://github.com/fchollet/keras/wiki/Keras-2.0-release-
not...](https://github.com/fchollet/keras/wiki/Keras-2.0-release-notes)

\- You can use a Keras model to compute some tensor(s), turn that into a loss,
and manually add that loss to the model via `add_loss` (it just needs to only
depend on the model's inputs).

\- Not all of your model outputs have to have a loss associated with them. So
you can do both {input, output}->loss and input->output in your workflow, as
you wish. Effectively, losses and outputs are decoupled.

The VAE example hasn't yet been updated to use the `add_loss` feature, but it
should be.

~~~
andrew3726
I will look into updating the VAE example, as I've ported the example to the
keras 2.0 API recently. There is currently no documentation on add_loss as far
as I can see, so I will have to try a few things.

------
Kiro
Is it better to learn Keras instead of tflearn?

Copying a comment I made in another thread where one response recommended
Keras:

I currently have a small pet project where I think some simple ML would be
cool but I don't know where to start.

Basically my use case is that I have a bunch of 64x64 images (16 colors) which
I manually label as "good", "neutral" or "bad". I want to input this dataset
and train the network to categorize new 64x64 images of the same type.

The closest I've found is this: [https://gist.github.com/sono-
bfio/89a91da65a12175fb1169240cd...](https://gist.github.com/sono-
bfio/89a91da65a12175fb1169240cde3a87b)

But it's still too hard to understand exactly how I can create my own dataset
and how to set it up efficiently (the example is using 32x32 but I also want
to factor in that it's only 16 colors; will that give it some performance
advantages?).

~~~
bayesian_horse
If you don't know how to set up a dataset, it's probably too early for you to
worry about performance and efficiency.

If you haven't, already, I'd suggest to learn some general machine learning,
including how to use logistic regression, random forests and SVMs.

Keras is certainly capable of what you want to do, at least from your
description.

One way is to interpret the colors as grayscale images, that would be the
fastest option. If however the 16 colors are actually from a palette, it may
be better to convert the image to three channels, r/g/b. And if the 16 colors
are 16 entirely different things, like 0 - Water, 1 - sand, 2 - earth and so
on, you could even turn one 16 color image into 16 images with two colors (1
bit), and get a better model.

Again, getting into machine learning or deep learning is not as easy as
reading the Keras documentation. You need to understand the basics first.

------
chestervonwinch
Keras is a great wrapper library built upon two fantastic frameworks -- theano
and tensorflow. I'm glad to see it is moving forward, and kudos to everyone
involved in all these libraries!

------
diminish
Slightly irrelevant but curious question about the Analytics for 7day (34K),
14day and 30day active users. I'm running a similar site so, could it be that,
a lot of users reading documentation are using ad/tracking blockers so that
active users count appear higher than it actually is in GA. Documentation
users tend to read quite high pages per session. If I'm right then they should
see less page views per user than expected.

~~~
srrr
For most websites I am familiar with, 20-30% of users use adblockers or other
privacy plugins. On content with a target audience of developers I have
sometimes seen 60%. Most of these users are not recorded in google analytics,
so the real number of unique users is higher than reportet by analytics.

Since no data from these users is sent at all (no user data and no pageview
data) page views per user is not directly influenced because your are missing
users _and_ pageviews in your reporting. It could be influenced because users
with an adblocker behave differently then users without. To analyse this you
would have to look at your server generated web logs.

The given data in the image is often not used to find out what the total
amount of unique users on your webpage is. It is used for computing
engagement: Monthly active users vs. daily active users. In this example we
only have 7-day active users and no daily active users, but it basically is
like: 34738/107942=0.32. At a value of 1 (the maximum) you have a high
engagement. In simple terms: Each user would come back every week for this
month. 0.32 is quite low. Around 0.25 would be the lower bound because we have
4 weeks in a month.

------
mooneater
Awesome. Yet "codebases written in Keras 2 next month should still run many
years from now" given that deep learning is no new, how can they that
confident that this API will remain relevant years down the line?

~~~
mixedCase
What does one thing have to do with the other? Regardless of how "relevant"
Keras 2 stays years from now, code written with it now should still be capable
of running then, that's the thing they claim.

------
rfeather
I wonder why they decided to get rid of MaxoutDense. Is there something better
or is it so trivial to implement they decided to drop it?

~~~
jph00
It's trivial with 'max' merge param:
[https://github.com/fchollet/keras/pull/3128](https://github.com/fchollet/keras/pull/3128)

