Hacker News new | past | comments | ask | show | jobs | submit login
Keras will be added to core TensorFlow at Google (fast.ai)
343 points by jonbaer on Jan 17, 2017 | hide | past | favorite | 42 comments



> because TensorFlow’s API is verbose and confusing, and because Keras has the most thoughtfully designed, expressive API I’ve ever experienced

This doesn't feel fair to say, just like it wouldn't if you replaced "Tensorflow" with "C" and "Keras" with "Python". They operate on fundamentally different levels and provide a similar trade off in control/ease of use.

We started using Keras for a project a few months ago, and it was great while it supported what we were doing. Once we needed to go outside of the box a little bit we essentially had to rewrite it in just Tensorflow.

This is great news though! Hopefully it will make the barrier to entry much lower for getting started with Tensorflow.


I strongly recommend checking out TensorFlow-Slim if you're looking for a lightweight abstraction for using TensorFlow – at least the layers, anyway.

The API looks similar to something like Keras or Lasagne, but the layers are just simple operations on tensors, so it integrates seamlessly with vanilla TensorFlow in a way that something like TF-Keras won't.

At the same time, though, the layers are the same foundation that TF-Keras look like they'll use, so you get much of the same expressive power, but without sacrificing flexibility.


> it integrates seamlessly with vanilla TensorFlow in a way that something like TF-Keras won't.

Google also released prettytensor, which is designed to address the same problem.

https://github.com/google/prettytensor


The Keras functional API are just simple operations on tensors too.


> This doesn't feel fair to say, just like it wouldn't if you replaced "Tensorflow" with "C" and "Keras" with "Python". They operate on fundamentally different levels and provide a similar trade off in control/ease of use.

That's a valid point. But I'd still like to see Tensorflow improve on a few areas. In particular, the documentation and overall marketing/positioning. For instance, the Udacity course and the Tensorflow tutorials do not make it at all clear that Tensorflow is the low level plumbing that you only need if you really have to customize the algorithms or build new ones.

Furthermore, the API really is over-complex, and the docs and tutorials tend to show the full complexity when much simpler approaches exist. I'd like to see context managers, scope, sessions, and explicit graphs disappear from all but the most advanced documentation - show us how to build in Tensorflow without all that cognitive overhead (and indeed, it can be done!)

Tensorflow's defaults are unfriendly too. For instance, grabbing all available memory on all of your GPUs is unexpected and unhelpful. Open up another Jupypter Notebook tab and you've got a nasty error message coming up...

> We started using Keras for a project a few months ago, and it was great while it supported what we were doing. Once we needed to go outside of the box a little bit we essentially had to rewrite it in just Tensorflow.

I'm surprised to hear that - I've found it so much easier to implement parts in Tensorflow or Theano and then call them from Keras. Trying to reimplement all the DL best practices from scratch in Tensorflow is a huge amount of work and hard to get right first time (e.g. handling dropout in RNNs correctly), and you still end up with an API that's less elegant than what Keras already provides.

Why did you find you needed to rewrite in TF rather than integrate with Keras?


Those seem like fair criticisms, although I'm just a beginner with Tensorflow right now. I don't doubt there's room for improvement!

We were building a character based RNN for generating text, with some special tweaks to it. This wasn't a good fit for Keras based on what we were reading, and there was a great straightforward Tensorflow example already out there which we could work off of so it made more sense to us at that point in time.


wrt GPU memory: If we made InteractiveSession by default grow memory, would that work for you? Seems sensible to change the default for InteractiveSession (e.g., for use in notebooks), but not the default for Session (production training / inference) due to the potential memory fragmentation.


(I made this change anyway).


They are also comparing tf unfavourably to theano.

Personally I've found a somewhat obscure low-level and (comparatively) little documented DyNet [[1] much easier to understand than tensorflow, so the critique rings true to me.

[1] http://dynet.readthedocs.io


+1 for DyNet. Chris Dyer is a boss (and just moved from Carnegie Mellon to DeepMind)


Disclaimer: Very biased player in the space.

I'd like to personally see keras as the industry standard as an interface. We are starting to give classes with it for our customers and the feedback has been positive.

CNTK is adding one: https://github.com/Microsoft/CNTK/issues/797

Mxnet is there: https://github.com/fchollet/keras/issues/1313

TF being there obviously helps.

//Begin blantant self promotion :D

No one will care about ours as much (research crowd,python,etc etc,..), but I'll throw our hat in the ring here too, keras has been amazing for our use case and we are personally pushing more python people towards that: https://github.com/deeplearning4j/deeplearning4j/pull/2562

^^ that will bridge the spark ecosystem and friends so you can run on yarn/mesos/spark etc from a familiar interface.

//End blatant self promotion


Somewhat off topic -- HN introduced me to fast.ai and I have found their hands-on/practical approach to teaching particularly useful!


I'm glad you liked it :)

As it happens, I literally just now posted the curriculum for part 2 of the course - http://www.fast.ai/2017/01/17/curriculum2/ . If you're near SF, you may want to join us. Either way, I'd love to get feedback on the curriculum - anything you'd like to see added? Anything from part 1 that you'd like to know more about?


I am just getting started actually, so I will provide you more feedback as I complete part 1!

For starters, I found the pairing of the lecture, the code, AND the documentation to be particularly useful. The setup in anaconda really enables to you compare/understand inputs and outputs, which at least for me, is very helpful! Big fan of learning through practical application, which the aforementioned combo is well suited for, imo.

Kudos and thank you (and the team) very much for all of the hard work! I am not sure I'll be able to attend part 2 in person but I will be sure to follow along online. If I am ever in SF at the time of a course, I will certainly apply!

All the best.


I'm very excited for this in Part 2:

"A key teaching goal for us is that you come away from the course feeling much more comfortable reading, understanding, and implementing research papers. We’ll be sharing some simple tricks that make it much easier to quickly scan and get the key insights from a paper."

My interest is musical style transfer. I'd like to replicate these examples from Sony Computer Science Lab-Paris: http://www.flow-machines.com/odetojoy/

They've published papers, but not code (except for DeepBach).


This looks very interesting to me. Being responsive for campaign sites allowing people to upload images, I would love to be able to use neural network in some of my projects, even if for simple "flag this picture as suspicious" because people are supposed to upload pictures of their cakes, but they are uploading something else instead. Over few years of running the site we've accumulated human-curated database of few dozens of thousands of images of cakes, and with traffic ever growing, I would love to be able to include neural network to prioritize suspicious submissions for moderators.

As person without any eduction in the field of maths or neural networks, examples from Keras docs and blog reasonated with me far more than Tensor Flow ones.


The blog post links to a tweet and that tweet has a couple more details. The new keras is going be tensorflow-only and build on top of tf.contrib.layers (so a complete rewrite).

https://twitter.com/fchollet/status/820746845068505088


It actually sounds like Keras as we know it will continue to exist and support both Theano and TF.

Theano support will continue for as long as Keras exists. This integration changes nothing for existing Keras users, only for TF users. - https://twitter.com/fchollet/status/821090410659344384

Rather than a "new" Keras it sounds (I could be wrong) as if the Keras API will be included now with TF as an alternative way of interacting with TF (as it has been for some time) and simple bundled with TF.


I wonder how Keras will be integrated into TF and how keras-in-tf will evolve with the separate Keras. More specifically, how many more APIs will keras-in-tf have in addition to the current keras? Wouldn't this hurt keras as a backend-agnostic library? Will the keras-in-tf APIs always keep in sync with standalone-keras? I will be curious to see how these pan out before getting too excited.


That isn't what that Tweet says at all. Notability he makes this very explicit:

Theano support will continue for as long as Keras exists. This integration changes nothing for existing Keras users, only for TF users.

https://mobile.twitter.com/fchollet/status/82109041065934438...


This is amazing news as using Keras is my opinion the best way to start building models in TensorFlow for most people (and also having support for Theano is definitely a plus).


That is potentially very bad news.

I fear that from now on the Python API will be the best documented API used by most example code, and the low level API will over time become something obscure that is hard to approach and badly documented, just like Torch's C API.

I don't like this split between a core system written in C or C++ and a high level API written in a language that's too slow and memory hungry to write the core system in.

This architecture is probably meant to reflect an existing split between users and implementors of the system and I can understand the arguments in favor. But I think it also creates and reinforces that split, which isn't a good thing at all.

We were actually looking into moving to TensorFlow from Torch because the Torch C API is considered an internal thing not meant for public consumption.


I don't really understand why people want to funnel themselves into these ml frameworks when you can have the freedom to implement arguably more interesting things in c++/fortran with libs like armadillo, openblas, superlu, etc? I mean, i get it if people just wanna copy paste the functions they see used in blogs/press/toy demo releases and get on with things, is that it?


I have an uneasy feeling about these frameworks as well, but I'm not an expert in this field. So if I'm told that it's not worth the effort to recreate all of that stuff from scratch I'm going to have to believe it.

What I do insist on is that we keep the capability to selectively dig deeper where we need to and combine different libraries with our own code.

That's why I'm always trying to reduce the frameworkness and increase the librariness of any third party code we use. Scripting language runtimes are a major roadblock in that approach.


> I have an uneasy feeling about these frameworks as well, but I'm not an expert in this field. So if I'm told that it's not worth the effort to recreate all of that stuff from scratch I'm going to have to believe it.

I wouldn't consider myself an expert in the "ml" field either, and only really informed because of my academic/contract research work in neuroimaging wrt DSP and beamforming and having to rely on performative code involving what at the end of the day comes down to manipulating matrices with some set of mathematical/physics constraints; I wished reviewers gave us an easier time with our paper and just believed our results without taking months before accepting it haha.

But if you are able to use these frameworks/hack around the quirks for your use case with minimum effort and solve your problems to satisfactory degree, I completely understand that they can be good enough.

>…we keep the capability to selectively dig deeper…

>That's why I'm always trying to reduce the frameworkness and increase the librariness of any third party code we use. Scripting language runtimes are a major roadblock in that approach.

I don't disagree with your sentiments at all, but it seems like you are already at the intersection where you just need to go down your own road because its probably unrealistic to expect Google's needs to be aligned with yours in the long term. Then again, I'm the type of person who would fork the nightly version of firefox and remove and add the stuff I want out of the box so maybe this is not the best suggestion >.<


The killer feature is that you get to have the same source code run your model on your $300 netbook's CPU or on a server/workstation's multiple GPUs.

When your model changes every day, you really need a framework that abstracts away the underlying computational resources.


I'm glad this is happening and honestly surprised this didn't happen earlier. fchollet, the Keras main author, works at Google and had integrated TensorFlow into Keras very early on (given he'd known about the existence long before the general public). Even though Keras is backend agnostic, you don't really lose any of the flexibility that TensorFlow would give you - it's a pretty transparent abstraction[3]. Best of all, if a new underlying tensor library appears tomorrow that's better than TensorFlow, you'll be getting support for that too!

Keras goes beyond simply being a concise API, a variety of examples, or a strong community with the "best practices included" philosophy. Opinionated but quite useful.

As an example, the settings for an LSTM are complex and require reasonably thorough understanding of many topics. There's dropout (and the many debated ways one could apply it), there's the forget bias, there's weight initialization, there's ... You get the idea.

If you use `keras.layers.recurrent.LSTM` however, bam, you get an opinionated version of these for free. Initialization is Glorot uniform for most of the weights but then orthogonal[2] for the inner weights. The forget bias is set to one - as I hope every library has by default now but wasn't the case for some time. Dropout is variational inference based dropout - recent, likely what you want, and zero complexity.

At some point you'll likely want to learn about all the details - and this provides a smooth easy transition for that as you go "wait, what's a Glorot?" - but for getting your feet wet and/or solving a specific task, "best practices included" seems the best combination. I've successfully recommended this to high school students and they've been up and running with neural networks in short order!

Given all of this, whilst I'm a researcher who works on fiddly novel architectures that require some pretty specific features so use Chainer at work, I turn to Keras for my fun side projects as it keeps me sane and happy :)

Full disclosure, I've committed examples to the Keras codebase and know Francois in person.

[1]: https://keras.io/layers/recurrent/#lstm

[2]: https://smerity.com/articles/2016/orthogonal_init.html

[3]: https://blog.keras.io/keras-as-a-simplified-interface-to-ten...

[4]: http://chainer.org/


What does chainer have that keras is missing?


Chainer has a "define by run" architecture for defining computation graphs, so you can have very dynamic computation graphs. TF/Theano have static computation graphs, though an optimized define by run library is coming to TF in ~February, not sure if there are any plans to make it play nice with Keras though.


Ohh that's very similar to ours actually. Our computation graphs just work relative to their context. We've found it very easy to add arbitary vertices because of it. Not that you care, but someone might: We ended up using this style because we opted for not having auto diff in our layers due to speed as well as the state of our framework at the time (layers being hand implemented already).

We got the use case for auto diff but not being focused on research we just decided it was easier to hand implement the layers and just have the graphs be defined at the layer level.

Thanks for clarifying.


But what will happen to support for the Theano backend?


Is there any way to do Keras or Tensorflow on AMD or will there be? I know AMD was pushing something new called ROCm or whatever. But I looked at the OpenCL thread on Tensorflow and they seem to be pretending ROCm isn't relevant.

I am considering just using WebGL or something like Turbo.js to experiment with my Radeon hardware for AI.

Maybe AMD should hire some people to work full time on Tensorflow. Anyway at this point next thing I buy will probably be an nvidia card and chalk up the amd graphics card purchase to ignorance. At least I played a few games on it.


Have you seen Keras.js? It uses WebGL to run certain computations on the GPU. https://github.com/transcranial/keras-js


Pretty cool! Just to make sure: keras.js so far only supports inference/prediction, but not training (including autodiff, SGD, etc). Is that correct?


No didn't know about that thanks!


For me, `tf.contrib.layers` is high-level enough. Besides, it is easier to be integrated to other TensorFlow functionalities than Keras.


Can you use Keras to train on GCP?


For what I see, TensorFlow is getting what Torch users have had for awhile with NN.


I like this news; as someone coming from an abstract, non-programming background, Keras is gorgeous.


Is this bad news for TFlearn?


Excellent news!


Great News!




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: