Hacker News new | past | comments | ask | show | jobs | submit login
Fastai: A Layered API for Deep Learning (arxiv.org)
214 points by pama on Feb 14, 2020 | hide | past | favorite | 42 comments

Jeremy is probably one of the smartest people out there. The fact that he decided to allocate his time to teach and to build open source tools — that is: a selfless way to provide value to his community — is something rare & worth being acknowledged.

So true. And he cares about ethical behaviour as well (it appears from the outside where I am), in AI and work life.

Nice to see our paper on HN! FYI the paper covers v2, which is a from-scratch rewrite that introduces a new layered API.

There are quite a few comments here pointing out (quite correctly!) that v1 was not at all easy to hack on. We've spent the last couple of years fixing that. Have a look at the paper to see what mean - especially the "mid-layer API".

Oh btw if you don't have time to read the paper, I've summarized some key bits here: https://mobile.twitter.com/jeremyphoward/status/122797513809...

My team and I are huge fans of you and Rachel's work!

I've recently begun to experiment with nbdev. Really like the concept.

I think this is cool... but... personally, I feel that PyTorch’s API is more or less where I want a deep learning API to be.

TensorFlow is too low level. Fastai abstracts too much away and feels way too magical - once things start going wrong, or if you need to do something totally outside the bounds of the library, good luck.

Guess I’m mostly just a bit tired of frameworks for what feels like frameworks sake.

For most cases, objects in the fastai framework inherit pytorch objects and are often rather thin. This means that you can usually roll your objects if the fastai ones aren't sufficient. This really help alleviate the feeling that you're "on rails" of the library.

That being said, a current project I'm working on didn't really fit the fastai pipeline and I needed to drop into pytorch. It's a total shame since there are so many little niceties that make things easier.

Can you go into more details on why pytorch was a better fit for your particular use-case?

My coworker recently completed a bootcamp using pytorch, and I am working through the fastai course, so it has been interesting to compare experiences.

I'm training a semi-supervised task which consumes three text samples while training. In retrospect, I might have been able to concatenate them, and treat them as one, but as it was I wasn't able to implement some of the functions fastai's Databunch required.

Might give it another go if I decide to give it another refactor.

I feel the same way about Fastai v1. Hard to use bits and pieces of it without rewriting everything. Apparently they heard this and addressed it with fastai V2 though so you can just pull in the critical bits you need.

Personally, I really enjoy the design and level of abstraction that Pytorch provides. FastAI has always felt too abstracted and trying to go outside the bounds of it gets really hard and complex, really fast. I glanced through the paper and while it certainly looks interesting and to be an improvement, I am not very convinced that doing custom things will be easy, like it is with Pytorch. My experience has been that it is easy, if you already understand all the pieces of the entire library, which is not an easy task at all.

That being said, I admire the passion and effort that goes into FastAI and I think it does a great job at providing an entry point into the world of ML/deep learning that is far more accessible to people. I'll probably still read through the library because I always pick up interesting ideas and learn new things when reading other peoples code, especially when a lot of thought has been put into structuring a library.

The lectures have a ton of gems in them, but I have a hard time following the lectures as a beginner/intermediate practitioner, as around 1-5 percent is relevant for me. I don't particularly like the fastai library as it feels like an obstacle between me and the underlying pytorch library. After trying to use fastai for some time, I started looking directly at pytorch and found that all the abstractions and features I liked with fastai were actually from pytorch. That said, it's a great resource and if you're doing cookiecutter stuff it seems pretty nice. Also, students of fastai have done some pretty amazing things. I like how Jeremy emphasizes the top-down view, but it's very hard for me to submit to it. I can't shake the need to understand the underlying ideas and build stuff bottom-up, even though I see my progress suffers from it.

The second part of the course (https://course.fast.ai/part2) - builds stuff bottom-up, starting from matrix multiplication all the way up to ResNets. This is a great resource even if you want only use Pytorch.

Great tip. Thanks:)

My small complaint is that there seems to be no backwards compatibility whatsoever between fastai versions.

I get it that it is supposed to be working on the very bleeding edge of deep learning technologies, but at the same time it is sold as "practical". At least I would be slightly uncomfortable doing anything in production with a library that is all but guaranteed to get no (compatible) development love whatsoever after a couple of months when the developers have started working on the next version.

But I guess it may just be a too tough nut to crack to provide a bleeding edge deep learning library with production quality life cycle support.

We're actually using fastai in production and will happily switch to v2. Sure, there are serious questions about long-term stability and we know these projects will be high maintenance.

However, they would be anyway: Core models and algorithms are quickly outdated and any change that allows us to achieve similar or better results with less effort in creating training data is easily worth the engineering work.

That said, I really hope v2 feels a bit more like other libraries: extending v1 models has been pretty painful in several occasions. E.g. making some changes to the underlying pytorch models was very straightforward but still using all the goodies for training build into fastai (in particular all the stuff based on the work of Leslie Smith, tuned for best practices inside the fastai universe) was pretty painful. It is awesome to have a library actually implement best practices from latest research, but sometimes all this greatness was pretty hard for me to transfer to changed models.

That said, it has worked for us in v1 and the benefits outweighed the problems by far.

Yeah it was a conscious decision, but it's certainly not the only "one right way". I saw when Keras had an API freeze a few years ago, and thought that it didn't really make sense to me when the field is changing so much. I can certainly see that there will be people for whom that is what they want, however.

So instead we'll be maintaining fastai v1 as a separate branch and accepting PRs as long as people are using it. But v2 is designed to leverage a lot of the new ideas that have come up in the last couple of years, both in our research, and more widely.

Hi Jeremy, just wanted to say you are amazing. I've followed all your lectures and I wish I had you as a teacher for all my assignments in college. You truly have a gift to explain hard things in a very simple way. And the things you are doing with fastai are a gift to the community that is moving science forward. I'm doing my Master Thesis, and fastai is a key piece of my experiments. So again, a big thank you. Hope some day I can do a favor to you as you've done for me

Jeremy is a fantastic teacher. The paper is written in his usual lucid style and mixes code with narrative in a seamless way. The paper feels like an extended version of one of his pedagogical jupyter notebooks.

Bleeding edge is great for fun and personal development, Jeremy (with the fast.ai team as a whole) is wonderful and all that, but you need to have your own suitable case study to make the most of such techniques and implementation... nlp for the time being, it was image processing when they were just starting and I followed their courses. In 2020 or year V, VI or VII of this new machine learning explosion, generalists cannot not do much on their own anymore imho.

My impression from 3 years of working under the job title of "data scientist" is that generalists have a huge advantage compared to specialists. Data scientists these days have to have some idea of how each step in the data science workflow works, including non-technical aspects like business, office and institutional politics, and domain knowledge.

I like fast.ai a lot. For those who remember OpenRDF, fast.ai reminds me of how nice it was to use Sesame Sails (though sometimes you had to create your own sail components). Sails were modular semantic reasoning and triple store layers you could stack in different ways to 'make your AI ship go'.

Is "fastai to pytorch" what "keras is to tensorflow?"

Glad to see pandas support for tabular data.

Kind of. Fastai abstracts a lot more of Pytorch then what keras does to TF.

Does it still have the requirement for a GPU (driver) to be around to even run, like fastai 1.0 had? (Had to manually comment requirements and imports to have it run CPU-only...)

I get it that for any serious use you'd want a GPU, but for learning and toying around you might want to be able to run and debug code on your freakin macbook! Is that too much to asks? (Some of us do code in IDEs, not in notebooks + vim on server, and we'd want at least our test suite to be able to run locally ffs!)

(Also, hopefully they've got rid of the lovecraftian architecture with methods that can mutate an object's class [?!] - I understood the practical appeal and why they did it, but as a software engineer with sympathy for functional-programming that almost made me wanna barf :|)

Anyway, fastai is awesome for learning and experimenting, keep up the good work! I just hate it that it's so obnoxious to use and learn for anyone with a more traditional software engineering background...

It is possible to run everything on CPU, even if we fastai 1.0. Only training can be 100 times slower than on GPU. Even for a toy exercises involving image processing and actual deep networks (30-150 layers) it means hours or days of training.

It is not FastAI fault though.

My actual use case some time ago was to run tests suites for some tricky and convoluted data wrangling code that used (by necessity since it was doing some funky tiled image loading and segmentation) fastai dataframes and stuff, locally on CPU to debug those tests... neither training, nor even inference actually, just running a useless micro-training session at the end to sanity test that data was loaded in a usable way and things didn't broke when you tried to train.

But in fastai 1.0 it was all bundled together in one big yarn, with everything depending in the end on some data loading classes that depended on GPU driver etc.

Anyway, it was really bad architecture and dev practices in the codebase I was working though, the tested behavior would probably not have matched production one 100%... I don't blame fastai much for not helping with a broken workflow, but I prefer more barebones and less opinionated frameworks, aka using tf or pytorch directly, since some times you really need to get that "broken" thing running in production before you work on a refactored version of it :P Fastai seems very research-oriented and opinionated.

I'll definitely look into fastai 2.0 though :)

I think the whole reason AI has become what it has is because these are “brute force” things you can’t do with a normal CPU. So functional programming and massively parallel algorithms are what make it possible.

Every year it gets more accessible to a wider audience. Soon there will probably be frameworks that hide the complexity completely and you can just say here’s a massive dataset, I want to train it to be a conversation bot or cat pic classifier, go. But we’re not quite there yet.

i believe OP would, for example, want to start training locally just to check for errors, then do the run somewhere remote.

Synchronizing local and remote code shouldn't take much time, but it's still at least a few seconds on the critical path for the run->fail->fix->rerun loop.

VSCode's remote mode might be a worth a try for people with such a setup.

I really like the fastai abstractions and their attention to detail. Also their callbacks are amazing. I always look at their implementations for inspiration.

But my main issue with it is the code formatting (https://docs.fast.ai/dev/style.html#layout). Maybe I am too used to PEP-8 and black (https://github.com/psf/black) formatted code, but honestly I cannot stand the code format.

Ufff that's rough to read. Totally. The only one I agree with is the one that says:

    Aim to align statement parts that are conceptually similar. It allows the reader to quickly see how they’re different. E.g. in this code it’s immediately clear that the two parts call the same code with different parameter orders.
That'd turn something like this:

    class OneClass:
        def __init__(self, a, b1, b2, c_long):
            self.a = a
            self.b1 = b1
            self.b2 = b2
            self.c_long = c_long
Into this:

    class OneClass:
        def __init__(self, a, b1, b2, c_long):
            self.a      = a
            self.b1     = b1
            self.b2     = b2
            self.c_long = c_long
(maybe not the greatest example, there are places where this helps much more)

The others all are un-pythonic and make the code more unreadable.

It's certainly unpythonic - as the link explains, it's based on research that goes back many more decades than Python has existed, and that PEP 8 entirely ignored.

But it only makes the code unreadable if you don't make a tiny effort to adjust. If you do make the effort, there's some great payoff, like this code:

        self._split(b);                                  self('begin_batch')
        self.pred = self.model(*self.xb);                self('after_pred')
        if len(self.yb) == 0: return
        self.loss = self.loss_func(self.pred, *self.yb); self('after_loss')
        if not self.training: return
        self.loss.backward();                            self('after_backward')
        self.opt.step();                                 self('after_step')
    except CancelBatchException:                         self('after_cancel_batch')
    finally:                                             self('after_batch')
That's the inner part of the training loop. You can see at a glance: what steps are in the loop; what callbacks are in the loop, in what order; what step corresponds to each callback. And you can see the whole training loop at once, which is great for getting a clear picture of what's going on.

By looking at that code without knowing much of the context in which it works, I assume self('begin_batch') and the like are "signals" that set some state or are used for logging. That behaviour could be achieved using other mechanisms (perhaps some metaprogramming magic or an observer pattern).

And while I can appreciate it can be quick to see where the signals are sent, the use of ; and having two things in a line still aren't convincing me.

Even more, if I were to run the line_profiler here, I know it'd report weird numbers precisely for having more than one thing per line.

The other thing that I dislike is opening blocks and closing them in the same line. It may be force of habit for me, but that screams unreadability at my face.

Rounding up, all I see is behaviour that can be achieved through other mechanisms, and dev/tools unfriendliness. And notice I'm not sayin anything about PEP-8, because:

a) There are parts of it with which I don't agree either.

b) Many people use PEP-8 as a sort of "silver bullet" and argument-ending-remark. That's not what it should be, it should be a _guide_ to be used when it helps, and ignored sparingly (with reason and consideration of _why_ you decide to ignore it, in the sake of readability).

I've long wished black would do this!

While I appreciate what black can do (no more discussions about code style!) I am lucky enough that I manage a small team (2-5 programmers) that understand and follow the style convention we set.

Even the creator of Python says "Black is good for large projects, but I find it uglifies all code compared to manually formatting..."

How does the default NLP tokenizer work with transfer learning? Are all the fast.ai pretrained NLP models pretrained with the same tokenization scheme?

FYI, I also created an HTML version of the paper, which is much easier to read on mobile, and probably easier on a computer too if you're not printing it out:


Jeremy <3

Is this paper about version v1 or v2 of the API?

V2 according to his twitter.

Listen to the paper here you can https://youtu.be/XO26sNuJaXg

All hail Jeremy et al. A leader and a legend.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact