
Fastai: A Layered API for Deep Learning - pama
https://arxiv.org/abs/2002.04688
======
giansegato
Jeremy is probably one of the smartest people out there. The fact that he
decided to allocate his time to teach and to build open source tools — that
is: a selfless way to provide value to his community — is something rare &
worth being acknowledged.

~~~
sabertoothed
So true. And he cares about ethical behaviour as well (it appears from the
outside where I am), in AI and work life.

------
jph00
Nice to see our paper on HN! FYI the paper covers v2, which is a from-scratch
rewrite that introduces a new layered API.

There are quite a few comments here pointing out (quite correctly!) that v1
was not at all easy to hack on. We've spent the last couple of years fixing
that. Have a look at the paper to see what mean - especially the "mid-layer
API".

~~~
jph00
Oh btw if you don't have time to read the paper, I've summarized some key bits
here:
[https://mobile.twitter.com/jeremyphoward/status/122797513809...](https://mobile.twitter.com/jeremyphoward/status/1227975138097819650)

------
king_magic
I think this is cool... but... personally, I feel that PyTorch’s API is more
or less where I want a deep learning API to be.

TensorFlow is too low level. Fastai abstracts too much away and feels way too
magical - once things start going wrong, or if you need to do something
totally outside the bounds of the library, good luck.

Guess I’m mostly just a bit tired of frameworks for what feels like frameworks
sake.

~~~
jszymborski
For most cases, objects in the fastai framework inherit pytorch objects and
are often rather thin. This means that you can usually roll your objects if
the fastai ones aren't sufficient. This really help alleviate the feeling that
you're "on rails" of the library.

That being said, a current project I'm working on didn't really fit the fastai
pipeline and I needed to drop into pytorch. It's a total shame since there are
so many little niceties that make things easier.

~~~
dillonmckay
Can you go into more details on why pytorch was a better fit for your
particular use-case?

My coworker recently completed a bootcamp using pytorch, and I am working
through the fastai course, so it has been interesting to compare experiences.

~~~
jszymborski
I'm training a semi-supervised task which consumes three text samples while
training. In retrospect, I might have been able to concatenate them, and treat
them as one, but as it was I wasn't able to implement some of the functions
fastai's Databunch required.

Might give it another go if I decide to give it another refactor.

------
_coveredInBees
Personally, I really enjoy the design and level of abstraction that Pytorch
provides. FastAI has always felt too abstracted and trying to go outside the
bounds of it gets really hard and complex, really fast. I glanced through the
paper and while it certainly looks interesting and to be an improvement, I am
not very convinced that doing custom things will be easy, like it is with
Pytorch. My experience has been that it is easy, if you already understand all
the pieces of the entire library, which is not an easy task at all.

That being said, I admire the passion and effort that goes into FastAI and I
think it does a great job at providing an entry point into the world of
ML/deep learning that is far more accessible to people. I'll probably still
read through the library because I always pick up interesting ideas and learn
new things when reading other peoples code, especially when a lot of thought
has been put into structuring a library.

------
locuscoeruleus
The lectures have a ton of gems in them, but I have a hard time following the
lectures as a beginner/intermediate practitioner, as around 1-5 percent is
relevant for me. I don't particularly like the fastai library as it feels like
an obstacle between me and the underlying pytorch library. After trying to use
fastai for some time, I started looking directly at pytorch and found that all
the abstractions and features I liked with fastai were actually from pytorch.
That said, it's a great resource and if you're doing cookiecutter stuff it
seems pretty nice. Also, students of fastai have done some pretty amazing
things. I like how Jeremy emphasizes the top-down view, but it's very hard for
me to submit to it. I can't shake the need to understand the underlying ideas
and build stuff bottom-up, even though I see my progress suffers from it.

~~~
aratauto
The second part of the course
([https://course.fast.ai/part2](https://course.fast.ai/part2)) - builds stuff
bottom-up, starting from matrix multiplication all the way up to ResNets. This
is a great resource even if you want only use Pytorch.

~~~
locuscoeruleus
Great tip. Thanks:)

------
beefield
My small complaint is that there seems to be no backwards compatibility
whatsoever between fastai versions.

I get it that it is supposed to be working on the very bleeding edge of deep
learning technologies, but at the same time it is sold as "practical". At
least I would be slightly uncomfortable doing anything in production with a
library that is all but guaranteed to get no (compatible) development love
whatsoever after a couple of months when the developers have started working
on the next version.

But I guess it may just be a too tough nut to crack to provide a bleeding edge
deep learning library with production quality life cycle support.

~~~
bjoernbu
We're actually using fastai in production and will happily switch to v2. Sure,
there are serious questions about long-term stability and we know these
projects will be high maintenance.

However, they would be anyway: Core models and algorithms are quickly outdated
and any change that allows us to achieve similar or better results with less
effort in creating training data is easily worth the engineering work.

That said, I really hope v2 feels a bit more like other libraries: extending
v1 models has been pretty painful in several occasions. E.g. making some
changes to the underlying pytorch models was very straightforward but still
using all the goodies for training build into fastai (in particular all the
stuff based on the work of Leslie Smith, tuned for best practices inside the
fastai universe) was pretty painful. It is awesome to have a library actually
implement best practices from latest research, but sometimes all this
greatness was pretty hard for me to transfer to changed models.

That said, it has worked for us in v1 and the benefits outweighed the problems
by far.

------
helloiloveyou
Hi Jeremy, just wanted to say you are amazing. I've followed all your lectures
and I wish I had you as a teacher for all my assignments in college. You truly
have a gift to explain hard things in a very simple way. And the things you
are doing with fastai are a gift to the community that is moving science
forward. I'm doing my Master Thesis, and fastai is a key piece of my
experiments. So again, a big thank you. Hope some day I can do a favor to you
as you've done for me

------
pama
Jeremy is a fantastic teacher. The paper is written in his usual lucid style
and mixes code with narrative in a seamless way. The paper feels like an
extended version of one of his pedagogical jupyter notebooks.

------
DrNuke
Bleeding edge is great for fun and personal development, Jeremy (with the
fast.ai team as a whole) is wonderful and all that, but you need to have your
own suitable case study to make the most of such techniques and
implementation... nlp for the time being, it was image processing when they
were just starting and I followed their courses. In 2020 or year V, VI or VII
of this new machine learning explosion, generalists cannot not do much on
their own anymore imho.

~~~
EForEndeavour
My impression from 3 years of working under the job title of "data scientist"
is that generalists have a huge advantage compared to specialists. Data
scientists these days have to have some idea of how each step in the data
science workflow works, including non-technical aspects like business, office
and institutional politics, and domain knowledge.

------
Communitivity
I like fast.ai a lot. For those who remember OpenRDF, fast.ai reminds me of
how nice it was to use Sesame Sails (though sometimes you had to create your
own sail components). Sails were modular semantic reasoning and triple store
layers you could stack in different ways to 'make your AI ship go'.

------
RocketSyntax
Is "fastai to pytorch" what "keras is to tensorflow?"

Glad to see pandas support for tabular data.

~~~
alex000kim
Kind of. Fastai abstracts a lot more of Pytorch then what keras does to TF.

------
nnq
Does it still have the _requirement_ for a GPU (driver) to be around to even
run, like fastai 1.0 had? (Had to manually comment requirements and imports to
have it run CPU-only...)

 _I get it that for any serious use you 'd want a GPU, but for learning and
toying around you might want to be able to run and debug code on your freakin
macbook! Is that too much to asks?_ (Some of us do code in IDEs, not in
notebooks + vim on server, and _we 'd want at least our test suite to be able
to run locally ffs!_)

(Also, hopefully they've got rid of the lovecraftian architecture with
_methods that can mutate an object 's class_ [?!] - I understood the practical
appeal and why they did it, but as a software engineer with sympathy for
functional-programming that almost made me wanna barf :|)

Anyway, _fastai is awesome for learning and experimenting_ , keep up the good
work! I just hate it that it's so obnoxious to use and learn for anyone with a
more traditional software engineering background...

~~~
aratauto
It is possible to run everything on CPU, even if we fastai 1.0. Only training
can be 100 times slower than on GPU. Even for a toy exercises involving image
processing and actual deep networks (30-150 layers) it means hours or days of
training.

It is not FastAI fault though.

~~~
nnq
My actual use case some time ago was to _run tests suites for some tricky and
convoluted data wrangling code that used (by necessity since it was doing some
funky tiled image loading and segmentation) fastai dataframes and stuff_ ,
locally on CPU to debug those tests... neither training, nor even inference
actually, just _running a useless micro-training session at the end to sanity
test that data was loaded in a usable way and things didn 't broke when you
tried to train_.

But in fastai 1.0 it was all bundled together in one big yarn, with everything
depending in the end on some data loading classes that depended on GPU driver
etc.

Anyway, it was really bad architecture and dev practices in the codebase I was
working though, the tested behavior would probably not have matched production
one 100%... I don't blame fastai much for not helping with a broken workflow,
but _I prefer more barebones and less opinionated frameworks, aka using tf or
pytorch directly, since some times you really need to get that "broken" thing
running in production before you work on a refactored version of it :P Fastai
seems very research-oriented and opinionated._

I'll definitely look into fastai 2.0 though :)

------
ya3r
I really like the fastai abstractions and their attention to detail. Also
their callbacks are amazing. I always look at their implementations for
inspiration.

But my main issue with it is the code formatting
([https://docs.fast.ai/dev/style.html#layout](https://docs.fast.ai/dev/style.html#layout)).
Maybe I am too used to PEP-8 and black
([https://github.com/psf/black](https://github.com/psf/black)) formatted code,
but honestly I cannot stand the code format.

~~~
dr_zoidberg
Ufff that's rough to read. Totally. The only one I agree with is the one that
says:

    
    
        Aim to align statement parts that are conceptually similar. It allows the reader to quickly see how they’re different. E.g. in this code it’s immediately clear that the two parts call the same code with different parameter orders.
    

That'd turn something like this:

    
    
        class OneClass:
            def __init__(self, a, b1, b2, c_long):
                self.a = a
                self.b1 = b1
                self.b2 = b2
                self.c_long = c_long
    

Into this:

    
    
        class OneClass:
            def __init__(self, a, b1, b2, c_long):
                self.a      = a
                self.b1     = b1
                self.b2     = b2
                self.c_long = c_long
    

(maybe not the greatest example, there are places where this helps much more)

The others all are un-pythonic and make the code more unreadable.

~~~
jph00
It's certainly unpythonic - as the link explains, it's based on research that
goes back many more decades than Python has existed, and that PEP 8 entirely
ignored.

But it only makes the code unreadable if you don't make a tiny effort to
adjust. If you do make the effort, there's some great payoff, like this code:

    
    
        try:
            self._split(b);                                  self('begin_batch')
            self.pred = self.model(*self.xb);                self('after_pred')
            if len(self.yb) == 0: return
            self.loss = self.loss_func(self.pred, *self.yb); self('after_loss')
            if not self.training: return
            self.loss.backward();                            self('after_backward')
            self.opt.step();                                 self('after_step')
            self.opt.zero_grad()
        except CancelBatchException:                         self('after_cancel_batch')
        finally:                                             self('after_batch')
    

That's the inner part of the training loop. You can see at a glance: what
steps are in the loop; what callbacks are in the loop, in what order; what
step corresponds to each callback. And you can see the whole training loop at
once, which is great for getting a clear picture of what's going on.

~~~
dr_zoidberg
By looking at that code without knowing much of the context in which it works,
I assume self('begin_batch') and the like are "signals" that set some state or
are used for logging. That behaviour could be achieved using other mechanisms
(perhaps some metaprogramming magic or an observer pattern).

And while I can appreciate it can be quick to see where the signals are sent,
the use of ; and having two things in a line still aren't convincing me.

Even more, if I were to run the line_profiler here, I know it'd report weird
numbers precisely for having more than one thing per line.

The other thing that I dislike is opening blocks and closing them in the same
line. It may be force of habit for me, but that screams unreadability at my
face.

Rounding up, all I see is behaviour that can be achieved through other
mechanisms, and dev/tools unfriendliness. And notice I'm not sayin anything
about PEP-8, because:

a) There are parts of it with which I don't agree either.

b) Many people use PEP-8 as a sort of "silver bullet" and argument-ending-
remark. That's not what it should be, it should be a _guide_ to be used when
it helps, and ignored sparingly (with reason and consideration of _why_ you
decide to ignore it, in the sake of readability).

------
jellyksong
How does the default NLP tokenizer work with transfer learning? Are all the
fast.ai pretrained NLP models pretrained with the same tokenization scheme?

------
jph00
FYI, I also created an HTML version of the paper, which is much easier to read
on mobile, and probably easier on a computer too if you're not printing it
out:

[https://www.fast.ai/2020/02/13/fastai-A-Layered-API-for-
Deep...](https://www.fast.ai/2020/02/13/fastai-A-Layered-API-for-Deep-
Learning/)

------
Lucasoato
Jeremy <3

Is this paper about version v1 or v2 of the API?

~~~
nestorD
V2 according to his twitter.

------
vackosar
Listen to the paper here you can
[https://youtu.be/XO26sNuJaXg](https://youtu.be/XO26sNuJaXg)

------
1_over_n
All hail Jeremy et al. A leader and a legend.

