
The State of Machine Learning Frameworks - hughzhang
https://thegradient.pub/state-of-ml-frameworks-2019-pytorch-dominates-research-tensorflow-dominates-industry/
======
rtkaratekid
I work at a small company as an engineer and recently was asked to do a
project that would require some neural net magic. I had some experience with
keras/tensorflow so that was my first choice.

Despite the absolute nightmare of getting it installed and running on a gpu, I
managed it and had a fantastic model. It was doing so well that the company
wanted to expand the project and build out a multi-gpu rig as part of it. So I
get building that environment and install the latest CUDA, cuDNN, nvidia
driver and use tensorflow 2.0 aaaaaand it wouldn't work. I actually spent a
long time hacking on it till on a forum I read that it was just a bug that
hadn't been fixed yet.

At this point I decided to see what Pytorch was like. In literally one day I
installed everything and migrated my project completely over to pytorch. Same
speed, same accuracy, works perfectly on a multi-gpu rig when I set it to. It
was like a breath of fresh air.

The next day I wrote some C++ to import a saved pytorch model so it could run
in a deployment environment. The C++ api is also great. The docs are lacking a
little bit, but an Facebook researcher mentioned to me on the forums that
they're hoping to have it all done by next month.

It's unlikely that I'll be going back to tensorflow.

~~~
ackbar03
Could you provide a link or resources how to run pytorch models from c++?

~~~
JacobiX
[https://pytorch.org/cppdocs/](https://pytorch.org/cppdocs/) This is a brief
description of the API. It worked fine for us even for complex models.

------
_coveredInBees
We use Pytorch extensively in our startup. We tackle a lot of new research
problems as consultants/partners to help develop products or devise new
algorithms/models to solve tasks for our customers. We have never regretted
our choice to pick Pytorch. I found the article pretty spot on when comparing
Tensorflow and Pytorch. The things that have appealed to me about Pytorch are:

1\. Extremely easy to debug and work with. Being able to debug effortlessly in
PyCharm makes life very easy.

2\. The API is quite clean and nice and fits in really well with Python and
nothing feels hacky. I've developed my own Keras-like framework for
experimentation, training and evaluating models quickly and easily and the
entire experience has been really enjoyable.

3\. The nicest thing though is that as the article points out, a huge
percentage of researchers have moved to Pytorch and this allows us to more
easily look at other researcher's code and experiment with things easily and
incorporate ideas and cutting-edge research into our own work. Even for things
that are released in TensorFlow, if it is an important publication that gains
attention and traction in the community, you will likely have implementations
in Pytorch pop up soon enough.

I do think that TensorFlow still has an edge on the deployment at scale/mobile
side of things as pointed out by the article. But Pytorch is a lot younger and
they are making a lot of progress with every release in that space.

~~~
UncleOxidant
Last year I was tasked with looking into a NAS (Neural Architectures Search)
paper and analyzing the algorithm. The paper came with a TensorFLow
implementation. Trying to read that TF code was quite difficult. I searched
around and found a PyTorch implementation - much easier to read and
understand, and it ran about 50% faster as well (the latter was a bit
surprising). I tend to think that TensorFlow lends itself to the creation of
code that's difficult to reason about. That may be different now with the
various flavors of TF (like TF Eager).

I'll add that it was much easier to install PyTorch with GPU support than it
was to install TensorFlow with GPU support - at least that's how it was around
November of last year. The PyTorch install was painless, whereas we ended up
having to build TF from source to work with out setup. Could be different now
as I haven't looked at TF since then.

~~~
usmannk
> That may be different now with the various flavors of TF (like TF Eager)

Unfortunately, if anything I think it's the opposite. The constant creation
and deprecation of TF flavors (tf-eager, tf-slim, tf-learn, keras, tf-
estimator, tf.contrib [RIP]) has made reading tensorflow code online somewhat
disastrous. Everybody, including the TF team, is using a different API and
it's difficult to keep all of them straight. It seems that you're doomed to
end up using some combination of many of the above in a way that makes sense
to you and your team, adding another confusing model to the pile.

~~~
halflings
Agree overall, but tf.eager doesn't have much to do with the rest of the list.

tf.contrib is just a module where user-contributed code was stored, which
included both low-level constructs and higher level APIs. tf.estimator is an
abstraction that is mostly used for productionizing models. tf.slim/tf.learn
were indeed redundant with keras (a library developped externally), but were
necessary steps before keras became part of tensorflow.

------
arugulum
Granted that PyTorch and TensorFlow both heavily use the same CUDA/cuDNN
components under the hood (with TF also having a billion other non-deep
learning-centric components included), I think one of the primary reasons that
PyTorch is getting such heavy adoption is that it is a Python library first
and foremost.

There're maybe all of two "surprises" I've encountered in all my time using
it, if even (1. Gradients are accumulated in state, 2. nn.Module does funky
things with attributes, so use something like nn.ModuleDict if you're going to
be dynamically setting modules). Everything else works like a dream, and works
almost exactly how you expected.

Model parameters? .parameters() gives you a dict-friendly generator of
tensors. Model state? .state_dict() is a dictionary. Loading model state?
load_state_dict(state_dict)... just loads a dictionary. Reusing modules across
different modules? Just assign them! Determining what parameters to optimize?
Just ... give the list of parameters to the optimizer.

You can use all your using Python development and debugging tools, and it
feels 100% natural. I can fit it into other Python workflows without making
the whole program centered around TensorFlow.

TensorFlow is undoubtedly powerful, and if you have the time/resources to put
into a static-ish TensorFlow-centric workflow, it could pay off many times
over. But it definitely feels like learning an entirely new language, with an
entirely different debugging pattern. And furthermore, a language that is
constantly changing patterns and best practices, other than super-standard
Keras examples.

To put into context, even running the official TensorFlow models repository
has deprecation warnings. Whereas torchvision works like seamlessly and reads
like a reference for writing PyTorch model code.

There is just a developer-centric focus to PyTorch that makes it a joy to use.

~~~
_coveredInBees
Yup, you make some great points and I couldn't agree more. Very recently, I
was looking into training an object-detector for a custom problem with not
many training examples. One of the classes (hardest one to train from few
examples) was "person".

I was able to create a custom detection network for a 3-class problem, load up
the COCO pretrained weights for the network, strip out all the other weights
at the "head" for all the other COCO classes except for the "person" class and
then fine-tune the model on my custom 3-class dataset. The resulting model
generalized exceptionally well on people as it was still able to retain a lot
of its performance from the COCO pre-training. It was so easy to do all of
this. Literally, maybe 10 lines of code, and so easy to figure out since I
could introspect the state_dict and the weights file directly in my PyCharm
interpreter while working out how to do this.

~~~
mlevental
how did you strip out weighs for other classes? what does that even mean?

~~~
_coveredInBees
So this will depend a bit on the architecture (I was working with CenterNet).
In my situation, the final feature maps for each class are all obtained by a
series of network "heads" that perform a set of convolutions on the same set
of slightly deeper set of featuremaps. Each convolutional "head" is
responsible for object-detections for a single class. So in the case of COCO,
if you have 80 classes, you have 80 such heads. In this situation, if I wanted
to create a new CenterNet model that predicted (let's say) the following 3
classes: "person", "teapot" and "donkey", only the class "person" exists in
COCO and I already have a wonderfully robust person detector if I can utilize
the person detecting weights from the COCO model.

So what I can do is, that I can instantiate a CenterNet model of identical
architecture, except with only 3 heads for the 3 classes instead of 80 heads
for the 80 COCO classes. Now, when I try to load the COCO weights in, there
will be a mismatch and typically you end up with the heads being left with
their default initialization while the rest of the backbone gets the COCO
weights... this is the traditional way you do transfer learning on related
problems because you are still starting off with a much better set of weights
for your entire network backbone than random weights, which will help with
training on related tasks.

However, we can go a step further and load up the state_dict from the COCO
weights file, figure out which set of weights are for the "person" head and
assign them to let's say the 1st of your 3 heads in your new architecture. You
can even go a step further... Since the "donkey" class is quite similar to the
"horse" class in COCO, you could also transfer the weights for the "horse"
head in the COCO weights to your 2nd head. So now you have a network with 2 of
the 3 heads already set up to be robust person and horse detectors. These are
much better poised to then be fine-tuned on your application specific data for
examples of people and donkeys. You end up with a model that is much more
robust on those 2 classes despite only having (let's say) a couple of hundred
labeled images for your specific application.

Hope all of this made sense. It's just nice that in Pytorch, everything is
pretty straightforward and weights are just dicts and it's super easy to
introspect them and splice them, etc.

~~~
zo1
Very nice write up! I'd be very curious if you had any links/resources related
to your solution, or that helped you come up with it.

~~~
_coveredInBees
Thanks. I don't really have anything to link to. It's just something I decided
to attempt as it made sense to me to try it. Person detection is notoriously
hard to get right on a small set of data, but when trained on something like
COCO, modern person detectors really feel like magic because they are so
amazingly good at detecting people even in the weirdest of poses. So I wanted
to try and leverage the robustness of a COCO-trained person detector and then
fine-tune it for my problem but still have other new classes in my network and
this seemed like the way to go about it.

------
danieldk
We are considering to move to PyTorch, we really dislike how the Tensorflow
1.x -> 2.0 transition is handled. For years a lot of stuff has been added to
_tf.contrib_ , some things were only in _tf.contrib_ and now that it's dropped
in TF a lot of project (including ours) have to do quite large rewrites. Since
the last few 1.x iterations, Tensorflow has been complaining that the older
RNN layers are deprecated and that we have to move to Keras RNN layers, which
they claim to be equivalent. However, when we tried a couple of months back,
it made RNN-based training 45% slower. It is all fixable, but it takes time
and a lot of testing of all the model variants to see if there are no
regressions. It feels quite a bit worse than Python 2 -> 3.

I am a bit saddened by all of this, because I really liked how easy it is to
define a graph in Tensorflow in Python, serialize it, and then use its
minimalistic C API to use the graph in Go, Rust, or wherever you need it.

How is your experience with PyTorch and backwards API compatibility (I know
that they only reached 1.0 fairly recently)?

~~~
_coveredInBees
It's been pretty good with PyTorch. The API has been fairly stable and I've
adopted code developed from 0.4.0 to 1.0.0+ with barely the need for any
tweaks. Granted, it's a younger project so for now things are stable but maybe
3 years from now they may have some giant API refresh. But I find their API
quite nice for the most part so I don't see them needing to switch everything
up periodically.

~~~
cmarschner
Wait, pytorch is torch at its core, right? That is almost 10 years old and the
last rewrite was version 7.

------
king_magic
Anecdotally, I've dumped TensorFlow in favor of PyTorch for almost all new
work I'm doing at my organization (industry focused). Biggest gripes with
TensorFlow are overly complex APIs, instability from release to release,
constantly broken code in Google's repos, and poor documentation. Maybe TF 2.0
will be better, but for me, the PyTorch ship has already sailed, and I am
sailing on it.

~~~
tedivm
TF2 still seems pretty beta to us, honestly. There were things that were
pretty easy to do in TF1 that are close to impossible as is in TF2.

------
oli5679
I have worked as data scientist on a lot of finance domain problems -
forecasting default, fraud, conversion probability ect.

Lightgbm library has consistently performed well. I've been interested in how
many colleagues instantly jump to neural nets when in my experience this often
doesn't beat lightgbm on medium sized datasets not related to text/images.

~~~
jimfleming
More anecdata: we consistently outperform lightgbm, xgboost, random forests,
linear models, etc. using neural networks even on smaller datasets. This
applies whether we implemented the other algorithms ourselves or simply
compared to someone else’s results with them. In my experience it really comes
down to how many “tricks” you know for each algorithm and how well can you
apply and combine these “tricks”. The difference is that neural networks have
many more of these tricks and a broader coverage of research detailing the
interactions between them.

I call them “tricks” but really they’re just design decisions based on what
current research indicates about certain problems. This is largely where the
“art” part of neural networks comes from that many people refer. The search
space is simply too big to try everything and hope for the best. Therefore,
how a problem is approached and how solutions are narrowed and applied really
matter. Even simple things like which optimizer you use, how you leverage
learning rate schedules, how the loss function is formulated, how weights are
updated, feature engineering (often neglected in neural networks), and
architectural priors make a big difference on both sample efficiency and
overall performance. Most people, if they’re not just fine-tuning an existing
model, simply load up a neural network framework, stack some layers together
and throw data at it expecting better results than other approaches. But
there’s a huge spectrum from that naive approach to architecting a custom
model.

This is why neural networks are so powerful and why we tend to favor it
(though not for every problem). It’s much easier to design a model from the
ground up with neural networks than it is for e.g. xgboost because not only
are the components more easily composable thanks to the available frameworks
but there’s a ton more research on the specific interactions between those
components.

That doesn’t mean than every problem is appropriate for neural networks. I
completely agree with you that no matter what the problem is you should never
jump to an approach just because its popular. Neural networks are a tool and
for many problems you need to be comfortable with every one of those decision
points to get the best results and even if you’re comfortable it can take time
and that isn’t always appropriate for every problem. My other point is that I
wouldn’t draw too many conclusions about a particular algorithm being better
or worse than another. I’m not saying that was the intention with your comment
but I know many people in the ML industry tend to take a similar position. It
really depends on current experience with the applied algorithms, not just
experience with ML in general.

~~~
suresk
This was a really interesting and insightful comment, thanks for sharing. I
think the conclusion I shared in my sibling comment was probably a little too
broad.

I particularly like this:

> In my experience it really comes down to how many “tricks” you know for each
> algorithm and how well can you apply and combine these “tricks”. The
> difference is that neural networks have many more of these tricks and a
> broader coverage of research detailing the interactions between them.

This is pretty true - the lack of knobs to turn on something like XGBoost or
LightGBM both make it pretty easy to get good results and harder to fine tune
results for your specific problem. Maybe this isn't the most correct way to
look at it, but I've always sort of pictured it as curve where you are
plotting effort vs results, and the one for LightGBM/XGBoost starts out higher
but is more flat, and the NN one is steeper.

I guess reading your post makes me wonder where the two curves cross? Do you
have good intuition for that, or do you feel so comfortable with neural
networks that they are sort of your default? I peeked at the company you have
listed in your bio, and it looks like you have pretty deep experience with
neural networks and work with other people who have been in research roles in
that area too, and I wonder how that changes your curve compared to the
average ML practitioner? Certainly figuring out how to pick the best layer
combinations, optimizer, loss functions, etc benefits hugely from intuition
gained over years of experience.

~~~
jimfleming
I think your conclusions are accurate. For many problems LightGBM or xgboost
can often yield decent results in short amounts of time and for many problems
that’s sufficient. A lot of the work we do is about pushing the results as far
as we can take them and the business case justifies the extra time it can take
to get there. For those types of problems, today, we would probably choose a
neural network because then we have a lot more knobs as you mentioned.

Just like the rest of ML, whether neural networks are the right choice still
depends on the problem at hand and the team implementing the solution. It
definitely impacts where the performance / time curves intersect. If we just
need something decent fast, or we’re working with another team that doesn’t
have the same background, we tend to focus on approaches with fewer moving
pieces. If we need the best possible performance, have a qualified team to get
there, and have the time to iterate on development then the curves would favor
neural networks.

------
acgan
We really enjoyed editing this piece. Just wanted to doubly highlight a few of
Horace's (chillee on HN) resources linked at the bottom:

Code: [https://github.com/Chillee/pytorch-vs-
tensorflow](https://github.com/Chillee/pytorch-vs-tensorflow)

Ablation of claims:
[https://thegradient.pub/p/cef6dd26-f952-4265-a2bc-f8bfb9eb1e...](https://thegradient.pub/p/cef6dd26-f952-4265-a2bc-f8bfb9eb1efb/)

JS interactive charts: [https://chillee.github.io/pytorch-vs-
tensorflow/](https://chillee.github.io/pytorch-vs-tensorflow/)

~~~
ankeshanand
Great piece, you might want to update the article with the mention of PyTorch
Mobile that released today:
[https://pytorch.org/mobile/home/](https://pytorch.org/mobile/home/)

~~~
acgan
Thanks! We're currently discussing this with Horace.

------
amrrs
I think tensorflow dominates industry purely because of its capability of
exporting the model into a coreml Android model or easy of moving it to
production in a GCP environment or in whatever form. Pytorch might have to
build a good production pipeline around it to catch up in this game.

With fastai module that's built on Pytorch learning and developing Deep
Learning solutions have become a lot easier. So there's a real game on now

~~~
pillefitz
You can do the same with PyTorch, right? Just export the model as .onnx and
import it with whatever inference engine you like.

~~~
tsbinz
This sounds like in theory and makes you curse in practice. Trained TF model
-> deployment using TF Lite is the most robust pipeline currently as far as I
know.

There are lots of improvements going into pytorch for mobile at the moment,
but for the moment I'll wait and see how it turns out - I didn't have much fun
with caffe2 when "train in pytorch and deploy with caffe2" was the storyline
FB pushed (e.g. problems with binary size and slow depthwise convolutions) so
not too eager to migrate back at the moment.

------
snendroid-ai
I'm using Keras from last 3 years. Most of the time where I have to deal with
core TF code is when I have to write some custom layers. I totally agree on a
part where hacking together TF code seems nightmare (well, initially.. but not
once you know what you're doing), where PyTorch more looks like blissful
experience (I have not tried PyT yet, just speaking from reading all these
comments). I'm genuinely curious about how one can use the trained PyTorch
models in production? For example, I got 6 TF based translation models + 1
classification model running on single AWS instance with TensorFlow Serving
with 1 GPU and 8 CPU cores. These 7 models are deployed to take advantage of
all the resources of this instance and everything runs smoothly. Now
considering I got these same models in PyTorch, what are my options to do the
same?

~~~
ericd
What challenges are you worried about with transferring PyTorch to production?
It’s been wonderful to work with, but I haven’t put a PyTorch model in high
volume production yet, so I’m curious too.

~~~
snendroid-ai
1) My laziness to look for what’s available to do this. 2) Core belief to
_NOT_ use any product backed by FB.

But anyway, at this point I got so many things already running over TF + Keras
that I don’t see any use case of reverting back the entire code base written
over 3-4 years to other platform just because new grads from university are
using some library more over other. I got everything I need, so why to suffer
unnecessarily? I can just spend the same amount of time polishing existing
things rather than spending time after something which has less probability to
be at same level as existing things.

~~~
ericd
Yeah, I guess if TF works for you, stick with it. I started learning with
Theano+Keras, then with TF, and finally PyTorch, and was much happier when I
switched to PyTorch, FWIW. I think it’s worth trying if you haven’t.

------
onlyyimte
It's only a matter of time until PyTorch will also dominate industry.

It's always like this.

Think how Ubuntu took over the server market because amateurs were preferring
it instead of Redhat/CentOS. And when they became professionals or were in a
position to decide, they also put Ubuntu on the server because this is what
they knew best.

~~~
jniedrauer
I'm not sure that's a great example, given that AWS mostly runs on RHEL-based
OSs and Debian is still preferred for Docker. Ubuntu did not "take over the
server market".

~~~
terranoct
RHEL is popular for solutions like running a datacenter mostly because it has
a nice enterprise support story. It's what the E in that acronym is for, after
all. Ubuntu, meanwhile, is quite popular among us mere mortals who have to fix
our own boxen.

Debian is popular for Docker images exactly because many of the people trying
Docker were already familiar with Ubuntu. Those users quickly ended up wanting
smaller images, making Debian an obvious thing to try out since Ubuntu is
basically Debian with bells on.

Ubuntu fought a sea of distros and came out as what's very nearly an industry
standard, if not an official one. The 90s were a fricking mess by comparison.
Slackware on floppies.

(And now I need "Slackware on floppies" dubbed over the "Jesus wept" scene
from Hellraiser.)

~~~
jniedrauer
> Ubuntu fought a sea of distros and came out as what's very nearly an
> industry standard, if not an official one.

I think you may be living in a bubble. I've been running devops for various
shops for half a decade and I've only once used Ubuntu, because it was already
being used by an acquisition.

I won't deny that Ubuntu is popular. It's certainly got the lions share of the
desktop market. But there is no such consensus in the server market.

~~~
Sammi
Come on guys. You're fighting anecdote with anecdote.

Here's some date I could dig up by a couple of minutes googling:

[https://w3techs.com/technologies/details/os-
linux/all/all](https://w3techs.com/technologies/details/os-linux/all/all)
[https://w3techs.com/technologies/history_details/os-
linux](https://w3techs.com/technologies/history_details/os-linux)

Other sibling comments link to more.

------
__sy__
I really enjoyed this write up. Thank you for putting it together. Even as a
TF user, I feel it's a really fair assessment of TF vs PyTorch.

A quick observation that may not be 100% accurate but still worth mentioning:
in some ways TF feels like it was written to solve large scale issues on day
one. For example, when I started playing with the new TF 2.0 distribution
strategies and dataset pipeline, I quickly got the sense that this thing was
meant to move and ingest bucketloads of data across hundreds/thousands of vm
instances. In a way, I suppose it's a reflection of Google culture where
there's a strong emphasis on not doing things that don't scale to Google
Scale.

As a result of this, I sort of feel that you should start with PyTorch and
eventually graduate to TF if/when the scale requires it. This is sort of like
starting with Rails/Django/Node, and migrating to a Go/JVM/[Insert Your
Favorite Static Language Here] stack when the traffic load warrants it.

------
kcolford
Whatever happened to Julia? Wasn't it supposed to incorporate all these
incredible abstractions at the language level and run quickly on GPUs and
everything in-between? Is it just lack of adoption or is has it something
else?

~~~
ddragon
If you mean Zygote.jl, it's a very ambitious project (like Swift for
Tensorflow which has been under development for even longer I believe) with
not many people working on it compared to Tensorflow and pytorch. And Pytorch
for example only supports the method it decides to overload, while Zygote aims
to support everything in the language (including stuff that isn't as obvious
like state, IO, control flow in general). And then you have optimizations over
the computation graph, memory management on GPU and many corner cases I can't
imagine.

Though you can already use very clean Pytorch style libraries like Flux and
Knet or the Tensorflow bindings to leverage the benefits of Julia for high
performance numerical processing on the adjacent tasks such as data
preprocessing.

~~~
carapace
"A Differentiable Programming System to Bridge Machine Learning and Scientific
Computing"

[https://arxiv.org/abs/1907.07587](https://arxiv.org/abs/1907.07587)

[https://news.ycombinator.com/item?id=20477873](https://news.ycombinator.com/item?id=20477873)

From the abstract:

> We describe Zygote, a Differentiable Programming system that is able to take
> gradients of general program structures. We implement this system in the
> Julia programming language. Our system supports almost all language
> constructs (control flow, recursion, mutation, etc.) and compiles high-
> performance code without requiring any user intervention or refactoring to
> stage computations.

Just linking to this for those who haven't seen it.

------
mlthoughts2018
Keras still is the very best in terms of expressive models and end to end
workflows. It leans heavily on the design idea that you should deliberately
design for end to end use cases and all intermediate abstractions should be
building blocks that serve precisely that purpose. This is discussed in [0]
which IMO is something that deserves to be more widely talked about in
software engineering. Lots of other disciplines of software engineering _say_
you should design this way, but in my experience it’s very rare no matter what
discipline you’re in. Take TensorFlow itself. It’s a huge mess with no clear
abstractions useful for end to end solutions. Just a hodge podge of disparate
APIs and way too many underlying engineering concepts were elevated to
abstractions for engineer (instead of user) convenience.

Constraining design by end to end use cases is a remarkably robust and useful
process.

PyTorch is way better at having clean engineering abstractions than
TensorFlow, but still falls short when things like “forward” or maintaining
your own training loop and gradient metadata are necessary concepts for a
practitioner’s end to end workflow.

[0]: [https://blog.keras.io/user-experience-design-for-
apis.html](https://blog.keras.io/user-experience-design-for-apis.html)

~~~
abakus
In terms of model expressiveness, I made a functional NN building API for
PyTorch (just like keras'), which offers the optimal balance of flexibility
and expressiveness: [https://github.com/blue-
season/pywarm](https://github.com/blue-season/pywarm)

------
Dzugaru
In my experience in computer vision research it doesn’t matter what you use,
yes, immediate mode is slightly more convenient, but research time is
influenced much more by your computing power, dataset
acqusition/labeling/relabeling power and, last but not the least, by code
quality and easy and efficient collaboration - thats why you need tools like
DVC/Argoproj. We did get amazing results using Caffe v1 back in the day.

~~~
Q6T46nT668w6i3m
I agree. I haven’t encountered a strong preference in academic computer vision
or machine learning. Keras and PyTorch, dominate, of course, but I wouldn’t be
shocked if everyone started using something new in the future.

------
rayalez
Anyone has any opinions on TF2.0? They've released it recently, and it seems
like it should be much closer to PyTorch now, but I don't know enough to
evaluate it properly.

[https://www.youtube.com/watch?v=EqWsPO8DVXk](https://www.youtube.com/watch?v=EqWsPO8DVXk)

~~~
sails
Jeremy Howard [0] has some takes here [1], mostly negative if I recall
correctly.

[0] [https://www.fast.ai/about/](https://www.fast.ai/about/) [1]
[https://www.youtube.com/watch?v=J6XcP4JOHmk&t=4152s](https://www.youtube.com/watch?v=J6XcP4JOHmk&t=4152s)

~~~
ehsankia
Wouldn't the CEO of Fast AI, a library using PyTorch, be slightly biased?

~~~
losvedir
They're also championing TensorFlow in Swift, though.

------
phillipcarter
What I really liked about this article is near the end, where it identifies
two things:

* Automatic differentiation of higher-order differentiation being important, and how there's clearly room to disrupt there

* Increasing hardware diversity seems to mean that both frameworks will run into a brick wall as-is

Exciting space. It'll be fascinating to see how dramatically, or not, things
change in the coming years.

------
cs702
Like others here, at work we switched over from TensorFlow to PyTorch when 1.0
was released, both for R&D and production. Our productivity and happiness with
PyTorch are noticeably, significantly better.

Back when we were using TensorFlow, whenever we wanted to try something new,
sooner or later we would find ourselves _wrestling_ with its computational
graph abstraction, which is non-intuitive, especially for models with more
complex control flow.

That said, we are keeping an eye on Swift + MLIR + TensorFlow. We think it
could unseat PyTorch for R&D and eventually, production, due to (a) the
promise of automatic creation of high-performance GPU/TPU kernels without
hassle, (b) Swift's easy learning curve, and (c) Swift's fast performance and
type safety. Jeremy Howard has a good post about this:
[https://www.fast.ai/2019/03/06/fastai-
swift/](https://www.fast.ai/2019/03/06/fastai-swift/)

~~~
jeffshek
I've read a reasonable amount about this and listened to a bit of the podcasts
they've done.

It feels a bit too early to tell. I don't believe many researchers will switch
to Swift though.

~~~
_coveredInBees
Yeah, I'm not convinced at all. It's the same reason why Julia hasn't replaced
Python for scientific computing either. There is wayyy too much infrastructure
in Python for datascience / machine-learning to just up and switch to Swift. I
get that he's excited about a new challenge, but I don't think it's going to
be great for the Python library when Jeremey switches over to Swift.

As it is the API for FastAI is constantly changing and has hardly ever felt
particularly stable. I don't see it ending up becoming this complete, stable,
polished framework if they keep switching focus. I don't care one way or the
other as I don't personally use it because it is way too complex to extend it
to do anything simple if you just have your own networks and Dataset class
that you want to plug into their infrastructure. Being familiar with Pytorch
and Python, I've always found it much easier to just work with those 2 rather
than trying to bend the fastai library to do things that don't fit perfectly
into the applications they designed for.

------
thanatropism
I remember using early Torch (in Lua! As someone who knew only Matlab!) in
2015-ish; and then using Keras (which is supposed to be an abstraction layer
over NN frameworks) and finding it much more verbose and complicated to use
without recurring to code snippets.

Perhaps it’s the nature of the game that changed with many new kinds of
architectures and so on. But maybe Keras is already overengineered for someone
who just wants to make thumbnail sized GAN stuff at home.

------
nmca
Jax, for those that haven't heard of it, is the thing y'all want.

~~~
mlevental
why would you use Jax over pytorch? even if it has technical merits it lacks
an ecosystem of readily available models to study and tweak.

~~~
nmca
At some point you stop caring about being able to import a set of imagnet
pretrained weights and start caring about extreme flexibility. Think about
implement ting, say "Scene Representation Networks"
[https://arxiv.org/abs/1906.01618](https://arxiv.org/abs/1906.01618) in each
of the three frameworks. Tf is a pig, pytorch is slow, and Jax is going to
crush the problem.

The lack of say, keras.applications is a shame, but it won't last, and if you
have a GPU or 8 the power of optimized (p/v)map definitely makes up for it.

~~~
chillee
I mean, the authors implemented it in pytorch:
[https://github.com/vsitzmann/scene-representation-
networks](https://github.com/vsitzmann/scene-representation-networks)

Do you have any particular evidence that PyTorch is slow here?

------
franciscop
This is _very_ interesting and telling:

> Great API. Most researchers prefer PyTorch’s API to TensorFlow’s API. This
> is partially because PyTorch is better designed and partially because
> TensorFlow has handicapped itself by switching APIs so many times (e.g.
> ‘layers’ -> ‘slim’ -> ‘estimators’ -> ‘tf.keras’).

Arguably, one of the biggest issues Google had with _Angular_ was the switch
from 1.x to 2.x. You'd have thought they learned about how _not_ to make major
changes on OSS projects.

Facebook on React for instance do an amazing job here, they use prefixes to
anything they don't want to support like "UNSTABLE_" and show warnings forever
when they actually plan to make something small obsolete.

I tried to learn from both, so in some of my bigger personal OSS projects
(amount of work involved) like npm's "server". I purposefully made some APIs a
bit more limited than I could to have more flexibility later on if I didn't
like the direction. Of course at a different level, I am a single dev doing
OSS on my free time after all.

But I understand in a project of the size of e.g. Tensorflow it's not an
individual dev learning, it's more about the company learning how to do things
better.

------
chips2001
Pytorch Mobile release today -
[https://pytorch.org/mobile/home/](https://pytorch.org/mobile/home/)

------
elwell
Has anyone used MXNet? I've been meaning to check it out because it appears to
have solid Clojure support.

------
o10449366
And Amazon is still trying to desperately get people to adopt MXNet.

~~~
chillee
MXNet is actually pretty good. It got to the "mixing eager and graph mode"
semantics before either PyTorch or TensorFlow did. On top of that, it's also
blazing fast (usually the fastest of the frameworks).

Admittedly, I've never used MXNet so it might have more issues that I'm not
aware of. Judging from the benchmarks I've seen, however, MXNet got a lot of
things right.

Unluckily, I just don't think it added enough on top of PyTorch or TensorFlow
for people to consider switching. People switched from TensorFlow to PyTorch
because eager mode was just _so_ much easier to use.

------
rossdavidh
...because in research, the person doing the programming gets to choose,
whereas in industry, their boss does. (ducks)

~~~
pmiller2
More like people who have long since left the company get to choose.

------
nbeleski
We have been using mostly Dlib[0]. There was the need to develop solutions
that can be statically compiled and produce dependency-free dlls and dlib
delivered remarkbly on that aspect.

I haven't had success doing so using frameworks such as Torch and TF, even if
their toolkit is better to develop new solutions.

Also we get to write code in C++, which can be a big positive when developing
machine learning SDKs. I personally still do most of the prototyping in Python
though.

I'll be checking the link on the post that mentions that pytorch allows models
to be converted to c++, looks promising actually.

[0] [http://dlib.net/](http://dlib.net/)

------
IshKebab
That first graph is super-confusing. The Y axis says "Percentage of unique
mentions" but it only goes up to 0.7%? Was it meant to be "Fraction of unique
mentions".

And then the title is "PyTorch vs Tensorflow", but it never says whether the Y
axis is unique mentions of PyTorch or Tensorflow? From the context I guess
PyTorch, but come on!

The Y axis should be "Fraction mentioning PyTorch", and the title should be
"Papers that only mention PyTorch or Tensorflow" (assuming I have understood
this correctly).

Shame it was labelled so badly because it's an amazing graph otherwise!

~~~
chillee
Oh shit, you're right.

I fixed these properly at some point, but I made some last minute
modifications to the text size and such.

These interactive figures are probably a bit better overall too:
[https://chillee.github.io/pytorch-vs-
tensorflow/](https://chillee.github.io/pytorch-vs-tensorflow/)

I'll change that ASAP. Thanks for the heads up!

EDIT: Fixed! Lemme know if that addressed your issues.

------
stefan_
You can just shorten it to "Python dominates research", "C++ dominates
industry".

~~~
p-morais
Pytorch has a C++ API now.

------
swampthinker
If I were to start a theoretical computer vision company today, which would I
be better off using?

~~~
probably_wrong
My very biased opinion: you start with PyTorch because it's easy to develop
and debug, and there's no point in having the fastest tools for a model that
you can't train properly.

Once your model is running, and if/when you start hitting performance
bottlenecks, _then_ you consider migrating your model to TensorFlow.

~~~
albertzeyer
But isn't TF eager mode just as easy to develop and debug, and the migration
of TF eager mode to TF static mode is then probably simpler?

~~~
reubenmorais
TF eager mode has been a stable/supported thing for 10 days, since the release
of 2.0. Before then it was available as opt-in behavior that once enabled
meant all bets were off for if things would work or explode. So I think it's
too early to answer your question. Maybe 2.0 bridges the gap to PyTorch in
development speed. But maybe the momentum has already shifted to PyTorch.

------
dmix
One thing the AI/data scene gets the best of is data on their own industry.
Reminds me of how Ruby used to have the best designed websites for their
various tools.

------
Der_Einzige
I don't even believe the thesis about one dominating the other in a specific
domain. I dont think top mentions in conferences in a good measure of usage.

~~~
chillee
Why not?

Ok, admittedly, there are a couple reasons. The fact that most papers don't
mention the framework they use is a big one. So if users of one framework
disproportionately mentioned that framework in their paper, it would be
overrepresented.

I did cover this concern though, in the Appendix. Check out the "Biased
Sample"
section.([https://thegradient.pub/p/cef6dd26-f952-4265-a2bc-f8bfb9eb1e...](https://thegradient.pub/p/cef6dd26-f952-4265-a2bc-f8bfb9eb1efb/))

Basically, some conferences have encouraged researchers to submit code.
Instead of checking the papers, I checked their code instead. The results are
pretty much the same. So I think that mentions in top conferences probably
correlates well with uses in code.

------
buildbot
While PyTorch is awesome, one thing it suffers from in my opinion is no "one
way" to do things - I've found it difficult to take someones model and
training code and tweak it so it fits in your code, compared to Chainer that
has nice abstractions for a trainer, updaters, models, etc.

PyTorch is easy to use and modify, but Chainer, and by extension cupy (a
separate awesome project!) are really, really easy to work with.

------
minimaxir
For the majority of production use cases (which tend to get all the AI/ML
hype), TensorFlow/Keras is more than powerful enough and accessible enough. If
you need to dive down to custom layers/optimizers, PyTorch has value there,
but for people looking to get their start in AI/ML, the meme that "TensorFlow
sucks" is highly misleading.

------
Grimm1
I thought the article was a good read and compared the two frameworks with
only small hints of personal bias, but one point about industry changing to
use pytorch because of researchers already knowing it seems like wishful
thinking. Unless PyTorch addresses its mobile and serving issues it is simply
not a great choice for many production situations. This article actually
influenced me to stick with TF instead of learning PyTorch due to my industry
needs.

Additionally I think tensorflow opt in by default for eager execution is fine
maybe good even. Many models are relatively simple and I doubt the gains for
rewriting them to utilize the execution graph will be worth it when with the
keras frontend you can just dump the h5py model and run it from there which
many companies already do.

Rewriting will only be an issue for sufficiently complex models and at that
point I imagine competent ML professionals will have baked the time for that
into the estimate of the engineering costs.

------
ausjke
I doubt this and feel the conclusion might be just the opposite, that TF 2
will be the top choice for most developers. Just started learning TF2 and feel
it's indeed a great upgrade. Still new to this, and I need TF2 for products
instead of research, the tensorflow lite and tensorflow.js seems very useful,
plus tensorboard looks promising as well.

------
coderheed
Google Trends data [0] shows TensorFlow being much more popular for now,
though the gap seems to be narrowing slowly.

[0]
[https://trends.google.com/trends/explore?date=2017-01-11%202...](https://trends.google.com/trends/explore?date=2017-01-11%202019-10-10&q=%2Fg%2F11gd3905v1,%2Fg%2F11bwp1s2k3)

------
haolez
Is anyone here using Gorgonia?

I’m working in a Go code base and I’m thinking of using it instead of creating
a separate service in Python.

[https://gorgonia.org/](https://gorgonia.org/)

~~~
woah
Isn't it a bit cumbersome working in Go for this kind of stuff? Not judging,
just asking.

------
mark_l_watson
Thank you very much for writing this up. I have been using TensorFlow since it
was first released and even though I am now retired I have been looking at my
own open source models with an idea of converting to TF 2.

Since I am just keeping up with deep learning in particular and AI in general
for my own interests, I will likely switch over to PyTorch because there is no
risk involved and learning something new is fun. This is a big change since I
have years of TF experience and perhaps four or five evenings spent with
PyTorch.

------
eanzenberg
What about unique mentions of keras and/or tensorflow? I'm wondering if papers
mention keras only and don't mention tensorflow at all.

~~~
chillee
I talk about this in the "ablation" section/appendix.

[https://thegradient.pub/p/cef6dd26-f952-4265-a2bc-f8bfb9eb1e...](https://thegradient.pub/p/cef6dd26-f952-4265-a2bc-f8bfb9eb1efb/)

------
krastanov
A funny reason I am stuck with Tensorflow: I would love to use PyTorch, but
for the moment only Tensorflow has support for complex numbers.

pytorch bug tracker:
[https://github.com/pytorch/pytorch/issues/755](https://github.com/pytorch/pytorch/issues/755)

------
sairahul82
I think its matter of time. New things gets adopted first in research. I think
pytorch will take over tensorflow. I was also a tensorflow user and when i
switched to pytorch i never looked back. I was also participating in a kaggle
competition and top 20 models are all implemented in pytorch.

------
goliathDown
This was for Computer Vision and NLP conferences, but would the same be true
if AutoML were thrown into the mix? I care mostly about efficiency and
optimization, and the author wasn't able to distinguish that Pytorch is any
better or worse save the two anecdotes.

------
33MHz-i486
its been my observation that most researchers/DS prefer PyTorch because it
lets them hack in python and most production software engineers will prefer
models be written TF because of effortless portability and performance of TF
Graphs.

I work on a team that does the latter and lately DS have been handing off
PyTorch models that we cant scale or make performant because Torchscript
doesnt really work with any realistic code complexity and authors include all
sorts of random python libraries. So we can't load models in C++ or get them
under 50ms.

So the framework divide very much feels like dynamic vs statically typed
languages. People that dont have real production demands love dynamic
languages for the productivity.

------
patagurbon
The selection of frameworks here doesn’t quite match the title. Then again,
research wise I’m sure those two frameworks make up 99% or more of papers.

I much prefer PyTorch, effectively all graph frameworks are there. Very nice
to see TPU support with 1.3 as well.

------
anirudhgarg
I have heard that Keras has made using Tensorflow much easier. Isn't that the
case ?

------
lettergram
For a long time cafe was the most popular framework for researchers. Most
people don’t even know of that now.

My point is researchers using a framework (matlab) does not mean it’s used
heavily in industries or even all industries.

~~~
sgillen
To be fair I think Matlab is still heavily used in industry, which I view as a
direct result of being so dominant among students and researchers. Maybe not
for machine learning, but for controls engineering, signal processing, stuff
like that it feels unavoidable.

Personally I think all these deep learning frameworks just haven't had as much
time to mature, I have a feeling once they do that the one that dominates
academia will eventually dominate industry.

------
zitterbewegung
Going from doing ML research to data ingestion and analysis to web frameworks
to API design, blogging and static site generation is very powerful. The trend
seems to be that python will dominate all of these .

~~~
Dude2029
Maybe in the scaly Python-bubble.

------
chadmeister
What great timing for Pytorch Mobile to just be announced

[https://news.ycombinator.com/item?id=21217169](https://news.ycombinator.com/item?id=21217169)

------
gryffin
TensorFlow & PyTorch, Angular & React,

Or even Karma & Jest,

Facebook seems to be late to market, but learns from Google's mistakes, to
create simpler and more elegant tools.

------
chewxy
Shame. I've been using Gorgonia for years now. Progress in the library is slow
but it is superior in deployment when compared to TF or PyTorch.

------
clatan
I wonder how the HN crowd feels about H2O.ai

------
boringg
Similar to the R vs python debate?

------
ineedasername
Can someone provide a tldr on the differences? I know enough to implement
models in tensorflow (via keras) and have a decent understanding of parameter
tweaking, but really don't understand the fundamental difference between these
two libraries. Thanks!

------
qwerty456127
Why?

~~~
chillee
Well, you could read the article :^)

As a summary, though:

PyTorch has become dominant in research because of its API (both its stability
+ having eager mode).

TF has become dominant in industry because A. it came out several years before
PyTorch and industry is slow to move, B. It supported a lot of production use
cases (mobile, serving, removing Python overhead) that PyTorch didn't for a
long time.

~~~
qwerty456127
If only articles were as concise as your summary is I would enjoy reading them
but as long as they are many pages long I have no time to read beyond the
titles, abstracts, conclusions and comments.

BTW I've also read (here on HN) PyTorch learns much faster than TensorFlow
does.

~~~
tecleandor
Then it wouldn't be an article, it would be a summary :)

------
machinelearning
Why engineers like Tensorflow:

\- More code to check-in (Looks more productive)

\- More infrastructure, e.g. checkpoints, exporters etc. (Looks like they're
doing more work)

\- Fancy visualizations (Allows them to look impressive while presenting loss
plots)

\- Easier to reuse things others have implemented and still get credit for it
(TF model zoo, research repo etc.)

Why researchers like pytorch:

\- Way easier to hack together their novel idea

\- Looks scrappier (which somehow makes the individual look like a better
researcher instead of an ordinary programmer)

\- Lots of other researchers release code in pytorch so if you're working off
of their idea, you use pytorch to avoid re-producing their results.

Open to debate on these ideas, let me know if you have a counterpoint or any
other reasons to add

~~~
stinos
_Why engineers like Tensorflow:_

With those bullet points, looks like you didn't talk to actual engineers, but
rather middle-layer management people.

~~~
machinelearning
Unfortunately, you're way off the mark here. Its actually more of the
opposite. Perhaps we have a different set of experiences, but your assumption
here is not the cause of our difference in opinions.

