
PyTorch 1.5 - vpj
https://pytorch.org/blog/pytorch-1-dot-5-released-with-new-and-updated-apis/
======
smhx
notable co-releases along with PyTorch 1.5:

\- TorchServe: model serving infrastructure for scalable model deployment

\- TorchElastic w/Kubernetes: fault-tolerant "elastic" neural network
training, allowing nodes to join and leave (for eg. to leverage spot pricing)

\- Torch_XLA: updates for PyTorch TPU support

\- New releases of torchvision, torchaudio and torchtext

Summary blogpost at [https://pytorch.org/blog/pytorch-library-updates-new-
model-s...](https://pytorch.org/blog/pytorch-library-updates-new-model-
serving-library/)

~~~
FridgeSeal
Damn they’re on fire at the moment (pun sort of intended).

Particularly excited about the launch of Serve. I didn’t know about
TorchElastic, the name makes me think of ElasticSearch but apart from that I’m
keen to get stuck into that as well.

Edit: C++ API now having complete parity with Python API is pretty cool,
hopefully when that flows through into the Rust binding crate, that should
make writing nn applications in Rust nicer.

------
mratsim
I'm curious about the NHWC layout they mentioned.

AFAIK CuDNN always had optimizations for NCHW and that was one of Tensorflow
speed issue when they choose to default on NHWC, plus the related issues on
writing transformation pipelines.

So what does NHWC enables that is new?

Relevant in-depth discussion including CuDNN team lead, Julien Demouth, and
Scott Gray who implemented Winograd convolution for Nervana Neon (which
interestingly was CHWN so batch last): [https://github.com/soumith/convnet-
benchmarks/issues/93#issu...](https://github.com/soumith/convnet-
benchmarks/issues/93#issuecomment-192621350)

~~~
jhj
The optimal memory layout usually depends upon which dimensions are the
reduction dimensions (for 2d convnet convolution, the spatial dimension on
which there is also reduction is likely less important than the batch or
channel dimension). Thus, the optimal memory layout for forward and backward
passes usually differs a lot, but transposing between the different layouts on
the fly has high cost.

Other alternatives beyond just permuting the dimensions include strip
mining/tiling and raising/sinking dimensions, techniques which come from loop
nest analysis (and correspond exactly to what one would do with the loops of
the code): e.g., translating NCHW -> N(C/4)(4)HW -> N(C/4)HW(4) for
vectorization purposes, where we turn a 4 dimensional array into a 5
dimensional array, with the innermost dimension being a set of 4 contiguous
channels which is sunk into the loop nest, and is amenable to vectorization.

Since many of these kernels are hand-tuned or generated by library vendors,
there is likely not much of a choice available, but there are likely many
other more optimal memory layouts out there that would require machine
learning-driven compilation or mathematical optimization techniques like
polyhedral compilation to explore and discover.

~~~
mratsim
Sure, you might be interested in the research in loop nest scheduling I
gathered here:
[https://github.com/numforge/laser/blob/master/research/autom...](https://github.com/numforge/laser/blob/master/research/automatic_loop_nest_scheduling.md)

The most promising here being the Halide and Tiramisu compiler. Halide uses
machine learning to discover scheduling and Tiramisu uses a polyhedral
approach.

Also I didn't add them yet but Stanford Legion
[https://legion.stanford.edu/](https://legion.stanford.edu/) and ETH Zurich
DaCe
[http://spcl.inf.ethz.ch/Research/DAPP/](http://spcl.inf.ethz.ch/Research/DAPP/)
are more focused on memory locality and are obtaining excellent results as
well.

------
dynamite-ready
I like the fact that the C++ API now has the same features as the Python one.
It was hard to find good C++ based NN libraries up to 2 - 4 years ago. The
likes of Tensorflow had a C++ API, but the documentation was odd. Now that
Facebook and Google (with Tensorflow) appear to be committed to maintaining
well documented C++ API's for their ML projects, perhaps it might draw a few
people away from using Python for this work.

While it's a notoriously verbose language, your deployment options do increase
with C++, and you also get type safety, which seems like a good thing for ML
work.

~~~
M5x7wI3CmbEem10
so TF is catching up to PT?

~~~
dynamite-ready
The last milestone TF release made the Keras API part of its core which I
thought was a pretty smart move. But I don't write enough NN code to make a
decent judge of how useful that decision will prove to be. I just know I
prefer the docs and examples for PT.

------
oehtXRwMkIs
Still waiting for better rocm support without docker. I wish it was at least
at the level of tensorflow.

~~~
t-vi
You can compile PyTorch this pretty easily, I wrote a recipe here:

[https://lernapparat.de/pytorch-rocm/](https://lernapparat.de/pytorch-rocm/)

No docker involved, just Debian + the ROCm packages from their repository.

I'm doing this relatively regularly, and it has worked well for me for the
past half year or so.

I guess I could upload wheels somewhere, but I'm never sure how tight the
dependency on ROCm has to be.

Over the last release cycle, PyTorch/ROCm gained support for TorchVision
including GPU ops among other things (based on support for extension compiled
via setup.py). Master has support for loading extensions that are compiled
just in time.

If you find things not working as well as they should, don't hesitate to tag
me on a bug report or send a mail.

(The PyTorch/ROCm is mostly for my own entertainment, so I don't speak for
anyone.)

------
0xcoffee
I'm not one to normally criticise some minor layout choices on a blog, but
this font is really difficult to read for me.

On Firefox: [https://i.imgur.com/zIHis3x.png](https://i.imgur.com/zIHis3x.png)

It's so wavy

~~~
dfan
I'm not sure what's going on with your computer (or with mine), but the
lowercase t's (which I assume is what you are noticing) look fine on in my
Firefox (75.0 on macos 10.14.5).

------
jamisteven
Checkout PyTorch lightning.

~~~
sytelus
I want to love PyTorch lightning but like many other wrapper frameworks it
also has severe problem: You can't use your dataloaders and nn.Module model
as-is. FastAI also invents its own data bunch, dynamically specified models
etc. All these reduces lines of code but then you are forced to adapt whole
world to your new language.

~~~
JPKab
Fastai data bunch objects, along with all it's other data wrapper objects are
incredibly simple. You can basically take any iterable and use it within them.

