Hacker News new | past | comments | ask | show | jobs | submit login
Deep Learning with PyTorch: A 60 Minute Blitz [video] (pytorch.org)
610 points by vyuh 30 days ago | hide | past | web | favorite | 59 comments



For anyone who's interested in learning PyTorch, here's the best video course I was able to find:

https://www.youtube.com/playlist?list=PLZbbT5o_s2xrfNyHZsM6u...

They explain things incredibly well, videos are easy to understand, engaging, and to the point. Highly recommend it to everyone!

I've also heard that Udacity has some good courses, but I can't vouch for those yet.


For people who know the basics this article describes Pytorch in more detail: http://blog.ezyang.com/2019/05/pytorch-internals/


One Quarter of the way through this playlist now. It's very good!

I'm having to learn this framework for a course assignment, and I feel a lot better about it now than I did after going through the OP.

Thanks for sharing!


Do you have any more recommendations?

I'm an undergrad student, and I'm nervous about picking between Tensorflow+Keras over PyTorch.

It looks like many more companies are hiring for TensorFlow, and there's a wealth of information out there on learning ML with it. In addition, it just got the 2.0 update.

But, PyTorch is preferred nearly every single time when I see the discussion come up on HN and Google searches. I'm having a hard time deciding what to dedicate my time to.


Abstract from the tools. They come and go. You will need to adopt a new one every other year.

Instead, make sure to understand the math and the concepts, and then it‘s easy to translate that to an implementation.

One way of doing this (though not sufficient) is to learn both tools.

Right now the pull is away from TF (increasingly convoluted API and lots of deprecations) and towards pytorch (more support from the research community and increasing performance in production).


My 2c: learn the methods deeply, pick up the frameworks as needed. Knowing PyTorch or TF well won't make you a good data scientist or statistician.

YMMV


I'll recommend fastai course [1].

At the end of the course you will be able to implement almost any ML state of the art solution (classification, regression and Computer vision).

Sounds too good to be true? Jeremy have that effect, The other day in a podcast they told him Saint.

[1]: https://course.fast.ai/

It's free btw.


I posted this link but now the title has somehow changed. I do not know what is the policy on HN. But the title saying "[video]" might give a wrong impression that this points to a one hour long video. The link points to a tutorial which embeds an entirely optional two minute video that introduces the main content contained in five web pages.


I was very confused when I clicked the link, spent a while looking for the full video.


Wow, thanks, I was planning to not even click the link because, well, 60 minute video.


One thing I've noticed is that it's quite hard to have vibrant discussions about DL because it is all either so simple or it is dauntingly complicated/unpredictable. Mostly my DL conversations end up being about frameworks. Anyone else experience this?

Also the number of DL submissions on HN seems surprisingly low given the applicability of the technology.


It's pretty easy when you're talking to people who understand the fundamentals of deep learning, but that understanding isn't very common even on HN. I think that's because the real-world, valuable usecases of DL are not very accessible:

(a) DL is pretty complicated in a way that's unfamiliar to most software engineers. You are consistently working with Tensors that have a couple more dimensions than people are used to holding in their heads (i.e. images mean you are typically working with 4D Tensors).

(b) You learn from academic papers, not blogs. It's a new workflow for many software people and intimidating to some (although the papers are usually closer to blog posts than rigorous academic papers).

(c) It's very difficult to learn deep learning on your own without it getting pretty expensive. Advanced uses pretty much require GPUs/TPUs and that's either a big upfront purchase or a serious per-experiment cost.

(d) Deep Learning is not a single field. It is CV, NLP, RL, speech recognition and probably others I'm forgetting about. They overlap, but it further reduces the number of people you can have informed discussions with because being knowledgeable about computer vision does not mean you are able to have a vibrant discussion about NLP.


Could you list some of the places you look to learn? Where do you find relevant academic papers? It's hard to find information about what this learning workflow is.

Any recommended resources?


Gwern’s resources are surprisingly good:

https://www.gwern.net/GPT-2

https://www.gwern.net/Faces

These are “hands on” in the sense that you can replicate the results just by pasting in the same code. It’s kind of like a tutorial notebook in essay form.

Speaking of tutorial notebooks, pbaylies’ stylegan-encoder is quite good and you can run it on colab: https://colab.research.google.com/github/pbaylies/stylegan-e...

(Set runtime to GPU up in the menu.)

https://github.com/pbaylies/stylegan-encoder

In my experience the best place to have informal ai discussions is Twitter. The community is shockingly helpful. Follow @jonathanfly, @roadrunning01, @pbaylies and whoever pops up in the stuff they post. Roadrunning in particular posts tweets of the form “here’s some research; here’s the code” often with an interactive notebook.


Ugh its so easy compared to what i've been wrangling in tensorflow.


Keep in mind that tutorials will always make it look easy compared to debugging actual production code. If you look through tensorflow tutorials, they also look very easy, especially with TF2.

That said, I've experimented with pytorch and I agree that it is really nice to work with.

Disclaimer: I work at Google and do use tensorflow, though I don't work on the tensorflow team.


PyTorch is 10x easier to debug than even TF2, and it's been that way all along. TF2 is no easier to debug than the previous releases if you're not using eager mode (which most people don't), and even in eager mode it sometimes errors out in ways that do not offer any suggestion as to _which op_ caused the error. This is nuts. Modern architectures have hundreds, sometimes thousands of ops. It basically boils down to flying blind and guessing and can easy take days of trial and error to figure out each issue. Plus, every time you start a TF program it just sort of sits there for a minute or so before it starts doing anything. This severely hampers productivity when debugging.

To all the folks who are just starting out: just go with PyTorch. It's downright intuitive compared to anything Google has been able to put out so far.

Disclosure: ex-Googler. Used TF while there (and DistBelief before it). Gave it up as soon as PyTorch came out. Couldn't be happier.


TF1 or 2?


Very good that Pytorch emerged as a serious contender to TF. While TF still provided more production grade tools (TFX, TensorRT, TF serving), Pytorch continue to evolve and hope soon we have a more complete ecosystem


I really like JAX as well: https://github.com/google/jax. It's younger than PyTorch and TF, but feels cleaner and more expressive. It has a very nice autodiff implementation (based on https://github.com/HIPS/autograd) and performance is comparable to TF in my experience.


It feels like JAX doesn't have any of the high-level APIs that PT/TF/MXNet that are vital for fast prototyping of model architectures. Is that correct?


It has stax, which is a minimal example of how to build a high level library: https://github.com/google/jax/blob/master/jax/experimental/s...

It seems that the JAX developers are focusing their time on making the core framework better and are leaving the task of building high-level APIs to the community for now. I suspect we'll see a few high-level APIs emerge over the next few months that explore different approaches before the community settles on a particular one.


I hope not. That's part of what makes TF so miserable - the core library didn't provide the tooling people actually needed so the community built a ton of different tools and it just made TF confusing to use.


Is there a drop in replacement for TensorBoard? It’s probably the biggest thing keeping me using tensorflow. Ideally the api of the pytorch equivalent would be about the same too.

I answered my own comment before posting it. But in case it’s helpful to anyone else, I’ll put the answer here: yes, TensorBoardX. Looks like it’s very easy to use: https://tensorboardx.readthedocs.io/en/latest/tutorial.html

Anyone have thoughts on TF2.0 vs pytorch? Over on Twitter people seem to be pretty hyped about TF2.0, but when I tried learning it it just felt... not very fun. I need to give it a fair shot though.


PyTorch supports logging into TensorBoard too ...More details can be found at https://pytorch.org/docs/stable/tensorboard.html


Whoever changed the title did a bad job.


Anyone know somewhere that has a good overview of the various ML and DL model types and what they are good for? I've been looking for a survey paper or book or just a glossary of ML.


When you hear autoregressive model, think “predicting a sequence”. These are good for text to speech since you can say “given some text, generate a spectrogram.” GPT-2 is probably the most impressive example of autoregressive techniques (I think).

GANs, and especially stylegan, are good for generating high quality images up to 1024x1024. These take about 5 weeks to train and $1k of GCE credits. The dataset size is around 70k photos for FFHQ. Mode collapse is a concern, which is when the discriminator wins the game and the generator fails to generate anything that can fool it. Stylegan has some built in techniques to combat this. IMLEs recently showed that mode collapse can be solved without gans at all.

Hmm.. what else... I’ll update this as I think of stuff. Any questions?

EDIT: Regarding IMLE vs GAN, here are some resources:

Mode collapse solved (original claim): https://twitter.com/KL_Div/status/1168913453744103426

Overview of mode collapse, why it occurs, and how to solve it with IMLE: https://people.eecs.berkeley.edu/~ke.li/papers/imle_slides.p...

Paper + code: https://people.eecs.berkeley.edu/~ke.li/projects/imle/scene_...

Some simple code for reproducing IMLE from scratch (I haven't seen this referenced many other places; stumbled onto it by accident): https://people.eecs.berkeley.edu/~ke.li/projects/imle/

Super resolution with IMLE: https://people.eecs.berkeley.edu/~ke.li/projects/imle/superr...

For comparing images, I believe they use the standard VGG perceptual loss metric that StyleGAN uses. (See section 3.5 of https://arxiv.org/pdf/1811.12373.pdf)

It seems to me that the main disadvantage of IMLE is that you might not get any latent directions that you get with StyleGAN. E.g. I'm not sure you could "make a photograph smile" the way you can with StyleGAN. But in the paper, they show that you can at least interpolate between two latents in much the same way, and the interpolations look pretty solid.


IMLE (implicit maximum likelihood estimation) as far as I can tell is a trivial method of parameterizing a random variable distribution and tuning it to make true data (e.g., image) examples more likely. The technique relies on finding nearest neighbor example images, which in turn needs a metric of image distance. Original IMLE uses least-squares pixel distance for example, which is not a very flexible or effective metric in practice (eg., it is completely confused by rotation).

The whole advantage of GaN is it does NOT need an explicit distance metric for comparing images--instead the discriminator effectively learns the metric in order to improve its ability to distinguish real images from generated/fake ones. Arguably this is the whole advantage of GaNs.

So to argue that IMLE can solve mode collapse is a false equivalency.


StyleGAN can be trained significantly faster than 5 weeks (although cost is still ridiculous). I thought StyleGAN used inception distance.


I found a strange bifurcation recently while collecting papers on a sub-topic of this question.. China-based authors quoting other China-based authors extensively, in English with math, of course. Meanwhile, the US and Western EU seem like "it" , in other words, all the papers referenced seem like the ones you would reference..etc self-consistant.


One of the incredibly unfortunate things about science out of China. It may or may not be trustworthy, as in the data may be just straight false. I'm not surprised that you saw that split, I'd be leary of quoting/referencing a potentially false paper myself.


Autoregressive models use their own output at past time steps as part of the input to predict the next value. If your sequence generator does not do that then it’s not “autoregressive”.


Does PyTorch have a learn to rank module? Tensorflow released a ranking module earlier this year, but I’d like to try out PyTorch.


Not as far I know. It does have max-margin loss [1], which is pretty much all you need to implement a neural ranking model, apart from data iterators, and training loops.

[1] (https://pytorch.org/docs/stable/nn.html?highlight=margin%20l...)


As a chess player, "60 minute blitz" sounds very wrong.


Does no one build their own ml algos anymore? I don't understand the need for pytorch and tensor flow. I honestly thought tensor flow was nothing but a teaching thing for undergrads


This type of reasoning can be extended to any high-level tool. " Does no one writes there own OS. I don't understand the need for Linux or windows. I honestly thought windows or linux was nothing but a tool for undergrads to use Excel or host a WordPress site". And this is not a caricature of your argument. There is a lot of stuff under the hood that Tensorflow or Pytorch implement for a programmer. So much so that people have written wrapper for using TF or Pytorch to even further abstract the working of the library. Implementing deeplearning architecture is less of a science and more of a "let me try this or that" and iterating ideas quickly if of the utmost importance. Also, I can implement a neural network in C (CUDA) although not the auto diff part, but I could if given time to research) but if I started implementing my own library, it would take an order (or even more) of magnitude more time to do the stuff I do daily. We don't need to reinvent the wheel here guys.


That's what I'm getting at, the stuff under the hood is what's important, devil in the details and all that. I'm also a quant so every ml algo needs to be tailored so idk


Do you also write your own automatic differentiation tools? Using libraries like TF and PyTorch makes sense if you use neural networks because they provide automatic differentiation (who wants to write out their gradients by hand?) and standard neural network components.

Edit: If your algorithm is not using neural networks, then libraries like TF may or may not be a good fit, it depends on the algorithm. Writing custom low-level code can still make sense in those cases.


People do write their own AD tools... in a day, sometimes!

http://blog.rogerluo.me/2018/10/23/write-an-ad-in-one-day/

http://blog.rogerluo.me/2019/07/27/yassad/

Although the endpoint is likely to be a better understanding of the choices made by a mature implementation, and of the work involved in fixing up edge cases.


Not all of us need to build their own ML algos. Just in the same way that not all of us need to build their sorting libraries or data structures. Some people are specialized in this to develop and do research. While other software engineers just want something they can use without much hassle and just a superficial understanding.


>Not all of us need to build their own ML algos. Just in the same way that not all of us need to build their sorting libraries or data structures.

And yet they love to ask you to do exactly that at technical interviews... coming up next: what ML algos you need to know to ace that interview.


And another reason is standardization. It's a lot easier to use or tweak any given network if it is implemented in the same framework.


good luck trying to land a job using ml with only ‘superficial understanding’


I can actually speak to this because I was involved with maintenance on a consultant's neural net built in raw java.

What I have to say is this: please don't build your own.


They're frameworks which implement high performance tools commonly used in ml problems like tensor operations, automatic differentiation, various gradient descent optimisers, and also neural network building blocks


Do you write your own crypto libraries too?


>Do you write your own crypto libraries too?

Some people do. It's a good challenge.[0]

[0] https://cryptopals.com/


To be fair, the main thing you learn doing the cryptopals challenge is to not write your own crypto.

I had a lesson in writing crypto once, when I made what I thought was a good enough secret mixing procedure to encode some data I wanted to email outside of a company that didn’t allow web access. (Long time ago, circa 2000). It all looked undecipherable and I sent most of the data before I discovered that strings of binary zero were leaking my secret key. Oops, pretty stupid.


>To be fair, the main thing you learn doing the cryptopals challenge is to not write your own crypto.

D'oh! Good point though lol.


But the OP clearly didn't ask about doing it as a challenge or to understand how it works (which is what cryptopals is about), but actual usage.


>But the OP clearly didn't ask about doing it as a challenge or to understand how it works (which is what cryptopals is about), but actual usage.

But the parent of the comment I was replying to clearly had the former in mind, as a subsequent comment showed.[0]

[0] https://news.ycombinator.com/item?id=21240429


That’s quite naive...

I am sure you could write stuff like Diffentiable Processors or the like from scratch with numpy but if you respect yourself and your time, you won’t. Complicated architectures are orders of magnitude harder than writing feed forward networks from scratch. For example, see the Merlin paper.


Why would you write your own AutoDiff if you do not have to?


childish troll attempt


can we see some of the "ml algos" you've built? in particular your autodiff engine


They're all owned by my firm lol, I can't share anything like that





Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: