Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Fast Deep Reinforcement Learning Course (dibya.online)
148 points by gh1 75 days ago | hide | past | favorite | 37 comments
I worked on this applied Deep Reinforcement Learning course for the better part of 2021. I made a Datacamp course [0] before, and this served as my inspiration to make an applied Deep RL series.

Normally, Deep RL courses teach a lot of mathematically involved theory. You get the practical applications near the end (if at all).

I have tried to turn that on its head. In the top-down approach, you learn practical skills first, then go deeper later. This is much more fun.

This course (the first in a planned multi-part series) shows how to use the Deep Reinforcement Learning framework RLlib to solve OpenAI Gym environments. I provide a big-picture overview of RL and show how to use the tools to get the job done. This approach is similar to learning Deep Learning by building and training various deep networks using a high-level framework e.g. Keras.

In the next course in the series (open for pre-enrollment), we move on to solving real-world Deep RL problems using custom environments and various tricks that make the algorithms work better [1].

The main advantage of this sequence is that these practical skills can be picked up fast and used in real life immediately. The involved mathematical bits can be picked up later. RLlib is the industry standard, so you won't need to change tools as you progress.

This is the first time that I made a course on my own. I learned flip-chart drawing to illustrate the slides and notebooks. That was fun, considering how much I suck at drawing. I am using Teachable as the LMS, Latex (Beamer) for the slides, Sketchbook for illustrations, Blue Yeti for audio recording, OBS Studio for screencasting, and Filmora for video editing. The captions are first auto-generated on YouTube and then hand edited to fix errors and improve formatting. I do the majority of the production on Linux and then switch to Windows for video editing.

I released the course last month and the makers of RLlib got in touch to show their approval. That's the best thing to happen so far.

Please feel free to try it and ask any questions. I am around and will do my best to answer them.

[0] https://www.datacamp.com/courses/unit-testing-for-data-scien... [1] https://courses.dibya.online/p/realdeeprl

To be honest though, the practical side of things of RL can be a hit-and-miss in terms of "fun" depending on the person. It requires a lot of manual hand tuning, reward shaping, hyperparameter tuning, and general trial-and-error to make an agent do a seemingly simple-enough task, and these tricks are more heuristically and haphazardly done than what you would expect from more "conventional" programming. It is fun for the right people (who loves tinkering with stuff and also have the perseverance to continually run RL experiments that can last hours or even days). But I would imagine many getting bored by the whole experience. (Pssst.... I was one of them, switched to doing something else in the middle of grad school)

By the way, RLlib is good if you want to try out simple experiments with well-established RL algorithms, but it's really awful to use when you want to modify the algorithm even just a little bit. So it's not bad for beginner-level tutorials, but once you get the basics it might be very frustrating later on. I would recommend simpler frameworks like Stable Baselines 3 (https://stable-baselines3.readthedocs.io/en/master/ ) for a much more stable experience, if you have gained a fair bit of Python/ML programming skills at hand and don't have trouble reading well-maintained library code.

RLlib maintainer here -- We've been in the process of making many API changes over the past couple months to make it easy to modify or implement custom algorithms. The full set of changes and updated docs will be released along with ray 2.0 in August!

Ah, good to meet here. I had experience using RLlib while participating in research back at grad school (which eventually became a SIGGRAPH conference paper this year!), and I've even sent some small pull requests before (with a different ID). Sorry if this is a bit of an off-topic comment, but I want to share some inconveniences I've experienced during using RLlib:

- The framework seems to be mainly built on the assumption that it is going to be run on a cloud machine like AWS/Azure. However, many researchers use HPC-type cluster machines which are far different from these cloud setups, and I found support for it to be lackluster in RLlib. (In our case we had 4 16-core Xeon CPUs and 1 V100 GPU per node, with multiple nodes connected via Infiniband, with CentOS 7 / OpenHPC installed and job control done via SLURM) It was quite disappointing to found out that the framework didn't support Infiniband communication at all, since these are really costly to have (for good reason!). I also found that allocating workers based on lower-level details like affinity/NUMA to be very cumbersome, since the API assumes you want to "auto-assign" your workers automatically instead of "pinning" it manually for the highest performance. (The last time I've used RLlib I looked at placement groups to do this but found it too confusing.) Running your environments NUMA-aware can be crucial for having the best performance when you're running heavy custom-made environments in C++. I did some experiments and found out that parallelizing the environment on the C++ side (via threading) on each NUMA mode was much faster than blindly running one process per physical CPU core (which is what RLlib defaults to. You can hack a bit and write your VecEnv on the C++ side but this messes up lots of assumptions RLlib makes and creates a whole lot of other issues in the code.) Seeing promising solutions like Envpool (https://github.com/sail-sg/envpool) coming up I think these issues with parallelizing environments can be improved.

- As I've said before, the framework is very easy to do simple and established things, but becomes very hard when you try to do anything custom, like modifying RL algorithms to fit in your research. What I needed to do was to simply modify the PPO algorithm to do some custom learning step inside each epoch, and still found it surprisingly hard. Using the whole declarative "Observable-like" API approach to write RL code in Python was incredibly painful, since you have no way to debug any of your code, and also have no idea that your code is correct until you run your whole RL pipeline until 30 minutes into your training you get a strange TypeError. (Got some of the horror flashbacks from when I was using modern JS and Angular, but in a much worse form) I get the feeling that the overall codebase is incredibly complex, uses too many weird dark Python metaprogramming tricks, and is a pain to navigate and extend, compared to other much cleaner solutions like Stable Baselines 3... (they aren't as "general" of a solution as RLlib, but can be more easily modified towards one's needs). Maybe my needs were a bit special, so it might have been much better if I had hand-rolled my PPO implementation with torch.distributed... (if I just had more time...)

But still, your framework did help tremendously in our research, we wouldn't have finished the paper without it. These were just some lamentations from a formerly-grad school student who was struggling with these issues some years ago. (I'm not doing any reinforcement learning nowadays, but many people would certainly benefit from these improvements.)

My experience matches yours. Recently, I was trying to solve an optimization problem using Deep RL. As usual, I had to run many experiments over several days using various tricks and hyperparameters. Finally, it turned out something related to the symmetry of the action space made a huge difference in learning.

Anyhow, the experimentation stage requires a certain discipline and feels tedious at times. But the moment when learning takes off, it feels great, and for me personally, compensates for the tedious phase before.

It's certainly not fun for everyone, but I guess it could be fun for the target audience of the course (ML engineers/Data Scientists).

Regarding frameworks, my experience has been different. I find RLlib to be more modular and adaptable than SB3. But the learning curve is certainly steeper. The biggest differentiating factor for me is production readiness. Assuming that we are learning something in order to actually use it, I would recommend RLlib over SB3. The equation for researchers may be different though.

Have you ever encountered a situation where RL solved a (IRL "people paid me non-research-grant money for this") problem for you faster than classical controls engineering and/or planning? I have not.

Depends on what you mean by faster. Do you mean "time to solution" or "time to inference"? I think there are also more factors to take into consideration when considering the merit of the method e.g. performance, robustness, ability to handle non-linearity, ability to solve the full online problem etc.

When all these factors are taken into account, I have encountered situations where Deep RL performed better.

There are also very public examples of this e.g. Google's data center cooling [0] and competitive sailing [1].

[0] https://www.technologyreview.com/2018/08/17/140987/google-ju... [1] https://www.mckinsey.com/business-functions/mckinsey-digital...

> Do you mean "time to solution" or "time to inference"?

I meant time to a real solution that works well enough to put into a product.

> There are also very public examples of this e.g. Google's data center cooling [0] and competitive sailing [1].

DeepMind really needed DRL wins on real problems.

McKinsy has a strong incentive to be able to say "we know all about the AI RL magic" (and all the better that it's in the context of an oligarchy's entry in a Rich Person Sport... such C-suite/investor class cred!)

In both cases, DRL was used because it was the right tool for the job. But, in both cases, proving DRL can be useful was the job! Go is a better example, but of course wasn't solving a real problem.

If you throw enough engineering time and compute at DRL, it can usually work well enough. (There is a real benefit to "just hack at it long enough" over "know the right bits of control theory".)

This looks nicely done, but for anyone interested I'd like to mention that these courses aren't something that can replace learning the fundamental concepts and theories behind ML/RL, for which there exist excellent books and courses that focus more on math and theory. I would go there.

These courses teach you how to call a library and use an API. You get nearly the same thing from just looking at the docs. Please don't say you "know RL" after this.

Any references you have would be greatly appreciated.

IMO the best intro book is Sutton's [1], it's extremely accessible (little math background needed) and covers all basic concepts. Work through David Silver's course (search youtube, it overlaps heavily with the book above) and then you are ready for something more advanced like [2] and you can start reading and implementing research papers.

[1] http://incompleteideas.net/book/the-book.html

[2] http://rail.eecs.berkeley.edu/deeprlcourse/

this is exactly what i was hoping for- thank you!

I think the idea is not to replace learning the theory or math, but rather to just postpone it. Learning the practical aspects of an engineering discipline can provide the necessary motivation to study the theory/math, and this is the oft ignored factor. I also think there is some benefit in learning a topic using the tool that you will actually use in production (if that is possible without adding unnecessary complexity in the syllabus).

I personally learned DRL from David Silver's course and Sutton & Burto back in the days. They were the only good resources around and I liked them very much. But I think that with the advent of high-level frameworks in DRL, there are better learning paths.

I do intend to teach the theory/math in a later installment of this series, but I wanted to do it by showing students how to implement the various classes of algorithms e.g. Q-learning (DQN/Rainbow), policy gradients (PPO) and model-based (AlphaZero) using RLlib. This would kill two birds with one stone: you can simultaneously pick up the theory/math and the lower level API of the tool that you will be using in the future anyway.

Enrolled! Went through the detailed lesson plan and you have done a great job structuring the course. I am looking forward to doing it over the weekend.

One suggestion: Instead of naming all the Jupyter notebooks "coding_exercise.ipynb", maybe name them differently? That way, they won't overwrite the previous download.

Good catch. I can imagine that this is annoying. I have put it in my todo.

I hope you enjoy the course over the weekend.

Thank you for doing this!

I haven't looked deeply enough, but does this course use a higher-level 'package' such as OpenAI Gym or teach at a lower-level? (Is lower-level stuff even possible...)

I think the levels (high, low etc.) are relevant for the Deep RL algorithm, not the environment. The lower level version of OpenAI Gym canned environments would be custom Gym environments. I don't see much reason to go any lower than that.

The situation looks different for Deep RL algorithm. You can implement them from scratch yourself using Tensorflow or any other similar library. Otherwise, you could just use a higher-level library like RLlib which implements the algorithm using modular components and exposes hyperparameters as configuration parameters.

In many real world use cases, all one needs to do is to use RLlib's implementation and then tune the hyperparameters. In that way RLlib is to Deep RL what Keras is to Deep Learning.

This course uses RLlib. Does that answer your question?

Great, yep, that is good to know.

When I visit the site using Edge, even with Adblockers disabled, I'm unable to view the courses listed as "preview", such as https://courses.dibya.online/courses/fastdeeprl/lectures/383..., and instead get a notice that "this page has been blocked by Microsoft Edge".

I am sorry about that. Unfortunately, the same thing happens in Firefox when the tracking protection is set to "strict".

This is apparently happening after Teachable updated their video player. Earlier, they used Wistia. Now they use Hotmart.

I have informed Teachable about this issue. They said they will look into it.

The current workaround would be to use Chrome or Firefox (with tracking protection set to a level below "strict").

(Related but kind of off-topic)

I’m a software engineer (non-ML) currently working at big tech company that does ML and has a fair amount of open roles in ML and I’ve wondered is ML the sort of thing you could jump into a team and learn on the job? Or do you really need to take some courses, read some books, or even get a degree?

I got a CS/Math bachelors but it’s been nigh on a decade and my higher level math is rusty. Curious on people’s thoughts here.

If it's neutral networks: I think to some extent you can self study and learn on the job, yes. You can learn how to construct and use neutral networks, and tweak and improve the structure and results, without understanding the maths underneath.

You'd need to (self) study & learn how to train a network, eg course or book or articles?

Hmm isn't that was this HN post is about :-) the course: https://courses.dibya.online/p/fastdeeprl, 4 hours it says, self study

I think often the most challenging part isn't the ML, but to gather training data and clean and prepare it so the ML has sth to learn from

I love the illustrations in the slides. How long did it take you to learn flip chart drawing and how did you do the overlays in LaTeX?

Thanks. I learned it from an Udemy course [0]. Took just a couple of weeks to pick up. Regarding overlays, Sketchbook supports the idea of layers. I simply put different elements in the illustration in different layers. Sketchbook gives me PSD files that can be imported in GIMP. I then export many PNG files by progressively selecting more layers in GIMP. These PNG files go into Beamer like this:

The % sign is important and it maintains the correct positioning of the images.

[0] https://www.udemy.com/course/drawing-for-trainers-leaders-an...

Congrats on the launch. I have seen your Deep RL tutorials circulating on YouTube. I like your presentation style: crisp and precise.

My suggestions for learning deep RL are the book Grokking Deep RL and the Spinning Up website. These are reading focused obviously. Then, when your implementations don't work, compare them to minimal-rl. I don't intend to detract from this course, just adding some of my own suggestions on the topic.




I like the reading-focused SpinningUp course as well. Thanks for the other pointers.

Would this teach transformers? Or is that something else?

Also any tips for finding a study group for learning the large language models? I can’t seem to self motivate.

Maybe this would help you differentiate: GPT-3, DALL-E 2 etc. uses transformers, while AlphaGo, OpenAI Five etc. uses Deep Reinforcement Learning. They are not mutually exclusive, but just different things.


Transformers are being used in Deep RL for at least months.

Try these: https://scholar.google.com/scholar?q=transformer+deep+reinfo...


Before jumping into Deep Reinforcement Learning I highly recommend doing the Reinforcement Learning course by David Silver [1].

[1] https://www.deepmind.com/learning-resources/introduction-to-...

Nice resource, but still 10+ hours of video and nothing else.

No code, coding assignments, math problems or coding problems.

Very little RoI.

I watched them all from start to finish. I had a superficial, shallow "understanding" but no real knowledge.

The best (very short book) to learn Deep RL is the one by Zai, Brown from Manning.

And keep the classic Sutton, Barto near. That's it.

If you want a video course that closely follows the book with quizzes and assignments, check out UofAlberta's MOOC on Coursera.

(Hugging Face also has a new Deep RL course taught by Simonini. You could check that out, but I haven’t seen it.)

HF covers Decision Transfomers.

Sutton and Barto is the best start for foundations. Start there.

Second this, his talks are very elaborate, has great pointers to reading material/coursework - as if you were sitting alongside the students in UCL. Very involved though - if you have a 'day job'.

This looks great! Thank you for all the thought and effort you have put into it.

I am currently working on a project where I need to use RLlib for a capacity planning problem. Looks like I will learn a thing or two over the weekend.

I will eventually need to use a custom environment, so it's great to see it's included in your roadmap. Most courses I have seen totally ignored that. Fancy Atari envs are great for practice and have wow factor, but you need a custom environment to do anything resembling real work.

Would I need a beefy GPU for the coding challenges?

I am glad you like it. The coding exercises don't require a GPU. Thankfully, most RL problems (and certainly the ones used in the course) require small neural nets which can be trained in reasonable time using a CPU.


Applications are open for YC Winter 2023

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact