Hacker News new | past | comments | ask | show | jobs | submit login
Dive into Deep Learning (d2l.ai)
528 points by soohyung 8 months ago | hide | past | favorite | 89 comments



As an engineer I find myself in this type of situation quite often - if anyone can point me to some good resources or has any advice, I'd be quite grateful:

- Some non-technical stakeholder comes to me and says "can we solve this problem with Machine Learning?" usually it's something like "there need to be two supervisors on the factory floor at all times, and I want an email alert everytime there are less than 2 supervisors for more than 20 minutes"

- I ask for some sample footage to build a prototype and get a few very poor quality videos, at a very different standard from what I see in most of these tutorials.

- I find some pre-trained model that is able to do people detection or face detection and return bounding rectangles and download it in whatever form

- After about 30 minutes of fiddling and googling errors, I run it against the sample footage

- I get about 60% accuracy - this is no good. Where do I go from here? Keep trying different models? There are all sorts of models like YOLO and SSD and RetinaNet and YOLO2 and YOLO3.

- At some point I try a bunch of models and all of them are at best 75% good. At this point I figure I should train it with my own dataset, and so I guess I need to arrange to have this stuff labelled. In my experience stakeholders are usually willing to appoint someone to do it but they want to know how much footage they need to label and whether their team will need special training to do the labelling and after it's all done is this even going to work?

What are some effective / opinionated workflows for this part of the overall process that have worked well for you? What's a labelling tool that non-technical users can use intuitively? How good are tools/services like Mechanical Turk and Ground Truth?

This part of the process costs time and money - stakeholders, particularly managers who are non-technical tend to want an answer beforehand - "If we spend all this time and money labelling footage, how well is this going to work? How much footage do we need to label?". How do you handle these kinds of conversations?

I find this space fairly well-populated with ML tutorials and resources but haven't been able to find content that is focused on this part of the process.


I'm somewhat surprised at the responses for this.

I believe your issue can be easily solved - have supervisors wear a distinctive color from a non-supervisor. For example let's say it's yellow.

OK so now you have yellow wearing supervisors and everyone else. To resolve the issue you have described acquire a month or so of footage, with labels per minute describing how many yellow wearing supervisors and how many people (in total) there are.

So the data you have is:

1. Yellow wearing supervisors

2. Total amount of workers on the floor

Then with this data you can train a network to do what you're describing pretty easily. Assuming there are a lot of workers on the floor, trying to do person detection or face detection would require too much data. Just have a uniform enforced and train on the colors/presence.


This is a pragmatic and valid approach. No matter what anyone else says.

Imagine, you told a 10 YO child to do this task. Even the child would ask the same question - how do I know who is a supervisor and who is not.

Not only is face recognition hard, it is almost impossible to accomplish in a factory floor like setting. Not totally impossible but it is really really hard. Face detection is still possible but face recognition is far more computationally expensive. You'll need a shit ton of data and you'll need access to the employee database. You'll need a whole new engineering pipeline to make this happen and of course a team.

Compared to that expense and time, you are way better off getting the company to approve special vests for supes.


I remember hearing a similar story circa 2003-2008. Some BigCo was spending a bunch of money to automate their inter office mail and was hung up on OCR of handwritten stuff. Some consultants come in to look and one asks if they can just use different color envelopes/baskets. The answer was “yes”.


Sorry but it was a scenario I imagined and not something that happened in reality. I can't talk about some of the real-world scenarios that I am asked to consult on, so I made up a rather poorly thought-out one.


Something to look at is the classic image processing algorithms that can be effective and more importantly behave predictably.

In your example, take a film of the factory floor when it is empty, then once work begins use a approximately human sized/shaped rectangular sliding window and look for areas that exceed a threshold of difference to the image of the empty floor.

You can then use that window as input to a classifier which will be easier due to the considerable dimension reduction or perhaps you can get sufficient performance using further deterministic techniques.


Interesting approach, is this documented in a blog post or tutorial somewhere where something similar is done?


"Easily solved - just have them wear special clothes." Everything is easy if you can arbitrarily change the requirements!


This is good problem-solving. Why spend tens (if not hundreds) of thousands of dollars building technology to do a complicated task if you can cut that effort in half or more by having somebody where a funny vest?

Remember, the problem is "I need to know when I don't have two managers on the floor," not "how do I use machine learning to know when I don't have two managers on the floor."


This particular problem is "I need to know when I don't have two managers on the floor, and they aren't always wearing funny vests just because the computer guys are bad at deep learning".

If we can make up arbitrary rules and assumptions then just have them jot down on a piece of paper when they come and go, and if they are the last to leave then they have to send an email.


I don't think they are making up arbitrary rules, I think it's problem solving. Brainstorming alternative solutions that are cost effective and solve the problem is a useful exercise. We shouldn't just blindly use machine learning because it's there.


Honestly, despite your facetiousness, this is the best starting point. And then from here work up to more complex solutions if there are reasons why rhis simple one isn’t suitable


This wouldn't work as there is a time requirement of 20 minutes. A solution to this would have to be real-time and not require one to manually log their presence, which would defeat the whole point.


The problem with supervisors was just an example. The person asking the question isn't served by simplifying the problem, because clearly they are after a more general solution.


The requirements were not changed. Supervisors of almost every working class position already wear different clothes to begin with. Heck, even doctors wear different clothing than nurses, teachers than students, coaches from athletes, etc.

The general point is to capitalize on preexisting information than to do the "true" solution which is error prone and even a human might not have 100% accuracy at, due to the fact that in certain settings (such as this hypothetical) the perfected solution cannot be accomplished without constraints.


This solution would change the work place culture - and I 100% bet would lead to a lot of good (for MY definition of good!) supervisors leaving.

Imagine where you worked suddenly introduced this: "Yes, previously everyone could wear whatever they wanted - but from today, just the senior programmers must code while wearing a high-vis jacket around the office so we can track when they at their desks".

The supervisors have now changed their relationship with coworkers - signaling their superiority, while simulataneously feeling stalked by their bosses, and looking "unfashionable"/un-cool - all because someone couldn't figure out how to do deep learning properly... which was the OP was actually asking about!


I really don't understand your comment.

1. Supervisors are already by definition "superior" than their subordinates.

2. Supervisors on factories already wear distinctive clothing - especially in fully automated factories.

Finally, you have yet to propose a solution to the problem yourself that would be highly accurate and easy to train. You vastly underestimate the difficulty to create a bespoke solution from scratch and no data.

In any case since the supervisor thing was just an example - the original poster's only real choice is to manually label everything, but AI is really problem centric so it's hard to recommend anything without knowing the actual problem. Assuming it really is just [someone in an area for a period of time] kind of problem, and the difficulty is picking apart the 'someone' and you cannot influence their behavior, you just need massive amounts of data. Even then there's no guarantee you'll have high accuracy.

If high accuracy is required the problem itself needs to be examined on a higher level.


I don't have a deep-learning solution to the problem (I know nothing about it, that's why I clicked on it!). Seems really hard to me. I'd certainly go with an obvious "clock in and out" or "rfid" approaches... but I've worked in factories - and if you make someone wear some special clothes (or do some special tasks) when they didn't have to before - you're asking for trouble. People really hate change. That's presumably why the OP was looking for an answer for a tough problem.

A nerd analogy would be making a programmer change OSs (or even text editors) against their will: They could do it, but they won't be very happy about it.


"nerd", "computer guys bad at..." it feels like you have an irrational axe to grind here, when a simple solution presented causes this line of argument.


Changing the context is one of the many well respected ways to solve a problem.


This is not bad, but once in this territory, why not just add some tracking beacon to a badge?


That is also a good idea. It really depends on what the rest of the requirements are.



Just passing on info but ANA (the airline company) has colored helmets in their maintenance to facility to distinguish supervisors (color 1) from non supervisors (color 2) and 1st year employees (colors 3) and guests (color 4). I don't know if they do any tracking.


This example was totally made-up.

In my experience related to the type of arrangement you're describing - in reality (at least anecdotally speaking) the helmets are often not worn, or the colors are not enforced, or the colors don't get picked up due to poor quality video.

I deal mostly with third-world countries so safety standards are not always the best.


- We use GCP for labeling [1]

- Yolov3 is state of the art for speed. I think RetinaNet does better if you have the horse power.

- I can't recommend FastAI [2] enough for learning things to try.

- 60% on a frame by frame basis might be enough as long as you have a low false positive rate you can tell. Combine with OpenCV mean shift if you need real time.

- Start small. Show success with pre-trained models, then move on to transfer learning. Start with a small dataset. Agree on a metric beforehand.

- Use a notebook. [3] Play around, don't let it run for days then look at the result.

[1] https://cloud.google.com/ai-platform/data-labeling/docs/

[2] https://course.fast.ai/

[3] https://github.com/Mersive-Technologies/yolov3/blob/master/f...

Edit: formatting


Thanks I will check out these resources


Most AI stuff is just horribly over-hyped, so the sad truth might be that what you are seeing is the state of the art and nobody else has found a better way yet.

As a practical example, figuring out where a given pixel moves from one video frame to the next one, when working on real-world videos, the best known algorithms get about 50% of the pixels correct. With clever filtering, you can maybe bump that to 60 or 70%, but in any case you will be left with a 30%+ error rate.

NVIDIA / Google / Microsoft / Amazon will tell you that you need to buy or rent more GPUs or Cloud GPU servers and do more training with more data. And there's plenty of companies in cheap labor countries offering to do your data annotation at a very reasonable rate. But both of them are just trying to sell to you. They don't care if it will solve your problem, as long as you're feeling hopeful enough to buy their stuff.

Judging from the bad results that even Google / Facebook / NVIDIA show at benchmarks, having a near-unlimited budget is still not enough to make ML work nicely.

Oh and for these image classification networks like YOLO, they have their own flavor of problems: https://www.inverse.com/article/56914-a-google-algorithm-was...


>As a practical example, figuring out where a given pixel moves from one video frame to the next one, when working on real-world videos, the best known algorithms get about 50% of the pixels correct. With clever filtering, you can maybe bump that to 60 or 70%, but in any case you will be left with a 30%+ error rate.

what do you mean by this? optical flow isn't really a learning problem? it's a classical problem with very good classical algorithms

https://www.mia.uni-saarland.de/Publications/brox-eccv04-of....

https://people.csail.mit.edu/celiu/OpticalFlow/

https://github.com/pathak22/pyflow


It used to be. Then the AI fanboys arrived and started treating it like a learning problem.

https://arxiv.org/abs/1612.01925

https://arxiv.org/abs/1709.02371

https://arxiv.org/abs/1904.09117

BTW, also the classical algorithms deal very badly with noise and repetitive textures, e.g. a video of a forest in the afternoon.


Ever tried "DIS optical flow" in OpenCV? Works like a charm for me even in challenging conditions.


Not yet, thanks for the suggestion :)


There are a load of questions here.

> Where do I go from here? Keep trying different models?

> ...after [the labeling is] all done is this even going to work?

> [How to label]

> If we spend all this time and money labelling footage, how well is this going to work? How much footage do we need to label?

Generally, you're discussing the space of model improvement and refinement. This is the costliest and most dangerous part of any ML pipeline. Without good evaluation, stakeholder support, and real reason to believe that the algorithm can be improved this is just a hole to throw money into.

The short answer to most questions is that you don't really know. Generally speaking, more data will improve ML algorithm performance, especially if that data is more specific to your problem. That said, more data may not actually substantially improve performance.

You will get much more leverage by using existing systems, accepting whatever error rate you receive, and building systems and processes around these tools to play to their strengths. People have suggested asking the floor managers to wear a certain color. You could also use the probabilistic bounds implied by the accuracies you're seeing to build a system which doesn't replace manual monitoring, but augments it.

Perhaps you can emit a warning when there's a likelihood exceeding some threshold that there aren't enough people on the floor. This makes it easier for the person monitoring manually, catches the worst case scenarios, and helps improve the accuracy of the entire monitoring system.

Not only can these systems be implemented more cheaply, they will provide early wins for your stakeholders and provide groundwork for a case to invest in the actual ML. They might also reduce the problem space that you're working in to a place where you can judge accuracy better and build theories about why the models might be underperforming. This will support experiments to try out new models, augment the system with other models, or even try to fine-tune or improve the models themselves for your particular situation.

In terms of software development lifecycles, it's relatively late in the game when you can afford the often nearly bottomless investment of "machine learning research". Early stages should just implement existing, simple models with minimal variation and work on refining the problem such that bigger tools can be supported down the line if the value is there.


Thanks - this validates many of the assumptions I had about this part of the process.

It has been challenging communicating many of these realities to non-technical folks, who seem to be quite misguided about implementing these types of systems as opposed to "non-ML" systems where there is a less imperfect and more predictable idea of what's possible, how well it will work, and how much effort is required to pull it off.


In my opinion, there's space for a "ML Product Manager" as a specialization for someone who understands the technical aspects of both software and ML systems, but also can design roadmaps, build stakeholder buy-in, and generally shepherd the project. That feels like a big open space right now.


Yeah IME expectations with ML are just the worst. Somehow, non-ML-educated stakeholders expect it to be predictable, like they pretend traditional software engineering is... but also to be magical in scope.


Honestly, it's not surprising. ML is billed as a tool, one that in the last 5 or so years we've surprisingly "figured out". This is vast overselling, but it still creates the basic mental model for folks without further training: ML is a tool you can apply to certain situations to achieve outcomes that you used to need people for, especially in vision and NLP.

I personally believe this is false, but also false in a way that we're remarkably far away from that. Even more than software, predictive automation is a process. It often relies on particular customization to your own situation to be successful. It can demand vast resources. It's wildly difficult to debug.

So we should be working to retrain those around us. ML is a process.


Gee, you just described my practice :-)


Ha, that's good to hear. Would love to chat with you about it if you're interested.


Have you tried fast.ai's Practical Deep Learning For Coders? https://course.fast.ai/ I think it's great for answering many of the exact questions you have.

I was able to answer my own versions of many of those questions after the first few video lessons. It demonstrated to us that our data is a great fit for machine learning. I didn't feel comfortable turning my experiments into something production-worthy but I feel confident enough to at least have conversations about it and sketch out a possible plan for what a contractor could work on this year.


There seem to be a lot of courses in this space - I'll give this one a try since you're recommending it. Most of them seem to focus more on the theoretical / math aspects of stuff, which is quite interesting but I find it more interesting to implement these things and solve real-world problems.


FastAI has you detecting dog breads in lesson one :)


More advertising from fast.ai


People always bring up fast.ai because it's a good course and it's free. As someone who has gone through it, I can attest to its quality.


Awesome summary. Welcome to some lessons/truths (circa 2019 state of technology):

1. Deep learning (by itself) is often a shitty solution. It takes a lot of fiddling with not just the models, but also the training data — to get anything useful. Often the data generation team/effort becomes larger than the model-building effort.

2. It is hopeless to use neural networks as an end-to-end solution. This example will involve studying whether detections are correlated/independent in neighboring frames... whether information can be pooled across frames... whether you can use that to build a robust real-time of the scene of interest, etc. That will involve lots of judicious software system design using broader ideas from ML / statistical reasoning.

This is why I find it hopelessly misleading to tell people to just find tutorials with TensorFlow/Pytorch and get started. You really need to understand what’s going on to be able to build useful systems.

That’s apart from all the thorny ethical questions raised by monitoring humans.


You need to start from what sort of accuracy do you need for the task from a business perspective (including what is acceptable in terms of false positives and false negatives). Just back of the envelope stuff. You have a rough idea of the "I copied stuff other people has done rate" and the "I spent few a days mucking about rate". This stuff always follows a logistic curve with time, starting at your first rate and asymptotically going to high 90%. Use this to get a ball park estimate of how long it will take / cost. If the accuracy required is close to 100% you can probably give up straight away. For things like this that I have done in the past, a good mental model has been if it isn't worth "manually automating" the task (i.e. paying someone somewhere to watch a webcam and send the email so you always have the end product and you eventually get labeled data as a byproduct) it might not be worth trying to automate it.


Why is this a machine learning problem? Does your factory not have keycard access? Or just require your supervisors to carry some sort of RFID/BLE tracking device. These are well-solved problems.


I had to lie about specifics in the example because I post under my real name and there are things I can't talk about :)

Apologies - I figured the primary intent of my comment - i.e. the questions at the end, would be the focus of most responses


>What's a labelling tool that non-technical users can use intuitively?

i haven't used it but microsoft has this

https://github.com/microsoft/VoTT

>"If we spend all this time and money labelling footage, how well is the going to work?"

"not well at all because we don't have facebook/google scale training data. let's try to figure out a conventional way to do it". for the supervisors problem i would recommend bluetooth beacons.


> "If we spend all this time and money labelling footage, how well is this going to work? How much footage do we need to label?"

Start by labeling some data yourself. If you need to scale things up, you're going to need very clear rubrics for how things should be labeled and you're not going to be able to make them without having labeled some data yourself.

Definitely think about what the easiest form of your task is. Labeling bounding boxes is time intensive, labeling whether there are 2 or more supervisors on the floor should be a lot easier, and you can easily label a bunch of frames all at once.

You're going to need to figure out what tooling you will need for labeling, is this available out of the box, or will you need something custom?

Label X data points yourself and do some transfer learning. Label another X data points and see how much better things get.

The rough rule of thumb is performance increases logarithmically with data[1]. After you have a few points on the curve about how much better things get from more data, fit a logarithmic curve and make a prediction of how much data you will need, though be prepared that you might be off by a factor of 10.

As others have mentioned, it's worth thinking about false positive/negative tradeoffs and how much you care about either.

If the numbers you're extrapolating to aren't satisfactory, then yeah, you need to keep messing around with your training until you bend the curve enough that it seems like you'll get there with labeled data.

[1] https://ai.googleblog.com/2017/07/revisiting-unreasonable-ef...


60% on a per-frame basis might be enough if all you need to do is identify the condition "two supervisors are not on the floor" for at least 20 minutes.

As in, if you compute your per-frame score and compare it over bigger chunks of time, is it sufficiently different when 2 are on the floor and 2 are not?


I wrote random numbers for the sake of narrating a scenario but yeah I suppose you could do Supervisor Present Y/N for 180 frame chunks @30fps and pick up the value per minute


Honestly, I'd recommend trying Google's AutoML. I'm not a shill or employee at all, I've just had really great luck with it identifying my cats (each by name) with only a small amount (couple hundred, low thousands) labeled frames.

In my case it probably used transfer learning on like a resnet-150 or inception or something. Regardless, it approaches the limits of what an expert in machine learning can accomplish, so you'll know very quickly whether you need higher quality video / yellow vests.


For annotation, check out Prodigy [0].

Generally speaking, as classification systems themselves are pretty dumb there isn't really a way to know what architecture will work best for your task, other than trial and error. Of course you can optimize parameters in a less chaotic way (grid-search or AutoML). In my experience it mostly boils down to data. Try augmentation methods, acquiring more data or transfer learning with varying degrees of layer relearning.

[0]: https://prodi.gy/


Looks great, will give it a try. I'm assuming I can just host this somewhere and send users the link.


Probably other ways to do this. RFID/GPS/something that tracks people going through a doorway...or get a better camera...I am really not sure why this needs ML though, this is not a new problem and you are taking a nuke to it when a good ol hammer works just fine (I know this isn't what you asked, I don't care...this kind of illogic cuts companies to death little by little).


I would recommend CVAT for annotation (for images/video): https://github.com/opencv/cvat

In general, annotating data for object detection or segmentation tends to be very hard to do effectively—expect low quality and inconsistent labels.


The shocking thing that at least I ran into is the sheer quantity of training data you really need. The large companies doing this successfully are using utterly gigantic libraries of training data that are beyond anything others could ever come up with. It really brought home to me what a blunt intstrument deep learning really is.


Is there some kind of rule of thumb for a minimum of how much data is needed for various types of problems?


Retraining and existing model does not need many (fast.ai lesson 1 example is retraining a net to distinguish cricketers and baseball players with 30 images). For a full net, it's on the order of thousands per category.


It's about agile development. I would really like a write up how Google for example reduced the energy consumption in their data centers. I have a hunch 30% energy reduction was based on an insight on the specific causal relationship between demand and supply flows. This kicked-off a development sprint and eventually lead to a major energy reduction. A traditional waterfall project planning starting with a requirement to reduce 30% energy collapses before it starts.


I would do the following:

- Manually scan through a couple of hours of data and setup a human baseline.

- Run standard algorithms and find their accuracy.

- Find errors in the model and analyze why the errors are happening. Is the model classifying some other object as a supervisor? Is the model not classifying the supervisor in certain lighting conditions or scenarios.

- Retrain the model with the failure scenarios so that it learns.


Seems much easier and accurate to have the supervisors install an app on their mobile phones that checks whether the phones move (and thus being carried by a person) via accelerometer and whether they are on the factory floor via wi-fi/bluetooth beacons and reports to a central server.

In general, it's much better to not use machine learning at all if at all possible.


At this stage the sibling comment to use GCP is pretty solid recommendation.

You can use Google for labelling (Mechancial Turk style), and AutoML Vision to train your model. It's going to be a bit pricey, but cheaper than your time to do the equivalent and will give you an educated guess at how much work it'll be to beat it. It costs about $100 to train a cloud vision model, I think (not including labelling)? You can also try the API for free to see how well Google does at finding people, they have better off the shelf models than you can get publicly.

https://cloud.google.com/vision/automl/docs/

You can try exploiting other things. Is your scene static? Try using frame differences as a feature. If it's a fixed environment then you should get a boost when fine tuning a model, versus some general person detector. COCO pretrained models should be quite good at finding people out of the box.

I wrote my own labelling tool specifically for Yolo which you may find useful (ie you label your data and export to a train-ready format): https://github.com/jveitchmichaelis/deeplabel

People who are not experienced are usually terrible at tagging images. They're not consistent, they miss objects and they don't understand why it's an issue. It will be faster to pay an "expert" service like mechanical turk, or do it yourself.

Basically a lot of your questions are open research problems. How much data do you need? Not a clue. It depends how your model is failing, which is always worth checking anyway. Figure out what the model is bad at and try and improve it, it should be doable to figure out where that 25% is going.

You should do better with a model like Faster-RCNN or its ilk. AutoML will do something like this, and you can try Facebook’s Detectron2 toolkit, or the Tensorflow Object Detection API.

Detecting unique people is a hard problem, by the way (eg two people versus the same person detected twice). You're better off just using an established method like RFID tags for presence/absence.

Another sibling made a great point. Don't detect people, train a model to output the number of people in the frame. This is how ML is applied to camera trap data with animals. In your case you can reduce this to a binary classification problem - >= 2 people, positive output.


Isn't this a human learning problem? Just tell your supervisors to be aware of their counterpart on the floor, at all times?


I'll chip in with my book, which is written with programmers in mind, implements everything from scratch, works on CPU and GPU, at great speed. Directly links theory to implementation, and you can use it along with Goodfellow's Deep Learning book. Also, discusses all steps, and does not skip gradients by using autograd.

Deep Learning for Programmers: An Interactive Tutorial with CUDA, OpenCL, DNNL, Java, and Clojure.

https://aiprobook.com/deep-learning-for-programmers/


And, what makes me want to dive into this the most, there's some Clojure! Will definitely have to take a look a this one. Thanks.


There's lots of Clojure! (in relative terms. In absolute terms, there's not much of it because Clojure is so concise and powerful that everything is implemented with very little code :)


Is there a print version (in the planning)? I usually don’t buy ebooks


Only a limited hand-crafted hardcover edition is planned. That being said, you can print a dead tree version from the PDF at your local printing shop (or at home) if you care about the text, and not that much about binding.


If you print technical documents often, I highly recommend a wire-binding machine like this one: https://www.renz-germany.de/produkte/renz-ring-wire/eco-s-36...

I've got a lower-end model, second-hand, still not cheap, but it's so cool.

You get real wire-o binding (not spiral binding!), so your book lays open flat on the table and the pages are right next to each other, not slightly displaced vertically like with spiral binding.


Yes, that would be an option.


Most local print shops will do something like that for <$10. Totally worth it, IMHO.

And I just did a quick price check, and https://xpress.lulu.com/ will do it for $10 as well, with shipping in 2 or 3 days (US).


See also Dive into Deep Learning Compiler from the same team: http://tvm.d2l.ai/


Great guide - though unless I missed it I think this is missing the latest advancements around Transformers, BERT, ELMo, etc.

This stuff is pretty fresh, so it's understandable, but the NLP chapter would be greatly enhanced by covering these newer topics


Is there any book which has more than a passing mention of BERT?



Does MxNet as a DL framework still have a place given Pytorch/tensorflow pretty much dominated all use cases?Amazon/AWS still “officially” supported it but given its product driven culture it could replace it with whatever framework that move faster and is more demanded by customers. Vendor Lock-in in this case probably won’t work as well since Amazon is not quite a leader in this case


MXNet existed before AWS picked it, and it has a lot of strengths. I’d use it (especially with Gluon) over TF any day. But that said, PyTorch is usually easy to use on AWS... the preference for MXNet seems weak


It existed at CMU, but it seems like even CMU has moved over to PyTorch. I think Amazon just doesn't want to seem like an "also ran" by conceding to one of its competitor's frameworks.


this looks pretty good. certainly much better than goodfellow's deep learning book. definitely much the diagrams and code are much appreciated but i'm curious why mxnet over pytorch?


I find this comment amusing, have you read the goodfellow's book? That book is amazing.


I suppose you and I have very different notions of the word 'amazing'.


i read the first half of it very closely and skimmed the second.


all the authors look to be Amazon employees and I think MXNet is Amazon's "chosen" DL framework.


ah that makes sense. should've googled author's names. i just assumed they were academic because of the large number of unis using the book.


Has anyone read this book ? It look very attractive but I want to hear some feedback before bookmarking another ML book.


Kinda did, but mostly the first chapters, actually up to CNN chapter (where real modern DL start). But so far, I really liked what I read. It has a very good blend of code and theory, with hands on applications throughout the whole book. Most importantly, all those applications could perfectly be copy pasted into your own environment. So it actually reminded me of a very thorough tutorial on a framework, more say than a regular textbook, although the authors don't compromise on mathematical arguments (but don't get lost in it either, they skimmed pretty fast on regularization theory imho). If you've had previous exposure to classical ML, I think it's a fantastic introduction to DL, enough to get started.


RFID at the doors?




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: