Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: What are the best resources to learn computer vision?
214 points by ameyades on June 28, 2017 | hide | past | web | favorite | 57 comments
Ideally, I would like to be good enough to get a job at an AI/robotics startup. I already have a CS degree, a decent math background, and am working as an embedded software developer for a large company.

For a full course, Nothing beats CS231n: Convolutional Neural Networks for Visual Recognition http://cs231n.stanford.edu/ by Andrej Karpathy et al.

Also, for a general and high level introduction to neural networks, I wrote a Learning Deep Learning in Keras http://p.migdal.pl/2017/04/30/teaching-deep-learning.html, focusing on visual tasks.

I did this course, but couldn't finish all the assignments. Loved it. Please note that this is a convolutional neural networks course, not computer vision as such. From what I know computer vision encompasses a variety of non machine learning based algorithms, which are not covered in this course.

As an engineer, it's difficult to know when to use deep learning and when to use more classical algorithms. Often, you have to try both and see which is better (twice the work, hooray!). The classical algorithms often are very understandable, and you can reason with what's going on and figure out what is breaking. Deep learning is so much harder, e.g., are my hyper parameters bad, or do I need another 30 GPUs running for a week?

Imho, deep learning has little to do with engineering, and more with guessing, hoping and praying. But it seems you can often get something to work if you do those three things hard enough.

This has been my experience as well :). Deep learning is a lot of random guesswork, trial and error. I am almost always in the 'brute force' mode. However, in this course, you learn more about the fundamentals of convolutions and backprop. You have to implement your own backprop - not sure of what use that is, given that it's a one line code in TensorFlow

I watched a great video on Tensorflow (link below). It mostly introduces very basic deep learning concepts, but there are a few key moments in the 2 hour+ video, where he explains what to do if something goes wrong. It's definitely not a "science", but with enough experience in deep learning, you can intuit what's going on inside the black box, and there are best practices on what to try next.

For example, he goes through a few examples where a neural net has too many weights, or too little data or improperly connected nodes. All three result in problems, but the problems exhibit themselves in slightly different ways and with expertise you can start identifying them.


Studying how to write your own back propagation algorithm could be useful if you're a deep learning researcher. But for most people it would be like studying semiconductor physics if you only want to write software.

This is pretty old school, but I recommend Multiple View Geometry by Hartley and Zisserman (http://www.robots.ox.ac.uk/~vgg/hzbook/) to get through the fundamentals...it's really good to understand the geometric foundations for the past 4 decades. Along the same lines, you have Introductory Techniques for 3-D Computer Vision by Trucco and Verri (https://www.amazon.com/Introductory-Techniques-3-D-Computer-...), which also goes over the geometry and the fundamental problems that computer vision algorithms try to solve. It often does come down to just applying simple geometry; getting good enough data to run that model is challenging.

If you just throw everything into a neural network, then you won't really understand the breadth of the problems you're solving, and you'll be therefore ignorant of the limitations of your hammer. While NNs are incredibly useful, I think a deep understanding of the core problems is essential to know how to use NNs effectively in a particular domain.

After getting a grip on those concepts, Szeliski's Computer Vision: Algorithms and Applications (http://szeliski.org/Book/) had some really amazing coverage of CV in practice. Mastering OpenCV (https://www.amazon.com/Mastering-OpenCV-Daniel-Lelis-Baggio/...) was very useful when actually implementing some algorithms.

I think the question is a little too unspecific for there to be a good answer. The field is vast and depending on which thing in computer vision you want to tackle the best learning paths may vary greatly. Just to give a bit of an overview:

Before the Deep Learning Craze started in 2011 more classical Machine Learning techniques were used in CV: Support Vector Machines, Boosting, Decision Trees, etc..

These were (and still are!) used as a high level component in areas like recognition, retrieval, segmentation, object tracking.

But there's also a whole field of CV that doesn't require Machine Learning learning at all (although it can benefit from it in some cases). This is typically the area of geometrical CV, like SLAM, 3D reconstruction, Structure from Motion and (Multi-View) Stereo, anything generally where you can write a (differentiable) model of reality yourself using hand-coded formulas and heuristics and then use standard solvers to obtain the model parameters given the data.

Whenever it's too hard to do that (for example trying to recognize many different things in images) you need a data-driven / machine learning approach where the computer comes up with the model itself after seeing lots of training examples.

As for resources the other answers are already giving a great overview. Use Karpathy's course for an intro to Deep Learning for CV but don't expect it to be comprehensive in terms of giving you an overview of CV.

Learn OpenCV for more low level, non-ML and generally more "old-school" Computer Vision.

A personal recommendation of mine is http://www.computervisionblog.com/ by Tomasz Malisiewicz. It's an excellent resource if you want to get an overview of what's happening in the field.

Great points.

I would argue Kinetic or Geometrical Computer Vision problems, things like Tracking, Mapping, Reconstruction, Depth Estimation are best suited for the classical approaches like VO, SFM/MVS, SIFT/SURF, HOG etc... and are a separate category of CV problems than object recognition/detection/segmentation - much more capable of being done with ML because dimensionality is reduced.

But there's also a whole field of CV that doesn't require Machine Learning learning at all (although it can benefit from it in some cases).

In fact, Machine Learning has made almost no progress on most of what you mention, specifically SLAM and Multi-View Stereo. It takes completely rethinking how those are done when they are approached from the Deep Learning perspective.

I absolutely love Adrian's blog. http://www.pyimagesearch.com/

He has articles on solving actual problems with OpenCV, dlib and tensorflow. I subscribe to the blog and try to do some of the tutorials myself.

Udacity is another great resource. Their self driving and robotics nanodegrees are great.

I am on the same path as you trying to pivot my career from full stack engineer and add CV + ML skills to it.

When we have decent robot hardware, I want to be the one programming them, not the one getting replaced by them.

A lot of real world computer vision is implemented on embedded devices with limited computational resources (ARMs, DSPs, etc.) so understanding how a lot of commonly used algorithms can be efficiently implemented in embedded systems is important. It is possibly a way for you to jump the gap from "embedded software developer" to "computer vision engineer". Also keep in mind that in many companies a "computer vision engineer" is fundamentally a different beast from a "software developer". A CV engineer creates software but the emphasis tends to be more on systems and is not 100% about software. This will vary a lot by company but if you're working with prototype hardware you will need to get at least a working knowledge of optics.

Fun and trendy though it may be, I would not focus on deep learning / convolutional neural networks to start off. Deep learning is a small subset of computer vision. I would focus more on understanding the basics of image processing, camera projection geometry, how to calibrate cameras, stereo vision, and machine learning in general (not just deep learning). Working with OpenCV is a good place to start for all of these topics. Set yourself a project with tangible goals and get to work.

I once created a list of 20 problems in CV. If you solve them in order, you will know basic CV. Here goes: https://gist.github.com/abhinai/b6eebecb4d19c57cfb1ee64c2b53...

A good online reference is: http://opencv-python-tutroals.readthedocs.io/en/latest/py_tu...

Surprised nobody has posted http://course.fast.ai/ yet. I've been following along with it so far for the first 4 lessons and it has been extremely helpful in understanding how deep learning works from the perspective of someone who did not have much of any related baseline knowledge except how to program. Jeremy is an excellent practical teacher.

Happy it is interesting for you.

I too got this url referred by somebody, and I got excited after their extended intro why, how etc their course different and better then any other.

Though after 5 videos i know nothing more then from any other ML/AI guide on the internet then i did before. 99% is only related to image classifying, and i'm simply seeing too many guides for that.

If anybody has some good links/videos on ML/AI on structured data, please comment and i'll be thankful and happy to click 'm :)

The author himself said that DL isn't the best option for structured data.

"Certainly I'd pick DL over more linear models for most problems. But I'd pick random forests over DL for most structured data problems."

"Deep learning is best for unstructured data, like natural language, images, audio, etc. it sounds like you may be dealing more with structured data, in which case the Coursera ML course would be a better option for you"

More discussion here - https://www.reddit.com/r/MachineLearning/comments/5jg7b8/p_d...

many thanks! much appreciated, will read it over the weekend.

Seconded - can't recommend the course highly enough

A lot of 'traditional' computer vision methods e.g. Hough detector are simply inferior to deep learning approaches.

Plus, it's a lot easier than you'd think to get up and running, especially when you leverage pre-trained models...

OpenCV and http://www.pyimagesearch.com/

disclaimer: not related to any of these

Adrian here, author of the PyImageSearch blog. Thank you for mentioning it, I appreciate it. If anyone has any questions about computer vision, deep learning, or OpenCV, please let me know.

In regards to OPs original question, I'm actually working on solving your very problem right now. About 1.5 years ago I created the PyImageSearch Gurus course (https://www.pyimagesearch.com/pyimagesearch-gurus/) with the aim of bridging academia with actual real-world computer vision problems. The course has helped readers in their academic careers, such as securing grants (http://www.pyimagesearch.com/2016/03/14/pyimagesearch-gurus-...) as well as students become practitioners and land jobs in the CV startup space (http://www.pyimagesearch.com/2017/06/12/pyimagesearch-gurus-...)

Within the next month I'll be launching PyImageJobs which will connect PyImageSearch readers (especially the Gurus course graduates) with companies/startups that are looking to hire.

Finally, I'm also working on my upcoming "Deep Learning for Computer Vision with Python" book (https://www.pyimagesearch.com/deep-learning-computer-vision-...) which is now 100% outlined and I'm on to the writing phase.

Definitely take a look and if you have any questions, please let me know or use the contact form on my website if you want to talk in private.

PyImageSearch is absolutely fantastic. We couldn't have got a top spot in an AI hackathon without your blog: https://www.youtube.com/watch?v=OreCICEcQWY&t=2m45s

Looking forward to your book. Keep up the great work.

2nd for pyimagesearch. The author (not me) is a prolific and dedicated blogger who really wants to share his knowledge. Dude has a 'bootcamp' as well

Highly recommend PyImageSearch!

I found Computer Vision: Algorithms and Applications really good. You can download it for free (for personal use) at http://szeliski.org/Book/.


This is the most comprehensive book I know of on Computer Vision. The diagrams in the book (including captions) themselves do a great job of explaining things.

I started by getting a webcam or two and trying out various projects: marker tracking (made an optical IR pass filter and tracked an IR LED with two cameras), object segmentation (e.g measure geometry of certain-colored objects).

Measure the speed or count the number of cars passing by your street. Try to implement an OCR for utility meter. There are lot's of applications you can train yourself in, and I guarantee that you will learn a ton from each and every one of them.

Grad-level CV courses, all recently offered:

Princeton CS598F Deep Learning for Graphics and Vision


Stanford CS331B: Representation Learning in Computer Vision


UVa CS 6501: Deep Learning for Computer Graphics


GaTech CS 7476 Advanced Computer Vision


Berkeley CS294 Understanding Deep Neural Networks


Washington CSE 590V: Computer vision seminar


UT Austin CS 395T - Deep learning seminar


Berkeley CS294-43: Visual Object and Activity Recognition


UT Austin CS381V: Visual Recognition


And best of luck to you!

Suprised no one has mentionned it, but Udacity has a very good very course on computer vision.


Does anyone know if tech like OpenCV is used at companies developing their own "computer vision" product, maybe at Tesla? Or do they build their own technology from scratch which isn't available to public domain? Or do they say fork OpenCV and build upon it and heavily modify as OpenCV could be seen as 'outdated' technology.

Disclaimer: Never worked with any technology related to Computer Vision, just a bloodboy beginner Python programmer.

Used to work on Tesla Vision / Autopilot Vision. They used Caffe, were switching to Tensorflow when I left, but might be moving to Caffe2 now.

Usually no OpenCV on successful products. Facebook Ads has dedicated research engineers implementing their real time photo analysis algorithms.

cs231n by Andrej Karpathy : http://cs231n.github.io

Yes, this course is very good: https://youtu.be/2uiulzZxmGg

In addition, you can try to work through a serious project related to computer vision to help you solidify concepts. I worked used Style Transfer as my motivating example: https://harishnarayanan.org/writing/artistic-style-transfer/

This is a great resource. I give it to people who need to learn about convolutional neural networks.

However let's keep in mind that the field of computer vision is much vaster than that. Deep learning approaches have been very successful at solving problems in computer vision, but not all of them and not without drawbacks. I believe any course on classic computer vision will give him more insight as to what challenges computer vision aims to solve, how, and what approach might solve what problem.

You don't specialize in surgery before learning biology. Similarly, you don't specialize in CV before learning basic ML and DL. The fundamental concepts are the same no matter if the modality is text, image or video (for example: regularization, loss, cross validation, bias, variance, activation functions, KL divergence, embeddings, sparsity - all are non-trivial concepts that can't be grasped in a few minutes, and are not specific to CV alone).

PyImageSearch by Adrian Rosebrock. http://www.pyimagesearch.com/

Adrian here, author of PyImageSearch. Thanks for mentioning the blog. If anyone has any questions regarding learning computer vision, please see my reply to "sphix0r" below.

Hey, amazing blogs. Currently working on degraded scanned documents. Are there algorithms distinguishable for documents and natural images?

I am using open cv to process the documents, curious if I am missing out chunk of cv algorithms specially for scanned administrative documents (financial,personal documents)?

I'm not sure what you mean by "algorithms distinguishable for documents and natural images" -- can you elaborate? OpenCV itself doesn't have builtin functionality to take documents and fit them to a pre-defined template, that tends to be part of a specific use-case/niche of computer vision for document processing. The general idea is to take a document a user has filled out and "fit" it to a blank template, where you know exactly where each field is. That way you can exact the information from the document.

"The general idea is to take a document a user has filled out and "fit" it to a blank template" - I agree point to point. However, I am struggling with templatization due to poor quality of the document images. To process those documents (denoise, super resolution, HE - etc. etc.), the OpenCV algorithms are not working good enough, requires a lot of tuning varying with each document.

So, I was wondering if those algorithms work better for natural images (buildings, people, things etc) than document images (text, graphics) and if so, there must exist algorithms to process such documents I am unaware of.

It's a really broad field, so don't expect to get up to speed very quickly. A lot of people have recommended a lot of books already, and I could add to that list. One thing you might think about is Safari Books Online. You'll notice a lot of the recommended books are there, and even though it's a bit pricey, I think you'll find you'd save money by the time you get enough of the books that seem useful to you. You'll also loose nothing by jumping from book to book because they're too advanced/not advanced enough until you find one that's at your level.

I would recommend starting with one of the many OpenCV tutorial books, and maybe work your way through a few of those. Then move into books that cover more of the algorithms behind the library like "Multiple View Geometry" by Hartley and "Machine Vision" by Davies, among many others.

I learned OpenCV using the O'Reilly book by Bradski and Kaehler (back when it was OpenCV 2). I found it well-structured and it worked for me. They have an updated version for OpenCV 3.

However, I can't tell you if OpenCV is still the framework of choice and/or widely used in the field you want to go into.

We made a mind map for learning computer vision :


Nodes with a map will go to other mind maps with resources. :)

There is a processing library with opencv bindings here: https://github.com/atduskgreg/opencv-processing

https://github.com/biometrics/openbr these guys are amazing, see what they are doing and you will have a good start

OpenCv is a great start, and is where I began.

this book: http://shop.oreilly.com/product/9780596516130.do has a number of worked examples that explain things well.

It does touch on Machine Learning, but it focuses much more on the fundamentals of computer vision, like feature detection, that allows things like SLAM to exist.

Is OpenCV (traditional CV technique) better to use or Deep Learning based approach? Has anyone done a comparison of the two approaches? The obvious flaw with deep learning is that it requires large labeled data sets - but assuming that is available, which one is more accurate at object detection (hotdog or not), detecting features on an image (faces, manufacturing defects)?

For the widely used handwriting dataset, MNIST, there is a site that actively tracks the best algorithms (although I'm not sure when it was last updated): http://rodrigob.github.io/are_we_there_yet/build/classificat...

You'll notice that all of the top contenders use Neural Networks, but I would bet that many of them use at least some traditional CV techniques to transform the images at various steps. That said, many of the more modern deep learning approaches are ditching CV altogether, just feeding in raw pixels without any normalization or transformation, leaving fewer parameters to tweak.

I use OpenCV for reading and writing real-time video streams (like webcams or video frames) for my hobby computer vision projects, but tend to use other ML or image-specific libraries for actual processing. The cascade classifiers in OpenCV are okay, but it isn't too difficult to set up something comparable in scikit-learn that is more modern and robust. Though a bit less performant if you do need real-time response though.

Best example that I have is a pulse rate detector that I put together, that uses OpenCV for video frame extraction & display but bare numpy/scipy for the rest.


For classical (non-machine learning) material, there's the nice text by Ballard and Brown: http://homepages.inf.ed.ac.uk/rbf/BOOKS/BANDB/bandb.htm

Even if you end up using neural nets, understanding how to think about the problems is useful.

Try reading the samples that come with the predominantly used programming library/framework. Works with pretty much everything. I know a thing or two about statistics because I study R examples at night. I know a decent chunk of winapi and windows ipc because of delphi, I know some CV because I studied samples from opencv and tried to solve problems with it, etc.

I learned mostly from MATLAB documentation. Good if you want theory and implementation, and most capabilities can be done with open source equivalents if you don't have MATLAB.


coursera has some; I typically have locally the images-2012 one, but you also have things like dsp-001 which is a bit more advanced. Generally speaking, coursera has some good material in many related domains.

Here is a guide I have developed over 6 years when I dove into copmuter vision around 2011. My path has been self taught until recently I took a graduate course.

I started from wanting to develop AR apps during my undergrad, Here are the best resources I have found to date:

Computer Vision is very theoretical and experimental, so the more hands on, the better! My approach has been to go top-down, overview the landscape and slowly progress deeper.

Begin with the best library for CV in my opinion: OpenCV. The tutorials are amazing!

Python tutorials: http://docs.opencv.org/3.0-beta/doc/py_tutorials/py_tutorial...

C++ tutorials: http://docs.opencv.org/3.0-beta/doc/tutorials/tutorials.html

Immerse yourself in these and build any apps you think of!

Then go into: pyimagesearch tutorials http://www.pyimagesearch.com/ and aishack.in http://aishack.in/,

tons of great tutorials to learn different topics of vision with coding walkthroughs. Understand the examples and rewrite applications.

Then Dive Deep:

Get the new OpenCV3 book, a nice deep overview of many topics in computer vision. https://www.amazon.com/Learning-OpenCV-Computer-Vision-Libra...

And watch this course on youtube:


I feel like then, you will have so much exposure that when you dive into formal classes and textbooks, you will really understand and be enlightened.

This was the general way I learned computer vision, and recently I completed a cv internship for nanit.com . I was not hired for my formal knowledge, but they were impressed by all the various projects ive done and knowledge I had on many vision topics.

I also recently took a formal course of vision at Cornell: http://www.cs.cornell.edu/courses/cs5670/2017sp/

All the assignments have starter code in python and opencv. This was an amazing class as it dove deep into 3D computer vision, which is so relevant to augmented reality!

Also, here is a link opencv examples for iOS: https://github.com/Itseez/opencv_for_ios_book_samples

here are links for opencv example for Android: https://web.stanford.edu/class/ee368/Android/

Hope this helps! Shoot me a dm if you or anyone has more questions!

Opencv documentation

Tensorflow tutorial

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact