
Ask HN: What are the best resources to learn computer vision? - ameyades
Ideally, I would like to be good enough to get a job at an AI&#x2F;robotics startup. I already have a CS degree, a decent math background, and am working as an embedded software developer for a large company.
======
stared
For a full course, Nothing beats CS231n: Convolutional Neural Networks for
Visual Recognition [http://cs231n.stanford.edu/](http://cs231n.stanford.edu/)
by Andrej Karpathy et al.

Also, for a general and high level introduction to neural networks, I wrote a
Learning Deep Learning in Keras [http://p.migdal.pl/2017/04/30/teaching-deep-
learning.html](http://p.migdal.pl/2017/04/30/teaching-deep-learning.html),
focusing on visual tasks.

~~~
deepGem
I did this course, but couldn't finish all the assignments. Loved it. Please
note that this is a convolutional neural networks course, not computer vision
as such. From what I know computer vision encompasses a variety of non machine
learning based algorithms, which are not covered in this course.

~~~
speedplane
As an engineer, it's difficult to know when to use deep learning and when to
use more classical algorithms. Often, you have to try both and see which is
better (twice the work, hooray!). The classical algorithms often are very
understandable, and you can reason with what's going on and figure out what is
breaking. Deep learning is so much harder, e.g., are my hyper parameters bad,
or do I need another 30 GPUs running for a week?

~~~
amelius
Imho, deep learning has little to do with engineering, and more with guessing,
hoping and praying. But it seems you can often get something to work if you do
those three things hard enough.

~~~
deepGem
This has been my experience as well :). Deep learning is a lot of random
guesswork, trial and error. I am almost always in the 'brute force' mode.
However, in this course, you learn more about the fundamentals of convolutions
and backprop. You have to implement your own backprop - not sure of what use
that is, given that it's a one line code in TensorFlow

~~~
speedplane
I watched a great video on Tensorflow (link below). It mostly introduces very
basic deep learning concepts, but there are a few key moments in the 2 hour+
video, where he explains what to do if something goes wrong. It's definitely
not a "science", but with enough experience in deep learning, you can intuit
what's going on inside the black box, and there are best practices on what to
try next.

For example, he goes through a few examples where a neural net has too many
weights, or too little data or improperly connected nodes. All three result in
problems, but the problems exhibit themselves in slightly different ways and
with expertise you can start identifying them.

[https://www.youtube.com/watch?v=vq2nnJ4g6N0](https://www.youtube.com/watch?v=vq2nnJ4g6N0)

------
wyc
This is pretty old school, but I recommend Multiple View Geometry by Hartley
and Zisserman
([http://www.robots.ox.ac.uk/~vgg/hzbook/](http://www.robots.ox.ac.uk/~vgg/hzbook/))
to get through the fundamentals...it's really good to understand the geometric
foundations for the past 4 decades. Along the same lines, you have
Introductory Techniques for 3-D Computer Vision by Trucco and Verri
([https://www.amazon.com/Introductory-
Techniques-3-D-Computer-...](https://www.amazon.com/Introductory-
Techniques-3-D-Computer-Vision/dp/0132611082)), which also goes over the
geometry and the fundamental problems that computer vision algorithms try to
solve. It often does come down to just applying simple geometry; getting good
enough data to run that model is challenging.

If you just throw everything into a neural network, then you won't really
understand the breadth of the problems you're solving, and you'll be therefore
ignorant of the limitations of your hammer. While NNs are incredibly useful, I
think a deep understanding of the core problems is essential to know how to
use NNs effectively in a particular domain.

After getting a grip on those concepts, Szeliski's Computer Vision: Algorithms
and Applications ([http://szeliski.org/Book/](http://szeliski.org/Book/)) had
some really amazing coverage of CV in practice. Mastering OpenCV
([https://www.amazon.com/Mastering-OpenCV-Daniel-Lelis-
Baggio/...](https://www.amazon.com/Mastering-OpenCV-Daniel-Lelis-
Baggio/dp/1786467178)) was very useful when actually implementing some
algorithms.

------
rsp1984
I think the question is a little too unspecific for there to be a good answer.
The field is vast and depending on which thing in computer vision you want to
tackle the best learning paths may vary greatly. Just to give a bit of an
overview:

Before the Deep Learning Craze started in 2011 more classical Machine Learning
techniques were used in CV: Support Vector Machines, Boosting, Decision Trees,
etc..

These were (and still are!) used as a high level component in areas like
recognition, retrieval, segmentation, object tracking.

But there's also a whole field of CV that doesn't require Machine Learning
learning at all (although it can benefit from it in some cases). This is
typically the area of geometrical CV, like SLAM, 3D reconstruction, Structure
from Motion and (Multi-View) Stereo, anything generally where you can write a
(differentiable) model of reality yourself using hand-coded formulas and
heuristics and then use standard solvers to obtain the model parameters given
the data.

Whenever it's too hard to do that (for example trying to recognize many
different things in images) you need a data-driven / machine learning approach
where the computer comes up with the model itself after seeing lots of
training examples.

As for resources the other answers are already giving a great overview. Use
Karpathy's course for an intro to Deep Learning for CV but don't expect it to
be comprehensive in terms of giving you an overview of CV.

Learn OpenCV for more low level, non-ML and generally more "old-school"
Computer Vision.

A personal recommendation of mine is
[http://www.computervisionblog.com/](http://www.computervisionblog.com/) by
Tomasz Malisiewicz. It's an excellent resource if you want to get an overview
of what's happening in the field.

~~~
AndrewKemendo
Great points.

I would argue Kinetic or Geometrical Computer Vision problems, things like
Tracking, Mapping, Reconstruction, Depth Estimation are best suited for the
classical approaches like VO, SFM/MVS, SIFT/SURF, HOG etc... and are a
separate category of CV problems than object
recognition/detection/segmentation - much more capable of being done with ML
because dimensionality is reduced.

 _But there 's also a whole field of CV that doesn't require Machine Learning
learning at all (although it can benefit from it in some cases)._

In fact, Machine Learning has made almost no progress on most of what you
mention, specifically SLAM and Multi-View Stereo. It takes completely
rethinking how those are done when they are approached from the Deep Learning
perspective.

------
rjdagost
A lot of real world computer vision is implemented on embedded devices with
limited computational resources (ARMs, DSPs, etc.) so understanding how a lot
of commonly used algorithms can be efficiently implemented in embedded systems
is important. It is possibly a way for you to jump the gap from "embedded
software developer" to "computer vision engineer". Also keep in mind that in
many companies a "computer vision engineer" is fundamentally a different beast
from a "software developer". A CV engineer creates software but the emphasis
tends to be more on systems and is not 100% about software. This will vary a
lot by company but if you're working with prototype hardware you will need to
get at least a working knowledge of optics.

Fun and trendy though it may be, I would not focus on deep learning /
convolutional neural networks to start off. Deep learning is a small subset of
computer vision. I would focus more on understanding the basics of image
processing, camera projection geometry, how to calibrate cameras, stereo
vision, and machine learning in general (not just deep learning). Working with
OpenCV is a good place to start for all of these topics. Set yourself a
project with tangible goals and get to work.

------
abhinai
I once created a list of 20 problems in CV. If you solve them in order, you
will know basic CV. Here goes:
[https://gist.github.com/abhinai/b6eebecb4d19c57cfb1ee64c2b53...](https://gist.github.com/abhinai/b6eebecb4d19c57cfb1ee64c2b538643)

A good online reference is: [http://opencv-python-
tutroals.readthedocs.io/en/latest/py_tu...](http://opencv-python-
tutroals.readthedocs.io/en/latest/py_tutorials/py_tutorials.html)

------
lightbyte
Surprised nobody has posted [http://course.fast.ai/](http://course.fast.ai/)
yet. I've been following along with it so far for the first 4 lessons and it
has been extremely helpful in understanding how deep learning works from the
perspective of someone who did not have much of any related baseline knowledge
except how to program. Jeremy is an excellent practical teacher.

~~~
thinkMOAR
Happy it is interesting for you.

I too got this url referred by somebody, and I got excited after their
extended intro why, how etc their course different and better then any other.

Though after 5 videos i know nothing more then from any other ML/AI guide on
the internet then i did before. 99% is only related to image classifying, and
i'm simply seeing too many guides for that.

If anybody has some good links/videos on ML/AI on structured data, please
comment and i'll be thankful and happy to click 'm :)

~~~
goxul
The author himself said that DL isn't the best option for structured data.

"Certainly I'd pick DL over more linear models for most problems. But I'd pick
random forests over DL for most structured data problems."

"Deep learning is best for unstructured data, like natural language, images,
audio, etc. it sounds like you may be dealing more with structured data, in
which case the Coursera ML course would be a better option for you"

More discussion here -
[https://www.reddit.com/r/MachineLearning/comments/5jg7b8/p_d...](https://www.reddit.com/r/MachineLearning/comments/5jg7b8/p_deep_learning_for_coders18_hours_of_lessons_for/)

~~~
thinkMOAR
many thanks! much appreciated, will read it over the weekend.

------
sphix0r
OpenCV and [http://www.pyimagesearch.com/](http://www.pyimagesearch.com/)

disclaimer: not related to any of these

~~~
zionsrogue
Adrian here, author of the PyImageSearch blog. Thank you for mentioning it, I
appreciate it. If anyone has any questions about computer vision, deep
learning, or OpenCV, please let me know.

In regards to OPs original question, I'm actually working on solving your very
problem right now. About 1.5 years ago I created the PyImageSearch Gurus
course ([https://www.pyimagesearch.com/pyimagesearch-
gurus/](https://www.pyimagesearch.com/pyimagesearch-gurus/)) with the aim of
bridging academia with actual real-world computer vision problems. The course
has helped readers in their academic careers, such as securing grants
([http://www.pyimagesearch.com/2016/03/14/pyimagesearch-
gurus-...](http://www.pyimagesearch.com/2016/03/14/pyimagesearch-gurus-member-
spotlight-tuomo-hiippala/)) as well as students become practitioners and land
jobs in the CV startup space
([http://www.pyimagesearch.com/2017/06/12/pyimagesearch-
gurus-...](http://www.pyimagesearch.com/2017/06/12/pyimagesearch-gurus-member-
spotlight-saideep-talari/))

Within the next month I'll be launching PyImageJobs which will connect
PyImageSearch readers (especially the Gurus course graduates) with
companies/startups that are looking to hire.

Finally, I'm also working on my upcoming "Deep Learning for Computer Vision
with Python" book ([https://www.pyimagesearch.com/deep-learning-computer-
vision-...](https://www.pyimagesearch.com/deep-learning-computer-vision-
python-book/)) which is now 100% outlined and I'm on to the writing phase.

Definitely take a look and if you have any questions, please let me know or
use the contact form on my website if you want to talk in private.

~~~
nojvek
PyImageSearch is absolutely fantastic. We couldn't have got a top spot in an
AI hackathon without your blog:
[https://www.youtube.com/watch?v=OreCICEcQWY&t=2m45s](https://www.youtube.com/watch?v=OreCICEcQWY&t=2m45s)

Looking forward to your book. Keep up the great work.

------
maffydub
I found Computer Vision: Algorithms and Applications really good. You can
download it for free (for personal use) at
[http://szeliski.org/Book/](http://szeliski.org/Book/).

~~~
alok-g
+1

This is the most comprehensive book I know of on Computer Vision. The diagrams
in the book (including captions) themselves do a great job of explaining
things.

------
fest
I started by getting a webcam or two and trying out various projects: marker
tracking (made an optical IR pass filter and tracked an IR LED with two
cameras), object segmentation (e.g measure geometry of certain-colored
objects).

Measure the speed or count the number of cars passing by your street. Try to
implement an OCR for utility meter. There are lot's of applications you can
train yourself in, and I guarantee that you will learn a ton from each and
every one of them.

------
indescions_2017
Grad-level CV courses, all recently offered:

Princeton CS598F Deep Learning for Graphics and Vision

[https://www.cs.princeton.edu/courses/archive/spring17/cos598...](https://www.cs.princeton.edu/courses/archive/spring17/cos598F/)

Stanford CS331B: Representation Learning in Computer Vision

[http://web.stanford.edu/class/cs331b/](http://web.stanford.edu/class/cs331b/)

UVa CS 6501: Deep Learning for Computer Graphics

[http://www.connellybarnes.com/work/class/2016/deep_learning_...](http://www.connellybarnes.com/work/class/2016/deep_learning_graphics/)

GaTech CS 7476 Advanced Computer Vision

[http://www.cc.gatech.edu/~hays/7476/](http://www.cc.gatech.edu/~hays/7476/)

Berkeley CS294 Understanding Deep Neural Networks

[https://bcourses.berkeley.edu/courses/1453965](https://bcourses.berkeley.edu/courses/1453965)

Washington CSE 590V: Computer vision seminar

[https://courses.cs.washington.edu/courses/cse590v/16au/](https://courses.cs.washington.edu/courses/cse590v/16au/)

UT Austin CS 395T - Deep learning seminar

[http://www.philkr.net/CS395T/](http://www.philkr.net/CS395T/)

Berkeley CS294-43: Visual Object and Activity Recognition

[https://sites.google.com/site/ucbcs29443/](https://sites.google.com/site/ucbcs29443/)

UT Austin CS381V: Visual Recognition

[http://vision.cs.utexas.edu/381V-fall2016/](http://vision.cs.utexas.edu/381V-fall2016/)

And best of luck to you!

------
RobertDeNiro
Suprised no one has mentionned it, but Udacity has a very good very course on
computer vision.

[https://www.udacity.com/course/introduction-to-computer-
visi...](https://www.udacity.com/course/introduction-to-computer-vision--
ud810)

------
mattfrommars
Does anyone know if tech like OpenCV is used at companies developing their own
"computer vision" product, maybe at Tesla? Or do they build their own
technology from scratch which isn't available to public domain? Or do they say
fork OpenCV and build upon it and heavily modify as OpenCV could be seen as
'outdated' technology.

Disclaimer: Never worked with any technology related to Computer Vision, just
a bloodboy beginner Python programmer.

~~~
chrinic726
Used to work on Tesla Vision / Autopilot Vision. They used Caffe, were
switching to Tensorflow when I left, but might be moving to Caffe2 now.

Usually no OpenCV on successful products. Facebook Ads has dedicated research
engineers implementing their real time photo analysis algorithms.

------
trwoway
cs231n by Andrej Karpathy : [http://cs231n.github.io](http://cs231n.github.io)

~~~
hnarayanan
Yes, this course is very good:
[https://youtu.be/2uiulzZxmGg](https://youtu.be/2uiulzZxmGg)

In addition, you can try to work through a serious project related to computer
vision to help you solidify concepts. I worked used Style Transfer as my
motivating example: [https://harishnarayanan.org/writing/artistic-style-
transfer/](https://harishnarayanan.org/writing/artistic-style-transfer/)

------
visarga
You don't specialize in surgery before learning biology. Similarly, you don't
specialize in CV before learning basic ML and DL. The fundamental concepts are
the same no matter if the modality is text, image or video (for example:
regularization, loss, cross validation, bias, variance, activation functions,
KL divergence, embeddings, sparsity - all are non-trivial concepts that can't
be grasped in a few minutes, and are not specific to CV alone).

------
zelon88
PyImageSearch by Adrian Rosebrock.
[http://www.pyimagesearch.com/](http://www.pyimagesearch.com/)

~~~
zionsrogue
Adrian here, author of PyImageSearch. Thanks for mentioning the blog. If
anyone has any questions regarding learning computer vision, please see my
reply to "sphix0r" below.

~~~
Trun_wal
Hey, amazing blogs. Currently working on degraded scanned documents. Are there
algorithms distinguishable for documents and natural images?

I am using open cv to process the documents, curious if I am missing out chunk
of cv algorithms specially for scanned administrative documents
(financial,personal documents)?

~~~
zionsrogue
I'm not sure what you mean by "algorithms distinguishable for documents and
natural images" \-- can you elaborate? OpenCV itself doesn't have builtin
functionality to take documents and fit them to a pre-defined template, that
tends to be part of a specific use-case/niche of computer vision for document
processing. The general idea is to take a document a user has filled out and
"fit" it to a blank template, where you know exactly where each field is. That
way you can exact the information from the document.

~~~
Trun_wal
"The general idea is to take a document a user has filled out and "fit" it to
a blank template" \- I agree point to point. However, I am struggling with
templatization due to poor quality of the document images. To process those
documents (denoise, super resolution, HE - etc. etc.), the OpenCV algorithms
are not working good enough, requires a lot of tuning varying with each
document.

So, I was wondering if those algorithms work better for natural images
(buildings, people, things etc) than document images (text, graphics) and if
so, there must exist algorithms to process such documents I am unaware of.

------
gmiller123456
It's a really broad field, so don't expect to get up to speed very quickly. A
lot of people have recommended a lot of books already, and I could add to that
list. One thing you might think about is Safari Books Online. You'll notice a
lot of the recommended books are there, and even though it's a bit pricey, I
think you'll find you'd save money by the time you get enough of the books
that seem useful to you. You'll also loose nothing by jumping from book to
book because they're too advanced/not advanced enough until you find one
that's at your level.

I would recommend starting with one of the many OpenCV tutorial books, and
maybe work your way through a few of those. Then move into books that cover
more of the algorithms behind the library like "Multiple View Geometry" by
Hartley and "Machine Vision" by Davies, among many others.

------
lauritz
I learned OpenCV using the O'Reilly book by Bradski and Kaehler (back when it
was OpenCV 2). I found it well-structured and it worked for me. They have an
updated version for OpenCV 3.

However, I can't tell you if OpenCV is still the framework of choice and/or
widely used in the field you want to go into.

------
nikivi
We made a mind map for learning computer vision :

[https://learn-anything.xyz/computer-vision](https://learn-
anything.xyz/computer-vision)

Nodes with a map will go to other mind maps with resources. :)

------
supernumerary
There is a processing library with opencv bindings here:
[https://github.com/atduskgreg/opencv-
processing](https://github.com/atduskgreg/opencv-processing)

------
peter_retief
[https://github.com/biometrics/openbr](https://github.com/biometrics/openbr)
these guys are amazing, see what they are doing and you will have a good start

------
KaiserPro
OpenCv is a great start, and is where I began.

this book:
[http://shop.oreilly.com/product/9780596516130.do](http://shop.oreilly.com/product/9780596516130.do)
has a number of worked examples that explain things well.

It does touch on Machine Learning, but it focuses much more on the
fundamentals of computer vision, like feature detection, that allows things
like SLAM to exist.

------
a_d
Is OpenCV (traditional CV technique) better to use or Deep Learning based
approach? Has anyone done a comparison of the two approaches? The obvious flaw
with deep learning is that it requires large labeled data sets - but assuming
that is available, which one is more accurate at object detection (hotdog or
not), detecting features on an image (faces, manufacturing defects)?

~~~
speedplane
For the widely used handwriting dataset, MNIST, there is a site that actively
tracks the best algorithms (although I'm not sure when it was last updated):
[http://rodrigob.github.io/are_we_there_yet/build/classificat...](http://rodrigob.github.io/are_we_there_yet/build/classification_datasets_results.html)

You'll notice that all of the top contenders use Neural Networks, but I would
bet that many of them use at least some traditional CV techniques to transform
the images at various steps. That said, many of the more modern deep learning
approaches are ditching CV altogether, just feeding in raw pixels without any
normalization or transformation, leaving fewer parameters to tweak.

------
madhadron
For classical (non-machine learning) material, there's the nice text by
Ballard and Brown:
[http://homepages.inf.ed.ac.uk/rbf/BOOKS/BANDB/bandb.htm](http://homepages.inf.ed.ac.uk/rbf/BOOKS/BANDB/bandb.htm)

Even if you end up using neural nets, understanding how to think about the
problems is useful.

------
nurettin
Try reading the samples that come with the predominantly used programming
library/framework. Works with pretty much everything. I know a thing or two
about statistics because I study R examples at night. I know a decent chunk of
winapi and windows ipc because of delphi, I know some CV because I studied
samples from opencv and tried to solve problems with it, etc.

------
graeham
I learned mostly from MATLAB documentation. Good if you want theory and
implementation, and most capabilities can be done with open source equivalents
if you don't have MATLAB.

[https://uk.mathworks.com/products/computer-
vision.html](https://uk.mathworks.com/products/computer-vision.html)

------
nycode
This one is pretty good: [https://www.udemy.com/computer-vision-with-
python/](https://www.udemy.com/computer-vision-with-python/)

------
ux
coursera has some; I typically have locally the images-2012 one, but you also
have things like dsp-001 which is a bit more advanced. Generally speaking,
coursera has some good material in many related domains.

------
mendeza
Here is a guide I have developed over 6 years when I dove into copmuter vision
around 2011. My path has been self taught until recently I took a graduate
course.

I started from wanting to develop AR apps during my undergrad, Here are the
best resources I have found to date:

Computer Vision is very theoretical and experimental, so the more hands on,
the better! My approach has been to go top-down, overview the landscape and
slowly progress deeper.

Begin with the best library for CV in my opinion: OpenCV. The tutorials are
amazing!

Python tutorials:
[http://docs.opencv.org/3.0-beta/doc/py_tutorials/py_tutorial...](http://docs.opencv.org/3.0-beta/doc/py_tutorials/py_tutorials.html)

C++ tutorials:
[http://docs.opencv.org/3.0-beta/doc/tutorials/tutorials.html](http://docs.opencv.org/3.0-beta/doc/tutorials/tutorials.html)

Immerse yourself in these and build any apps you think of!

Then go into: pyimagesearch tutorials
[http://www.pyimagesearch.com/](http://www.pyimagesearch.com/) and aishack.in
[http://aishack.in/](http://aishack.in/),

tons of great tutorials to learn different topics of vision with coding
walkthroughs. Understand the examples and rewrite applications.

Then Dive Deep:

Get the new OpenCV3 book, a nice deep overview of many topics in computer
vision. [https://www.amazon.com/Learning-OpenCV-Computer-Vision-
Libra...](https://www.amazon.com/Learning-OpenCV-Computer-Vision-
Library/dp/1491937998/ref=pd_lpo_sbs_14_img_0/130-9452832-4153358?_encoding=UTF8&psc=1&refRID=M57H0VTXQ0WZZE2KGEHC)

And watch this course on youtube:

[https://www.youtube.com/watch?v=skaQfPQFSyY&list=PL4B3F8D4A5...](https://www.youtube.com/watch?v=skaQfPQFSyY&list=PL4B3F8D4A5CAD8DA3)

I feel like then, you will have so much exposure that when you dive into
formal classes and textbooks, you will really understand and be enlightened.

This was the general way I learned computer vision, and recently I completed a
cv internship for nanit.com . I was not hired for my formal knowledge, but
they were impressed by all the various projects ive done and knowledge I had
on many vision topics.

I also recently took a formal course of vision at Cornell:
[http://www.cs.cornell.edu/courses/cs5670/2017sp/](http://www.cs.cornell.edu/courses/cs5670/2017sp/)

All the assignments have starter code in python and opencv. This was an
amazing class as it dove deep into 3D computer vision, which is so relevant to
augmented reality!

Also, here is a link opencv examples for iOS:
[https://github.com/Itseez/opencv_for_ios_book_samples](https://github.com/Itseez/opencv_for_ios_book_samples)

here are links for opencv example for Android:
[https://web.stanford.edu/class/ee368/Android/](https://web.stanford.edu/class/ee368/Android/)

Hope this helps! Shoot me a dm if you or anyone has more questions!

------
rhlala
Opencv documentation

------
tanilama
Tensorflow tutorial

------
mickbig
Buy a book:
[https://www.amazon.com/gp/aw/s/ref=is_s?k=computer+vision](https://www.amazon.com/gp/aw/s/ref=is_s?k=computer+vision)

