Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Best books on AI?
375 points by mavmak on March 18, 2017 | hide | past | favorite | 91 comments
Which are the best books on AI, for somoone who doesnt have much knowledge on the topic. Mostly practical, and not academic.

You cannot really go practical in AI without academic rigor. You can do recipes of what's already been done using a TensorFlow book, but that's how far one can go. If one is serious in getting in AI today, a great way to do is read the following books, in order:

1. AI: A Modern Approach by Stuart Russell and Peter Norvig.

2. Deep Learning by Ian Goodfellow and Yoshua Bengio.

It is amazing how approachable both books are for beginners, but you will be diving a lot into academic stuff as you go along.

I love Paradigms of Artificial Intelligence by Peter Norvig much more than AIMA, which I found excessively encyclopedic and shallow.

While some will argue it is dated, I think it presents many timeless ideas that will get in vogue soon with little tweaks to their inference schemes.

Same for The Art of Prolog.

It's Paradigms of Artificial Intelligence Programming (PAIP) http://norvig.com/paip.html

AIMA provides better introduction for wider area of subjects but PAIP is one of most elegant and timeless books for both programming and old school AI.

Why not both?

If you're shipping anything after 2010, you're not going to get within an order of magnitude within state of the art with that book (PAI), unfortunately.

There's basically no numerics in that book about anything that'll past muster at NIPS or ICML nowadays or would be shipped by one of the big corporate AI labs, I'm sorry to say.

PAIP is one of my favorite books ever, but taken as a book about the craft of programming, not about AI. AI has grown, and the broadness of AIMA matches the subject. (It does need another update, and I heard they're working on one.)

PAIP is also very, very high on my list as well. I'm very pro-lisp and still develop new projects in lisp (CL, Scheme) and promote it when/where I can. But the person who picks up PAIP wanting to learn AI might not necessarily want to worry about picking up lisp programming skills at the same time (nor is learning lisp strictly necessary). This is why AIMA is still the best option, IMO, because it employs a language agnostic approach.

PAIP is awesome. Will make you a better thinker and developers. A truly hidden gem.

Art of Prolog was over of the best books I've read. As Prolog itself is one of my favorite languages. I wish there were more of both.

CTM by Van Roy & Haridi?

I will check it out. Thanks.

I had the feeling it was more of a Common Lisp book than AI book.

It's both, but the AI coversge is quite out of date.

Sutton & Barto's Reinforment Learning complete this triumvirate


David Silver's Reinforcement Course is based on Sutton & Barto


I liked all three of them. I also feel Murphy's MLAPP provides additional useful material and is well written.

For whatever reason, Sutton's was the first serious book I read in any area of AI. The balance between explaining the history, the concepts and the code is handled really well.

I have both, and I agree these are great. I really appreciate the style of the Goodfellow book, it's very approachable.

Some on this thread have recommended Norvig's PAIP, but that's kind of an old school AI book in that it focuses on heuristic search and logic (implementing prolog in lisp at one point, very impressive stuff actually); but is lacking any coverage of statistical machine learning, which is the approach that underlies most of the cool stuff these days. It's still a great book, but I'd instead recommend a path that focuses on machine learning:

- The Master Algorithm: made for a general audience, gives you a lay of the land

- Python Machine Learning by Sebastian Raschka: gives you practical skills using python, scikit-learn, numpy, jupyter notebooks, pandas etc. From zero to kaggle in 4 chapters, goes deeper after that. Also goes into enough theory you aren't flying completely blind.

After that, I'm afraid I think you do need to go "academic", if by that you mean learning some of the underlying math to approach AI / ML from a more rigorous probabilistic perspective. I'd recommend studying probability theory and then working your way through Bishop's Pattern Recognition and Machine Learning. After that a lot more doors open up too more specialized topics like computer vision, reinforcement learning etc.

I've written up a lot more about this here:


You don't need to focus on just one side or the other. AlphaGo wasn't made by just hacking together a couple neural nets, it built on heuristic search, MCTS, and the "old-school" AI.

I agree that basic statistics + Bishop's book is a great way to start getting into machine learning -- but AI is a much broader field than that.

"old-school" AI is just a pretentious name for algorithms/graph theory/combinatorics, plus a bit of (now very outdated) PLT mixed in.

"new-school" AI (machine learning) is just a more pretentious name for statistics/control theory/randomized algorithms.

What is to say the human brain doesn't engage in graph combinotrics with some learned heuristics.

Some of the old ideas with enough computation power are actually pretty amazing.

Because the human brain isn't a symbolic machine. It's a living organism made up of cells trying to survive and reproduce in the real world. All our symbol manipulation capabilities are built up on older abilities based on feeling, experiencing, socializing, and moving about.

Hi Karl, I just read through your very intersting post about the sabbatical. I 've been thinking about this a while too. I see that you are currently in a A.I role at umich, I was wondering you have written anything about how you went about actually landing a role in this particular field.

Thank you.

Long story short, I became aware of the position through networking in the tech/startup community in Ann Arbor, and eventually got introduced to an ML prof who knows the profs who are actually running the lab. I applied to the position with a resume and cover letter customized to relevance of the role (compressed a lot of startup / product / leadership stuff down, went into more details on tech stuff, emphasized ML studies, excitement for mission of their lab) and met with the profs a few times and it worked out.

If I can extract any advice from this, it's that putting yourself out there and letting the world / your network know that you are interested in something or working towards something (in my case: a transition to a career applying ML), things might turn up. Also: if you have more experience you should feel comfortable completely customizing your resume to the role so you have one page jam packed with relevance; it's ok if they don't see (or won't care) about some of your experience.

I also notice that in some of your previous HN discussion you lamenting companies not really looking at your open source work; it's annoying that they wouldn't take the chance to look. But if I were you I might highlight specific projects on your resume relevant to the role you are applying to if you haven't been doing this already; this could elevate your open source work to job experience in its emphasis. Assume 99% of people will only see your resume, everything else should be supporting resources in case they get interested enough to look (or wish to validate your claims).

+1 to Python Machine Learning, I feel like some of artificial intelligence is understanding a lot of finer (but critical) mathematical nuances, but some of it is just getting your hands dirty, and the second is definitely more accessible for somebody who is starting to learn.

First, read fucking Hastie, Tibshirani, and whoever. Chapters 1-4 and 7. If you don't understand it, keep reading it until you do.

You can read the rest of the book if you want. You probably should, but I'll assume you know all of it.

Take Andrew Ng's Coursera. Do all the exercises in Matlab and python and R. Make sure you get the same answers with all of them.

Now forget all of that and read the deep learning book. Put tensorflow or torch on a Linux box and run examples until you get it. Do stuff with CNNs and RNNs and just feed forward NNs.

Once you do all of that, go on arXiv and read the most recent useful papers. The literature changes every few months, so keep up. There. Now you can probably be hired most places. If you need resume filler, so some Kaggle competitions. If you have debugging questions, use StackOverflow. If you have math questions, read more. If you have life questions, I have no idea.

source: fizixer https://news.ycombinator.com/item?id=13890952

FWIW, a "super harsh" guide to (learning) ML [1] was posted on reddit a few days ago.

[1] https://redd.it/5z8110

Edit: The entire Reddit discussion feels slightly similar to this one, if more snarky. The first reply there also links all the resources listed above. I don't really know enough to add anything.

Statistical learning is only one part of AI. If the person wants to learn about AI, wouldn't starting with something like "AI: A modern approach" he a better start?

I would add one more item to the list: replicate the results of at least a few "classic" deep learning papers from scratch in one of the popular frameworks (TensorFlow, Torch, Caffe, etc.), instead of downloading code written by others. For example, build and train Alexnet or one of the VGGnets, a Word2Vec model, an image captioning model (joint CNN and LSTM RNN), and a pong- or breakout-playing AI (CNN with reinforcement learning). It's possible to do all of this on a single machine with a relatively inexpensive GPU.

To iterate on what others said, but what was not emphasized enough from my point of few:

AI is academic (as a synonym for 'theoretical' and 'math-intensive'). Once you look beyond purely symbolic AI, which proved to be infeasible as @curuinor pointed out somewhere here, you will need to build up at least basic knowledge in probability theory and linear algebra.

The path I'm following at the moment is a quite rigorous one and is outlined here (http://www.deeplearningweekly.com/pages/open_source_deep_lea...).

If you've never had any exposure to probability theory or statistics, I recommend having a look at the course "MIT 6.041 Probabilistic Systems Analysis and Applied Probability" taught by John Tsitsiklis at MIT (video lectures are available through YouTube and MIT OpenCourseWare for free). Both the course and Tsitsiklis' book are superb learning materials to get into probabilisitc thinking.

Edit: Link was broken. Thanks to @blauditore.

Strang's class is very pretty and excellent and just a little bit off from the center of the sorts of linear algebra used in machine learning. Not a lot off, but a little off.

A field that does inspire a lot of deep learning folks and never gets mentiond in this sort of thing is the theory of physical dynamical systems. Attractor is a term that came from here, for example, and much of the mathematics behind the numerical fuckery behind deep nets is dynamical in nature. RNN's are entirely dynamical systems. Classic there is Strogatz book (https://www.amazon.com/Nonlinear-Dynamics-Chaos-Applications...).

There is also information theory, of course, which is part of the MacKay source.

Many of the earlier papers in deep learning-land are really nontrivial to read, because the terminology and worldview of everybody has changed so much. So reading original Werbos or Rumelhart is really difficult. This is really not the case for Sutton and Barto, "RL: An Introduction" (http://webdocs.cs.ualberta.ca/~sutton/book/the-book.html). Two editions, apparently the second edition is basically getting with the program on shoving DL into everything.

Schmidhuber often mentions that Gauss was the original shallow learner. This is a technically correct statement (best kind of statement), but you definitely should probably know linear and logistic regression like the back of your hand before starting on DL too much.

To preface, I'm currently learning several disciplines in tandem along a route suggested by the link, so kudos to them for putting together a solid list of resources.

Now, from the link: "Few universities offer an education that is on par with what you can find online these days. The people pioneering the field from industry and academia so openly and competently share their knowledge that the best curriculum is an open source one."

On the one hand, it is true there are a ton of resources where the largest cost is the time it takes to go through the learning process. And I'm awestruck that research papers are so openly available and practitioners are so willing to share their knowledge to others both in posting their books as PDFs/HTML files and creating online courses.

On the other hand, how feasible is it for an individual to work on notable AI companies/projects without a Masters or PhD in a related field? Can that gap be crossed merely by becoming fluent in the various disciplines involved in AI, before contributing non-formally academic research/experiments you've conducted on your own?

The Google Brain Residency is a cool program for non-academics to get into deep learning research, and you can always get into AI on the applications side, but in both cases you're going to have to really try.

Your link is broken, leading to http.com - this is the correct one: http://www.deeplearningweekly.com/pages/open_source_deep_lea...

I think to be successful at machine learning you also need a good understanding of calculus, besides probability and linear algebra.

What is too much though? Backpropagation uses derivatives, some filters in Computer Vision use multivariate calculus. If you want to have a thorough understanding then calculus is necessary. That said, Andrew Ng was quite good at avoiding calculus in his Machine Learning MOOC, and for applied machine learning I guess calculus is not that important.

A great place to study about math is www.khanacademy.org, they have courses on calculus, probability/statistics and linear algebra.

Strang's complaint is that there's too little linear algebra. This is true. This doesn't overshadow the fact that you're not going to get out of using some partial derivatives in neural net land (and many other AI subfields).

Though it is an academic book, Artificial Intelligence: A Modern Approach by Stuart Russell and Peter Norvig.

The first chapter in the book provides a detailed analysis of how other disciplines contribute to the idea of AI - from Philosophy to Psychology, Biology to Computer Science. Makes for an interesting read, even for a non-tech reader.


If you're also looking for a course that goes alongside the book, I highly recommend UC Berkley's CS188 (you can find it at http://ai.berkeley.edu).

The lecturer Pieter Abbeel does such a good job explaining stuff and the programming exercises are really neat.

Edit: Formatting

The course is also quite easy to follow without buying the book. I love the exercises in which you are programming an intelligent agent to move through a maze. It reminded me of how we learned programming in university using Karel The Robot.

This alongside Andrew Ng's Machine Learning course was my first exposure to the field. https://www.coursera.org/learn/machine-learning

I can also recommend Sebastian Thrun's Artificial Ingelligence for Robotics course: https://www.udacity.com/course/artificial-intelligence-for-r...

I wouldn't exaggerate the "detailed analysis" here. For example, I found the philosophical parts quite weak and superficial.

Get your feet wet:

* http://cs231n.stanford.edu/ (the course notes are excellent)

* http://neuralnetworksanddeeplearning.com/

* http://www.deeplearningbook.org/

Use Tensorflow to train a few small neural nets. Move on to CNNs and RNNs. Make sure you actually do this. By this point you'll have read a lot, and retain none of it if you don't put it to use. Look at reinforcement learning. Use the book by Sutton and Barto, the new edition: https://webdocs.cs.ualberta.ca/~sutton/book/the-book-2nd.htm... Read the first 4-5 chapters, then go online and read about Deep Q learning, policy gradients, DDPG, etc. Then try to solve some problems on OpenAI Gym.

Once you have an idea of the kinds of problems you can solve, and have a couple you're interested in, go back and learn the foundational math, and start reading research papers.

In general, start with modern books that mention deep learning. With older books or high-level-overview books, you'll get frustrated when you see something cool on /r/machinelearning and can't find any mention of it in the book.

AI is not a field where the practitioners can safely ignore the academic. A huge number of people got into rabbit-holes in its history and basically failed completely and wasted decades of their lives, in some cases.

Anecdotal. Do you have specific records of such failures?

Well, it's not very nice to talk about people spending decades on failure, but I guess I can give some examples.

Cycorp still exists, from 1984. However, D. Lenat's approach to AI by ontology engineering has basically been completely infertile after about the early 90's.

Feigenbaum's expert systems stuff was a basic bust, it led to the Japanese just throwing that stuff away. People spent an incredible amount of effort and time systematizing expert knowledge and making expert systems and it was not a happy time. Much of that knowledge went into probabilistic forms, culminating in the Bayes net. The most famous application of Bayes net: Clippy (there are a lot more successful applications, but still...)

It was believed shortly after the AI conference that computer vision could be solved by a summer project in the 50's. That didn't happen.

More failures.

Minsky and Papert gave a criticism on single-layer perceptrons in '86 where they proved that they could only make linear discriminators and therefore were useless for any real practical purposes harder than the XOR problem. They were wrong, given that we call multi-layer perceptrons neural networks.

Simon and Newell made their model and thought that models like theirs with production rules would point the way towards the way that humans could systematize thought. That didn't happen, although they had some cool papers.

People saw ELIZA and SHRDLU and thought that good NLP was coming in only a decade or so.... in the 60's.

Beveridge report. "The spirit was willing, but the flesh was weak" to "The vodka was good, but the meat was rotten." (that last one's a bit apocryphal, but still)

Less symbolic failures.

There was a huge and abiding torrent of neural net stuff that dealt with evolving topologies in late 90's. I see very little of it in any way shape or form in industry or academia today, because it's a lot of computation for basically no gain.

They thought that layerwise pretraining of neural nets was the way to go in 2006, before they realized that initializations, normalization, and better activations was the better way.

A disgusting amount of why Watson won Jeopardy was because it could buzz faster than Jennings and Rutter. Ain't that nice?

Lisp machines (ok that's symbolic again).

A nitpick on the buzzer point:

The skill-cap in Jeopardy is sort of low. The top players can all answer almost all questions, so victory comes down to the buzzer even between Jennings and Rutter.

The important thing is that Watson hit that skill cap. From there it wins on tie-breaks every time. I think we'll see this dynamic in many human/AI contests. If both competitors' skills are at the saturation point, the contest is decided either by luck, or some strategically unsatisfying thing like diligence or mechanics. I don't see why humans will ever have an advantage at this.

Is pretraining really all that much of a failure? I haven't really found an authoritative answer on whether or not pretraining is worth it these days. Hinton's 2012(?) Coursera course still focuses pretty deeply on generative/layer-by-layer pretraining with RBMs but I'm just not really sure if that's fallen by the wayside today. Or maybe it's still useful only in specific circumstances?

Saxe Ganguli McClelland, 2013, about linear nets and orthogonal initialization. But then, read Li Jiao Han Weissman 2017 (maybe preprint), "Demystifying ResNet", which makes a nice claim about the niceness being conditioning of Hessian at init.

Tldr: it's good conditioner but you can do better ab initio

Interesting, I will check these out!

AlphaGo relied heavily on (supervised) pretraining, and that seemed fairly successful.

I still wonder if the many who spent years on obtaining the Loebner would be considered a "failure" in this domain ... https://en.wikipedia.org/wiki/Loebner_Prize - I think no matter how much you read about AI or look into the rabbit hole you will always end up in some type of Chinese room argument ... https://en.wikipedia.org/wiki/Chinese_room

After listening to him for a bit, reading some of his books, annoying him a fair amount, I think that my opinion of J Searle is that he doesn't know jack shit about AI.

Gedankenexperiment as a methodology has had considerable success in physics and miserable, complete, ridiculous, awful failure in psychology and cognitive science.

Superintelligence by nick bostrom. It's a book that explores how superintelligence could emerge, the different ways it can take off and what it means to us as humans. More importantly, the book takes on the difficult task of figuring out ways to make sure the AI is safe and not land up in the wrong hands. Pretty interesting read that doesnt really require technical know-how.

There is lots of content in this book but there is no practical technical content in this book. Interesting philosophy.

Much of AI philosophy is done by extremely non-practitioners. John Searle can't code. Nick Bostrom came to coding extremely late in life. Geoffrey Hinton and the other ex-PDP folks wrote some philosophy papers, though, which are of interest if you like the philosophy.

The book is interesting, but the level of detail he goes into in some of his speculation is completely unjustified. It's kind of ridiculous.

"There's no sense in being precise when you don't even know what you're talking about." - von Neumann

If you want to get into actually implementing deep neural nets (what everyone is calling "AI" these days), then look at TensorFlow and Keras.

This is a good getting started book for TensorFlow:


The Quest for Artificial Intelligence: A History of Ideas and Achievements by Nils Nilsson

Gives a great run through of the history of AI research. Understanding the approaches that have been tried before gives you a sense of why the state of the field is what it is today. It is worth bearing in mind that AI research expands far beyond computer science into psychology, philosophy, linguistics etc.

I think there is a pdf draft of the book legally available from http://ai.stanford.edu/~nilsson/QAI/qai.pdf

"Python for Data Science For Dummies" by Luca Massaron & John Paul Mueller is a very practical book on machine learning. In a real world scenario your first obstacles will be learning how to use the programming language as well as preparing data. Both topics are well covered in this book. The "learning from data" chapter contains an introduction into basic machine learning algorithms as well as ensemble learning. The book contains minor inaccuracies but I think it's a good, practical start for a novice. It doesn't include anything on neural nets however.

Try this site: http://www.allitebooks.com/?s=artificial And the books are free and downloadable :)

You know, I have never found a "casual" book covering tree-search techniques, from minimax to Monte Carlo Tree Search. Still relevant for game/agent AI (AlphaGo used MCTS for example).

You'd think there would have been 100 "How To Make a Computer Chess Engine in BASIC" books back in the 80s, and continuing to the present day, but I can't find them. Lots of papers and online tutorials, and some stuff in textbooks, but no accessible hands-on books.

Graphical models might also be something folks might want to consider. Ideas from PGMs are often behind many advances in ML.

The canonical text is by Daphne Koller; a course I took used Martin Wainwright's monograph though - the book is briefer and dives into the math quicker.


Firstly, I don't think you can dive straight into coding without understanding the fundamentals. AI is such a broad and rich field, and there's a lot you need to know before you start.

It also depends on what you're going to focus on. Are you looking to implement a game-playing agent? An object recognition algorithm? More of a logic focus?

If you just want Deep Learning and statistical methods, then Bishop's Pattern Recognition and Machine Learning is a good start. Otherwise, Russel and Norvig's Artificial Intelligence or Patrick Winston's similarly titled book are great starting points. For more big-picture stuff,

Marvin Minsky's Society of Mind is great, and Hofstader's Gödel, Escher, Bach is a classic too. Both are a lot less practical though, which seems to be what you're looking for.

A good suggestion seems to be not to read a lot of books, but to put intro practice what you read and apply it to problems you need to solve, that way you learn a lot more effectively

David Rosenberg's Machine Learning course is an excellent intro. Will give you the foundations that you need for anything else. Has a few links for additional resources, but the slides are mostly sufficient. https://davidrosenberg.github.io/ml2015

Sometimes its best to answer a question with a question.

Are you simply curious or is there something more pressing? For example, do you want some light reading or have you perhaps been asked to implement machine learning for your company?

Most answers here assume you want to jump into the ML swamp and start analyzing your trove of "big data" ASAP. But is that so?

I want to have a basic understanding on the topic, and implement something small, maybe a side project.

If you are developing AI for games, then Programming Game AI By Example by Mat Buckland are one of the best book for that : https://www.amazon.com/Programming-Example-Wordware-Develope...

I can also vouch for this, it's a great introduction to AI in Games, however it is just an introduction as the topic is very broad.

AI: A Modern Approach (Russell, Norvig) and Deep Learning (Goodfellow, Bengio) have been mentioned already.

I'd also recommend:

Godel, Escher Bach: an Eternal Golden Braid

by Douglas Hofstadter. Might not be exactly what you're looking for (it's all over the place, touching music theory, math, art, philosophy...), but it's fun and enjoyable to read. Also very dense.

Once you get your feet wet, the first year PHD course book for good theory is Pattern Recognition and Machine Learning by Bishop.

But it could be a bit too theoretical - it provides a foundational mathematical framework and got me thinking about problems in a better way.

Paradigms of Artificial Intelligence Programming (PAIP)

One of the best books on AI and Programming ever.

Just to inform others, PAIP is not just an AI book, it is also a book on learning Common Lisp from scratch, and uses AI for the domain examples.

Unfortunately, there is not a drop of numerics in that book. It's a good book for learning about symbolic AI. There is, to a solid first order approximation, zero symbolic AI in a system like, say, Google speech recognition.

Not surprisingly, I would say, since I wouldn't count speech recognition as an AI task.

Nothing is counted as an AI task after someone finally manages to do it.

I anticipated that argument, but I wouldn't have considered ASR an AI task even before it was first attempted.

Any particular reason why? I feel that the problem of mapping sensory inputs like sound etc onto internal concepts for reasoning is an important part of AI.

My personal reason is that I don't consider the output of speech recognition to be "internal concepts for reasoning". In a Spoken Dialog System, this task is typically performed by a subsequent component that does natural language understanding (rather than recognition).

The challenge I've found with books is the space has been moving so quickly in the past 10 years. By the time the book is out, the methods described in it are no longer state of the art.

That's why it's important to understand the underlying math.

I'd spend my money on these two:

- The Emotion Machine by Minsky

- Superintelligence: Paths, Dangers, Strategies by Nick Bostrom

Any recommendations for a book focused on AI in game development?

I've got a book called "Game Programming Patterns" by Robert Nystrom which goes into this. It's not advanced, but it's very good.

Bishop, Hastie, Shai Shalev-Shwartz

The Second Machine Age

Just watch the movie with Jude Law. We're close, we're so close!

The society of mind and the diamond age !

Imagine you want to learn the Roman alphabet. Which book should you use? Any book. There are so many good books and courses that it's almost useless to select. Don't worry about selecting the initial book, just use any course or book in the beginning, and later when you will know exactly what fits your needs, you will be able to fine tune.

Applications are open for YC Winter 2022

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact