Ever since Euclid listed (collected?) his axioms the march towards AI became inevitable. People don't realize how much AI "failures" have contriubted to computing, programming and society in general. Some examples include lisp, functional programming, garbage collection, object oriented programming and Walmart.
In functional languages we have the modern motivations in two parts. Well one really, but one of the sub parts is so big it deserves a section of its own.
Contributions to AI in the form of automated mathematicians is a descendant of Axiomated Set Theory and the desire to automate mathematics in the form of rigorous mechanical proofs. From Robin Milner we have the invention of ML whose actual purpose was as a scripting language to aid in the creation of Automated Proofs ! We also have contributions from AutoMath  which had a lot of modern cutting edge techniques such as dependent types in the late 1960s. All in all the original goal of automatic mathematics and reasoning proved impractical but the tools needed for their development have proven anything but.
Then there is Lisp. Lisp offered to both functional and object oriented techniques. Lisp was inspired by the lambda calculus which was also an offshoot of the whole mechanize maths movement. It was invented by John McCarthy - whose interest was in the old AI notion of symbolic computing - as a turing complete way of representing algorithms. A very old language its influence on all subsequent history is undeniable. From Lisp we see that: "Garbage collection was invented by John McCarthy around 1959 to solve problems in Lisp" . We also notice that the AI researchers had concepts of object orientation in the early 60s  and that the later mixins and multiple inheritance had a root in Lisp. The development of multiple inheritance was also motivated by the needs of the old symbolic reasoning researchers.
As for Walmart? Walmart is a very big user of optimization theory and logistics. Walmart and its ilk also use Data Mining and Rule Learning stuff. And optimization is a lot more important than efficiently routing trucks: OR people are actually laying the foundation of AI and are seeing a lot of their work rediscovered. OR people also contributed dynamics programming for example. This is what I mention in another post about improving synergy between all these very related silos. These topics are essentially subsets of a more general theory that will be pivotal to AI.
I personally think optimization is a very important part of our universe. In a way the principle of Least Action is doing some kind of optimization. I feel that people working in machine learning, in particular the bayesian inference and graphical models part of AI are actually doing a subset of quantum mechanics. Vice versa these techniques will likely prove pivotal in programming quantum systems.
AFAWK (As Far As Wikipedia Knows), operations research as a formal area kicked off in 1937 and had 1000 people working on it in Britain during World War 2. It may in fact be true that some techniques from AI have cross-fertilized but the credibility of the claim that "we owe operations research and supply chain management to AI" is very low.
What's next: Minsky traveled back in time in a LISP Machine and impregnated the Rev. Bayes' mother?
I'm willing to give 'em Prolog (take it, please!) but the confluence of FP and AI might owe more to the fact that AI (especially 'strong AI') was considered a Big Deal early on in computing and as such, ideas that were invented/used contemporaneously tend to be associated with AI.
I never understood the popularity of their AI text - the discussion of topics other than the most basic search methods is uniformly obtuse and the authors hardly ever make any of the important connections with literature outside their "field", e.g. between reinforcement learning and classical control (see for example Russ Tedrake's notes on OCW )
Certainly nothing remotely comparable to SICP exists. From my point of view people who are interested in these topics are much better off getting some intuition for basic problems in scientific computing and then making the requisite connections themselves. I can wholeheartedly recommend the recent book by Gilbert Strang "Computational Science and Engineering"  and companion video lectures for MIT courses 18.085 and 18.086.
Yes, but that's robotics, not AI. Highly advanced robotics can exist without much "intelligence" (however you define that). The golden ring in AI is human-level (or better) intelligence which may or may not require robotics.
It goes the other way too. Reinforcement learning and control theory are also very useful for general machine learning. That's part of the problem - engineers, statisticians and computer scientists keep inventing different views of the same things. Now that people are communicating and consolidating things will start to get even more interesting.
Sadly, it's three editions and AIMA is still hopeless regarding stochastic optimization (genetic algorithms, simulated annealing, ant colony optimization, hill climbing, and the like), and gradient-based optimization.
It seems they don't want to be bothered about the significant difference between search and optimization. So they stick all the stochastic optimization stuff into a section called "local search and optimization", and place it underneath the Search chapter (it's not search, and almost none of it is local). And then separate out optimization methods like gradient descent etc., placing them under "local search in continuous spaces", as if (1) they were search and (2) stochastic optimization wasn't applied to continuous spaces.
And if this wasn't muddled enough, their recommended books for stochastic optimization aren't under the Search chapter at all -- they've been placed under the Machine Learning chapter. And it's a strange collection.
I'm pretty disappointed with AIMA's seemingly poor understanding of this area. Well, I guess at least it's better than their cursory treatment of multiagent systems.
How is optimization not a form of search? It's a search through the solution space which tries to find a global minima or maxima.
Also, of the techniques you list, simulated annealing, hill climbing and gradient-based optimization are all local search methods (genetic algorithms may or may not be, depending on your emphasis on mutation vs recombination.)
I agree with you, most of those methods are local and optimization is a form of search typically used for continuous and usually differentiable spaces. One can also say search is a form of optimization.
But Based on my readings genetic algorithms are more known to be susceptible to local minima, while simulated annealing in theory is a global method.
As for genetic algorithms - for optimization I prefer Differential Evolution. For Exploration maybe genetic algorithms are more fruitfull although I think Genetic programming and Learning Classifier Systems be slept on. Speaking of slept on - another under appreciated but much more used method is stochastic gradient descent. Global optimums might be overrated.
> But Based on my readings genetic algorithms are more known to be susceptible to local minima, while simulated annealing in theory is a global method.
Both simulated annealing and genetic algorithms are global methods, and both can get caught in local minima. The same goes for pretty much every other global stochastic optimization method out there.
Simulated annealing got this "special" reputation among engineers back in the '80s because the Metropolis algorithm had a formal proof of global guarantees -- run it long enough and it's guaranteed to find the global optimum. But nowadays most every algorithm in this category has similar guarantees. It's relatively easy to make such a guarantee as it turns out.
I like differential evolution too, though it's pretty exploitative.
"Local Search" is an unfortunate term -- it's optimization, though the term "local search" is historically common in the hill-climbing community. But I don't think many people would place Genetic Algorithms (GA) in anything other than the "global" category nowadays. Ant Colony Optimization, Particle Swarm Optimization, Genetic Algorithms, Evolution Strategies, Estimation of Distribution Algorithms, Simulated Annealing, etc.: these things are global stochastic optimization techniques.
The terms "search" and "optimization" are often conflated, mostly because "search" was used by two different communities early on to mean two different things back in the 1970s. You still see echoes of it in the optimization community, particularly in single-solution techniques like stochastic hillclimbing, or old terms like "beam search".
So here's how I define them. I believe that "search" typically is applied to problems where you have something to search for. A state or point in space which meets a certain criterion. The most common examples of this are state-space search problems, where the path you took to get there matters, and constraint satisfaction search problems, where the path doesn't matter, only the result. It's also applied in simpler fashions: tree search, binary search, etc.
Optimization, on the other hand, applies to problems where you are trying to find a point which is optimal with regard to some criterion. Unlike search, where it's a find-it-or-not proposition, in optimization you do not necessarily expect to find the globally optimal criterion ever, and in fact you don't necessarily even know that one exists (it could just keep going up and up). You're just trying to find the best thing you can given the time and resources you have.
Optimization algorithms are usually more general, and weaker, than search algorithms. Thus you can typically cast a search problem into an optimization problem -- for example, use your heuristic as the optimization criterion -- but optimization algorithms were designed for a more general class of problems and thus are expected to be outperformed by a custom-designed search algorithm (this fact is a general rule of thumb for pretty much everything in optimization, particularly stochastic optimization -- it's a last-ditch approach). Similarly you can use an optimization algorithm to find a decision tree solution: but I'd get smacked around by colleagues in the machine learning community for calling PSO a machine learning algorithm.
So interesting question: is Alpha-Beta Pruning a search algorithm or an optimization algorithm?
At any rate -- Russell and Norvig are entirely muddled about this. They admit (p. 121) exactly what I state above, yet go on to lump the two together without any further explanation, and then make the strange case that local search is global if it is "complete" -- a very logician way of stating it, but not a term in the field so far as I know.
Very well explained. Thanks. And its how I understand it. I see search as having to do with combinatorial exploration and optimization as having to do with exploring vector spaces. You switch to heuristics in search when the space is too big or complex. But I also think that for suitable generalizations of the term one can view search as optimization or vice versa for reasoning purposes. I think there is something philosophical to be said about the fact that metaheuristics work as well as they do.
Isn't it somewhat historic? Lisp was invented by McCarthy, who also coined the term AI. It was perhaps the language AI researchers were acquainted by was Lisp, and it remained a tradition. (There are technical reasons too, but I'm not that qualified to list them. Here are some reasons: http://stackoverflow.com/questions/130475/why-is-lisp-used-f... )
Partially historic, yes. It was faster/easier for students to write their AI programs in lisp than in any other languages for a long time. Between functional programming, an REPL, and macros, you could find yourself doing a lot with a little.
Prolog is also partially historic, but it has the added benefit of being logic-based, which is the direction that AI focused on for several years. Around that time, it was believed that AI could be done with pure symbolic logic, and that's exactly how programming in Prolog works. This approach eventually turned out not to work very well, but Prolog is still used in some places because it's a very easy language for interacting with graphs and decision trees (which are big things in AI).
Lisp and Prolog are still huge in AI, at least in academia, or at least at my university. For instance: ICARUS , CCalc , and answer set programming solvers , all of which are part of active and recent research, use Lisp or Prolog.
Prolog was and still is used precisely because it is so (relatively) easy to specify some facts and behaviors as Horn clauses , which is important, because it is one of the few places I ever hear the phrase "solvable in polynomial time" in KRR.