I wrote an opinion piece a while ago about why Prolog had fallen out of favor (http://www.kmjn.org/notes/prolog_lost_steam.html), and one of the reasons was that other declarative programming approaches, like forward-chaining, SQL w/ recursive queries, and LINQ-style dataflow variables, have bitten off a simpler but useful subset of what it bought you. In response, Mark Proctor from Drools argued that they're well on their way to importing nearly everything important from Prolog, rather than only a useful subset, including backwards chaining with unification: http://blog.athico.com/2011/04/backward-chaining-emerges-in-...
This includes a very interesting mashup of the two ideas, "reactive derivation", which does Prolog-style queries, but where the results, like with truth-maintenance systems or SQL "views", update in real time when the underlying data changes: http://blog.athico.com/2011/06/truth-maintenance-over-direct...
Imo, the simultaneous use of Prolog-style derivation and real-time forward-chaining style truth maintenance is both really interesting and a bit mind-bending, so learning Prolog first might be a good foundation.
Allegro Prolog  is based on Norvig's implementation, although I'm sure they've done a fair bit of optimizing.
I have over a half a dozen Prolog books, my favorite being "The Art of Prolog." Bratko's books, "Prolog and Natural Language Analysis," and "Natural Language Processing in Prolog" are also favorites. I started decades ago with C&M - also a great start.
It has been over 5 years since anyone has hired me to do any Prolog development. I am not sure how widely used it is now.