Why is he interested in programming? It must seem cool to him for some reason. I'd try to tap into that in order to find a direction to go in. And for a 14 year old, it would be helpful to find some heroes. Without the cool and the heroes, it may be hard to devote the time for it to be impactful.
Yes. Continuing beyond monads to learn about category theory is mostly because it is interesting. It is definitely not a prerequisite.
The historical relationship is that Eugenio Moggi wrote a paper in 1988 "Computational lambda-calculus and monads" http://bit.ly/1x9uUHQ that used ideas from category theory to handle the semantics of programming language features. His problem was that the lambda calculus itself failed to adequately handle certain "notions of computation" such as side effects and non-determinism. This inadequacy goes way back to the origins of programming language semantics, starting with Peter Landin's 1964 paper "Mechanical evaluation of expressions" http://bit.ly/1rrBit3. He described a virtual machine (SECD) for evaluating lambda expressions, but had to add special constructs for handling assignment and jumps. In case you don't know what semantics is about, imagine that you had two small programs written in two different languages. How would you determine if they were the same? One way would be to demonstrate their equivalence to a common, mathematically sound, language. That's what Landin used the lambda calculus for. But it's only good for a subset of what we understand computation to be (particalurly what a Von Neumann computer is capable of doing).
That's what prompted Moggi to look to a more encompassing branch of mathematics -- category theory -- to describe semantics. In 1991, he wrote "Notions of computation and monads" http://bit.ly/1viXT6z, which is much shorter and more accessible, but essentially the same as the previous paper. Philip Wadler, one of the original authors of Haskell and long-time functional programming researcher and educator, was inspired by Moggi's work to write (several times) a tutorial showing how Moggi's "monads" could be applied to the restrictions of functional purity. For example, to simulate the incrementing of a global variable representing a counter (of some operation), one could use a "state" monad.
The monad of Wadler (and Moggi) is really pretty simple. First, you need a way of representing one of these notions of computation. In a programming language, this is done with types. Or to be more precise, a 'type constructor' since you will want the computation to be general. For instance, if your computation changes state (eg the counter above), you might want to construct a function type that given a value, takes a state (eg integer representing the counter variable) and returns a tuple consisting of the value and some new state. I should point out that my last sentence is what makes monads so confusing and challenging in languages other than Haskell. They don't have a nice way to represent what I just said. To use Wadler's notation:
type T x = S -> (x, S)
The second component of a monad is a function that takes a value and returns a "computation" (ie, the type described above). And the third component is a function that takes one of the computations and a function from a value to a computation and returns a computation. That's a mouthful, and it requires the language to support polymorphic types and higher-order functions. If your language does not or doesn't provide good syntax for them, then monads will be an elusive and difficult concept. But as you can see, a monad is just a pattern made up of these three things. That's all ... almost.
Because of the relationship to category theory, a monad (consisting of the three components) must also obey certain rules for how these operations are expected to behave when combined in certain ways.
In fact, monads are pretty cumbersome to implement, especially when there isn't good syntactical support in a language. But they provide a general solution to certain kinds of problems which is functional in nature.
Seconded. To elaborate, it is essentially a lightly edited transcription of all the lectures of an undergraduate introduction to categories course, complete with diagrams, exercises, and dialogue, and even including students' answers (right and wrong!) to review questions. It's a very unintimidating format, and the book does very well to help a beginner get firm footing with the foundations.
Since I have painstakingly gone through the series of proofs in Landau's book, I feel I need to weigh in here. On the surface, this book does appear to start from sets and convincingly proceed up to real numbers etc. It's interesting that you include the quote from the beginning about "high-school mathematics", which I think is laughable.
I personally believe that Landau was caught up in the spirit of the times and optimistically believed that math could be built up from "first principles". The famous kickstart to this is Hilbert's 1900 presentation. And it certainly continued up through Nicolas Bourbaki.
In fact, Landau's mathematics is presented in a somewhat archaic style and his proofs are extremely hard to follow in spots, as if he is making unstated assumptions. Overall, it is an interesting, but ultimately thankless, task to go through that book. It is a mostly a historical curiosity. The same can be said of Hardy's "Course of Pure Mathematics", which was recommended elsewhere in this thread. I find it hard to believe that anyone who recommends these books have actually read them.
To the OP, while I can relate to the goal from personal experience, after decades of going down a similar path, I can tell you that the history of math is very messy. Our textbooks and notation reflect this messiness. My recommendation is to dive into whatever part strikes your fancy, although it may help to start from where you are. For instance, if you program, you might want to get a book on physics in game programming or learn Haskell.
Because it doesn't refer to a key. It's more of a identifier. It would be considerably more confusing to substitute whatever key you think it should be. Cmd? Alt? Opt? Esc?
When I see "M-" in the literature, I know what that means and translate it to the relevant keystrokes. For instance, M-x for me is actually "C-[ x". I've customized it that way. However, we have a lingua franca for communicating commands despite the personalizations.
FWIW, Peter Landin sowed the seeds in "Mechanical Evaluation of Expressions" in 1964 in his description of recursive discriminated union for lists:
A list is either null
or else has a head (h)
and a tail (t) which is a list.
Rod Burstall picked up on this in "Proving Properties of Progams by Structural Induction" (1968) in an extension to Landin's ISWIM that used pattern matching on these. If you look closely the pattern matching was present in his if-then-else syntax, but he changed it to the more convenient "case".
I want to add some color to the other answers saying "no". If your medium-term goal is to use FP for business problems (e.g. perhaps you are responsible for technology decisions), then the OP's path of "Scala for the Impatient" -> "Functional Programming in Scala" -> "Learn You a Haskell" will likely provide the easiest transition and expose you to "real world" solutions in FP. Learning F# will give a similar pragmatic path (although its complications are due to .NET rather than the JVM).
However, if you are interested to "learn FP", as in see what all the fuss is about, then there are more direct routes. My own recommendations are for Graham Hutton's "Programming in Haskell" and "Real World OCaml" (https://realworldocaml.org). There are many other good (albeit verbose) resources but these two are the shortest path (IMO) to understanding the "common denominator" and historical underpinnings of functional programming languages and functional programming. Dan Grossman's "Programming Languages" course at Coursera was also quite accessible and comprehensive, but I think it may be closed now.