Yes. Continuing beyond monads to learn about category theory is mostly because i...

Yes. Continuing beyond monads to learn about category theory is mostly because it is interesting. It is definitely not a prerequisite.

The historical relationship is that Eugenio Moggi wrote a paper in 1988 "Computational lambda-calculus and monads" http://bit.ly/1x9uUHQ that used ideas from category theory to handle the semantics of programming language features. His problem was that the lambda calculus itself failed to adequately handle certain "notions of computation" such as side effects and non-determinism. This inadequacy goes way back to the origins of programming language semantics, starting with Peter Landin's 1964 paper "Mechanical evaluation of expressions" http://bit.ly/1rrBit3. He described a virtual machine (SECD) for evaluating lambda expressions, but had to add special constructs for handling assignment and jumps. In case you don't know what semantics is about, imagine that you had two small programs written in two different languages. How would you determine if they were the same? One way would be to demonstrate their equivalence to a common, mathematically sound, language. That's what Landin used the lambda calculus for. But it's only good for a subset of what we understand computation to be (particalurly what a Von Neumann computer is capable of doing).

That's what prompted Moggi to look to a more encompassing branch of mathematics -- category theory -- to describe semantics. In 1991, he wrote "Notions of computation and monads" http://bit.ly/1viXT6z, which is much shorter and more accessible, but essentially the same as the previous paper. Philip Wadler, one of the original authors of Haskell and long-time functional programming researcher and educator, was inspired by Moggi's work to write (several times) a tutorial showing how Moggi's "monads" could be applied to the restrictions of functional purity. For example, to simulate the incrementing of a global variable representing a counter (of some operation), one could use a "state" monad.

The monad of Wadler (and Moggi) is really pretty simple. First, you need a way of representing one of these notions of computation. In a programming language, this is done with types. Or to be more precise, a 'type constructor' since you will want the computation to be general. For instance, if your computation changes state (eg the counter above), you might want to construct a function type that given a value, takes a state (eg integer representing the counter variable) and returns a tuple consisting of the value and some new state. I should point out that my last sentence is what makes monads so confusing and challenging in languages other than Haskell. They don't have a nice way to represent what I just said. To use Wadler's notation:

type T x = S -> (x, S)

The second component of a monad is a function that takes a value and returns a "computation" (ie, the type described above). And the third component is a function that takes one of the computations and a function from a value to a computation and returns a computation. That's a mouthful, and it requires the language to support polymorphic types and higher-order functions. If your language does not or doesn't provide good syntax for them, then monads will be an elusive and difficult concept. But as you can see, a monad is just a pattern made up of these three things. That's all ... almost.

Because of the relationship to category theory, a monad (consisting of the three components) must also obey certain rules for how these operations are expected to behave when combined in certain ways.

In fact, monads are pretty cumbersome to implement, especially when there isn't good syntactical support in a language. But they provide a general solution to certain kinds of problems which is functional in nature.