Knowing a little bit of maths, but not much category theory, the wiki entry for Monads(Category Theory) is pretty clear. First sentence: A Monad is an endofunctor (a functor mapping a category to itself), together with two natural transformations required to fulfill certain coherence conditions. Easy.
Knowing a bit of programming, but not much Haskell, reading the entry for Monad (functional programming) or any blog post titled "A Monad is like a ...", it almost seems as if the author is more confused about what a Monad is than me. The first sentences of the Wikipedia article for example are a word-salad. With dressing.
From an outsider's perspective, it's almost as if a monad in functional programming is not a 1:1 translation of the straightforward definition of category theory, leading to an overall sense of confusion.
The formal definitions are straightforward enough, but the definitions alone don't really motivate themselves. A lot of mathematical maturity is about recognizing that a good definition gives a lot more than is immediately apparent. Someone without that experience will want to fully understand the definition, and fairly so -- but a plain reading defies that understanding. That is objectively frustrating.
> As a bit of an aside, it's interesting that the CS (Haskell, mainly) descriptions of a Monad are much more complicated than the math.
I actually do agree with this, though. I feel like monads are much simpler when presented via "join" (aka "flatten") rather than "bind"; and likewise with applicative functors via monoidal product (which I call "par") rather than "ap". "bind" and "ap" are really compounds of "join" and "par" with the underlying functorial "map". That makes them syntactically convenient, but pedagogically they're a bit of a nightmare. It's a lot easier to think about a structural change than applying some arbitrary computation.
Let's assume the reader knows abut "map". Examples abound; it's really not hard to find a huge number of functors in the wild, even in imperative programs. In short, "map" lets us take one value to another, within some context.
Applicative functors let us take two values, `f a` and `f b`, and produce a single `f (a, b)`. In other words, if we have two values in separate contexts (of the same kind), we can merge them together if the context is applicative.
Monads let us take a value `f (f a)` and produce an `f a`. In other words, if we have a value in a context in a context, we can merge the two contexts together.
Applicative "ap", `f (a -> b) -> f a -> f b`, is "par" followed by "map". We merge `(f (a -> b), f a)` to get `f (a -> b, a)`, then map over the pair and apply the function to its argument.
Monadic "bind", `(a -> f b) -> f a -> f b`, is "map" followed by "flatten". We map the given function over `f a` to get an `f (f b)`, then flatten to get our final `f b`.
It's a lot easier to think about these things when you don't have a higher-order function argument being thrown around.