Copied from a comment of mine from a couple of years ago:
"I've recently finished reading Learn You A Haskell For Great Good!, and I think the author's approach to monads works well. I'd read various metaphorical explanations that only served to further cloud the issue, but thanks to LYAH I finally got it.
In short, he has you using monads long before you understand them (Maybe and IO in particular), then slowly introduces monads by first explaining functors and applicative functors.
In retrospect, the mystique seems crazy. Monads are just not that confusing: they're simply values with added context, along with functions that let you interact with those values without losing the context. It's a shame that this powerful idea is so obscured by its supposed difficulty."
I'm not going to give you any more metaphors (a monad is not a burrito). Monads are objects (in the category theory sense) that in some sense "contain" another object, and let you work on both the level of the contained object and the box around it. This can be terribly useful (and necessary in a pure language like Haskell).
It turns out that monads obey the so-called functor laws (which is a just another scary category theory word for a really simple idea) and thus have the function `map` defined on them. What's really cool is that we can map (transform) the boxed data inside the monad without caring whether there's actually anything inside.
Why is this useful? Consider the Maybe monad in Haskell (or Option in Scala). Lets say you want to get data out of a HashMap. You look up a specific key, but there might not be any data there. In most languages, this would be handled by returning a null or throwing an exception. Both solutions break type safety, so we want something better. Using Maybe we don't have to care whether or not the data is present to operate on it:
// This is Scala because my Haskell is a bit rusty, but the
// ideas are the same
val map = new HashMap[String, String]("hello" -> "goodbye")
// HashMap$get returns, in this case, an Option[String]
val opt1 = map.get("hello") // => Some("hello")
val opt2 = map.get("bonjour") // => None
// What happens when we process these two options?
opt1.map((x) => x.toUpperCase) // => Some("HELLO")
opt2.map((x) => x.toUpperCase) // > None
// Note that `map` here is the ">>=" operator in Haskell
// It's different than a normal map, which takes in something
// of type A, a function from (A -> B) and produces something
// of type B. ">>=" instead takes in something of type M[A],
// a function from (A -> B) and produces something of type
// M[B]. The powerful thing about this construction is that
// it lets us concatenate processing steps:
opt1.map(_.toUpperCase).map(_ + "!!!").map("!!!" + _)
opt2.map(_.toUpperCase).map(_ + "!!!").map("!!!" + _)
// If we start out with Nothing, we always have Nothing, but
// we don't need to check for it at every step.
No conditional code checking for nulls, just type-safe processing on possible unreliable inputs.
And Maybe/Option is possibly the simplest monad. You're probably more familiar calling map on lists, and it turns out list is also a monad, with the empty list representing the Nothing value. Things get trickier when we move into the work-horse monads in Haskell: IO and State, but the same principles apply. The IO monad lets us process the results of running IO operations without having to care (until we need to) about whether the options have succeeded or not. It also, crucially, allows us to run IO operations in a determined order by chaining the operations together. The details of all this are too long for a HN comment, so I'll just direct you to Learn You a Haskell  if you haven't read it, or Real World Haskell  if you have. Both have pretty good chapters on monadic thinking which hopefully will get you over the hump.
My best advice is: don't be scared by the category theory. Contrary to popular opinion, you don't need a PhD to program in Haskell :).