Monads are essentially a design pattern, an abstraction, for dealing with items of some arbitrary type (call it "a") in some way.
Instead of working with values of type "a" directly, you work at one step removed. You hand the value of type "a" off to the monad, and it hands you back an opaque box (of type "M a") which logically contains your value. Now, instead of operating on the value directly, you instead hand the box a function that's supposed to apply to the contents of the box, and in return you're given back a new box with the function applied to the contents of the old box. That's basically it.
The interesting bit is that the monad - the box - sits between the function you're trying to apply, and the value it's supposed to apply to. That gives it the opportunity to do clever things. For example, the Maybe monad can keep track of whether the box is empty or not, and only apply the function you pass in if there's actually a value there. The List monad lets you apply operations to multiple values as if it were only one value (i.e. it's like an implied map). The IO monad takes in your function that's supposed to be applied to (e.g.) whatever was read from input, and hands you back a box that appears to contain the results of that operation. The catch is that the box is opaque - you can never look inside it - so there's no way for you to tell, from the outside, whether or not there were any side-effects. The IO monad can be looked at in another way: you can think of it as queuing up all the actions to perform on I/O, but actually building up an imperative program inside the box, which only ever gets opened up and evaluated after your program has finished its whole pure computation. (This is just a mental model for IO in the context of pure functional programming, with no side-effects.)
Putting a value into the box is the unit operator, also called return in Haskell. Passing in a function to the box and getting a new box in return is the bind operator, spelled >>= in Haskell, but simplified by do notation.
however, as the burrito explanation makes clear (and the apple thread confuses spectacularly) there's also some extra bits related to whether related boxes are identical or not. and it's kind-of obvious that these are important, because if you can't look into the box directly then you need to have some idea of what makes some boxes different while others remain equal.
And without some sufficiently abstract description of the thing, it's easy to fail to generalize from the seemingly unrelated concepts as a Maybe type and I/O.