Sure, it's very easy for me to understand monads mathematically, what kind of structure are they. I don't need any pictures for that, the definition suffices for me.
But that doesn't tell me what are they good for. We invent things for a reason, and I don't see the reason. Now I know the reason, it's been told to me many times (some way to wrap I/O while preserving functional purity), but I just don't see how that works.
I would love to find a resource that would explain these things for me in a way I'd understand them.
You kind of just have to use them to “get it”, but here’s an analogy: if you write a data structure, you naturally want to make it “iterable” in your language’s usual way, so you can take advantage of a library of generic functions for working with iterable things.
Same goes for monads. If we have N data types and M functions, instead of writing N×M implementations, we can write just N monad instances + M generic implementations.
So it’s kind of tautological, but monads are basically useful because lots of useful things happen to form monads—exceptions, loggers, parsers, dependency injection, persistent state operations, continuations, futures, STM transactions, I/O actions, and so on.
If you can write a data type that represents an API, and implement a couple of interfaces, you get a complete, expressive EDSL for free.
With Facebook’s Haxl, for example, you can write ordinary serial-looking I/O code, and with just a few typeclass instances, instantly get concurrent/async data fetching without changing a single line of business logic. You can’t readily do that without the kind of first-class effects that monads provide.
> Same goes for monads. If we have N data types and M functions, instead of writing N×M implementations, we can write just N monad instances + M generic implementations.
Bear with me for a bit, because I still don't get it (although I'm not the OP, I share the same doubts). In OO terms, if you have N types and M functions, with M different behaviours (function code), you aggregate the N types into an inheritance tree that makes you write 1xM functions (one function against the ancestor of the N types). If you have 2M behaviours, you aggregate the types in two different inheritance hierarchies and then write 2M functions. What expressiveness advantages do monads provide against this OO scenario?
You could implement monad as a class that List, Either, Writer and all the rest extended. There are practical issues: you need higher-kinded types to be able to write the monad interface, which not many OO languages have (Scala is the only one I can think of), and writing the functions as instance methods would require "F-bounded polymorphism" which is tricky ( http://logji.blogspot.co.uk/2012/11/f-bounded-type-polymorph... - that link even has an example of how you'd write Monadic as an interface you extend rather than a typeclass (don't be intimidated by the type lambda, it's a scala wart), although it doesn't compile in scala as implemented today), and you also need a way to associate the "zero" with the type in general rather than a particular instance of it (i.e. you need a static virtual method - I think maybe C# supports them?) but in principle there's no reason you couldn't do it.
(There are arguments that type classes are a more expressive way to solve these kind of N*M problems than inheritance - see https://en.wikipedia.org/wiki/Expression_problem - but that's orthogonal really)
In OO terms, I’m talking about having N classes—which for the sake of argument are not related by inheritance—one interface, and M functions implemented in terms of that interface. Clearly it’s cheaper to implement the interface once for each class than to implement each function for each class.
That part has nothing to do with monads. The advantage of monads is the actual functionality they provide, of first-class effects and easy EDSL creation.
> In OO terms, if you have N types and M functions, with M different behaviours (function code), you aggregate the N types into an inheritance tree that makes you write 1xM functions (one function against the ancestor of the N types).
In general, its O(M+N), because you write the M methods in the ancestor that you talk about, and then O(N) class-specific implementations of the underlying functionality on which the M methods rely.
You don't write O(N) class-specific implementations. You write only one function implementation.
You only have to write N class specific implementations if each class shows a different behaviour and thus requires a different implementation. If one function code can handle N different classes that share 1 behaviour, you encode this behavioural information in the class hierarchies: by inheritance/delegation, you inform the type system that the N classes share a common behaviour. You write the code in one class, and have the N classes either inherit from that class or delegate onto that class (inheritance vs delegation is orthogonal to this discussion).
For example, consider a pattern for reading length-prefixed lists: read a number N, then read N numbers, then sum them.
main :: IO ()
main = do
n <- readLn
xs <- replicateM n readLn
print (sum xs)
“replicateM” (replicate + M for monadic) does an action N times and accumulates the results. Above we used it in IO, but we can use it in any monad, for example a parser:
import Text.Parsec
import Text.Parsec.String
import Control.Monad
main :: IO ()
main = do
line <- getLine
parseTest sumParser line
sumParser :: Parser Int
sumParser = do
n <- number
xs <- replicateM n number
return (sum xs)
number :: Parser Int
number = spaces *> (read <$> many1 digit)
And if we wanted to abstract over this pattern, we could do so trivially:
lengthPrefixedSum :: (Monad m) => m Int -> m Int
lengthPrefixedSum get = do
n <- get
xs <- replicateM n get
return (sum xs)
Now our first “main” becomes:
main = print =<< lengthPrefixedSum readLn
And “sumParser” becomes:
sumParser = lengthPrefixedSum number
There are many such functions in the standard library, such as “forM” which implements “for each” loops, and “when” which implements conditionals. If your data type is a monad, all of these control structures are available to you for free, so you can easily implement very expressive DSLs.
> But that doesn't tell me what are they good for. We invent things for a reason, and I don't see the reason. Now I know the reason, it's been told to me many times (some way to wrap I/O while preserving functional purity),
That's an excessively narrow reason. Its certainly a factor of why monads are front-and-center in Haskell, given the goals of the language, but its not really the reason monads are interesting or useful. Monads (and Functors and Applicatives, as well, as more general constructs) are interesting constructs in programming because they are powerful abstractions that unite disparate, useful data types and which, therefore, allow code that works across those data types. They therefore enable library code in circumstances where, in languages without such abstractions, fill-in-the-blanks template code patterns would be required, so they promote code reuse over copy-and-paste coding.
That IO operations are among the things that can be represented by monads is certainly part of their usefulness, but if IO was all they were good for, they wouldn't be all that interesting.
Have you used jQuery? The way that selectors and method chaining works has a monadic flavor which is very convenient. In the alternate reality where haskell is running in your browser, if you were creating jQuery from scratch you would recognize that the thing you were making was monadic, you would make use of the large number of useful functions that fall out from Monad here: http://hackage.haskell.org/package/base-4.8.1.0/docs/Control..., and you would also recognize that much of what you needed in your jQuery API is already provided by those functions, so no need to re-implement things in an ad hoc way and force your users to learn new useless things. You might even find that what you're trying to create is a mashup, or "stack", of two or three different monadic things that have already been defined. You'd know that because your new jQuery structure obeys a couple simple rules that it is well-behaved and reasonable in the presence of those functions (and others that don't exist yet), and you may even notice some optimizations you can confidently make.
It's not about I/O. It's about being able to handle cross-cutting concerns like logging, or error handling, or dependency injection, or async, with ordinary refactor-safe functions rather than global variables. http://m50d.github.io/2013/01/16/generic-contexts.html was my effort.
The reason that the Monad structure is interesting, in my view, is that it's simple and neatly captures the notion of a "computation" that we can compose in different ways.
The core useful operator for monads in Haskell is >>=. It has the following type:
m a -> (a -> m b) -> m b
If you squint, this is like function application with a few extra m's thrown in. Here's normal function application for comparison:
a -> (a -> b) -> b
So what is this extra m useful for? It's like a hole where we get to plug in some custom logic. In a sense, it lets us change what it means to "apply" a "function". This turns out to be useful for a whole bunch of things, not just IO. (Honestly, from a pedagogical standpoint, I think IO is a bit of a distraction!)
For example, take the Maybe type. It's Haskell's nullable: a Maybe a means you either have an a or Nothing.
data Maybe a = Just a | Nothing
Remembering the signature of >>= above, what's the most natural way to implement "application"? If we break the function into cases, it becomes pretty straightforward. Here's the specialized function signature we want to implement:
(>>=) :: Maybe a -> (a -> Maybe b) -> Maybe b
x >>= f = ...
If x is Nothing then we don't have anything to pass into f so the whole result has to be Nothing. If x has a value, we can just get that value out and pass it into f normally, returning the final result.
Nothing >>= f = Nothing
Just x >>= f = f x
(In case you're not familiar with Haskell syntax, the above is actually a valid definition of (>>=) for Maybe!)
For Maybe, being a monad gives us a standard way of working with values while automatically dealing with Nothing. It abstracts over repetitive null checking and lets us easily build up Maybe values based on other Maybe values.
Other examples of monads are the same in spirit. The list monad, for example, lets us handle any number of inputs in a way that's similar to Maybe. The State monad similarly lets us combine values while carrying along an implicit state internally.
The rest of the Monad structure (namely the return function and the laws) are just a formal way of codifying behavior behavior that's already intuitive.
So how does this all apply to doing input and output?
Well, the problem in Haskell is that it's a language of evaluating expressions at heart: executing effects makes no sense any more than it would in arithmetic. To work with effects we instead have a special, opaque type IO; normal expressions get evaluated to IO actions that can be run to produce the desired effect—namely the IO type.
Critically, the IO type does not have to be a monad. It could be completely self-contained and have custom functions for doing one action after the other. We could imagine something like:
after :: IO a -> IO b -> IO b
which would let you run an IO statement then run a second one and only return the value of the last one. It's like an imperative block of code!
However, we would also like some way of using the results of an IO statement, perhaps assigning them to a name. We can't do this normally because the IO statements are run separately from expression evaluation. We'd have to have some sort of function that could take an IO value, unwrap it and do something with it. And how would we express an interface like this? With a normal function!
doSomething :: IO a -> (a -> IO b) -> IO b
Hey, doesn't that look familiar? It's exactly (>>=)!
I'm hand-waving a bit again, but the rest of the monad structure comes up when you try to make sure after behaves consistently and intuitively.
So IO being a monad emerges naturally from the desire to be able to compose actions and depend on their results in a way that's separate from normal variable bindings and expression evaluation.
The causation here is important: it's not that IO is a monad, but rather the IO type (which could exist on its own) happens to naturally and usefully form a monad. But it does a lot of other things too, including some specific capabilities (like spawning threads) that are hard to generalize.
So my point, I suppose, is twofold: monads are useful for combining some notion of computation and IO happens to be an interesting example, but the fact that we wrap statements with external effects in a custom type called IO does not inextricably depend on the idea of a monad.
> Well, the problem in Haskell is that it's a language of evaluating expressions at heart: executing effects makes no sense any more than it would in arithmetic.
You know what else doesn't make sense in arithmetic? Computation. I think it is important to wonder how come mathematical definitions of computation didn't come about until the 20th century[1] even though math has had most of its big breakthroughs (to date) by then (or, at least, there was no revolution in mathematics in the 20th century on the same scale as there was in physics). And yet, as advanced as math was, computation was only defined very late in the game. The reason is that, perhaps ironically, the concept of computation is absent from most of math, which is "equational".
But computation is the very core of computer science. So, for example, in arithmetic, 4 and 2+2 are the same thing because 4 = 2 + 2. So in arithmetic, equality means "sameness". But in computer science they are most certainly not the same, because the process of moving from one of the representation to the other is precisely what a computation does. Not only that, the process isn't even necessarily symmetrical, as moving in one direction may entail a much higher computational complexity than going in the other direction.
Any distinction between calculations and effect (other than actual IO) is, therefore, arbitrary. Whether or not the particular distinctions Haskell makes (where waiting for one second may or may not be an effect depending on how it is implemented) in the form Haskell makes it -- using pure functions and equational reasoning and describing effects with monads -- is actually useful and beneficial is a question for psychologists. Given Haskell's design, monads are certainly useful in Haskell, but that utility can't be generalized to languages with other designs.
Computation in mathematics is well represented in the constructive and intuitionistic camps. It was thoroughly examined in the 1930s via the invention of things like lambda calculus and turing machines. It was also subsequently rejected by most mathematicians not because it fails to be equational but just because they decided that they don't really care.
Including notions of computation makes equational theories hairier. That might be very much what you want, and there are plenty of ways in which mathematics talks about (2 + 2) being unequal to (4). Typically you instead replace equality with weaker notions going all the way down to the weakest (most intricate) examples of actual algorithmic computation.
So why avoid this kind of stuff? Because you want to pick a "cognitive abstraction", in your terms, which is most useful. It turns out that for a lot of purposes inclusion of computation is too hairy and obscures what people care about. It also turns out that in many cases what they care about is somewhat unaffected by the computational substrate involved.
I really can't say I agree that your beef with equational reasoning makes much sense to me, but I think it's just because you're shooting too broadly. You can easily have problems with the particular choice of granularity of equating taken by the Haskell community and language... but that's just a much smaller thing.
That's what I said. Only in the 20th century for something that might seem to an outsider to be fundamental to mathematics (after all, mathematicians sometimes compute, right?)
> So why avoid this kind of stuff? Because you want to pick a "cognitive abstraction", in your terms, which is most useful.
Oh, absolutely, but there are many ways and many forms of reconciling "equational" mathematics and computational mathematics, and PFP is just one of them. As I've shown in another comment, Prolog takes this much further than Haskell because it's truly equational (or, rather, relational), while Haskell gladly allows a direct (one-way) computation if it defines it as pure, i.e. in Haskell 2+2=>4 is still as different from 4=>2+2 as it is in Java. Other ways are the plain procedural style (which simply lets you name and parameterize a computation), and my favorite (although I don't know how practical), temporal logic used in, e.g. Esterel (and TLA+).
My beef is not with equational reasoning as a concept, and not even with Haskell's flavor of it. Certainly "equationality" in programming languages is a spectrum (as computation is inherently non-equational), and finding useful points on it is a matter for lots of empirical study. My "beef" (if that's what you want to call it) is with the claim that Haskell's particular design is somehow more fundamental or mathematical than other designs, and, to bring us back to the topic of the discussion, that (cognitive) monads are somehow essential. The reality is that there are many (good) justifications for Haskell's design, but they've been made by people to match certain human aesthetics and certain human goals[1] and they don't have any transcendent mathematical justification. They are just a point in the design space. Similarly, monads are only (pretty much) essential in Haskell given its design. Some people try to present Haskell (and monads) as the language (and abstraction) God, nay, better -- Math -- has given us, and that's just wrong. It's just a language like many others, with its own design choices. Besides, everybody knows God uses Lisp. My other "beef" is with the unjustified belief that a certain way of expressing an algorithm is automatically cognitively optimal just by virtue of it being equational (be it either in the Haskell or the Prolog sense).
[1]: One of them is a particular balance between correctness and specification effort, which, again, is also arbitrary and defined by the limitations of HM type inference.
P.S.
To clarify something I wrote in another comment, Haskell's definition of purity, while fundamentally arbitrary (from a computational theory point-of-view), makes a lot of sense if you've chosen that particular design.
Ahhh, well if you're taking aim at Haskell somehow being "right" then I'm not going to say a thing. :)
I think as far as monads go I quite like your distinction drawn elsewhere between abstractions for mathematical sake and abstractions for human cognitive sake. I think ultimately it has a lot to do with someone's goals as a programmer and person as to which abstractions they should hold and realize.
I think, personally, monads have a very high power-to-weight ratio here, even outside of Haskell, but I've never been one to argue that one cannot live without them. Or even shouldn't.
I will add that in general, there is an inherent mismatch between "equational" math and computation, and there have been various approaches to bring the two closer, e.g. Hoare Logic, various process calculi and temporal logic (my personal favorite). Haskell's approach is basically using lambda calculus[1] to unify some computations such as 2+2=4 with 2+2 -computes-> 4 [2](Prolog goes one step further and unifies 2+2=4 with both 2+2 -computes-> 4 and 4 -computes-> 2+2, as equality is a kind of relation and Prolog is relational) and other computations as monads wrapping various "unsafe" operations[3]. It works, and it's quite elegant if LC as the one true expression of computation is your cup of tea. I think it's clear why this approach can be a fine design choice, but it's also clear why it can be not so fine. Personally I think that design doesn't sit too well with most programmers as a cognitive abstraction.
So the answer to "why are monads useful?" is that if for whatever reason (e.g. it makes programming easier for you) you decide to model computation as Haskell does, as LC and "equational reasoning", monads are invaluable as an elegant description of various computations that wouldn't otherwise be convenient to express. Once you use other approaches, monads (as a cognitive abstraction) quickly lose their usefulness and appeal.
[1]: Which is one common definition of computation and one of the "original three" definitions, but not a very good or powerful description of computation as it lacks good measures of complexity (it also lacks introspection which makes it less powerful than the Universal Turing Machine, but that is no a very common capability in programming languages other than Lisp).
[2]: In fact, "computational math" doesn't care in the least that 2+2 equals 4. All it cares about is the computation 2+2 -computes-> 4 (or 4 -computes-> 2+2, the latter is completely separate from and unrelated to the former). It is those various approaches of verifying computation that care about the relationship between 2+2=4 and 2+2 -computes-> 4, but those can (and should?) be completely separate from the programming language.
[3]: That split between some computations and others is exactly why a `sleep` operation in Haskell can be implemented to either be "effectful" (using monads) or "pure" (using only "pure" computation). That artificial distinction doesn't really have a rigorous definition in computation theory; rather, the definition of "pure" is a language-level design choice. For example, in Koka a divergent computation (i.e. infinite sleep) is an effect, but not a finite sleep; in Haskell, neither is an effect. In both Haskell and Koka, a stack overflow is not an effect and neither is memory allocation. OTOH, in Haskell spawning a new thread is considered an effect (which seems really weird to me). I hope that at this point it is clear that the distinction between "pure" and effect is arbitrary and not a fundamental concept of computation (aside, perhaps, from IO).
I appreciate your explanation but again it fails to actually show it being useful outside of the context of working around Haskell's strict type system. What the OP and myself would like to see is concrete examples why this is a better approach than the way something would be done without explicitly caring about monads (I say explicitly because I know there is a tendency to say that some given structure is a monad and we didn't realise it; but that still doesn't really prove that it is all that useful in the general sense)
One problem I suppose is that not everyone speaks the abstract language of monads and so it does not serve much usefulness, yet, and maybe there is a little chicken and the egg with that.
You know how ES6 has this big hoorah coming because they're finally getting something to make callback hell a little less hellish, async/await?
Well. It's a monad. It's about 30s of work to implement it if you know what you're doing.
Of course, you could implement it as a monad in Javascript as well. It'd be a neat trick, but you'd never love it. The reason is simple: you can't benefit from monads unless you use them so pervasively that everything will integrate together.
And if you do, that integrate together bit works fantastically. People often complain about monads not composing---but I think they're often just misinterpreting a rather technical result. Actually, imo, monads compose incredibly well and it's astounding when you get used to it.
When you use them pervasively, monads mean that you get to "pick your own semantics" on the fly, whenever you want. You can mix and match semantics as is interesting and work with your custom mixes as if they were built into the language.
So people, e.g., talk about how it was easy to write STM because of monads.
It wasn't "because of monads". Of course STM is a monad and anything which looks even halfway like it in any language will also be a monad no matter how hard you try to avoid it. They're "just a pattern".
But when you've got a language which allows you to cut into the "STM monad" exactly and whenever you want, when you've got a hold of the root of the semantics of your language, when you've got programmable semicolons, then there's something really special.
And without that you've got a couple of years of bickering between standards committees to fix what end up being trivial looking problem.s
> I say explicitly because I know there is a tendency to say that some given structure is a monad and we didn't realise it; but that still doesn't really prove that it is all that useful in the general sense
Exactly. That is what I call mathematical- vs. cognitive-abstraction. They are not the same thing, and in the case of programming languages, only the latter matters. A mathematical abstraction is simply the naming of a pattern; a cognitive abstraction is a human solving a problem by explicitly thinking of said pattern. The former is either true or false; the latter is either useful or not (and the utility is completely orthogonal to any mathematical considerations).
> I appreciate your explanation but again it fails to actually show it being useful outside of the context of working around Haskell's strict type system
I think it isn't useful as a cognitive abstraction (and the problem isn't Haskell's type system, but its purity from effects). I believe that monads as a cognitive abstraction are foreign and harmful to imperative languages, which have an equally-powerful cognitive abstraction -- the thread -- that fits much better with the rest of the language's abstractions and I think is much more useful for most people.
> For Maybe, being a monad gives us a standard way of working with values while automatically dealing with Nothing. It abstracts over repetitive null checking and lets us easily build up Maybe values based on other Maybe values.
To expand on that a bit, in a way that might work for you (or might not): I've got a function f a that exists. It works. It does exactly what I want. Unfortunately, the data I've got is a Maybe a, not just an a. So I can't use my nifty function f, because it takes an a. You can feel my annoyance/frustration - so near, and yet so far.
But, in fact, I can use my function f, because Maybe is a monad. That lets me apply f to my Maybe a, without having to change either f or a whatsoever. I don't even have to write the magic code to do this - Maybe already has it.
That operation is called fmap, and it's characteristic of a functor. In Haskell, it's called liftM for monads, but they're really the same thing (and I think recent proposals are going to iron out this redundancy). fmap is one of the properties of a functor. All monads are functors, so all monads provide an operation with that signature.
I'll try and keep it simple. It helps make code more readable.
When I see a bunch of Maybe monads, for example, I can instantly know that the programmer's intent was maybe to not just have a type safe null and avoid null pointer exceptions, but also to possibly make use of Maybe's toxicity.
What I mean by toxicity is you know how in some languages you have "not a number" aka NaN? A NaN times / divided by / added to / subtracted from anything is a NaN, that means that if you have a complex formula, if you throw a NaN in there anyway you get back a NaN. Maybe works the exact same way but with all functions not just arithmetic operators. That's what I mean by its toxicity.
Without Maybe you'd have to explicitly apply that logic everywhere you'd otherwise apply a monadic "join." The reader would then have to study your code much more carefully to understand that intent, and probably look at it with some skepticism that the toxicity works perfectly every time or is in every place you might expect. And of course not everyone values brevity and expressiveness, but when done right it makes code more readable not less, and Monads done right facilitate that.
Maybe's toxicity is just an example of pretty much any "magic" you can build into a Monad. You could for example have a Future Monad that will handle asynchronous control flow for you.
As a side matter, it's not just working around Haskell's strict type system that make Monads a good fit for IO, but also Haskell's non-strict execution.. monads help things happen in the right order which is crucial for IO. That's kind of neat when you think about - the bridge between the mathematical rigor of pure functional code and real world devices that make computing useful. Functional purity makes code easier to reason about and less error-prone.
I'd say STM (software transactional memory) is a compelling example. Your computations that mess with shared variables can only mess with shared variables, the type system enforces this, and the only way you can actually run your shared-memory-changing transaction is through a function `atomically :: STM a -> IO a` (this gives a nice effect: `atomically $ do ...`)
Ha, you're quick on the draw! I think the exploration of `after` and `doSomething` highlight good insights that I missed.
The idea that the `IO` type forms a monad (as opposed to `IO` being a monad just because the wizards said so) is also really important and missing from my explanation. Great work.
Whoever told you it has to do with IO and functional purity vastly oversimplified things. That is just one use case.
Monads take care of a very common computation pattern: wrapping and unwrapping data from a container to do stuff with it. This process is error-prone and leads to code bloat if done by hand whenever needed.
I mean, how often have you had to apply a function to every element in a vector of values? fmap takes care of that without having to go into the list, take a value one at a time, and apply the function you want to use.
Rather than have to unwrap your container to use functions, a monad offers the means to transform functions to use a monadic container as input instead. No wrapping or unwrapping by hand required; using bind, fmap, or ap(ply), you transform lots of functions to use the monadic container and build a seamless pipeline for data to traverse.
How often have you had to deal with a value that might or might not exist--e.g. searching a string for a substring's position? If you had that problem, you might say, well, I'm going to return an integer, but a negative number means no match. And then in all code thereafter, I have to check if the integer is negative and do different stuff. A simple Maybe/Optional monad would take care of all that for you.
How often have you had to write verbose output alongside a computation? You could pepper your code with print statements, but maybe you want that output controlled by some runtime parameter. Would you want to check at every print statement whether that parameter is true? Or you could use a Writer monad, pass along the verbose output through the computation, with an easy means to transform regular functions into using Writers, and then decide what to do with the output at the end.
Sure, it's not convenient to try to use monads if the language doesn't offer it, or if there's not a library to facilitate it (trust me, I know this; I implemented a slew of templates to use monads in C++).
But programmers do this stuff all the time: unwrap data, do stuff with it, pack it back up. They do this over and over, repeating themselves, and such repetitive code is a maintenance trap waiting to happen.
Monads help you avoid that trap, and they help you take your program and reduce it to "one thing in, one thing out." Now maybe those "one things" are containers, but linearizing the flow of data this way only helps make the program's concept easier to understand.
> Monads take care of a very common computation pattern: wrapping and unwrapping data from a container to do stuff with it.
Isn't this what Functor does with `fmap`, not Monad?
Even if Monad did not exist and you only had Functor and Applicative, you'd be able to compose `pure` and `fmap` to get the same effect as bind, right?
No, plain functors, applicative functors, and monads each deal with a different potential use case.
Look at the specific type signatures:
plain functor - fmap: (a -> b) -> (F a -> F b)
applicative functor - ap: A (a -> b) -> (A a -> A b)
monad - bind: (a -> M b) -> (M a -> M b)
So yes, monads are not unique in doing wrapping and unwrapping data to use with functions--all functors do this--but all monads are also functors, and also applicative functors, so they do everything the other two do, and more. You can't get at anything like bind using just fmap, ap, and pure/unit. But you can write fmap and ap merely in terms of bind and unit.
So I think that expanding your idea of what monads do might help. Monads enforce a sequencing, and let later computations in the sequence depend on the result of an earlier one. Have a look at the definition of the Monad instance for Maybe:
instance Monad Maybe where
return x = Just x
(Just x) >>= k = k x
Nothing >>= _ = Nothing
Now consider (because it's a contrived but simple example) that you have some `Map` type (from Strings to Strings, just for convenience), a value of that type `myMap :: Map` and a function `lookup :: Map -> String -> Maybe String`. Let's do the equivalent of `myMap[myMap[myMap["a"]]]`:
case lookup myMap "a" of
Nothing -> Nothing
Just v -> case lookup myMap v of
Nothing -> Nothing
Just v' -> lookup myMap v'
This pattern of 1. do a thing, 2. check its result, 3. feed the result into the next step of the computation is what's abstracted over by the monad. We can write the same lookup using `do`-notation:
do
v <- lookup myMap "a"
v' <- lookup myMap v
lookup myMap v'
If this is making sense, I'd suggest repeating the exercise with `Either`, which is often used to pass an error message on its `Left` constructor (bypassing the rest of the computation). Actual results are stored on the `Right` constructor. If that makes sense, then I would then look at `Reader` (which lets you do computations with some value (like an environmental context) at-hand, and then maybe `State`.
If all the functional stuff is clear, then I'd look at the `STM` monad, which implements Software Transactional Memory. STM is IO-like in the sense that you are manipulating shared state, but you only have a restricted set of tools to do it with - the type `STM a` means "a transaction that fiddles with some shared memory, then returns a value of type `a`". To actually execute the transaction, you have to turn it into an `IO` action using the function `atomically :: STM a -> IO a` and put it in a side-effecting computation somewhere.
Hopefully that clears things up a bit: `IO` is just a special case of this sequencing strategy, but the really cool things happen because we can define what sequencing computations means for different data types. I think this is what some haskellers mean when they say "monads let you overload semicolons".
The `Maybe` monad presented here lets you create a language without null references. That's good enough reason for me. Rust doesn't have HKTs like Haskell does, but it still uses the same things that are called monads in Haskell. They just don't want to call them monads.
Phrased another way, in any language, Monads work as a design pattern, and Rust had built-in implementations of that pattern; but some languages (Haskell, etc.) support Monads as a construct against which you can program directly, which is a more powerful abstraction than spring or implementing it as a design pattern.
Sure, it's very easy for me to understand monads mathematically, what kind of structure are they. I don't need any pictures for that, the definition suffices for me.
But that doesn't tell me what are they good for. We invent things for a reason, and I don't see the reason. Now I know the reason, it's been told to me many times (some way to wrap I/O while preserving functional purity), but I just don't see how that works.
I would love to find a resource that would explain these things for me in a way I'd understand them.