 The Algebra of Algebraic data types 136 points by jasonrdsouza on June 25, 2015 | hide | past | favorite | 40 comments A few remarks on this stuff.1. Very closely related is the pure-mathematical technique of generating functions, where you take a sequence (let's say the Fibonacci sequence 0,1,1,2,3,5,8,13,...) and use its elements as coefficients in a power series (0+1x+1x^2+2x^3+3x^4+5x^5+8x^6+13x^7+...) and then notice that (e.g.) shifting the sequence by 1 is the same as multiplying by x, so if F is the series for (f(n)) then x.F is the series for (f(n-1)) and x^2.F is the series for (f(n-2)) apart from some corrections for very small n, which gets you F=(x+x^2)F+x or F = x/(1-x-x^2), and now you can factorize that quadratic and write F in the form p/(1-ax)+q/(1-bx) which immediately gets you an explicit formula for the Fibonacci numbers (the one with all the sqrt(5)s in it).So let's translate this into algebraic data types. We get`````` F(x) = x + x F(x) + x^2 F(x) data FibTree a = Leaf a | Pair a (FibTree a) | Triple a a (FibTree a) `````` and the number of FibTrees with n "a"s in is exactly the nth Fibonacci number.(There is a theoretically important data structure called a Fibonacci heap, but I don't believe there is any connection between these and these "Fibonacci trees" other than that in both cases there is something you can count with Fibonacci numbers.)2. Suppose you consider a rather boring binary-tree type like this:`````` data Tree = Leaf | Pair Tree Tree `````` whose corresponding identity is T = 1+T^2, or T^2-T+1=0. If you're a mathematician you will quickly see that the solutions (in the complex numbers!) of this are sixth roots of unity; that is, they have T^6=1. This clearly can't hold in terms of types (it would mean an equivalence between 6-tuples of trees and objects of the "unit type", there being only one such object), but it turns out that T^7=T is "true"; you can find an explicit construction for it on page 3 of http://arxiv.org/abs/math/9405205 if you like.3. It doesn't appear that the author of the OP ever got round to writing about negative and fractional types, but other people have. For instance, this paper http://www.cs.indiana.edu/~sabry/papers/rational.pdf which I have only glanced at (it appears to do this in a more restrictive context where there aren't general ADTs with recursive definitions; negative types seem to involve running computations backwards somehow, and reciprocals are "constraints").Apologies if I've got my Haskell syntax wrong anywhere. Just as a breadcrumb for readers who are curious about the generating function stuff: https://en.wikipedia.org/wiki/Combinatorial_species (see also https://www.cis.upenn.edu/~byorgey/pub/species-pearl.pdf for a nice FP-flavoured tutorial) If you're like me, and you didn't immediately understand why the number of inhabitants for "a -> b" is "a^b", and needed help for the answer, then here's my solution to the author's question ("Why are there eight inhabitants of Tri -> Bool, but nine of Bool -> Tri? It helps to write out each possible function.").The nine inhabitants of Bool -> Tri: https://gist.github.com/acbart/5d3fdfd8d363af26a59cThe eight inhabitants of Tri -> Bool: https://gist.github.com/acbart/c695355880a706a4ff59Pedagogical note: If you're going to poise a question in an online presentation of material, try to have a solution (automatically checked or at least statically available). A huge part of learning is having the learner participate (which you did!) and then assessed. I got help from here, @tikhonj suggested this in another comment: http://chris-taylor.github.io/blog/2013/02/10/the-algebra-of... It may be helpful to think about this as a table.9 functions from 2 to 3 - How many 2 digit numbers in base 3?Each row represents a function from {True,False} to {A,B,C}.`````` False True ----- ----- A A Function 1 f₁(False)=A f₁(True)=A A B Function 2 f₂(False)=A f₂(True)=B A C Function 3 etc. B A Function 4 B B Function 5 B C Function 6 C A Function 7 C B Function 8 C C Function 9 `````` 8 functions from 3 to 2 - How many 3 digit numbers in base 2?`````` A B C ----- ----- ----- False False False Function 1 f₁(A)=False f₁(B)=False f₁(C)=False False False True Function 2 f₂(A)=False f₂(B)=False f₂(C)=True False True False Function 3 etc. False True True Function 4 True False False Function 5 True False True Function 6 True True False Function 7 True True True Function 8`````` Imagine a collection of a objects(lets call it A) and another collection of b objects(called B). A function is a set of arrows matching each object in A to an object in B. Each object in A only has one arrow originating from it.For each object in A, you have b ways to draw an arrow to an object in B. As there are a number of objects in A, the total number of arrows you can draw is bbb*b... a times. Hence the number of functions from A -> B is b^a. > the number of inhabitants for "a -> b" is "a^b"You mean "b^a". (wz1000 has it right, but he didn't point out that the original statement had them flipped.) Maybe that part would've been easier with Bool and Unit. It's obvious that there is only one function from Bool to Unit, and two functions from Unit to Bool. Hi acbart, thanks for the suggestion. This is now fixed. https://codewords.recurse.com/issues/three/algebra-and-calcu... Very cool! Thank you for writing this article, I found it very helpful and extremely interesting. I also found it a very nice read in general. Thanks - wasn't obvious to me either. Am I missing something though, I only see six inhabitants in your gist for Tri? You're right, I missed two cases! Thanks a bunch. Second pedagogical note: we just demonstrated the value of cooperative learning, which can really help a weak assessment. Similar to paired programming! To add another way to look at it (thinking of a function as a set of tuples):With tuples (A,B), you're listing As uniquely paired with BsWith functions (A -> B), you're listnig unique sets of (As non-uniquely paired with Bs), so each A can go to any value of B, with replacement. A nice algebraic law that holds in data types is`````` a -> (b, c) = (a -> b, a -> c) (b * c)^a = b^a * c^a `````` This is useful as common argument factoring and can optimize repeated function calls.Another is the famous`````` (c^b)^a = c^(b * a) a -> b -> c = (a, b) -> c `````` Which is curry/uncurry or the "product/exponential adjunction" in category theory.Finally, consider`````` a^b * a^c = a^(b + c) (b -> a, c -> a) = Either b c -> a `````` Which essentially says that given a tuple of functions we can choose which one we want to evaluate. Choose b and c to be unit and notice that `() -> a` is essentially `a` (after all, a^1 = a) and you'll have`````` a * a = a^2 (a, a) = Bool -> a `````` Which is the seed of a useful way of memorizing pure functions (look up memo-combinators for many more details). It's worth noting that if you take the expansion for List:`````` L a = 𝟏 + a × (L a) = 𝟏 + a + a² + a³ + ⋯ `````` and substitute 𝟏 for a:`````` L 𝟏 = 𝟏 + 𝟏 + 𝟏² + 𝟏³ + ⋯ = 𝟏 + 𝟏 + 𝟏 + 𝟏 + ⋯ = 𝟏 + (L 𝟏) `````` you get ℕ:`````` ℕ = 𝟏 + ℕ `````` The zero-power rule (x⁰ = 1) from classic algebra also holds; there is exactly one function from 𝟎 to any type (or in set-theoretic terms, from the empty set to any set).There was an interesting paper on "closed semirings" posted here a few days ago. Clearly, the product and coproduct on types form a semiring, and interestingly, List is the closure! The "Analytic Combinatorics" book by Flajolet and Sedgewick is also relevant here. Not to detract from your point, but you don't need the expansion for that. Directly substituting 𝟏 in the definitional equation works just fine:`````` L a = 𝟏 + a × (L a) L 𝟏 = 𝟏 + 𝟏 × (L 𝟏) L 𝟏 = 𝟏 + (L 𝟏) L 𝟏 = 𝟏 + 𝟏 + 𝟏 + 𝟏 + 𝟏 + 𝟏 + ... ℕ = 𝟏 + 𝟏 + 𝟏 + 𝟏 + 𝟏 + 𝟏 + ... `````` The final expansion is worth writing, however, since it notes how there are unboundedly many naturals, each a different component in this infinite disjoint sum. >Not to detract from your point, but you don't need the expansion for thatI just used the expansion given in the parent. It also makes it apparent that List is the closure. That's true! Actually I just realized that the law does not hold, the type `Void -> Maybe Void` has two inhabitants, not one:f1 x = Nothingf2 x = Just x Both of them are regarded as the same function(ignoring bottom) because there is no way to differentiate between them as you can never supply them with a value of type Void in order to see their result. They are similar in that they cannot be applied to any non-bottom value, but they are definitely not the same function. >they are definitely not the same functionThis depends on what equivalence you're using, and the only one in which it's true (definitional equality) isn't very interesting. Extensionally, the functions are identical. Ignoring, bottom, the functions arguably have no extensionality, and therefore are only vacuously equivalent in that sense. `Just x` cannot be constructed since there is no `x` which is has type `Void`. Well, there's the one in context that does! Except this function can never be called in an empty context so you ultimately are stuck. `Just x` can be constructed given an `x`. The fact that no `x` can be given is somewhat irrelevant to the definition of the function. I love that 0^0 = 1 since you can define voidToVoid :: Void -> Void = id You have to be careful when using Haskell for this sort of thing, because you can, for example, have a term of type Void:`````` inhabitantOfZero :: Void inhabitantOfZero = fix id `````` (Side note: the empty type can also be interpreted as the type of non-terminating computations, and in fact typing 'fix id' into ghci will cause it to hang)It turns out that 𝟎⁰ = 𝟏 is true, though, and if I'm not mistaken, this is true of semirings in general (where the exponent has a natural interpretation; of course you can define exponentiation this way for any semiring you like). This sole inhabitant is the empty function, which is easiest to explain in set-theoretic terms: if you define a function from A to B to be a subset of A × B such that each a ∈ A appears exactly once (for example, the negation function on Bool would be the set {(true, false), (false, true)}), it's clear that there's exactly one function from the empty set to the empty set, because there is only one subset of ∅ × ∅ (i.e., ∅) and for this set the above property is vacuously true.For completeness, here's what the empty function looks like in Agda, which treats types much more carefully than Haskell (to understand why, look up intuitionistic logic/type theory and the Curry-Howard isomorphism):`````` module EmptyFunction where data ⊥ : Set where -- No constructors, since ⊥ is empty inhabitant-of-𝟎⁰ : ⊥ → ⊥ inhabitant-of-𝟎⁰ ()`````` Yeah, things start to break down when you include bottom (and fix id = _|_ since it diverges).I'm not sure exactly how to work with polymorphism in the function as subset model you described, but I'm guessing it would be something like this:Let id_A ⊆ A x A s.t for all a ∈ A, (a, a) ∈ id_A.If A = ∅, then id_A = ∅, which is exactly the result we'd expect, so I don't see any issue.> the empty type can also be interpreted as the type of non-terminating computationsThis is true, in a sense, but also a little bit misleading I think, since technically every type in Haskell is the set of terminating value with that type plus _|_ >If A = ∅, then id_A = ∅, which is exactly the result we'd expect, so I don't see any issue.The issue isn't with the example you gave, but with the idea in general that being able to produce a term of a given type in Haskell means that that type is necessarily inhabited.>This is true, in a sense, but also a little bit misleading I think, since technically every type in Haskell is the set of terminating value with that type plus _|_Well, ⊥ is empty, so naturally every type is itself plus ⊥. In fact, that leads to an important observation, which is that any computation in Haskell could potentially be non-terminating, not just those with type Void—and so it's really not the inclusion of the bottom type that causes things to break down, but the ability to produce non-terminating computations (more specifically, non-coterminating, i.e. non-productive "lazy" computations).Unfortunately, the alternative—making it impossible to produce non-terminating computations—is more than anything a matter of rejecting everything the compiler can't prove terminates, which is kind of limiting, termination being undecidable in the general case. On the other hand, the sorts of computations humans design are generally much more well-structured than an arbitrary Turing machine, and so probably much easier to prove (co)termination for. Worth noting that this isn't why they are called algebraic data types, which was a mistake I made for a while.The "algebraic" in algebraic data types refers to this notion of "algebra": https://en.wikipedia.org/wiki/Initial_algebra#Use_in_Compute... As someone new to this I have to say the first paragraph doesn't do a good job at introducing the concepts. I was completely lost in how to read this. Isn't + == "or" and x == "and" like in boolean algebra? What does 'either' even mean here, if not 'or'? So then why is the Bool type 1 + 1 == 2? Shouldn't it be 1 + 0 == 1 if you want to map the possible states? Same question for "maybe" that seems to work like in Rust or Swift - shouldn't it be a + 0? If not, what does "a or unit" mean? So many questions... Take a look at this other blog post with the same title, which I think does a better job of explaining the core ideas. Thank you, that clears lots of things up.Edit: Wow, even currying is explained nicely. This blog entry is pure gold. >Isn't + == "or" and x == "and" like in boolean algebra?In a sense, yes. "×" denotes the product of two types— i.e., the type of pairs of one value of the first type, and one of the second. "+" denotes the disjoint union of two types, the type of values which are taken from the first type or the second type. Importantly, when you have "a + a," taking a value from the left "a" is distinct from taking a value from the right "a.">So then why is the Bool type 1 + 1 == 2? Shouldn't it be 1 + 0 == 1 if you want to map the possible states?"0," "1," and "2" here do not denote states; rather, they denote types. In particular, "0" is the type with zero values, "1" is the type with one value, "2" is the type with two values, etc.The reason that "1 + 1 = 2" is true is that there are two ways to form a value of type "1 + 1:" take the one possible value from the left "1," or take the one possible value from the right "1." The Bool type, by definition, has two values, so it needs to be given by the type "2," or equivalently, "1 + 1.">Same question for "maybe" that seems to work like in Rust or Swift - shouldn't it be a + 0? If not, what does "a or unit" mean?"a + 0" is, as the algebra suggest, just "a." This is because the only way to form a value of type "a + 0" is to use a value from "a;" there are no values in "0" to choose from. The Maybe type is effectively the identity type plus a single "None" value. Thus, we call it "a + 1;" there is a "Some" for each value in "a," and a None for each (i.e., the only) value in "1." Got it, thank you. The other link posted by tikhonj is also very good at explaining. Your phrase> "0," "1," and "2" here do not denote states; rather, they denote types. In particular, "0" is the type with zero values, "1" is the type with one value, "2" is the type with two values, etc.should IMO be included in OP's post in some form or another. Also an explanation for void would be helpful, as in do not confuse 'void' in Haskell with e.g. 'void' in C or 'None' in python, as in those types are rather corresponding to 'unit' in Haskell. Reading the article reminded me of reading about Lisp macros (which I have been doing a bit of recently]. The shift from reasoning about values to reasoning about types feels a Ltd like the shift from reasoning about runtime to reasoning about compile time.The interesting take away for me is that smaller data types are going to make it easier to reason about the state space. In turn this reminded me of Perlis:Epigram 34. The string is a stark data structure and everywhere it is passed there is much duplication of process. It is a perfect vehicle for hiding information.In that the state space is so massive. (Formal) differentiation also makes sense for grammars, and in particular gives rise to an interesting way to match regular expressions. The paper introducing this concept is  but was long forgotten. Recently the use of derivations to implement regular expressions has become popular again, see e.g. . J. A. Brzozowski's, "Derivatives of Regular Expressions", https://dl.acm.org/citation.cfm?id=321249 >Recently the use of derivations to implement regular expressions has become popular againAnd they're not limited to regular languages; with the right computational tricks (lazy evaluation, coinduction, whatever you want to call it), it can be used with context-free languages in general: http://matt.might.net/papers/might2011derivatives.pdf Search: