Hacker News new | past | comments | ask | show | jobs | submit login
Category Theory, Syntactically (2016) (hedonisticlearning.com)
137 points by colinprince on April 27, 2017 | hide | past | favorite | 68 comments

I don't understand why anyone thinks step 1 in conveying any concept is definitions. This is not how brains work. Definitions are how you formalize your understanding once you've developed it. If this is meant to be an introduction, it's entirely backwards.

Also, if your examples are made only of the thing you're trying to explain, they're not examples.

Introduction to Cake Making

A cake is a baked good containing one or more edible substances, created by a baker. Cakes are made in a kitchen, using appliances, implements and utensils. An appliance is a powered mechanical device used to augment the baker's abilities, different from an implement and a utensil. Implements and utensils are unpowered tools used by hand to produce batter from ingredients. A batter is a mixture produced in stages, consisting of a base starch, proteins and flavoring agents.

For a concrete example of a cake, consider a cake containing one or more fruits. The fruit may be uniformly distributed among the cake's contents or clustered in toppings. This produces the cake's characteristic flavor and color, though color may also result from artificial coloring agents.

Through prolongued exposure to heat, we can transform batter into a cake. Molds may be used for this. Any bowls, spoons or whisks used in the process may contain residual batter suitable for direct consumption. [1]

In some ways I've stopped just when things were about to get good, but now you should have a solid understanding of the fundamentals.

[1] Licking is actually quite natural in this context.

This is Actually Good, and I'd love more stuff written that way!

No, seriously. Just look what this example does - it recursively provides a short description for each of the important concepts in the process of baking cakes. After reading this, I now feel I have a decent map of the topic - both what the things / processes involve and what they don't involve. I've also been introduced to domain terminology that will help me understand other sources.


As for original question, why definitions - I believe it's good to have precise definitions next to casual explanations of the same things, just so that the reader can quickly connect the casually discussed aspect with the precise definition, and answer some of their own questions (which authors can't predict) through thinking.

No it's terrible, and the fact that you don't notice is exactly the problem. See also: monads. If you understand the basic process of what's going on (preparing food) and what it's meant to do (produce a delicious treat), formalizing your understanding in this way feels like concise and clear insight. If you don't yet know anything, it will be meaningless word salad where each 'answer' produces a multitude more questions.

Note e.g. that I never once cited an actual cake, I merely provided an example of a category of cakes. This too is a problem with mathematics, so in love with their abstractions, they no longer notice the lack of the concrete.

This would actually be a very good introduction for a race of alien visitors who'd never heard of cakes, baking, or kitchens.

No it wouldn't for the same reason the article isn't a good introduction -- it relies on concepts that a naive reader doesn't understand. In your example the aliens are unfamiliar with kitchens, but the text mentions kitchens without ​any explanation. Similarly the article mentions and uses "sort" without any explanation. It seemed to be a fundamental concept but without any idea what a sort was I gave up. Every article on category theory does this.

There are 2 sentences between the first use of 'sort' and its explanation, which is that they are symbols with no interpretation. It looks like he defines function symbol before sort, presumably because function symbols are more important than sorts.

This isn't true.

>Sorts and function symbols are just symbols. Something is a sort if it is in the collection of sorts. Nothing else is required.

If you gave up before reading this, then you didn't even read the 4th paragraph.

If they are Vulcans or androids.

More recipe books should be written by mathematicians ;)

Euginia Cheng, a category theorist, has written Cakes, Custard and Category Theory: Easy Recipes for Understanding Complex Maths, and How to Bake Pi: An Edible Exploration of the Mathematics of Mathematics.

This first title disappeared in favor of the second that is "How to bake Pi" I guess pure marketing move

Indeed, like "Burritos for the Hungry Mathematician" http://emorehouse.web.wesleyan.edu/silliness/burrito_monads....

After the proof that one may substitute slivered almonds for chopped crystalized fruit, I have a lemma that says one may substitute slivered almonds for a subset of the chopped crystalized fruit.

So basically, the Guy Steele approach to cooking


Amazing. I couldn't agree more this is obviously a useless way to learn baking at first.

har! please do keep going

Brains can work that way. It is a skill, and this skill even has a name: mathematical maturity.

Basically, after you've been exposed to a lot of mathematics you develop a skill where you can just accept abstract definitions and go with it.

I would expect no less even in an introductory text. Mathematics is meant to be precise and formal mathematics even more so. Prosaic pedagogy might suit a liberal arts introduction but it won't help the reader to convince themselves that the statements and theorems proposed are true and valid.

The syntactic approach is really useful as a programmer. A Logical Approach to Discrete Math[0] teaches programmers how to develop proofs in this style using logic and syntactic substitution. Once shown how easy it is to achieve rigour it is hoped that work-a-day programmers can formulate their own proofs and develop stronger specifications using mathematical tools.

I'm not a formally trained mathematician and am completely self-taught as a programmer but with some background in set and graph theory I found it approachable. It's certainly a good companion text to learning more practical systems like TLA+ or Lean.

[0] https://www.amazon.ca/Logical-Approach-Discrete-Math/dp/0387...

Great that it works for you, but it won't convince anyone of anything if they don't understand what it means or why they should care.

As a writer or a teacher, if you want to reach a broad audience, give some simple, concrete examples, then explain the abstraction that covers them all. Assuming mathematical maturity won't help you reach your audience.

This makes me think about formalised notation. If you have a mathematical formalised notation, you can express things either more expressively, or with less ambiguity (or both) than without it.

One thing that is required by a formal notation is that before you write a statement in your formalised notation, you need to define the terms and symbols and how they interact with one another or the statement makes no sense to the reader.

I would say this is true even when you're trying to use words rather than some type of syntax, because what you are trying to do is paint a very specific picture (or concept) in peoples minds, and you need these definitions there or people wouldn't understand them.

If you're trying to be concise and to constrain the space of possible interpretations of the words you have written, to only contain the concept that you are trying to explain, and not contain other valid interpretations of the words, that may disagree with the concept you are trying to get the reader to explain, the easiest way to do so is probably with up-front definitions of the terms you are about to use.

TL;DR I posit that an up-front definition of terms simplifies explanation of formal notation (or words) which is necessary to constrain the semantic idea-space to only the valid definition trying to be conveyed.

This is explicitly a syntactic approach. It's treating this as a game you play with symbols. I found it very helpful, so maybe some brains work differently from others.

> Definitions are how you formalize your understanding once you've developed it

You probably already "developed it" if you're looking back on even just a few years of full-time coding, typically "multi-paradigm" anyway (imperative, "functional" when you dig out SQL, denotative/descriptive for the common web DSLs html/css).

You never formalized or looked for existing formalizations of your organically acquired & accumulated intuitions and "instincts" (covering numerous little assimilated-by-osmosis laws and relations and "ittt" implications) because no real need ever arose, nothing in the real world prompted you to, and if the passing thought to do so ever occurred to you briefly, you saw no particular promise in going down that rabbit hole. I think that's the bulk of C.T. (just covering logical/mathy sciences/fields from a higher-vantage-point in a blurrier meta way). Once you would do so, you'd find it also shockingly hard to really interrelate and out-formulate and staking-off-boundaries those, figuring out "the laws of the laws" etc.

Admittedly, this my current assessment of "category theory" may be quite incomplete, pending further future insights & compelling persuasion to reassess it from scratch.

((( NB. imho we current crop of developers (and indeed most other "brainy" disciplines I'd wager) in many ways (besides of our syntaxes, schemas, algos, type systems that form the core "laws of nature") are still quite a bit like our stone/ice-age ancestors: in a way they enjoyed quite a grasp of the various facets (related to survival & thrival) of physics, biology, botanics, some chemistry, some "meteorology", geography and much more --- BUT strictly local, not formalized, formulated, no formal core/base language such as maths, and seeing no need for any of it to put fresh mammoth on the table. Quite an enjoyable state of affairs, too --- until you run out of mammoths and need to invent agriculture, thus cities, thus money, thus maths, thus science, not necessarily in that order =) so even though coding seems rigid and formal, a large part in reality is somewhat primal/instinctive/intuition-from-experience and stuff like C.T. is the "meta" attack on this organically-grown primitive (though again, tremendously enjoyable for the coder himself) state of affairs.. )))

> Also, if your examples are made only of the thing you're trying to explain, they're not examples.

Uh, no? Are you really arguing that (e.g.) a book teaching a programming language has no examples unless some of them are in another language? How on earth do you learn your first one?

Strictly speaking all programming language books have the examples shown in two languages: the programming one and the natural language.

If your examples are made only of the thing you're trying to explain, they're not examples.

That's just recursive definition. Some concepts demonstrably can't be exemplified otherwise.

And... I still understand nothing about category theory.

Are there any more comprehensible primers out there or is this just a field of mathematics that I shouldn't expect to understand without years of prior study in other requisite fields? Or am I better off building something in Haskell and learning by doing? I know some discrete math, but all attempts at broaching category theory have been so far totally unsuccessful for me.

My understanding was made better when I started thinking about utilizing category theory in programming, more in a way of "here are some design patterns stolen from abstract algebra", rather than "man! now I should learn this entire branch of mathematics".

The branch of math itself would be mostly useless anyway, because it is mostly concerned about relations between categories, and as programmers, you usually work just within a single category, category of functions and types in your programming languages :-)

But from the practical, design-pattern viewpoint, I would suggest trying purescript :-)

My journey was working through the purescript book [1] up to the Applicative Validation chapter. And then I played around with the flare backend in try-purescript [2]. And then it clicked for me and I was suddenly able to read and comprehend Typeclassopedia [3] :-)

[1] https://leanpub.com/purescript/read [2] http://try.purescript.org/?backend=flare&session=a06d83dd-d0... [3] https://wiki.haskell.org/Typeclassopedia

I'd say there is not much point to Category Theory if we where to restrict ourselves to a single category. For instance Monoids are categories (with a single object) that is not the category of functions. Another set are the Kleisli Categories[1] that are equivalent to monads.

You mention Applicative Functors, which form another (different) category where arrows from a to b are on the form `f (a -> b)` and composition and identity laws of these arrows are equivalent to the applicative laws.

[1] https://en.wikipedia.org/wiki/Kleisli_category

I got my point about single category from a friend of mine that studied category theory and applied it to problems topology.

Way he explained it, they often were searching for relations between categories that didn't at all intersect.

And way I was learning the applications of category theory in i.e. haskel, I viewed it more as learning about algebraic structures in the category of haskel types and functions.

So right now I have reasonable idea about i.e. applicative functors in haskel. But thinking about applicative functors as their own category is one abstraction step above what I am usually used to :-)

A category abstracts the concept of function composition. You need two things: an identity element, and a composition operation. For functions the identity element is the identity function id(x)=x and the composition operation is function composition compose(f,g) = (x -> f(g(x))). The twist is that you can't compose any two functions; their input and output types have to match up. If g is A -> B and f is B -> C you can compose them to a function A -> C, but the output type of g has to match up with the input type of f. Categories generalise this concept from composing functions to composing any thing with a notion of input and output type. In practice that often ends up being function composition anyway, but with restrictions on the function. For instance if you compose two linear functions you get a linear function, so this forms a category. Or continuous functions, or functions that can be differentiated, etcetera. However, quadratic functions do not form a category, because if you compose a quadratic function with another quadratic function you may get a function with higher degree. For instance if f(x) = x^2 then compose(f,f) is x^4. Polynomials on the other hand do form a category. In general this need not be composition of functions, as long as there's an identity element and a compose operation with a notion of input and output type. Binary relations is the easiest example that I can think of.

So it's more a generalization of Groups right ?

A monoid (weaker than group: doesn't need to have an inverse) is an example of a category, with a single object G (namely the underlying set) and a lot of arrows from G -> G.

If you take the natural numbers N = {0, 1, ...} as a set and + as a binary operator forming a monoid (N, +, 0), then the arrows would correspond to (. + 0), (. + 1), (. + 2) etc. Here (. + 0) can be seen as the identity arrow, and composition of arrows (. + a) and (. + b) yields a new arrow (. + (a+b)).

Yes, roughly. You can think about a category like a group which carries around "type parameters" to say which "elements" you can and can't multiply. Also, "elements" may not be invertible.

For example, for each N, there is a group of invertible N by N matrices. If we want to allow non-square matrices too, we can bundle them up as a category:

- The objects are the natural numbers

- A map from N to M is an M row by N column matrix

- We compose A : N -> M and B : M -> K by multiplying them as matrices to get BA : N -> K. The objects (i.e., the dimensions) are type checking which things we can and can't multiply

There are two differences:

1) Groups have an inverse operation, categories don't.

2) You can compose any two elements in a group, whereas in a category the input and output types have to match up.

The category-like generalization of the group is the groupoid: all arrows are isomorphisms.

I can recomend the video series of Bartosz Milewski (https://www.youtube.com/watch?v=I8LbkfSSR58&list=PLbgaMIhjbm...) I'm at the 4th video and it's the first time that i start to become an understanding of the howl thing.

But expect it to take some time. (10 Videos, most are around 1h, i needed to pause several times and think or try something out)

Those videos are good but still I struggle!

What also helped me understand CT a little bit better was the book "Haskell Programming from first Principles" (http://haskellbook.com/)

This also needs a lot of time and effort to comprehend!

Hm, as someone who's finished that book, I don't know that I'd say Haskell Programming from First Principles really teaches much category theory -- it teaches specific applications of a few things as it exists in Haskell.

I'm currently working through Conceptual Mathematics (an intro category theory book), and while the Haskell book has some overlap, it's been extremely minor. The main thing I can think of at the moment is determining the number of arbitrary maps that exist between sets. There is some overlap of things like monoids and functors, but even those have a different, more nuanced treatment in category theory (for example, the relatively simple fact that endomaps are functorial interpretations of the sum monoid was new to me, coming from that haskell book).

I'd recommend learning a couple of the fields that category theory abstracts over first. It's cool as a generalisation over topology and set theory, for example (and shows you how some theorems from both are "really" the same theorem), but that only works if you know topology and set theory.

I think this is the fundamental roadblock to teaching category theory to people who don't have a relatively diverse math background. If you have one, the early exposition of category theory is a constant stream of formalizing connections that already had a nebulous version in your head (e.g. products, coproducts, kernels, etc.). If not, you're robbed of any kind of a-ha! moment, and the motivation is unclear at best.

The single paper I have understood about this subject is http://www.sciencedirect.com/science/article/pii/01676423879... . It is the foundation of the Caml Light language that latter gave birth to the famous ocaml. This paper is very dense, but everything is explained.

For anyone wondering how this is useful outside of programming and/or looking for an easier introduction, check out Category Theory for the Sciences [1]. There are free digital versions on Github, too [2]. Here's the reasoning given by the author:

"Information is inherently dynamic; the same ideas can be organized and reorganized in countless ways, and the ability to translate between such organizational structures is becoming increasingly important in the sciences. Category theory offers a unifying framework for information modeling that can facilitate the translation of knowledge between disciplines."

[1] https://mitpress.mit.edu/books/category-theory-sciences

[2] https://github.com/mmai/Category-Theory-for-the-Sciences

If you have a math-background, category theory is easy to learn, as it's merely a trivial (in retrospect, but not in prospect) generalisation of existing mathematics. It's basically the abstract theory of binary associative 'thingies' and 'functions' that preserve these 'thingies'. Most mathematicians pick it up by osmosis without really trying. That's because they understand so well what CT abstracts from.

Experience has shown that without a maths background learning CT is extremely difficult. Almost everybody basically fails.

As somebody who taught himself CT without a math background, I suggest to expect between 100 and 300 hours of serious, intensive study before CT begins to make sense.

My recommendation is to bite the bullet and study one of the many good textbooks, and solve every exercise, and rote-learn the definitions until they become second nature. You will know you've understood CT when see why Peter Freyd's quip "[t]he purpose of [category theory] is to show that which is trivial is trivially trivial" is apt. The key concepts to learn are: category, functor, natural transformation, adjunction, universal property, limit and dualisation. Once you understand those, an extremely large number of prima facie separate mathematical concepts can be seen as instances of the same few trivial ideas. But they must be rephrased in this seemingly alien and unnatural language called CT. My favourite example is Lawvere's unification of most known paradoxa (e.g. Cantor, Russell, Goedel, Tarski) as a basically a trivial fixpoint [1]. Note that after Lawvere had realised what most paradoxa have in common using a categorical re-formulation, it then became trivial to get the same insight without CT [2].

Let me illustrate the power of looking at a phenomenon using a new language with the example of computation itself: we have two dominant abstractions of computation:

- Turing machines

- Lambda calculus

Simplifying only a tiny bit: Turing machines gave us the theory of computational complexity but has not been helpful at all in the development of programming languages in general, and typing systems in particular; in contrast, lambda-calculus has been the foundation of most work on programming languages in general and practically all work on types. Yet lambda-calculus has been mostly silent on complexity theory (although [3] is the beginning of a rapprochement).

Finally, let me state my belief that the average programmer has currently no need to understand CT. The main categorical concept that has filtered down to conventional programming are monads, and monads can be explained and understood completely without reference to CT.

[1] F. W. Lawvere, Diagonal arguments and cartesian closed categories.

[2] N. S. Yanofsky, A Universal Approach to Self-Referential Paradoxes, Incompleteness and Fixed Points.

[3] https://arxiv.org/abs/1601.01233

Can you explain the fixed point idea more in a beginner way? I'm curious to understand the result.

A function has a fixed point when for some value of x, x = f(x). This is interesting because it's some change, that doesn't change the result.

in the lingua franca of javascript.

  var timesThree = (x => x * 3)
  //for this function 0 is a fixed point...
  var a = 0
  console.log(a == timesThree(a))
  //7 is not a fixed point
  var b = 7
  console.log(b == timesThree(b))

Thanks for the response. My question is how does fixed point relate to the incomplete theorems and decidability?

The key to e.g. Goedel's results is the Fixed Point Lemma: if ψ is a formula with v free then there is a sentence φ such that, provably, φ <=> ψ(<φ>) [where <..> is the numerical code of .. ].

The proof, if you are interested, is not difficult. Let sub be the function that describes, via codes, substituting the (numeral n_ of the number) n for a free variable v of a formula χ : sub(<χv>, n_) = <χn_>. Then the PROOF is: consider ψ(sub(v,v)), call it θv, let m be <θv> and let φ be θm_. Then, provably, φ <=> ψ(sub(m_,m_)) <=> ψ(sub(<θv>,m_)) <=> ψ(<θm_>) <=> ψ(<φ>). Ta-da!

Goedel used this to get the "formula that says I am not provable", φ <=> not Prov(<φ>).

    explain the fixed point idea
The Yanofsky paper explains is in quite gentle a manner.

The key insight behind Lawvere's abstract framework to paradoxa is the the general existence of certain fixpoints.

Definition. We say that a set B has the fixpoint property if any function f:B→B has a fixpoint (i.e. f b = b for some b in B).

With this convenient definition, we are now ready to state and prove Lawvere's ridiculously simple and yet great theorem.

Theorem (Lawvere). If e:A→(A→B) is a surjective function, then B has the fixed point property.

The proof is quite easy. Let e:A→(A→B) be surjective. We have to show that B has the fixpoint property. That means for every f:B→B there is b∈B such that f(b)=b. Choose a function f:B→B and define the function g:A→B by setting

   a ↦ f (e a a)
As e is surjective, there must be a0∈A such that

   e a0 = g.
But then immediately

   f (g a0) = f (e a0 a0) = g a0
Hence g a0 is a fixpoint of f's.

Now many/most paradoxa are special cases of Lawvere's theorem. But for each paradox, the specific functions involved are a bit different. Let's look at an example.

Theorem (Cantor). There is no surjection e:A→Pow(A).

To see why this is true, note that Pow(A) is isomorphic to A→Bool. But there is a function on Bool that has no fixpoints, for example negation ¬:Bool→Bool, contradicting Lawvere's theorem.

What is the intuition behind Lawvere's theorem? At first worrying about A→(A→Bool) is a bit surprising. What does that have to do with paradoxa and self-reference?

Well, what does it mean that A can speak about itself? To approach an answer, we could maybe first ask a simpler question: what does it mean to speak about A? How about this for an answer: to speak about A means to say something about A's elements. What does it mean to say something about A's elements? Maybe stating whether any given element a∈A has a property of interest? But what's a property? Easy: a property of A's elements is a function

But we don't want just a fixed property, we want arbitrary properties. To do so, we have to consider the function space

And how can we turn this into self-reference? What if each element a in A corresponded to a property over A? In other words, (with a lot of handwaving) self-reference means the existence of a surjective function

The next step is to wonder: why Bool? Why not any old set? Note that Cantor's theorem continues to hold if we replace Bool with a larger set, but does not hold, if B in A→(A→B) has cardinality 1. What Bool and larger sets have in common is that we can rearrange them, i.e. there is a permuation that doesn't have a fixpoint.

Fascinating! A couple of questions:

Are there any non-singleton sets with the fixpoint property?

> a ↦ f (e a a)

Don't you mean

   a ↦ e a a
? Because later you expand (g a0) to (e a0 a0).

No, g is given by a ↦ f (e a a). The expansion you refer to works because e a0 = g.

Honestly, Saunders MacLane's book, "Categories for the Working Mathematician", was very useful for this non-mathematician. It was surprisingly accessible if I started at the introduction and did not move forward until I grokked the content.

I won't pretend I absorbed all (or even most) of it, but I at least understand the "Standard Haskell Joke" now (which is a funny misquote from his book)[1][2]. I also understand much more clearly why Haskell is so associated with Category Theory.

[1]: "A monad is just a monoid in the category of endofunctors, what's the problem?"

[2]: What MacLane actually wrote is:

All told, a monad in X is just a monoid in the category of endofunctors of X, with product × replaced by composition of endofunctors and unit set by the identity endofunctor.

nLab has been one of the best sources I've come across. My aim is to express and work with philosophical logic in a more formal way, and nLab is a great source for that as well. Milewski's YouTube lectures gets mentioned a lot, but I couldn't make much sense of them before watching Tom LaGatta's more informal and conversational presentation https://youtu.be/o6L6XeNdd_k Good luck and keep at it.

Learning Category theory should be as easy as A -> B -> C.

A is a category

A -> B is an arrow, often called a morphism, but we're not going to be all highfalutin about this.

Think of A -> B like a function which takes type A, and returns type B. After all, types can be seen as a specific kind of category, and functions can be seen as a specific kind of arrow. It can be helpful to pretend say A is an int, and B here is a string, maybe C is a float but really they can be any type as long as they are not the same.

The rest of it is just stuff you already know. To qualify as a category it must be associative aka (A -> B) -> C == A -> (B -> C) and it must be possible to make arrows that don't change the category such as A -> A, B -> B and C -> C. You can see how if f = A -> B; and g = B -> C we can compose f, and g to make an arrow from A -> C.

Any time you have things (categories), and verbs (arrows) which meet associativity, and identity (which is usually the verb of doing nothing) you can do category theory to it.


A is an object of the category A -> B -> C.

Types can be seen as a specific kind of category object.

Any time you have things (objects) and verbs (arrows) which meet associativity, and identity (which is usually the verb of doing nothing) you can do category theory to it.

I think you mean that A is an object of your category...

Unless the category is, e.g., the category of small categories.

Yeah I thought I could get away with omitting a term but it's too specific and really fails to catch how broad categories are.

They're still objects in that category...

yeah that's correct, otherwise my arrows would be better described as functors, whoobs I'll go fix that.

A simple (but potentially dangerous) question - is there any practical application of category theory?

(I have PhD in quantum physics theory... yet, each time I see category theory, it looks to me as abstract nonsense. The only example when it was for me anyhow insightful is the abstract definition of tensor product; but for more mundane stuff, all examples I know can be implemented with functions (or even: matrices :)).)

The prime purpose of CT is unification, showing that many familiar things are instances of the same idea. In that sense CT is shallow, you can always in principle do without CT what you can do with. But without the abstract view afforded by CT, you may not see similarities.

Here are moderately practical things that came from CT.

- CAML, the predecessor to OCaml had a virtual machine based on category theory.

- Monadic effect handling, which is an approach to modularising computation, is based on category theory.

- Remarkably, Bell's theorem from physics and database theory share a surprising amount of structure, see [1].

- Modern libraries like Scala's Scalaz.

[1] S. Abramsky, Relational Databases and Bells Theorem.

My talk at LambdaConf last year. Practical category theory refactoring couched in the language of middle school Algebra. https://www.youtube.com/watch?v=Mhw4FD0pCU8

We learned sums (a+b) and products (a*b) well. The leap is understanding that exponents count function spaces a->b.

Can someone explain what the symbol that normally means "approximately equal" (≈) means in the article?

a≈b means (a,b) is in the equivalence relation ≈

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact