Hacker News new | past | comments | ask | show | jobs | submit login
A Path to Programming Language Theory (github.com)
174 points by rspivak on Apr 30, 2016 | hide | past | web | favorite | 35 comments

I've been taking a PLT[0] class in school and while the list above is extremely comprehensive and thorough, I'd argue that you do not need to read up all of it to get a good idea of programming languages.

Without even learning all the theory, you can get pretty far ahead (IMHO) in a building your own programming language (if that's the goal). The two resources that I've found invaluable are - the prog lang course[1] by Dan Grossman and PLAI[2] by Shriram. Lastly, there's also a whole bunch of interesting university courses that you can refer to - https://github.com/prakhar1989/awesome-courses#programming-l...

[0] - http://www.cs.columbia.edu/~sedwards/classes/2016/4115-sprin...

[1] - https://www.coursera.org/course/proglang

[2] - http://cs.brown.edu/courses/cs173/2012/book/

There's a difference between learning about programming languages and learning about programming language theory. The class you linked is squarely in the former camp. It's a bit confusing because they use PLT to stand for "Programming Languages and Translators" where I usually see it referring to "Programming Language Theory".

Honestly, "programming language theory" is a bit of a misnomer. It's less a theory for programming languages and more a theory of CS from a language perspective. (Where theoretical CS is a theory of CS from a computational point of view.)

Programming language theory is interesting in and of itself and is fairly distinct from the sorts of things you'd learn in a normal programming languages course or by implementing your own language. I'm not saying either of those is useless—I'm a big fan of doing both!—just that they accomplish fundamentally different things. Your advice is good if you just want to implement a programming language; the linked repository is good if you want to explore PL theory.

PL theory shares a lot with formal logic and the foundations of mathematics. The main areas are things like type theory (a special approach to formal logic) and semantics (akin to proof theory or model theory).

What you'll learn by reading the books linked above is almost completely disjoint from what the class you linked covers.

I can imagine someone being overwhelmed by all of those at first glance (in OP).

thanks , your links look way more reasonable than pure theory and mathematics for new comers who just want to implement simple programming languages and compilers.

I've spent a lot of time on learning these ideas. They are interesting in themselves, but also because they come with a tantalizing promise of usefulness. For me, that promise was never fulfilled. I suspect that's not just bad luck. Creating programming languages is mostly a practical problem, not a research problem. For example, type theory will never tell you that garbage collection is a good idea, or which kind you should use.

My advice to younger folks looking at this field would be to turn back and instead solve some practical problem by creative programming. Git or BitTorrent would be good examples to emulate. Even a smaller project could have more impact than the whole field of PL research combined.

One thing that is sorely missing here is a section on motivation, which is arguably the first step on the journey. Because I can see that learning PLT will be hard. Yet most people are not going to learn PLT just because it is hard. There must be something more that makes it worthwhile. But I struggle to appreciate why any programmer would want to learn PLT.

Why does anyone learn anything? An intellectual pay-off is, to many, just as valuable as the practical (and ultimately, presumably, economic) pay-off that comes from learning a new skill. That might be seen by many others to be "learning for learning's sake", but any research could potentially lead to wider benefits.

That said, for die-hard-pragmatists, it seems reasonable to assume that learning PLT might make one a better programmer.

I was struck by the category theory section, which starts with two fairly easy books (Cheng and Awodey), a book I've never heard of, and then Mac Lane. Mac Lane is a fearsome book, which takes a lot of concerted effort and study for me to read. I have had four years of the Cambridge Maths Tripos, so I am (ahem) probably quite a bit more highly trained in mathematics than most readers of this list, and Mac Lane is just barely on the edge of things I can read. It should come with a health warning for its extreme difficulty and abstraction; there's a reason it's in the "Graduate Texts" series.

I had the same feeling when Ben Pierce's introductory text was tossed casually into the same list as HOTT. I mean... more power to you, but a few lines of text are casually glossing over probably a decade of study for the reader.

I know this might be a stupid question for some, but I have to ask anyway because this will drive me crazy trying to figure this out on my own.

Why is this a 'theory', what makes this different from a documentation different design patterns in programming languages?

It's basically a certain kind of documentation of design patterns in programming languages. PLT emphasizes the use of formal logic and mathematics. It focuses on systems that can be rigorously defined, and theorems that can be proven about them; as opposed to software engineering, which focuses on the issues and patterns that crop up in the practice of developing software, or HCI, which focuses on human factors.

Of course, all of these are overly broad generalizations, and it's not as if there aren't points of intersection between these different ways of looking at programming languages.

But the kind of stuff you'll see in a book like Types and Programming Languages is super-different in approach from the kind of stuff you'd see in, say, the Gang of Four book, both of which are different again from Bret Victor's approach.

> It's basically a certain kind of documentation of design patterns in programming languages.

PLT has become its own thing, and work in the area is not necessarily related to PL. If you scan current papers at POPL, you'll see a trend of applying language-oriented theory to problem X where X has nothing to do with implementing/designing a programming language.

We can apply theory to PL, but many of the factors in a PL design today (or yesterday) are not rooted in theory.

> But the kind of stuff you'll see in a book like Types and Programming Languages is super-different in approach from the kind of stuff you'd see in, say, the Gang of Four book, both of which are different again from Bret Victor's approach.

It is not even just the approach that is different, but the goals and outcomes.

Would you mind describing some of the goals, current problems, and some "must-read" papers on the topic?

I'm in a university so I have access to most journals.

I'm very interested in learning more about this.

PL is vastly huge, so you have to be more specific about "topic." What are you interested in learning? I personally specialize in experience, and could list many (what I consider seminal) papers that are not even on the radar for other PL researchers.

Are there any benefits or contributions made by this method that couldn't be simply done by "doing" it.

IE: Implement a language and use that as an example and improve it.

No, it is a completely egotistical and futile exercise used to produce public funding consuming academics.

This method of doing things offers a sound approach to developing things rather than ad-hoc implementations. It allows properties about the method to be proved with absolute certainty.

In hindsight after you have "done" it, everything seems they were doable.

Monads for example proved to be very useful in functional languages, and as Phil Wadler puts it "it is doubtful that the structuring methods presented here would have been discovered without the insight afforded by category theory. But once discovered they are easily expressed without any reference to things categorical. No knowledge of category theory is required to read these notes." in his now famous paper [0].

But maybe monads are too exotic to provide a good example. Let's talk about higher-order functions instead. Closures are very common to all programming languages these days and they existed long long before we had the first computer thanks to the lambda calculus. It would be nice then we look at lambda calculus to understand use and ramifications of such constructs. What they are good for, if they have limitations, etc.

More importantly, there seems to be a correspondent logic to useful type theories we come up with (or perhaps discover). One famous example is Curry-Howard correspondence [1]. So it might be useful if we understood what has come decades, sometimes centuries before, rather than reinventing the wheel.

[0] http://homepages.inf.ed.ac.uk/wadler/papers/marktoberdorf/ba...

[1] https://en.wikipedia.org/wiki/Curry–Howard_correspondence

> But maybe monads are too exotic to provide a good example.

Nah, it's fine.

But a monad is just a part of an abstraction. Which also means that they have been used without that specific denomination for ages.

The use of Category theory just allows to think about things out-of-context but it is not necessary nor evident that it "always" leads to interesting results for the everyday programmer.

It is like the difference between applied and pure mathematics.

The concept of a monad in haskell was introduced to solve a specific problem though, if I remember well. And it was not uncontroversial. But people who know better could chime in.

Have there been developments of similar or better constructs accidentally/naturally outside of the efforts put forward by PLT?

If I wanted to see what was being said by the field, are there any seminal works to 'ingest'?

For category theory, I don't think so. But if we take tensor algebra, of course, in physics, especially quantum mechanics .The thing is, you could approach the same physical problem at different levels of abstraction and would therefore use either or. For the study of manifolds, people generally tend to favor tensorial analysis. But some results could be found within the framework of category theory.

In terms of seminal work, I admit I am quite ignorant. I would check with SIAM.

This is exactly what it says on the tin.

A path to enlightenment.

I've read a lot on these subjects already, but I am certainly saving this for two reasons:

* Including papers and sections of books in my weekly reading

* I find it quicker to understand if I can grab a handful of explanations and critique them (often rewriting the confusing parts of several explanations once understood)

If you want an easy list, or a minimal list, this isn't it - and I am glad.

This seems very one-sided, skewing towards type-theory and functional programming.

Is that what is taught these days? No wonder people think that's all there is. :-(

And what would be the omitted part of programming language theory?

mpweiher could mean implementation theory, so things like how to write better compiler analysis and optimisations and things like that, but in the academic and research community I think that just isn't thought of as coming under the term 'programming language theory'. It's not thought unimportant, it's just a different topic, more grouped under 'systems'.

Looking at mpweiher's profile it looks like he might be interested in things like metaprogramming and object protocols - again they're seen as either part of the software engineering or systems disciplines rather than PLT.

I disagree that compiler analysis/optimizations would be considered under 'systems', I would expect that kind of work to be published in PLDI, which is definitely the PL community. Systems is more OSDI/NSDI/SOSP.

Metaprogramming also has a rich history in traditional PL venues like POPL and ICFP, such as the Scheme/Racket work on macros, Template Haskell and friends, staged computation work a-la MacroML.

I think POPL is PLT and PLDI is systems.

I say that I'm a systems person to signify that I want to talk about ASTs, IRs and compiler output rather than core language models and formal semantics.

But you could argue about definitions all day.

The relevant observation is that the reason this list focuses on formalisms like type theory and models we can reason about formally like FP is that this is what we mean by PLT.

You mean, systems in the OSDI/SOSP sense? Systems and PL implementation (compilers) communities have always overlapped quite a bit, maybe more so a decade ago than today (e.g. see Sanjay Ghemawat and Jeff Dean as PL people who did systems who are now doing machine learning...).

Apologies for being a bit brief, it was late.

First, I don't have specific things that were left out (well, maybe a few), it was more a sense of there is so much more, especially that I would want to know and feel I should know when creating a computer language. The important bits.

First and foremost, programming languages are for people and for building systems. Except for CTM, that seems to be almost completely missing (and I had initially missed that it was actually on the list, so my bad).

So anyway, if you want to have theory of programming languages, you have to have something about how people think and work and how that affects programming language design. As an example, I found John Pane's HANDS system very illuminating, or maybe the "Smalltalk: Bits of History, Words of Advice", which talks about iterating on language usability, or Henrik Gedenryd's thesis "How Designers Work". Not saying these are it, but just the flavour I'd be looking for.

On the systems front, I think I'd want to see how problems building systems lead to features in programming languages, what the tradeoffs are etc. Again, without this, what's the point of a theory of programming language? And I do hope features in programming languages are trying to address actual problems building systems. Otherwise, it would be like having theories of engineering without talking about real world objects that we are trying to build such as airplanes, bridges or buildings.

As a tiny and obvious example, what are the tradeoffs between more compile-time checking with longer compile times and more inscrutable errors and a more dynamic approach with much faster cycle times but less checking? And yes, there are tradeoffs.

When you look at Hennessy/Patterson for Computer Architecture, you see (well understood?) tradeoffs that matter in terms of application. Yes, more registers are good, but then you have encode them in the ISA, and your path lengths increase and you need to save/restore more on a context switch, etc.

One thing that should also definitely be on there is HOPL. Histories of people actually designing programming languages, trying to solve specific problems and then figuring out what worked, what didn't and why. Without some sort of reflection like this (=evidence), again, what's the point of even beginning to talk about a "theory" of programming language? Unless it's a definition of the word "theory" that I am not aware of.

any book list that starts with "first you should read TAPL and PFPL, then move on to SF and CPDT and read HOTT and of course these three category theory textbooks" is really out of touch IMO. mostly because i don't think there's anyone alive that has simultaneously read all of those works, not even the authors of each individual work!

I'm 27 and while I'm only about halfway through HOTT - it's tough - I've read all the other named books. TAPL was tough too but invaluable to wrap my head around as an undergraduate.

You're seriously underestimating the amount of work that goes into a phd, or into writing a textbook...

I'm a current PhD student in programming languages, I don't think I am...

Let's talk again in a few years. :)

In all seriousness, though, I'm also just speaking from personal experience. That said, if you are working in type theory, chances are that you will end up skimming almost everything on this list...

okay, I've skimmed everything in that list. I haven't read any of them cover to cover though. that's what I really question, mostly because, who has the time for that? there's especially a lot of shared material between PFPL and TAPL, so why are you reading about the material twice?

the problem is that all of those textbooks are great for an introduction (that's why they're textbooks) and so what is usually more productive is:

1. ask your supervisor or a senior student or postdoc for a 10-30 minute explanation / intro to a topic

1a. skim the relevant chapter(s) from one of the textbooks to make sure you got what they said

2. go read research papers

and frequently, you can just skip straight to 2.

There is much more to PL than these topics, of course, and even PL theory is more dominated by a long tail. But then, when everything is niche, what do you cover in a course?

I (and I'm sure others) would appreciate references to the minimal set of things necessary to absorb to easily participate in the PLT discussion.

This is the opposite of that.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact