Hacker News new | past | comments | ask | show | jobs | submit login
A Programming Language Database (github.com/breck7)
205 points by breck on Aug 27, 2022 | hide | past | favorite | 62 comments

The CSV file at https://pldb.com/pldb.csv is served with open CORS headers:

    Access-Control-Allow-Origin: *
This means you can open it in web apps like my Datasette Lite application, so you can run SQL queries against it in your browser:


Here's a more interesting query: https://lite.datasette.io/?csv=https://pldb.com/pldb.csv#/da...

This is awesome Simon! I actually had it on my todo list to explore using datasette with PLDB.

Here's my quick query to see the number of things that are missing a year and also a count of the total things in each year: https://lite.datasette.io/?csv=https://pldb.com/pldb.csv#/da...

2 pieces of feedback (which I'm sure you're aware of):

1. super slow startup time (30seconds or so, but then things are instant).

2. autocomplete column names (and syntax highlighting) for the editor?

Yeah the slow startup is because this is using Datasette Lite, which loads an entire Python runtime compiled to WebAssembly into your browser every turn you load the page! Server-side Datasette starts a whole lot quicker than that, eg https://scotrail.datasette.io/scotrail/random_apology

The editor has syntax highlighting in server-side Datasette - I've not got that working in Datasette Lite yet.

Column autocompletion would definitely be cool!

> which loads an entire Python runtime compiled to WebAssembly into your browser every turn you load the page

Oh wow! (the lengths people will go to to avoid learning Javascript ;) )

> Server-side Datasette starts a whole lot quicker than that, eg https://scotrail.datasette.io/scotrail/random_apology

Is there anyone I can pay to take care of setting that up for PLDB?

Drop me an email at swillison at Google's mail provider.

So the first blog entry I randomly looked at:


is about line comments. C gets credit for the // comments, starting in 1972, but that's not really accurate. BCPL -- which begat B which begat C -- had // comments but they were not included in C until C99. C++ (which isn't included in their top 30 languages) brought back // comments from BCPL sometime between 1979 and 1985 (the first public release of cfront). Many C compilers included // comments as an extension prior to C99 but those were inspired by C++.

This brought back a vivid memory. Sometime in the late 1980s there was a C standards committee meeting in the San Jose area, leading up to the C89 standard.

I was acquainted with one of the committee members and called him to ask if // comments would be included in the standard.

He told me there was no plan for this, but if it was something I felt strongly about, I would be welcome to visit the committee meeting and lobby for it.

I thought about doing this, but didn't bother since all the C compilers I was using already supported // comments.

I also wasn't sure how I would lobby for it.

In hindsight, it would have been a lot of fun to just show up wearing a "sandwich board" with a big // printed on front and back.

Memory is a funny thing. I don't remember the committee member's name or exactly when I called him, but I could tell you exactly where I was sitting and which direction I was facing when we talked. I also think it was before I got married, so that would put the call perhaps sometime in 1987.

Any old-timers here happen to know when that meeting was?

And thus, Lua's source code gained a kilobyte worth of */

It's even worse than that!

(For anyone wondering what we're talking about, Lua is written in a very portable subset of C including pre-C99, so no // comments.)

I got curious and downloaded the Lua 5.4.4 source and ran this:

  $ cat * | grep -c ' */'
(If there is a better or more correct way to do this, anyone feel free to correct me, as my grep-fu is muy pobre.)

So this is 6386 * 3 = 19,158 characters wasted in this version of the Lua source.

And that's just one Lua version.

Let's assume there are ten million copies of various versions of the Lua source on developer machines and repos around the world. That would make this something on the order of 200 gigabytes. We'd better multiply that by 5 to account for backups and datacenter redundancy.

So my selfish and lazy decision to go to the beach instead of the committee meeting that day has cost the world a terabyte of storage, maybe much more.

And that's the least of it. I didn't account for the extra typing that /* comment */ takes, and how it's just not fun to type all that. So I have annoyed every developer who's had to do this for compatibility reasons. Including myself.

I humbly apologize, and hang my head in shame.

Beach day is more important than 1TB and some chars that are auto-completed by the IDE.

If there is a better or more correct way to do this, anyone feel free to correct me, as my grep-fu is muy pobre.

  $ cat * | grep -c ' */'
Doesn't that * in your regular expression mean "greedily match as many of the preceding SPACE character as possible including zero, followed by the /" ?

So, what you've done is counted all the slashes in the code. Let's imagine that there are not many divide /, and there are not that many pathname /, so you're double counting the comments and should at least divide your 6386 in half

you want

  $ cat * | grep -c ' \*/'
the backslash is protected by the single quotes, so it will get passed to grep where it will mean "literally *" rather than the regular expression operator * (now I'm thinking my \ is going to get swallowed up by HN so I'd better double them? ed: nope, didn't have to double, but I did have to backslash escape the asterisk after literally)

CPL (the direct ancestor of BCPL) also had a form of `//` comments. Quoting from the manual:

> A comment is introduced by a double bar and continues up to the end of that line; the whole of this text is ignored.

The double bar was written as `||`.

This information comes from the "CPL Working Papers", July 1966.

Very interesting and helpful context. Thanks ksherlock!

I've updated that post a bit with some of your info: https://github.com/breck7/pldb/commit/fb3a46baee0fe06218884e...

isn't c++ at place #5 on https://pldb.com/lists/top500.html ?

d'oh you're absolutely correct. And it's in the blog infographic.

This site reminds me of http://rigaux.org/language-study/ which includes studies of where language concepts originated, lineage of languages, wide ranging syntax comparison (although missing some newer languages) and other interesting things

Does anyone here remember a website, active about ten years ago, that had languages ranked according to statements like, "I feel like I'm not smart enough to program in this language," and, "This language has good support for concurrency?" There was another ranking for different martial arts, IIRC. I don't remember the name of the site and can't find any references to it anywhere.

It sounds like you are referring to a website called "The Hammer Principle" formerly hosted at hammerprinciple.com. The website went down shortly after the owner had collected all the data they wanted.

I collected a bunch of the data in a spreadsheet in March 2014 before the website went down. It covered 51 programming languages and 15 statements about them such as:

* Easy to tell what code does

* Language is expressive

* Code is very readable

* Code written is elegant

* I enjoy using this language

* I choose to use this language

* I would like to write more

* I can be sure code is correct

* This is a high level language

* Code is easy to maintain

* Encourages reusable code

* It is easy to write efficient code

* Code tends to be efficient

* Code tends to be reliable

* This language is good for teaching children

Looks like it was online longer than I thought and had a lot more questions added after March 2014.

Yes! That's the one!

Perhaps this one?[0]

It doesn't match exact description and I also have vague memory of something similar, but not able to find it right now. Drew's post does a reasonably snarky job though :)

0: https://drewdevault.com/2019/09/08/Enough-to-decide.html

I'll have to read that. I seem to remember the page having orange elements like HN's, and the language rankings were displayed in tables. I thought it was a really good idea, although I have no idea when or how the initial survey was conducted. The ranking was pretty comprehensive both in the range of questions and the number of languages (although you'd have to wonder how many respondents had any experience with languages like Io or Oz.)

PostScript is both a general-purpose and domain-specific programming language. Can you make that consideration?

Some programming languages that have semantic indentation allow to turn off that feature, such as Haskell. Other more complex considerations than only has or not has, also is relevant for other features of other programming languages, too.

Some programming languages are domain-specific and might only be used in specific programs because it is a narrow enough use, e.g. ZZT, MegaZeux, and Free Hero Mesh; do any of those count? (ZZT is rather limited, but the other two aren't so limited.)

Other features of programming languages to be considered:

- Character encodings


- Byte arrays

- Generalized algebraic data types

- Sigils

There is also consideration of variants of programming languages.

Good feedback, thanks!

> PostScript is both a general-purpose and domain-specific programming language. Can you make that consideration?

Ah you are right, the current categorization as a textMarkup isn't quite a great fit and the ontology in PLDB can be improved. I'm linking your comment to a related issue that I hope to address this week. https://github.com/breck7/pldb/issues/39

Missed opportunity to encode this as s-expressions or a Prolog database!

I don't know enough Prolog to know why you'd want that (please share), but SWI prolog has a SQL interface to run prolog over SQL data, and the author has released it as CSV and JSON, I know SQLite lets you import CSV data, probably other databases do.

JSON to s-expr's should be convertable with simple search-and-replace, or you could just load the file with a json library with reader macros so it loads the data into the appropriate data structures, I'm fairly sure both common lisp and clojure let you read jsons directly like that.

SWI is full of features and does JSON and SQL on its own very easily.

If you view source, it is encoded in an s-expression equivalent.

The PLDB documents "over 4000 programming languages". I suspect this is well short of the true number.

"... today... 1,700 special programming languages used to 'communicate' in over 700 application areas." -- Computer Software Issues, an American Mathematical Association Prospectus, July 1965.

> Languages with First-Class Functions include JavaScript, Hack

It says there are only 2 languages with first-class functions, not sure how much stock I would put into this data set.

Sorry about that. On this page (https://pldb.com/lists/features.html), you'll only see percentage if we have that information for at least 100 languages (https://pldb.com/lists/features.html). I should add an "under construction" or something for the others, to indicate that certain columns are currently very sparse.

The features section is a big part of my focus the next few weeks and hope to fill out that part of the dataset.

While this is a great resource, I was wondering is there any programming language comparison repo where we can find strengths and weaknesses of each language.

Interesting in "A Language Without Comments" post

> JSON is the only popular language in the PLDB without comments.

> JSON is the only popular language in the PLDB without comments.

Some folks actually tried addressing that limitation (and some others) with JSON5: https://json5.org/

Though sadly it never really caught on outside of a few niche projects. At the same time, however, I've seen some projects out there also use JSONC or another non-standard version that adds comments, since clearly people want them: https://komkom.github.io/jsonc/

Perhaps a more accurate statement will end up being "the only formally defined popular language".

Some redditers pointed out that (traditional) CSVs also don't have a mechanism for comments.


very cool

but I just cannot find the type for audio programming language


Interesting! I thought for sure I had a type for audio languages.

We could add one by simply adding one word to this line here: https://github.com/breck7/pldb/blob/59be1839df9888099537662e...

Or perhaps it might be better to add it via a group/tag/paradigm/industry or something like that (created an issue to track this: https://github.com/breck7/pldb/issues/39)

JSON[1] is a programming language?

[1]: https://pldb.com/languages/json.html

This is not only a list of programming languages, but a list of "technologies". To get only programming languages you have to filter on type=pl [1]. JSON is marked with type=dataNotation [2] - so it is a "data notation" language...

[1] https://pldb.com/lists/languages.html?filter=pl

[2] https://pldb.com/lists/languages.html?filter=dataNotation


PLDB is a comprehensive database of programming languages and their features. The focus is on programming languages, but the database also includes other languages and entities one degree away--from popular high level plain text formats to binary specifications and beyond.

So is this none of one those 'increasingly ill-named' type of things. I guess it's easier to include everything then to try and define what is and isn't a programming language.

Maybe not pedantically, but it's a language relevant to many programmers. Seems useful to include it. They include other syntaxes such as HTML too.

I'd say so - same as HTML or CSS. It may not be Turing complete but it's a language used for programming computers.

These can only be used to describe and format documents, not to write programs, which is why they are markup languages and not programming languages.

Well, modern HTML + CSS is actually Turing-complete.


You can't have any programming language that is more powerful than that. So technically speaking (HTML + CSS) is a "proper programming language". Just a very awkward one where the user has to "operate the crank on the machine" to execute it.

Turing completeness is a separate axis. There are programming languages that are famously not Turing-complete (for guaranteed halting) and there are (actually, surprisingly many) “things” that are Turing complete, but are not programming languages at all. The latter includes Game of Life, many boardgames and PowerPoint.

As most human things, programming languages don’t have a specific definition, but as a point of reference we should probably have a look at whether that “thing” was ever intended to be used as that. Sure, there is a program that can encode loops into html tags, but it is generally used as a markup language for describing the DOM. Not for programming it.

It's turing complete "so long as you consider user interactions to be part of the “execution” of CSS". So no, it cannot be used to write computer programs.

First comment to accepted answer:

> The formal definition (simplest) of Turing Machine is simply a tuple of states set, symbol set, initial state, accepting states set and a transition function. There is no crank in it. By computation we mean somebody needs to apply the transition function faithfully on the tape which is exactly like the clicking in this case. More formally, a model of computation can be viewed as a set of rules somebody needs to follow to do the computation. In that sense, I think CSS is Turing-Complete. – Shuhao Tan Jun 11, 2018 at 21:32

[Emphasis mine]

The definition of a programming language is a language that can be used to write computer programs. A turing machine is not necessarily a computer program, and a programming language isn't necessarily turing complete.

Yes, but how is this relevant to the question whether (HTML + CSS) is a "programming language"?

There is no restriction on the "device" executing a program.¹

Steam devices like in the case of power looms are for example just fine. As the human brain is in case of say abstractly but formally expressed algorithms (think Ada Lovelace).

Also programmable machines that were in fact run by spinning a crank by a human are considered some kind of archaic "computers" (think for example antique mechanical music devices which could "load" a peace form a kind of card or tape).

(HTML + CSS) can be used to formulate executable algorithms to compute anything computable. You can't be more of a programming language than that! Only that the executing device is a little bit awkward as you need in fact "spin a crank" manually.


¹ For example there was this nice Lego Turing machine:


If I would replace the electrical motors by a hand crank would anything fundamentally change?

No, of course not! The machine would be still a universal computer. Only one that needs to be operated "semi-manually"—exactly like a "computer" build in (HTML + CSS).

JSON Schema has an ‘if’ construct. That gives JSON a standardised way to do flow control.

I’m still not sure I would call it a programming language. But I wouldn’t say it’s not one.

JSON can be used to write an abstract syntax tree of a program, but the specific grammar used and the way of converting it into a program would be the programming language, not JSON itself.

It has a restricted grammar and can be parsed, so it is a programming language as much as any on the list. It just doesn't have a canonical runtime.

Any arbitrary string of Unicode characters would be a programming language given that above definition. Which makes of course no sense.

A syntax without predefined runtime semantics is not a programming language!

Without the interpretation semantics no (or every, depending on standpoint) computation can be described by any syntax. The definition of runtime semantics for some syntax is what creates a programming language in the first place.

Pure syntax OTOH can be given arbitrary interpretations. So it's therefore not a programming language on its own. The runtime semantic of some code is what allows us to write and execute programs.

One can't write even the most trivial programs in JSON. No "Hello World" as JSON does not offer any I/O facility. No "fizzbuz" as there are no control structures or any loops in JSON. No "Fibonacci numbers" as there are no function abstractions in JSON…

JSON is just not a programming language.

(Still a valid candidate to include in computer language collection, of course, as JSON has its merits regardless the fact there is no canonical way do write in it anything that resembles computer programs).

I doubt many share your opinion that anything with a restricted grammar that can be parsed is a programming language.

I think your definition is closer to, if not identical to, what Wikipedia calls “Computer Language” (https://en.wikipedia.org/wiki/Computer_language)

For me, and I think for most, the more restricted https://en.wikipedia.org/wiki/Programming_language#Definitio... (“A programming language is a notation for writing programs, which are specifications of a computation or algorithm”) is closer to what makes a formal language a programming language.

A programming language is a notation for writing programs, which JSON isn't, not more than any other storage format or markup language.

CSV too?

I understand whis is pretty much WIP, but still, it's too unorganized to be anything useful. I thought the most interesting to be features page[1], which is nearly empty, and this effort in taxonomy is rather too complicated to be crowd-sourced without supervision. For example, let's take a look at traits[2] and mixins[3]. There are a couple of issues here. First off, why it's 2 different pages? There's no real difference between a trait in PHP, and mixin in… well, no languages except for Racket actually have a syntactic construct called "mixin", but I guess modules in Ruby or Julia are close enough. Scala also has something that's called "traits", and it's also basically the same thing, but with caveats.

On the other hand, D has both "mixins" and "traits", but these are completely different features, and these "traits" have nothing to do with traits in Scala or PHP. So if somebody were to make a comprehensive list of features of D in this DB, should these "traits" appear on the same page as PHP and Scala traits (which are mixins)?

Furthermore, unlike PHP, Scala, Ruby or Julia — Python's "mixins" aren't just mixins with a different name. It's not even clear if it has mixins at all. There's something people call a "mixin" in Python, but these are just classes, so you cannot really say "yes". However, Python has multiple inheritance, which makes "mixins" borderline pointless: classes are (or can be used as) mixins, if you have multiple inheritance! Templates in some languages can be used this way as well.

Which brings us to the next issue — it's not clear, if a language should be marked as having a feature if it comes built-in, explicitly, or if a feature can be implemented in it. Does every language have a semaphore? I cannot remember any where it couldn't be implemented (that would be weird), but I cannot remember any where it's an explicit feature construct either (well, arguably, maybe some SQL-extensions?).

All this isn't to say that the current list is bad. All the questions above can be answered in any way, and it's up to a "researcher" which definition to use in order to actually get a useful taxonomy. It's a non-trivial job.

[1] - https://pldb.com/lists/features.html [2] - https://pldb.com/languages/traits-feature.html [3] - https://pldb.com/languages/mixin-feature.html [4] - https://pldb.com/languages/semaphores-feature.html

This is pretty cool! I've thought about making something similar myself. Nice to know I can contribute to something instead of creating it from scratch.

Top programming languages since 2005:

- Go

- Swift

- Rust

- TypeScript

- Clojure

- Kotlin

- Julia

- PowerShell

- Elixir

- Dark

- F#

- CoffeeScript

- Crystal

- Nim

- Elm


- Reason

- Haxe

- Node.js (!)

- fish

- Zig

- Arduino

- Red

- Idris

- Vala

MySql is a programming language ?!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact