
The naked truth about writing a programming language (2014) - pcr910303
https://www.digitalmars.com/articles/b90.html
======
Animats
That's a good set of questions for 2014. Questions that have become important
more recently include:

\- Imperative? Functional? Some mixture of both? Mixtures of the two tend to
have syntax problems.

\- Concurrency primitives. The cool kids want "async" now. Mostly to handle a
huge number of slow web clients from one server process. Alternatively, there
are "green threads", which Go calls "goroutines". All this stuff implies some
level of CPU dispatching in compiled code.

\- Concurrency locking. Really hard problem. There's the Javascript run to
completion approach, which reappears in other languages as "async". Ownership
based locking, as in Rust, is promising, but may lock too much. And what about
atomic operations and lockless operations? Source of hard to find bugs. Get
this right.

\- Ownership. Rust got serious about this, with the borrow checker. C++ now
tries to do "move semantics", with modest success, but has trouble checking at
compile time for ownership errors. Any new language that isn't garbage
collected has to address this. Historically, language design ignored this
problem, but that's in the past.

\- Metaprogramming. Templates. Generics. Capable of creating an awful mess of
unreadable code and confusing error messages. From LISP macros forward, a
source of problems. Needs really good design, and there aren't many good
examples to follow.

\- Integration with other languages. Protocol buffers? SQL? HTML? Should the
language know about those?

\- GPUs. Biggest compute engines we have. How do we program them?

~~~
zzo38computer
I would say possibly to use a different programming language for GPU than CPU.
I like the idea of Checkout (see [0]) for GPU programming, although
unfortunately it is not implemented and the preprocessor is not yet invented.
(The trigonometric functions are also missing. They should probably have at
least sine, cosine, and arctangent, since these functions seem like useful to
me when doing graphics.)

[0] [http://esolangs.org/wiki/Checkout](http://esolangs.org/wiki/Checkout)

~~~
dnautics
Julia has done a fair job of mapping high level mathematical primitives to the
GPU (though I disagree with their approach)

------
billconan
I want to implement a toy programming language, but I have questions regarding
the following in that article:

> Context free grammars. What this really means is the code should be
> parseable without having to look things up in a symbol table. C++ is
> famously not a context free grammar. A context free grammar, besides making
> things a lot simpler, means that IDEs can do syntax highlighting without
> integrating in most of a compiler front end, i.e. third party tools become
> much more likely to exist.

Many complex languages, like c++, rust are not context free. We therefore need
semantic analysis.

What programming language features will be compromised if I stick to context
free?

Will the end result be as expressive as c++/rust?

Any programming language that is massively adopted is context free?

BTW, the toy language I want to build will be something similar to javascript.

~~~
tom_mellior
This point in the article is pretty weird (the rest is good). I don't think
IDEs do a lot of parsing to do syntax highlighting, isn't it all just regex
matching to identify the types of tokens? I'd be interested in real-world
examples of IDEs doing something more complex to achieve syntax highlighting.
And conversely, examples of imperfect syntax highlighting of C++ due to the
undecidability of its input language. I mean, yes, an actual parser for C++
must be able to execute template computations, and that's a pain, but why
would that be relevant to syntax highlighting? It isn't.

Now, there _are_ valid reasons for running a proper language frontend from the
IDE: Error reporting of all kinds, not just syntax but also type errors and
whatever else the compiler likes to complain about. But there the IDE should
_not_ try to replicate the parser. Parsing only a context-free surface
language will not catch type errors, exactly because enforcing a static type
system requires one to "look things up in a symbol table". So the actual
compiler should provide an IDE-friendly mode where it runs its frontend and
reports errors back, and the IDE should not try to roll its own version of
this.

In other words, the IDE is not relevant either way.

> Any programming language that is massively adopted is context free?

Statically typed languages require semantic analysis and symbol table lookups,
so they are out.

And yet we have context-free grammars for all widely adopted statically typed
programming languages. And we also have additional semantic checks for all of
them. The notion of a "context free programming language" being one that has
context free syntax and no semantic checks at all is not a useful notion.
Don't worry about it while designing your language.

~~~
geophile
I'm pretty sure that the JetBrains products (Intellij, PyCharm, etc.) have a
pretty deep understanding of the languages they handle. Java is strongly
typed, and Intellij does a perfect job, (as far as I can tell), of
highlighting, and more importantly, refactoring. Hard to see how they could do
that with just regex matching.

By contrast, Python is more dynamic, and so the refactoring in PyCharm is
pretty weak. It usually catches some of what it needs, but I often need to
find the rest. It does appear to be the case that PyCharm is doing flow
analysis to infer types. I.e., even PyCharm is doing more than regex matching.

~~~
tom_mellior
> highlighting, and more importantly, refactoring. Hard to see how they could
> do that with just regex matching.

The former is easy, the latter is impossible with regexes only. Which tells
you that they are very different things. For refactoring you need not only a
proper parse, but also the ability to "look things up in symbol tables". Which
is what I said: There are good reasons to run a proper language frontend from
the IDE. But syntax highlighting isn't one of these reasons. And refactoring
isn't possible with only a simple parse without using symbol tables.

------
zzo38computer
In some programming languages, e.g. TeX, Forth, PostScript, etc you cannot
really syntax highlight the program without executing it. However, syntax
highlighting is a feature that I can do without, and I don't think those
programming languages are bad. In some programming languages, such as C and
Free Hero Mesh, you can read the sequence of tokens despite there are macros,
which you need not parse to find where the tokens are (although I think older
versions of the C preprocessor did not work like this).

They mention tools. I do think valgrind is sometimes helpful; I do use that.
However, Git is not the only version control system; some people prefer
others, such as Mercurial, Fossil, etc. I use Fossil.

They also mention error messages. I like the first one; print one error
message and quit. However, in some cases, it might be possible to easily just
ignore that error and continue after displaying the error message (and then
finally fail at the end). It might also be possible to skip a lot of stuff,
and then continue, e.g. if there is an unknown variable or function then you
might display an error message and then skip the entire statement that
mentions it.

They mention a runtime library. This should be reduced as much as possible, I
think, and if you can make it so that it has functions which will only be
included in the compiled program when they are used, that can be a good idea,
since you can include some functions that many people don't need and don't
waste space and time with them.

Something they did not mention but I think it is very good to have and useful
is metaprogramming and 'pataprogramming.

But I also think that different programming languages can be good for
different purposes, and that some are domain specific, and some are easier to
write than others. This can be done using existing syntax or new syntax, and I
have done both, and so have some others.

Another thing that I can recommend to have is a discussion system with NNTP;
even Digital Mars themself have a NNTP-based discussion system. Perhaps better
will be to have the same messages with NNTP, web forum, and mailing lists;
again, this is what Digital Mars does.

However, I do think that minimizing keystrokes is helpful. Minimizing runtime
dependencies and runtime memory usage is also helpful.

~~~
ken
> In some programming languages, e.g. TeX, Forth, PostScript, etc you cannot
> really syntax highlight the program without executing it.

Technically true. In practice, however, I've run into _way_ more issues with
syntax highlighters in overly-complex fixed languages than I have with
simpler-but-programmable languages.

------
tanin
What a great article. It's something I need to read, as I have been working on
my own programming language for a while now. This creating-programming-
language-thing is so difficult and so much effort. (I'm stuck at designing how
string works right now)

+1 on writing parser by hand. I've done it once with java
([https://github.com/tanin47/javaparser.rs](https://github.com/tanin47/javaparser.rs)),
and I switched my own programming language from a parser generator to a hand-
written one. Once you know how to write a parser by hand, it's a much better
approach (e.g. easier to test, modularizable).

My programming language is a bit strange in a way that I drop the performance
requirement. Performance distorts other aspects. I focus on making the
language featureful and beautiful instead. I hope I'll achieve this dream of
making a usable programming language one day.

------
didibus
After learning s-expression based syntax, I am just baffled why we even bother
with anything else.

When you play around with different Lisps, the syntax is always the same, the
language differences becomes the semantics only.

Other languages put too much emphasis on the syntax in my opinion. And while I
understand the "popularity" appeal. I've almost never seen someone learning
the s-expression syntax and afterwards not liking it.

Basically I think it be worth it to push people to learn the s-expression
syntax just so we can stop wasting our time with syntax afterwards.

Also if I recall, without user macros, I think s-expression syntax is context
free no?

~~~
chrisseaton
> I've almost never seen someone learning the s-expression syntax and
> afterwards not liking it.

I don't particularly like s-expression syntax - I think s-expressions suit the
computer at the expense of the programmer, which I think is backwards. I think
they're verbose and noisy and that obscures the meaning I want to see in the
text as a person. Yes they're easier to parse, but I want to make my life
easier, not the computers. And yes they're convenient for metaprogramming but
I don't want to optimise for the meta case.

A concrete example - I'm very happy working with precedence. I've been doing
it since I started school. My five-year-old can understand precedence. Using
precedence I can reduce ceremony and noise in my code and allows me to more
naturally look at an expression and take in its meaning. But s-expressions
don't like using precedence.

~~~
didibus
Do you mean precedence or infix?

The choice of infix vs prefix vs postfix is semantic. What s-expression syntax
imposes is to have everything balanced between brackets. For example, nothing
prevents an S-expression based language to allow:

(2 + 3)

The only thing is there are no precedence rules, you must have parenthesis
always:

((3 * 2) + 5)

~~~
chrisseaton
Why are you trying to explain s-expressions to me like I’ve never heard of
them when I’m already in a thread debating their pros and cons for language
design and implementation?

~~~
didibus
I'm sorry that you are getting defensive and feeling like I'm attacking your
expertise. But there's all kind of people with different experience levels on
HN, so I prefer asking for clarification when I can instead of assuming things
and talking over each other.

If you truly meant precedence, well, I guess I disagree with you. Even in
math, I prefer explicit parenthesised expressions for readability.

If you meant infix, that's not a concern of s-expression so it's irrelevant to
the syntax question.

I know that personal preference is basically the reason for various syntaxes
off course. But I think similar to how there's a movement that a common
standard code formatting is better than everyone having their own custom
formats, you could argue that having a common syntax which I think
s-expression would make a great common syntax, could also yield an overall
better outcome. And that's my argument here.

Most languages have a: here's an s-expression based version of it. Like Python
has Hy, Lua has Fennel, etc. So you already see a trend in how s-expression
could easily cover all those languages, which is normal since they get closer
to the parse tree anyways.

------
chubot
On regexes, he is generalizing from a small number of examples. Some lexers
are straightforward to write by hand, but others are better done with a code
generator.

edit: I should really say that you should consider using _regular languages_
to lex your programming languages, not (Perl-style) _regexes_. Those are
different things:

[https://swtch.com/~rsc/regexp/](https://swtch.com/~rsc/regexp/)

I used re2c for Oil and it saves around 5K-10K lines of "groveling through
backslashes and braces one a time" (e.g. what other shells do)

[http://www.oilshell.org/blog/2019/12/22.html#appendix-a-
oils...](http://www.oilshell.org/blog/2019/12/22.html#appendix-a-oils-lexer-
uses-two-stages-of-code-generation)

And the parser is faster than bash overall:

[http://www.oilshell.org/blog/2020/01/parser-
benchmarks.html](http://www.oilshell.org/blog/2020/01/parser-benchmarks.html)

------
DonaldFisk
You might be doing this as a learning exercise, in which case it doesn't need
to be particularly innovative, and writing another Lisp or Forth
implementation is fine. In fact, I'd recommend beginning along those lines,
followed by developing several different new languages, probably domain-
specific ones. Your first attempts probably won't be worth keeping. (I haven't
kept the object oriented Prolog I wrote about 25 years ago, but I do regularly
use a small Prolog I recently wrote in Lisp.)

The languages at the roots of all programming languages are Fortran, Lisp,
Cobol, Algol60, APL, Snobol, Forth, Prolog, OPS5, SASL, Smalltalk, Prograph.
(If I've missed any out, or any of the above have important predecessors,
please let me know.) A new language is unlikely to be radically different from
any of those languages or their descendants - of those I listed, the most
recent is from the mid 1980s. Within the descendants, which is your favourite?
Can you make big improvements to it? If so, make them. Otherwise, can you
improve another language so that it becomes your favourite?

One heuristic I've found useful: the right design choices at the start might
be unknown, but when I need make them, they're obvious to me, and often a
particular choice leads to other equally obvious choices. If the choice isn't
obvious, I've probably done something wrong.

My own preference is for pure rather than hybrid languages with simple regular
syntaxes (i.e. one idea expressed well), and very strong static typing.

My favourite language is Common Lisp or rather a subset of it. I have trouble
thinking of ways to make it significantly better. But I also like the idea of
dataflow (a few decades ago I knew some of those working on experimental
dataflow hardware), and graphical programming seems to be the right approach
to dataflow. Prograph was the closest existing language to what I wanted. It's
isn't pure dataflow, it's dynamically typed, it's object oriented (so, unless
it's Smalltalk it's hybrid), and entering programs in it involves too many
mouse gestures and keystrokes. So I'm implementing a new language which
corrects those deficiencies. It's been particularly difficult because I have
to write an IDE as well as the language, and there aren't any textbooks I can
rely on for help.

I'll have succeeded if I use the new language more often than I do Lisp, and
if along the way I get a few more papers (two so far) and some academic
interest, so much the better. I don't expect a wide user base. If that was
important to me, it would have C syntax and I'd try to get corporate backing.
Your criterion for success might differ from mine.

~~~
lioeters
> I have to write an IDE as well as the language

This part caught my attention. I've developed a few small languages over the
years, with somewhat boring syntax in Lisp or Algol family. The one that has
the most longevity (~7 years) and traction (estimated few thousand users) is
an XML-based (!) domain-specific language. Frankly it's verbose and kind of
ugly, but apparently easy to learn due to its regular, minimal syntax - a
little like Lisp, if parentheses were brackets. It's also "declarative", a bit
like SQL, in that the users describe the result, and magic happens internally
to make it so.

More recently, I've been working on a tree-structured editor for users to
visually build programs/templates. Developing it has been insightful - I'm
realizing that this development environment should have been there from the
beginning, as a fundamental part of the language's design and user experience.

All syntactic constructs, keywords, patterns, defined variables, functions,
etc., can be made available to the user, with auto-complete, suggest, lists of
choices. Rather than the user having to remember commands and type them in
text, the environment can let them select, compose, and build. Ideally, it can
ensure that a program never has a syntax error (conversely, that it makes it
impossible to write/build such a program).

Another aspect I've come to appreciate is instant feedback. Any kind of live
preview, or automatic build/test, makes a big improvement in the flow of
programming and thought process. It becomes like shaping clay.

In an article about Smalltalk, I read how its integrated environment is
indistinguishable from the language itself. In my daily work too, I spend most
of my time in an editor with language servers for syntax highlight,
autosuggest, linting, and other smart features, as well as integration with
terminal, Git, remote edit with SSH.. Typing in text is still the fastest way
for me to express myself, but the environment is constantly supporting it - it
understands what I'm writing (often more than I'm aware), showing me what I
need to know to build with minimal effort.

What I like is a programmable development environment, especially one which is
built on itself.

Well, this has been a ramble without a point. It's an endlessly fascinating
subject, and perhaps it's only natural that a programmer finds delight in
creating one's own language. There's something pure and philosophical about
it, like working with the very material of thought.

\---

A little list of links related to building interactive software visually:

[https://airtable.com/](https://airtable.com/)

[https://bubble.io](https://bubble.io)

[https://www.memberstack.io/](https://www.memberstack.io/)

[https://webflow.com/](https://webflow.com/)

[https://zapier.com/](https://zapier.com/)

~~~
teleforce
Tree structured-editor with built-in language construct, sound like the
incarnation of Cedar/Tioga desktop and programming environment by Xerox back
in 1980s. This environment replaced three distinct programming environments
that were very popular at the time at Xerox namely Smalltalk, Lisp and Mesa
and according to Xerox's staffs it is a highly productive programming
environment.

Perhaps someone should re-introduce the modern version of Cedar/Tioga similar
to Apple and Microsoft re-introduction of the pervasive mouse/windowing user
interface originally made by Xerox in 1970s.

~~~
lioeters
Ah, what a delightful rabbit hole your comment led me down.. I've got about
twenty tabs open now, articles on the history of (and the relationships
between) programming languages and graphical user interfaces. Will enjoy this
food for thought!

My recent reading has included:

Unix: A History and a Memoir (Brian Kernighan) -
[https://www.cs.princeton.edu/~bwk/memoir.html](https://www.cs.princeton.edu/~bwk/memoir.html)

The Development of the C Language (Dennis Ritchie) - [https://www.bell-
labs.com/usr/dmr/www/chist.html](https://www.bell-
labs.com/usr/dmr/www/chist.html)

JavaScript: The First 20 Years (Brendan Eich) -
[https://zenodo.org/record/3710954/files/jshopl-
preprint-2020...](https://zenodo.org/record/3710954/files/jshopl-
preprint-2020-03-13.pdf) (PDF)

This morning, this little phrase jumped out:

> C was created on a tiny machine as a tool to improve a meager programming
> environment.

What this made me realize is that a language _is_ a user interface. It makes
sense how C evolved with UNIX as an operating system, that a language is
deeply fundamental to a computing environment (and its development).

There's a parallel with JavaScript and Netscape. The language was meant to
provide users with an interface to operate the web browser, create dynamic
documents, and (as it turned out) to let users develop the web environment.

..Which brings me back to the Cedar/Tioga programming environment you
mentioned. While searching around, I happened across another comment of yours
(I see it's a favorite topic! :) and found the good stuff:

Eric Bier Demonstrates Cedar -
[https://www.youtube.com/watch?v=z_dt7NG38V4](https://www.youtube.com/watch?v=z_dt7NG38V4)
(video)

Active Tioga documents: an exploration of two paradigms -
[http://cajun.cs.nott.ac.uk/wiley/journals/epobetan/pdf/volum...](http://cajun.cs.nott.ac.uk/wiley/journals/epobetan/pdf/volume3/issue2/ep030dt.pdf)
(PDF)

The Mesa Programming Environment -
[http://www.digibarn.com/collections/papers/dick-
sweet/xdepap...](http://www.digibarn.com/collections/papers/dick-
sweet/xdepaper.pdf) (PDF)

Fascinating. I saw a phrase, "active document applications", which
interestingly is what the web has become, in its own monstrous way.

It's a good point about "re-introducing" old ideas. I was lucky to have grown
up in the earlier days of computers, to have breathed (a little) of its
different culture and worldview, that I've learned to appreciate its history
and the wealth of ideas - what made possible our current networked computing
environment, and visions yet to be fully realized.

------
WalterBright
Author here. AMA!

~~~
Rochus
What is your opinion about self-hosting (i.e. writing the parser/compiler in
its own language)? Is that really desirable, or even necessary, or just a
gimmik (I know what Wirth says, wonder what you think)?

~~~
mhh__
(Not Walter but) Self-hosting in a compiler should be done wherever possible -
if we ignore that if the compiler self-hosts it becomes it's own test suite
(writing tests is good but it's quite difficult to find weird behavioural bugs
using unittests if the tests only test one thing at a time) - it makes it so
much easier to work on the compiler (For example, the main D compiler can
build itself in a second or two on my machine - if I had to cart around some
other huge toolchain it would be much slower). Another boon is that, if people
who are really good (let's say) Go programmers want to make the Go compiler
better they then don't have to try and write go in C++ if the main compiler is
written in C++.

Also, it can also be a plan to rush to a MVP, rewrite the MVP in your language
and go from there. This has the added benefit of giving you a decent sized
program in your language at no extra cost - aside from testing this should
make you design a better language as you learn what works at what doesn't in
the "real world".

It's not always possible (Please don't write a compiler in Javascript, or even
C if possible - pain and lack of abstraction respectively).

~~~
WalterBright
Yes on all points. I'll add that I invented D because D was the language I
wanted to write code in. The D compiler was originally written in C-with-
Classes, and it became increasingly irritating for me to have to be working in
that language rather than D. Thankfully, the DMD compiler is now 100% in D and
I've been slowly refactoring the code into much more pleasing forms.

------
matheusmoreira
Is it true that optimization follows the 80/20 rule? What are some common
performance issues that new languages and their implementations face? Are
there common optimization techniques that can be applied in order to make the
new language competitive with existing ones?

For example, I know that it's generally better to compile programs into a
linear code structure such as bytecode instead of interpreting a tree
structure directly.

~~~
WalterBright
Optimizing code is a book-length topic just for an introduction. It's also
true that knowing how optimizers work can feed back into improving the
language design.

For example, `const` in C++ doesn't mean the data is immutable - it can change
with any assignment through a pointer. No optimizations assuming immutability
will work. That's why D has an `immutable` qualifier, giving the optimizer to
do optimizations assuming it does not change.

For a famous example, Fortran assumes two arrays never overlap. In C/C++ they
can. This is the source of a persistent gap in performance between Fortran and
C/C++. C attempted to fix it by adding the `restrict` qualification, but this
failed because it is just too arcane and brittle for most users.

In D, I'm working on adding an Ownership/Borrowing system, which will enable
the compiler to figure out the pointer aliasing optimization opportunities.

~~~
matheusmoreira
> That's why D has an `immutable` qualifier, giving the optimizer to do
> optimizations assuming it does not change.

Which optimizations are enabled by immutable data? I can think of constant
folding. Is the data statically allocated in a read-only page?

> For a famous example, Fortran assumes two arrays never overlap. In C/C++
> they can. This is the source of a persistent gap in performance between
> Fortran and C/C++.

What sort of optimizations does this assumption enable? Does it allow the
compiler to freely reorder or parallelize the code?

C assumes that pointers to different types are never equal. This is
incompatible with common systems programming concepts such as type punning.
Even something simple like reinterpreting some data structure as an array of
uint8_t can make the optimizer introduce bugs into the code. Notably, the
Linux kernel is compiled with strict aliasing disabled:

[https://lkml.org/lkml/2003/2/26/158](https://lkml.org/lkml/2003/2/26/158)

[https://lkml.org/lkml/2009/1/12/369](https://lkml.org/lkml/2009/1/12/369)

~~~
MauranKilom
> What sort of optimizations does this assumption enable? Does it allow the
> compiler to freely reorder or parallelize the code?

Consider this code:

    
    
      void foo(int* data, const int& count)
      {
        for (int i = 0; i < count; ++i)
          data[i]++;
      }
    

The compiler has to account for the possibility of `count` being inside of
`data`. It has to reload `count` from memory at every iteration. This also
prevents e.g. auto-vecorization:

Compare [https://godbolt.org/z/sNRKv4](https://godbolt.org/z/sNRKv4) vs
[https://godbolt.org/z/iJq5yn](https://godbolt.org/z/iJq5yn).

Similar story if you manipulate data with multiple arrays in play: Unless the
compiler knows (via `restrict`, type-based alias analysis or just seeing the
definition) that two objects are non-overlapping, it has to correctly handle
the worst case. See also this related discussion:
[https://news.ycombinator.com/item?id=20800076](https://news.ycombinator.com/item?id=20800076)

------
tom-thistime
"Wanna know the secret of making [X] fast? [...] Use a profiler."

Still really valuable advice. We know it, but do we do it?

------
xiphias2
,,grammar should be redundant. You’ve all heard people say that statement
terminating ; are not necessary because the compiler can figure it out. ''

I'll stay with my non-redundant Julia language, thank you very much. It gave
me a lot of joy to programming, and I don't remember having big problems with
error messages. Even if I have, syntactic errors are trivial to find.

The hard parts of efficient programming are memory management (garbage
collection), controlling vectorizing instructions, balance between ease of GPU
programming and efficiency, parallelization, not semicolons at the end of a
line.

All programming languages have to decide on how low level control they give to
the computer resources, and how they abstract them, to make high level
programming possible without sacrificing too much efficiency.

~~~
adimitrov
Whilst you're not wrong about what _you_ value about a programming "language",
I'd say there is an important distinction to be drawn between two things
people tend to conflate: the _language_ and the _runtime_.

Perhaps a counterexample is in order to explain it: Java the language and Java
the runtime (VM, stl) are two different things entirely. Kotlin, for example
makes a lot of different choices in _language design_ (e.g. semicolons) while
reusing the same _runtime_ (GC, threads, etc.)

Of course, you can often not tease these two concerns apart easily and neatly,
but there is merit in giving thought to the ergonomics of the pure _language_
side of things. At the end of the day, a programming language doesn't need an
implementation to be useful (e.g. as a teaching tool) as it's an abstract,
formal concept.

------
somewhereoutth
I am currently developing a language, though more as a research project than
anything practical. Apologies for jumping on this post, but it is a good
opportunity to organize and set out my thoughts (and possibly someone might
find it interesting):

Essentially the language is a pure functional language that takes the untyped
lambda calculus and adds decoration terms as first class citizens in the
calculus. These terms can be used to wrap selected combinators (eg church
numerals). The normal beta reduction rules are extended to handle these
decorations in a useful way.

The decorations allow predicates to be formed that are total over the term
space (eg isChurchNumeral?), which means that function call identifiers can be
dynamically dispatched based on their arguments (using a certain amount of
term assembly from the surface syntax). Adhoc polymorphism can thus be encoded
in a transparent fashion.

This has been sufficient to build a numerical tower up to and including the
Complex numbers, such that the usual operations add, mul, sub, div are defined
within and between Natural, Integer, Rational and Complex numbers. It can also
manipulate strings as if they were lists, whilst retaining their
'stringyness'. All without needing number or string specific features internal
to the runtime (save parsing and formatting on the way in and out).

REPL examples:

    
    
      > (sum [1 -3 2.5 3/2 1+2i])
      > 3+2i
    
      > (reverse "hello")
      > "olleh"
    

Something resembling Haskell's typeclasses naturally arises, with the usual
definitions of Functor, Monad, and Applicative.

It is, needless to say, astonishingly slow.

~~~
tpush
From your description it sounds like these decorators are an alternative to
type annotations?

Mind showing how these decorators look and work? I'm also building a language,
and always interested in seeing novel features like these :).

~~~
somewhereoutth
Possibly an alternative - but operating at the term level as predicates - if
you consider 'type' to be a total function dividing your term space into 'of
the type' and 'not of the type'.

Church numerals:

    
    
      0 = (^Nat (\f \x x))
      1 = (^Nat (\f \x (f x)))
    

The ^Nat is the decoration, and can be recovered with a special predicate
function ?Nat. So:

    
    
      (?Nat (^Nat (\f \x x))) =>Beta (^Bool (\x \y x)) *i.e. true*
    

For this to work the inner part must be reduced first. It does seem to be
consistent with Normal order reduction - the Y combinator etc work as
expected.

~~~
abecedarius
It sounds like the trademarks in
[https://blog.acolyer.org/2016/10/19/protection-in-
programmin...](https://blog.acolyer.org/2016/10/19/protection-in-programming-
languages/) \-- is that right?

~~~
somewhereoutth
Kind of - but more about dynamic dispatch / overloaded function name
resolution than protection per se.

Furthermore, the decorations work at the lowest possible level, and are bound
up with the extended beta reduction (there is also an unwrap operator with
associated reduction rules).

Of course, any computation can be modelled in the standard lambda calculus,
without needing to hack the reduction, but I believe that this extension is in
some sense _non-trivial_ as it would require the LC to first implement the LC
terms and usual beta reduction within itself, before then implementing the
additional terms and redux.

------
pfdietz
A reason for simple parsing is to make the toolchain for the language easier.
And in particular, you don't want a preprocessor. Try to parse your program
into an AST? You can't in C/C++ in general -- you can only parse it for a
particular set of preprocessor defines, which will leave out large chunks that
were deleted during preprocessing.

------
mrlonglong
Zortech C was fabulous, wrote a disk editor in it many moons ago. Kudos!

~~~
WalterBright
Thanks for the kind words! Zortech was indeed a great compiler for its time.

~~~
a9h74j
Years ago IIRC you put out a request for original Zortech C++ disks and
packaging, if anyone had some. I was sorry that I had just disposed of mine
six months earlier. .. That was my one-and-only C++ in the early 90's, and I
am still programming in D. Sorry I couldn't help!

~~~
WalterBright
No problem. By the kind work of many people, I managed to accrete a fairly
complete collection.

~~~
mrlonglong
I still have Zortech C 1.06 on my old P166. I should boot it up to see if it
still lives !

------
teleforce
As you are probably aware that the author of the article is the designer of
the D programming language.

For background reading of the design of D programming language, this is a very
interesting and insightful paper on the history and the origins of D
programming language up to 2018 [1].

[1]
[http://erdani.com/hopl2020-draft.pdf](http://erdani.com/hopl2020-draft.pdf)

------
ChicagoDave
I just started down this road. I’ve mucked with scanning text into object
trees but want learn how to make my own language. The difference is this would
be a very small domain-specific language that controls an in-memory graph data
store targeting interactive stories.

There are great existing tools for this sort of thing, but I have
theories/ideas I want to explore.

I’m glad there are healthy discussions on compilers.

------
bogomipz
>"Context free grammars. What this really means is the code should be
parseable without having to look things up in a symbol table. C++ is famously
not a context free grammar."

Could someone elaborate on why C++ is not a "not a context free grammar"? Also
how does C++ deal with this fact?

~~~
colejohnson66
Here’s a good Medium post on why C++ is undecidable:
[https://medium.com/@mujjingun_23509/full-proof-that-c-
gramma...](https://medium.com/@mujjingun_23509/full-proof-that-c-grammar-is-
undecidable-34e22dd8b664)

~~~
bogomipz
This is a great read! It summed things up quite nicely with some good graphics
as well. Cheers.

------
valera_rozuvan
One should consider creating a VM first - it simplifies life a bit, and makes
the language more portable in the future. The implementation task is split in
two - compiler and VM. First, compiler targets the VM. Second, the VM targets
the CPU. This way porting to different architectures is just a matter of
adding to the VM to interface with the new arch. A compiler and VM definitely
are 2 separate projects with their own set of issues.

Also we should consider the possibility of going the eDSL (embedded domain-
specific language) route. Working at a very high level, re-using existing
constructs of the host language. If one is interested, looking into SICP [1]
will show an example of such a project. Using Scheme to build an interpreter.
This can easily be done in several days. Best to try to reduce doing work that
has been done many times before.

\----------

[1]
[https://mitpress.mit.edu/sites/default/files/sicp/index.html](https://mitpress.mit.edu/sites/default/files/sicp/index.html)

------
nostrademons
I used to be big into programming language design, but eventually decided that
there just wasn't enough headroom to do something innovative enough to warrant
the switching costs. The world doesn't need yet another syntax on top of basic
C-style or Lisp-style semantic constructs.

Lately I've been wondering if I should reconsider, though, and have had a
bunch of ideas that blur the lines of what should be considered in-scope for a
programming language. Things like:

1\. Universal constructs for serial & parallel execution, but with user-
defined semantics for _where_ the code should execute. So for example, at low-
scale you'd execute in a loop on the CPU. At mid-scale it'd run on the GPU. At
high-scale you could transparently distribute across thousands of GPU boxes.

2\. Profiling and execution statistics built into the language. Write your
code, run it on representative data, and instantly get statistics on how many
times each function is called, what the average size of data structures is,
how much memory is used, etc. Profilers do this, but often not at a
granularity that's particularly useful (ever tried to find out how many
Strings are created within a particular dynamic call-tree?), and most new
languages overlook implementing a profiler, leaving it to third-party vendors.
This is a shame, because a standardized programmatic API to a profiler could
be a very effective compiler tool, leading to...

3\. A way of defining transformations between data representations,
benchmarking the _cost_ of those transformations, and then having the compiler
automatically decide whether it's worth bulk-converting data based on typical
profiler results. The motivating example for this is the AoS-SoA
transformation, but they also feature in common system design questions like
"Do you unpack all of the fields of this wire protocol, or lazily construct an
object only when necessary?" or "Should you zip your data before sending it in
an RPC?" or "Do you do this work at indexing or serving time?"

4\. The ability to dump intermediate data from any function call site to disk,
and then restart program execution from there without re-running the whole
program. Handy when iterating on a complex algorithm.

5\. Drill-down data visualizations for large collections. If you've ever tried
to find the logic bug in a 500,000 element array using a standard IDE
debugger, you'll know about this pain point.

6\. Built-in support for machine-learning. In particular, many ML applications
need extensive preprocessing, and you need to preprocess the data in exactly
the same way for training and serving. Wouldn't it be neat if you could write
your traditional algorithms, dump out your signals at the particular call-site
of the classifier, and then have the development environment automatically
send it to Mechanical Turk or Craigslist for labeling? Then have the system
return the labeled data in a format easily suited for your ML framework of
choice, do your data analysis & training, and import the finished model back
into the call site of the language.

7\. GPU-native, with language constructs that lead to efficient GPU code. This
is becoming increasingly important not just for execution speed, but for
developer velocity, because devs in data-intensive areas can waste a
significant amount of time waiting for their program to run.

~~~
tom_mellior
1\. OpenACC kind of goes into this direction, but the specification is a bad
joke, and it's less ambitious than your goal. Still, might be interesting.

4\. This sounds cool. I wonder how difficult it would be to implement a
version of this using fork. The user would write:

    
    
        if (some_condition_I_want_do_debug) {
            suspend_a_copy();
        }
    

where suspend_a_copy would fork and continue. The forked version would log its
PID somewhere and go to sleep until woken up by an external signal, then
continue from there. Possibly you would attach a debugger first.

------
pbiggar
This advice is quite good for certain types of programming language, if you
look at the world the same was as he does. We're implementing Dark from a
completely different worldview, and from that vantage point, a lot of these
aren't exactly wrong, but different.

> Syntax matters

The approach we took with Dark is that there isn't a syntax, per se, in that
there isn't a parser. There's certainly a textual view of Dark and so it's
important that the code looks good (it currently looks only OK, in my
opinion). But as a result, we have other options for minimizing keystrokes
(autocomplete built-in), parsing (again, no parse), and minimizing keywords
(you're allowed have a variant with the same name as a keyword, which isn't
allowed in languages with parsers (well, lexers, but same idea).

> context free grammars

No parser, no need to have a grammar. His point about IDEs is great - we only
support our own IDE (a controversial decision to be sure!)

> Redundancy

This is an esoteric parsing problem, that only applies if you have a parser.
No parser means no syntax errors. We are left with editor errors (how does the
editor have good UX) and run-time errors.

> Implementation

He's right about how hard error messages are, so I feel good about our "not
parsing" approach.

> Compiler speed

Our compilation is instant. The way it's instant is:

\- very small compilation units: you're editing a single function at a time,
and so nothing else needs to be parsed.

\- no parser: The editor directly updates the AST, so you don't have to read
the whole file (there isn't a "file" concept). Even in JS, that means an
update takes a few milliseconds at the most.

\- extremely incremental compilation: making a change in the editor only
changes the exact AST construct that's changing.

> Lowering

This is really about compilation. One thing you can do, which is what we do,
is have an interpreter. Now, interpreters are slow, but we simply have a
different goal with the language, which is to run HTTP requests quickly. We do
run into problems with the limit of the interpreter, but we plan to add a
compiler later to deal with this.

Really what I'm suggesting here is that compiled languages look at having
interpreters in addition to compilers.

> i/o performance

> memory allocation

I think he's approaching this with an implicit goal of "it must be as fast as
possible", which isn't necessarily a goal for all languages.

> You’ve done it, you’ve got a great prototype of the new language. Now what?
> Next comes the hardest part. This is where most new languages fail. You’ll
> be doing what every nascent rock band does — play shopping malls, high
> school dances, dive bars, etc., slowly building up an audience. For
> languages, this means preparing presentations, articles, tutorials, and
> books on the language. Then, going to programmer meetings, conferences,
> companies, anywhere they’ll have you, and show it off. You’ll get used to
> public speaking, and even find you enjoy it (I enjoy it a lot).

Well this is certainly correct!

~~~
tomp
> The approach we took with Dark is that there isn't a syntax, per se, in that
> there isn't a parser.

Can you explain how that works? Unless Dark is a purely visual programming
language (in which case I'd say that there is "syntax", it's just a bit more
abstract, and in any case, it seems that visual languages haven't really
caught up as an idea), I find that hard to believe.

~~~
pbiggar
Sure. You can see it in action at [http://darklang.com/launch/demo-
video](http://darklang.com/launch/demo-video) (also, we've gone through our
waiting list, so we're pretty much adding new folks who sign up immediately,
if you want to try it out).

The main idea is that you when you make a change (let's say, you type a key in
the editor), that change happens directly on the AST (the internal set of
objects that represent the program). So for example, if you're seeing:

    
    
        let x = 5
        x + 2
    
    

we represent that internally as `Let ("x", Int 5, BinOp ("+", Var "x", Int
2))`

In a normal programming environment, you type up text, and the parser converts
it to the internal representation. With Dark, we always keep everything in the
internal representation. That means, if it isn't a "syntactically" valid
program, you can't create it in the first place.

In practice, this means that if you type, for example, `z` in the middle of
`let`, in a traditional language you'd get an error like `lzet is not a valid
token` and in Dark, the `z` just wouldn't appear (nothing would happen). So
it's not possible to get a syntactically invalid program.

The overall idea with Dark is that programs are always live (and also safe).
So to make that be true, we need to ensure that you don't spend a period of
time in a state where we can't understand your program (such as when the
syntax is invalid).

~~~
WalterBright
> That means, if it isn't a "syntactically" valid program, you can't create it
> in the first place.

It's an interesting idea. I don't know if it would work for me, though. I
don't write code as a stepwise refinement of a valid program, I massage the
text gradually into a valid program. This is especially true when refactoring.

~~~
pbiggar
Yeah, I think a bunch of people work like that. We've had the feedback that
some people have had to change their workflow to work well with Dark.

People generally use what we call "trace-driven development", where they make
a HTTP request to the code they're editing (typically, people are making APIs
or backends to JS apps), and then they always have "live values" of what
they're working on (similar to a REPL, or to running unit tests on a loop, but
instantly).

------
jakear
Off topic:

Annoying that this is totally unreadable even at 300% zoom on an iPhone 11
Pro.

I feel like a set of people refuse to learn proper HTML/CSS as some sort of
statement, not realizing their laziness renders their work unavailable to the
visually disabled.

HN behaves similarly poorly.

~~~
saagarjha
What's wrong with Hacker News?

~~~
jakear
it doesn't respect iOS font size, and you can't alter the zoom to make the
text actually bigger. It zooms the entire page rather than just the text, so
the text doesn't re-flow. It's pretty much equivalent to just pinching to
zoom, when it should be adjusting the font size.

~~~
catalogia
Why can firefox on the desktop resize the font on HN and reflow the content,
but iOS cannot?

~~~
jakear
Beats me. But usually the onus is on the application developer to make their
content accessible on all platforms. Especially one as common as iOS Safari

