>Do you have any advice for up-and-coming programmers?
> Learn several different programming languages, and actually try to use them before developing a religious affection or distaste for them.
> Try Scheme, try Haskell, try Ada, try Icon, try Ruby, try CAML, try Python, try Prolog. Don’t let yourself fall into a rut of using just one language, thinking that it defines what programming means.
> Try to rise above the syntax and semantics of a single language to think about algorithms and data structures in the abstract. And while you are at it, read articles or books by some of the language design pioneers, like Hoare, Dijkstra, Wirth, Gries, Dahl, Brinch Hansen, Steele, Milner, and Meyer.
I think this point...
Try to rise above the syntax and semantics of a single language to think about algorithms and data structures in the abstract.
...is a stage that not many programmers reach (I certainly haven't). Agree or disagree? Or do you think it's importance is overstated?
Read CS books about algorithms and datastructures that use pseudo-code instead of specific languages.
Lambda calculus, denotional semantics, theory of objects, ...
Then try to map how those concepts map to the daily languages that you use.
A very contrived example, a B-Tree doesn't stop being a B-Tree just because you switched languages. It just might require a different way to map the abstract concept "B-Tree" to the actual hardware, using the specific set of language features available.
As a concrete example of thinking outside a particular language, My shell Oil is 24K lines of code that does most of what bash does at 140K LOC, written not in Python, but:
1. OPy, a subset of Python 2 -- http://www.oilshell.org/blog/2018/03/04.html#faq
2. Zephyr ASDL -- http://www.oilshell.org/cross-ref.html?tag=zephyr-asdl#zephy...
3. a mathematical dialect of regular expressions (that runs under both Python's re engine and re2c's dialect, via translation).
In other words, it's a composition of DSLs, independent of any language.
I started a translator from OPy (typed with MyPy) to C++. It partially works but doesn't translate the whole codebase yet. If it works then I will have fully achieved this "language abstraction" goal.
It's analogous to how TeX was implemented by Knuth. TeX is not implemented in Pascal and compiled with a Pascal compiler. It's implemented in H-Pascal, and abstract subset of Pascal. Then it's translated to C and compiled with a C compiler!
In case it isn't obvious, this makes the style of code VERY different and more abstract than what you see in every other codebase.
This approach has downsides -- namely that the shell is way too slow right now. Hopefully the translation will fix that. Either way, I can definitely claim that the ideas and architecture are completely separate from any programming language. I've documented a lot of these ideas on the blog:
i.e. composing different DSLs to solve a problem. The codebase is separated into parts and each part may be expressed in a different language.
I tend to call it "metaprogramming", which might be vague but is captures the spirit IMO. Metaprogramming how you implement bespoke DSLs. I categorize textual code generation as the most basic form of metaprogramming, i.e. programming where your data is code.
It's one level removed -- rather than talking about the problem, you're talking about the tools/language/construct you're using to solve the problem.
And yes I would say the other downside is that it's easy to go off the deep end :) But I think that certain problems are difficult and you need some leverage to solve them. For example, writing a bash-compatible shell would be extremely repetitive otherwise. It's like a dozen different ways of groveling through backslashes and braces one-by-one.
Someday I would like to define the subset of HTML I actually use... HTML is messy with lots of implementation quirks. And plenty of tools operate on it in ad hoc ways.
It would be nice to have some notion of correctness for those tools, and defining a subset seems like the best way to do that.
For now how about "Chop" Programming? "Choose Half" Oriented Programming. Or "Super" programming for "Subtract Unwanted Programming Elements Rigorously".
One benefit of that is aesthetic -- re2c is a 30K line piece of C code itself. The other benefit is practical -- it would be nice to "hoist" it up to the Oil language level, so shell users can use efficiently compiled regexes.
And I filed this issue to try derivatives awhile ago:
There is a possible performance win, but I'm not sure if it makes sense. If you see anything interesting there I'd love to chat about it! (contact info in profile)
What I mean by "math" is either automata-based methods or derivatives. In contrast, Perl-style backtracking engines aren't math.
To me, the "magic" of "regular languages" is that they give you non-determinism for free. The equivalence between NFAs and DFAs isn't obvious, and it's useful in practice.
I've taken advantage of this free nondeterminism in my huge shell lexer. (See my other reply for details)
I think it's interesting to generalize dRE to general control-flow (not just parsing.)
Are you aware of Abstract State Machines? https://en.wikipedia.org/wiki/Abstract_state_machine
Concretely, the first two links in this post show (old versions of) frontend/lex.py and osh-lex.re2c.h. TODO for me: put up the latest versions, as well as the huge C file with state machines that re2c eventually generates.
When Are Lexer Modes Useful? http://www.oilshell.org/blog/2017/12/17.html
It works like this:
( ) frontend/lex.py (a bunch of Python regexes, has some "metaprogramming") ->
(+) frontend/lex_gen.py ->
( ) osh-lex.re2c.h (re2c input file) ->
(+) re2c ->
( ) osh-lex.h (state machines in C, i.e. DFA as a big switch/goto)
The reason I call this "a mathematical dialect" is because the same regular expressions run under Python's re engine (a backtracking engine) and as native code via re2c, an automata-based compiler.
If you scroll toward the bottom of this doc there's a useful table:
Regex Theory and Practice http://www.oilshell.org/share/05-31-pres.html
One side is "Perl-style regexes", which Python's engine is based on. The other side is "regular languages". Regular expressions were always mathematical, but the name got taken over by programmers to mean something different, so I call them "regular languages" or this "mathematical dialect".
These articles explain the difference,
but they're very long and a lot of people still don't understand the difference.
(which is understandable since it mainly comes up in performance corner cases, and when you want to compile regexes, which most people don't do)
I should write about this, but the lexer is one of the more solid pieces of the project. That is, it's "done" for now, and I need help with all the other parts, so I prioritize writing about those parts!!!
More lexing posts: http://www.oilshell.org/blog/tags.html?tag=lexing#lexing
I think this is very insightful. It's not enough to add (some) value. A language has to add enough value to matter to enough people in enough situations, or it doesn't gain any traction.
Never thought about it that way.
Also couldn't help but smirk when I read the answer to "What do you think will be Clojure’s lasting legacy?":
> I have no idea. It would be nice if Clojure played a role in popularising a functional style of
Actually, brainfuck is pretty straightforward and is basically a very low level language for dealing with turing tape machines using only 8 primitives.
It's definitely not very human readable, but then, neither is actual machine code -- we use assembly mnemonics to be able to read and understand it.
Detailed historic material about these is copious.
It's interesting to see interviews like with Luca Cardilli of Modula-3 and such.
Shots fired at Rich Hickey and Clojure, then, as they are in the PDF.