
Oberon – The Overlooked Jewel [pdf] - nickpsecurity
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.90.7173&rep=rep1&type=pdf
======
vidarh
Compulsory Michael Franz (author of the linked paper) fanboy-ing:

Prof Franz did his PhD under Niklaus Wirth. His thesis was on Semantic
Dictionary Encoding - a method for storing a compact semantic tree
representation of a program in what is effectively a prefix code (e.g. similar
to Huffman coding) that it exploits beautifully to both compress the tree and
to use heuristics (the same on the encoder and decoder) to generate templated
code fragments to allow the code generator on load time to do less work.
Published in '94, it unfortunately was totally overshadowed by Java.

It's still one of my favorite papers, and one day I still want to go down the
route of actually implementing SDE properly (there are implementations for
Oberon - one of the early remarkable feats was that because of the more
compact representation, on MacOberon it was on some hardware faster to load
and generate code from these "slim binaries" than it was to just load a native
binary, because the code generator was fast enough to beat the disk IO cost of
loading a larger binary...)

One of Franz' PhD students was Andreas Gal, who wrote the paper on trace trees
and applied it to JIT'ing Javascript, and worked as CTO at Mozilla.

~~~
chubot
Hm very interesting. But to play devil's advocate, what's wrong with the Java
approach or the v8 approach? Is it just that they are suboptimal in terms of
efficiency, or is there something that SDE can accomplish that they can't?

The Java approach is to do parsing, semantic analysis, and compilation to
bytecode on one side of the network. Then JIT compilation is done on the other
side.

The v8 / "modern JS" approach is to do gzip compression of source code on one
side of the network. Then decompression, lazily parse, and lazily generate
native code on the other side.

Now I recently look at v8's parser, and it's fiendishly complex [1] and
somewhere north of 20K lines of code -- and yes that's just the parser.

So there are downsides. SDE might be more elegant and efficient overall (?)
But when you are dealing with networked code, you can't change both sides of
the wire at once. So we are sort of stuck with suboptimal interfaces and brute
force on both sides.

That said, I do think there is a good argument to be made for shipping plain
text source code and using brute force to make parsing and analysis in the
browser fast. (That is, writing ridiculous amounts of hand-tuned C++ code.)

[1]
[https://www.youtube.com/watch?v=Fg7niTmNNLg&t=2s](https://www.youtube.com/watch?v=Fg7niTmNNLg&t=2s)

~~~
vidarh
There's nothing "wrong" with the Java or V8 approach per se. It's not even a
given that SDE would do better today. Note that most of Franz' own work has
since been on the JVM.

I just think SDE is worth more exploration. It lost out not because it failed
to compete on merits with the JVM but because the JVM basically cut short any
hope of it getting mindshare at the time.

It's possible the JVM or V8 approaches are simply superior designs. But we
don't really know, because such huge resources have gone into making the JVM
or v8 approaches work, and not just SDE but most other approaches from that
time became near instant dead ends.

> The v8 / "modern JS" approach is to do gzip compression of source code on
> one side of the network. Then decompression, lazily parse, and lazily
> generate native code on the other side.

The interesting part of this is that you _could_ treat SDE simply as a way to
short-circuit the parsing and compression step by using it to serialize the
AST and still retain exactly the same code generation. You can lazily generate
native code from it, as long as you apply the heuristics to build the
dictionary on load time (necessary to rebuild the correct tree).

Whether it'd be worth it is an open question, but I wish I had time to explore
it.

~~~
vanderZwan
I'm actually exploring something "spiritually" related: currently, I have a
project where I store the state of an app in the URL, by compressing trees of
JavaScript objects by first putting them through a pre-compression "filter"
step to make the data more compressible (not going into that now, but see
[0]), then apply JSON.stringify, and finally put the resulting string through
lz-string to create an URI-safe string[1]. lz-string is basically a wonderful,
wonderful hack to Lempel-Ziv compress strings to a bitstream, which is then
turned back into a string.

I'm basically looking at ways to cut out the JSON.stringify step. So
basically, first turn a JSON-friendly object into a binary representation,
which is then compressed into a bytestream, and stored in a URI-safe string.

This might lead to something generally usable, and faster/smaller than lz-
string. I know JSON alternatives like flatpack and messagepack exist, but this
would be a much simpler, lighter JS library, single-digit kilobytes in size
_before_ gzipping. It also would probably be backwards compatible all the way
to IE6.

[0] [https://github.com/linnarsson-lab/loom-
viewer/blob/master/cl...](https://github.com/linnarsson-lab/loom-
viewer/blob/master/client/js/state-compressor.js)

[1] [http://pieroxy.net/blog/pages/lz-
string/index.html](http://pieroxy.net/blog/pages/lz-string/index.html)

~~~
vidarh
That's interesting. Yes, I'd encourage looking at how SDE functions there, as
SDE is ultimately using Lempel-Ziv-Welch that operates directly on a tree
instead of a serialized string, and where the heuristics for what to add to
the dictionary can include templated tree subsets.

That part of it is really independent of the code generation aspect, and could
easily be adapted to serialize arbitrary types of tree structures.

Basically what you need is a general mechanism that takes a possibly templated
rule, and adds to the dictionary. Then you need a mechanism that looks at a JS
object and instead of spitting out JSON, matches the current node to one of
the dictionary items and spits out a bit pattern and then recursively spits
out the bit pattern for the template arguments. The decoder can basically be
very simple recursive descent: Identify the dictionary entry, look it up, find
the template rule, recursively call itself to retrieve the template patterns,
and reconstitute the object.

The "magic" then lies in the heuristics, which can be very generic (e.g.
suitable for any JSON) or very specific (e.g. embedding domain knowledge of
what the trees are likely to be shaped like, or at least code to be able to
recognize and encode complex shapes when seen).

------
michrassena
For those who are unfamiliar with the project, here are a few small photos of
the custom-designed workstation based on an AMD bit-slice processor that the
Oberon System ran on originally.

[http://www.ethistory.ethz.ch/rueckblicke/departemente/dinfk/...](http://www.ethistory.ethz.ch/rueckblicke/departemente/dinfk/forschung/weitere_seiten/lilith/index_EN/popupfriendly/)

[https://en.wikipedia.org/wiki/Ceres_(workstation)](https://en.wikipedia.org/wiki/Ceres_\(workstation\))

Along with its predecessor Lilith, for running a Modula-2 system.

There was even a Russian clone of the Lilith called Kronos
[https://en.wikipedia.org/wiki/Kronos_(computer)](https://en.wikipedia.org/wiki/Kronos_\(computer\))

Along with Plan 9, I see this project as a starting point for a deep dive into
a world of alternative computing platforms.

(edited for spelling, bit-slice)

~~~
seanmcdirmid
In that vein, the Atari Transputer with OCaam as its language was the ultimate
alternative computing platform.

[https://en.m.wikipedia.org/wiki/Atari_Transputer_Workstation](https://en.m.wikipedia.org/wiki/Atari_Transputer_Workstation)

~~~
michrassena
I knew that sounded familiar. There's a thread here from last year,
[https://news.ycombinator.com/item?id=12995277](https://news.ycombinator.com/item?id=12995277)

Includes some interesting stuff.

------
magoghm
The book "Project Oberon: The Design of an Operating System and Compiler"
[https://www.amazon.com/Project-Oberon-Design-Operating-
Compi...](https://www.amazon.com/Project-Oberon-Design-Operating-
Compiler/dp/0201544288/ref=sr_1_1?ie=UTF8&qid=1366902572&sr=8-1&keywords=project+oberon)
contains a detailed explanation of the design for the Oberon compiler and
operating system. Not only it is well written with clear explanations, but it
also includes the full source code for the compiler and operating system in
less than 550 pages!

I learned some interesting compiler techniques from reading that book.

~~~
lboasso
The 2013 edition of that book (updated with the latest source code) is
available at Wirth's website:
[http://people.inf.ethz.ch/wirth/ProjectOberon/index.html](http://people.inf.ethz.ch/wirth/ProjectOberon/index.html)

------
prestonbriggs
Franz mentions Wirth's metric for compiler quality: Speed of self compilation.

I first heard this idea, in a slightly different form, from Vickie Markstein
(part of IBM's 801 team) in early '88\. She suggested using this idea as an
acid test for optimizations. For example, if the compiler with value numbering
compiled itself faster than the compiler without value numbering, then she'd
say value numbering was worthwhile.

Given the organization of the 801 team, it's possible the idea was due to John
Cocke or one of the other members. Doesn't matter much, I think.

I've never had the opportunity to test optimizations using this approach, but
I think it would be a fun way to approach the question of adding optimizations
to something fast & simple like the Oberon compiler, the Plan 9 compiler, or
the Go compiler.

Indeed, for a _system_, like Oberon or Plan 9, where everything is compiled
with the same compiler, we could go a step further and try adding the
optimization, rebuilding everything, then timing the new compiler. That way
any benefits the optimization brought to handling I/O and such would be
properly credited and we'd (hopefully) see a steady improvement in the
programmer's experience.

Preston

------
lboasso
For those interested, you can find the latest Oberon system by Prof. Wirth
here:
[http://people.inf.ethz.ch/wirth/ProjectOberon/](http://people.inf.ethz.ch/wirth/ProjectOberon/)

You can run Project Oberon with this emulator (pre-built binaries available):
[https://github.com/pdewacht/oberon-risc-
emu](https://github.com/pdewacht/oberon-risc-emu)

~~~
pinewurst
You can run it from your browser here:
[https://schierlm.github.io/OberonEmulator/](https://schierlm.github.io/OberonEmulator/)

~~~
lboasso
If you just want to play with the Oberon-07 language (the one used to
implement the whole system), there are several compilers available for
different platforms. You can find a list here:
[http://oberon07.com/compilers.xhtml](http://oberon07.com/compilers.xhtml)

This includes my own Oberon-07 compiler targeting the JVM:
[https://github.com/lboasso/oberonc](https://github.com/lboasso/oberonc)

~~~
Koshkin
It seems it would be hard to make Oberon the language to play well with
"foreign" environments, such as the Java VM and the library. How does one
implement interfaces, for example?

~~~
lboasso
Yes a custom foreign function interface is needed. In my compiler I did the
bare minimum to bootstrap the compiler. Currently you can create a definition
module that makes the bridge between the Java and Oberon worlds.

For example the Oberon definition below:

    
    
        DEFINITION MathUtil;
          PROCEDURE ln(x : REAL) : REAL;
        END MathUtil.
    

exposes the Java MathUtil.ln() static method:

    
    
        public class MathUtil {
          public static float ln(float x) {
            return (float) java.lang.Math.log(x);
          }
        }
        

A client module can use now the Java implementation of ln:

    
    
        MODULE Client;
          IMPORT MathUtil;
          VAR x: REAL;
    
        BEGIN
          x: = MathUtil.ln(2.0);
        MathUtil.ln()
        END Client;
    

For a complete example see
[https://github.com/lboasso/oberonc/tree/master/examples/fern](https://github.com/lboasso/oberonc/tree/master/examples/fern)

~~~
Koshkin

      > DEFINITION
    

I am not sure, but would a MODULE only containing declarations (and no or
empty body) also work, syntactically?

~~~
lboasso
The DEFINITION is an extension of my compiler. It sets the compiler in a
different mode, accepting only declarations in the source file. You approach
would work, but I prefer a new keyword showing the intent of the source file.
DEFINITION is for declarations only, to allow Java-Oberon interoperability.

------
Koshkin
I had once tried rebuilding an Oberon compiler and the standard library using
itself - all from the sources. I will never forget this - it only took 0.15
seconds.

Having said that, the language goes a bit too far trying to be "as simple as
possible but not too simple", at least to my taste -- in a way that Go seems
to be, only much more so. It also looks verbose - especially compared to C.
So, while I marvel at the language and have a feel for its weird beauty, I
don't think I'd be comfortable using it in my day to day activity - I guess, C
and C++ have spoiled me...

~~~
fiddlerwoaroof
I think if you look at it in context of the entire Oberon operating
environment, it makes a certain amount of sense. While everything is written
in Oberon, it seems to me that the average user would essentially be using
oberon for roughly what shell scripts and AppleScript is used for: I.e. for
writing little commandlets that automate frequently used functionality.

~~~
pjmlp
It was great, Oberon had a kind of REPL.

Basically modules are loaded dynamically and any procedure that obeys to a
specific signature could be called from the UI, via keyboard and mouse like
the ACME editor on Plan 9.

The way they get their input, depends on the signature.

It could be like command line parameters, clipboard, or any selected UI
element in another application.

------
Xeoncross
I really don't like having to type the shift key so often on QUERTY. A
language where uppercase keywords are the norm sounds like it would lead to
RSI.

Then again, we would probably have typescript, coffeescript, etc.. version of
Oberon to solve this.

~~~
hsitz
The Pascal language is not case sensitive, nor, as far as I know, are any of
its variants. From my experience with Delphi/Object Pascal back in the day,
nobody capitalized keywords. You can see an example of typical code with
lowercase keywords here:

[https://github.com/MarcoDelphiBooks/ObjectPascalHandbook/blo...](https://github.com/MarcoDelphiBooks/ObjectPascalHandbook/blob/master/01/HelloVisual/HelloVisualForm.pas)

~~~
fasquoika
From "The Oberon Programming Language"[0]: "Capital and lower-case letters are
considered as being distinct....These reserved words consist exclusively of
capital letters and cannot be used in the role of identifiers". So it seems
that keywords _do_ need to be in all-caps, at least if your compiler is
compatible with Wirth's description (the standard?).

[0]:
[https://people.inf.ethz.ch/wirth/Oberon/Oberon07.Report.pdf](https://people.inf.ethz.ch/wirth/Oberon/Oberon07.Report.pdf)

------
eukgoekoko
The article praises some features adherent to the runtime/standard library:
file system, Hypertext-Based User Interface, but what does it have to do with
the language itself? What about clunky COBOLique keyword set: PROCEDURE,
BEGIN/END? What about FOR loops, not implemented in the original Oberon?
Ironically, LISP dialects predating PASCAL-family are still alive, whereas
Pascal/Modula/Oberon are dead as a dodo.

~~~
lobster_johnson
Many people like the clunky syntax of Wirth's languages, and have been
exceedingly productive in those languages.

There's Borland's ObjectPascal products, of course (from Turbo to Delphi),
which were extremely successful for many years and still survive today. Pascal
is nowhere as popular today as in its heyday, but there are people still using
Delphi as well as Lazarus, a modern open-source Pascal implementation that's
compatible with Delphi.

One of the most popular languages right now, Go, is heavily influenced by
Modula and Oberon. Go is more complex, and opts for a more C-like syntax, but
many of its concepts come directly from those languages. Another nascent
language with a heavy ObjectPascal influence is Nim (which was created by a
former ObjectPascaler).

Ada is another Pascal dialect that is far from dead.

Granted, I personally wouldn't want to use these languages _today_. Go and Nim
are taking the programming language evolution further. But this article isn't
about using Oberon today.

~~~
AnimalMuppet
Ada is a Pascal dialect? How do you figure? (If you mean "somewhat in the
Pascal flavor", I might be able to see it...)

A more C-like syntax, with a more Pascal-like semantics, might actually be a
reasonable sweet spot. (I think the C syntax won for a reason. It's terser,
without being so terse it's unreadable, and therefore gives you higher
bandwidth. Most peoples' complaints about C are the semantics, not the syntax
- though I'm sure at least some people dislike that too...)

~~~
lobster_johnson
While Ada is not _formally_ a Pascal dialect, it arguably has more influence
from Pascal than from the earlier ALGOL (which also influenced Pascal; it has
keywords like PROCEDURE, BEGIN and END) and Simula. A Pascal-family developer
can more easily pick up Ada than, say, a C programmer; the Pascal family
already has TYPE, VAR, ranges, records, etc.

From the 1983 Ada reference manual [1]:

    
    
        Another significant simplification of the design work
        resulted from earlier experience acquired by several
        successful Pascal derivatives developed with similar
        goals. These are the languages Euclid, Lis, Mesa,
        Modula, and Sue. Many of the key ideas and syntactic
        forms developed in these languages have counterparts
        in Ada. Several existing languages such as Algol 68
        and Simula, and also recent research languages such
        as Alphard and Clu, influenced this language in
        several respects, although to a lesser degree than
        did the Pascal family.
    

Ada was widely considered as being an evolution of Pascal at the time (e.g.
[2]).

[1]
[http://archive.adaic.com/standards/83lrm/html/lrm-01-03.html](http://archive.adaic.com/standards/83lrm/html/lrm-01-03.html)

[2] [https://academic.oup.com/comjnl/article-
pdf/25/2/248/1080494...](https://academic.oup.com/comjnl/article-
pdf/25/2/248/1080494/250248.pdf)

------
romaniv
One would think that in 2017 we would be using an OS that takes the best ideas
from Xerox Park/Star, Oberon and Cloud 9. At least on new devices (like
tablets) which don't have to keep backward compatibility.

~~~
Santosh83
How would a large company (or community project) be able to recruit hundreds
of programmers well versed in these languages? The problem is s/w is built on
existing knowledge and in the system programming domain that is C/C++ for the
most part, and that's where you'll get man-power to the extent needed to build
gargantuan OSes like Windows, MacOS, Linux and so on.

And usually good s/w engineering principles were not the predominant factor in
the inception of these systems anyway. They were commercially motivated, and
hobbyist beginnings. And once you already have a big body of code...

~~~
Uehreka
>How would a large company (or community project) be able to recruit hundreds
of programmers well versed in these languages?

I'd take this moment to point out that while there existed _some_ Objective-C
programmers in 2007, approximately 99% of the current Objective-C programmers
in the world (by my wild speculation) learned this bizarre SmallTalky language
just to make iOS apps.

~~~
pjmlp
Yep, before Apple bought NeXT, my only contact with Objective-C was to port a
particle visualization framework from dying a Cube into Windows/Visual C++.

------
mark-r
Anybody know when this was written? The latest citation is from 1998, and it
talks about Oberon being 10 years old. It also talks about JIT for Java being
a "current trend".

> I believe that Wirth’s ultimate impact will be felt even stronger ten years
> from now than it is today, and that the legacy of his accomplished career
> devoted to The Art of Simplicity will be enduring.

Unfortunately I don't see any sign of that. The actual trend seems to be in
the opposite direction.

~~~
prestonbriggs
Published in 2000. See
[https://dl.acm.org/citation.cfm?id=645993&picked=prox&cfid=8...](https://dl.acm.org/citation.cfm?id=645993&picked=prox&cfid=829975124&cftoken=96589339)

------
SwellJoe
About a decade ago, I worked with the guy who developed IBM's Oberon-2 system
for OS/2 in the 90s. He'd always get a sort of wistful look about him when he
talked about it. It made me want to give it a look, but there was never time.

~~~
MaysonL
The BlackBox Component Pascal IDE on the old Mac OS (originally Oberon/F) was
the sweetest IDE I've ever used. The Windows version, still pretty good, has
been open-sourced.

------
useranme
From the paper:

"the one and only system that almost all members of the Institute for Computer
Systems (including the secretary!) used on a daily basis for all their
computing tasks"

Even the fonts were designed in-house.

------
k__
The first and only time I heard about Oberon was when a fellow student told me
quit his first CS degree.

"They taught us stuff like Oberon? Why not Java or C? I want to get a job
later!"

~~~
_ph_
And I thought that the "S" in CS stood for science... a science degree isn't
about training for jobs, it is about learning and understanding the
fundamentals of a field. In that sense the Wirth languages are a great base
for understanding static compiled and typed languages (the other importand
thing would be learning about Lisp...).

Having properly understood the fundamentals, languages like Java and C can be
picked up rather quickly.

~~~
k__
I'm with you there.

Problem is, many tech jobs in Germany require an degree.

------
mwexler
I remember wanting Oberon to be Pascal version 2, and that just wasn't what it
was supposed to be. I wonder if that's really what prevented the massive
growth that other langs have: it had so much "creator halo" that it was hard
to have it's own identity vs. "the lang from the guy that made Pascal".

~~~
mkempe
Historically, Modula was Pascal v2. Modula-2 was arguably different enough to
be v3. And so Oberon was v4.

------
orangeshark
That was an interesting read. Are there any similar projects being worked on
creating a simple designed system that can be used day to day? Maybe there is
still an active group of Oberon users?

------
tzahola
From time to time I find myself daydreaming about an alternative timeline
where instead of C, it was the Wirth-family of languages that dominated.

~~~
lboasso
Thanks to Robert Griesemer (Phd student of Prof. Wirth), the Go programming
language includes some design lessons taken from the Oberon language

~~~
tempodox
I would be interested to know what those design lessons are. Is there a
writeup of that somewhere?

~~~
pjmlp
For example Go method declarations are taken from Oberon-2.

The unsafe package kind of resembles SYSTEM, just that Oberon spec leaves it
open the possibility to have Assembly intrisics in SYSTEM, given its goal as
systems programming language.

------
purplezooey
Ah I thought were talking about the Bell's summer brew.

