
Olin – Defining a new operating primitive for event-driven services - ngaut
https://christine.website/blog/olin-1-why-09-1-2018
======
kardianos
I think fuchsia has it correct: everything is an interface (not a file) that
must be injected into the child process.

[https://fuchsia.googlesource.com/zircon/+/HEAD/docs/fidl/ind...](https://fuchsia.googlesource.com/zircon/+/HEAD/docs/fidl/index.md)

But maybe I'm missing something critical here.

~~~
ofrzeta
What's an "interface"? Obviously "file" is also just an abstraction but it has
familiar semantics (read, write etc).

From cursory reading it seems that an interface is just a specification for
(remote) procedure calls. Also reminds me of microkernel IDLs
[https://www.gnu.org/software/hurd/idl.html](https://www.gnu.org/software/hurd/idl.html)

------
Someone
_”Each Dagger module can only handle one data type. This is intentional. This
forces users to make a separate handler for each type of data they want to
handle.”_

Don’t count on that. What you’ll get instead is a handler that defines a data
type containing some enum field that changes what the rest of the data type
means, or, worse, a data type named “script” containing a command and its
arguments. The latter could give you a separate DSL for every module without
any guarantee of consistency between them (just as Unix doesn’t agree about
how to control verbosity, how to specify an output file, etc)

Also, think about how you’ll want to control modules. If you intend to shut
them down, query their health, etc. using a handler, that doesn’t leave room
for a handler that does the real work.

~~~
adrianratnapala
> Also, think about how you’ll want to control modules. If you intend to shut
> them down, query their health, etc.

One way around this (which might be useful even if there are multiple handlers
per module) is to have explicit layers to the onion.

That is, make the health checking stuff etc, belong to an outer layer that is
different from where the application exposes it's API. The layers can still
"grammar" (i.e. message protocol), but they are separate endpoints.

I suspect (but do not know) that this lets the event protocol evolve more
independently of the management framework. Which fits in with the OPs noble
goal of keeping version 0 minimal. (In fact I think she wants to ignore what I
called the "management framework" entirely).

~~~
xena
> That is, make the health checking stuff etc, belong to an outer layer that
> is different from where the application exposes it's API.

Yep. Ideally people would write handlers to handle both the health checking
and the downtime response, but that largely boils down to implementation
details.

------
akavel
As for "everything is a file", IIUC Plan9 explored this area extensively, and
the community of fans is still alive, so I believe good insights from real
world experience could be gathered at this early design stage by talking with
Plan9 people and maybe revieving their manuals. Especially asking them for
feedback.

------
_wmd
I admire experimentation, so below comments should preferably be considered
constructive..

> I want the binary modules any user of Olin would upload today to be still
> working, untouched, in 5 years, assuming its dependencies outside of the
> module still work

There are only 3 program formats that enjoyed anywhere like that kind of
longevity - PE, Linux ELF, and Jar, all with significant money behind them. Of
these, only PE has managed anything like ABI and dependency compatibility
across more than 10 years, and that's because of explicitly obsessive
compatibility concerns baked into the culture of the company behind it.

The nearest best example to PE is Linux Intel ELF, where despite explicit
efforts, and where glibc which nearly-almost-if-you-squint manages long term
ABI stability, and the kernel which nearly-almost-if-you-squint manages long
term compatibility with glibc, you can run a terminal program compiled at the
start of the millennium if you boot with the right kernel command line
options, so long as you also run it with an ancient buggy libc, and that
program has no other system dependencies.

Coming from Python land, I can say categorically that any attempt to maintain
feverishly accurate compatibility for a rich set of interfaces across a
community is almost a pointless effort, the options always degrade to running
ancient buggy deps, or updated deps with broken interfaces.

The Linux kernel barely manages it despite a strict policy, and they break it
occasionally due to security concerns. Downstream community projects follow
the whim of whoever is contributing code on that particular day, and their
motivations are rarely if ever "compatibility".

That is not to say a community effort couldn't manage >10 year compatibility,
but that such concerns must be a foremost priority (rather than a cute goal)
if there is even a dying chance of the attempt succeeding in the long term. It
seems better that the inevitability of incompatibility is honestly accepted
and _designed for_ rather than hoping things will just work out.

\---

> The core idea is that everything is a file, to the point that the file
> descriptor and file handle array are the only real bits of persistent state
> for the process

Again an admirable goal, but this is Stockholm syndrome in its purest form,
and learning completely the wrong lessons from UNIX land. Files are a shit
abstraction, but ignoring those, streams even moreso. Very few elements of a
modern application even at the lowest tiers lend themselves well to being
expressed as a serialized bytestream.

The minimum primitive (and happy to share many references to support this)
should be something like messages and ports. Streams can be expressed as
messages, but the converse is not true without application-specific protocol
scaffolding, of which there are a thousand examples on every modern UNIX
system as a consequence of UNIX picking streams.

\---

> When a dagger process is opened, the following files are open:

Sequentially numbered transparent handles are a mistake, as are predefined
resources, they allow code to make incorrect assumptions that are difficult to
manage over the life of a system. You say that glibc should be built on these
APIs, yet one of the principal restrictions glibc suffers from is its
inability to open long-lived file handles that might conflict with program
assumptions -- assumptions often make implicitly without any conscious thought
on the part of the programmer. Incremental numbering SUCKS.

A few years back Linux even considered adding a brand new fd namespace just so
glibc could safely open files, all because of predictable numbered handles
with assumed semantics that did not exist.

Try to learn from NT and elsewhere here -- the user receives an opaque word,
no predefined words exist, and an interface is provided to grant access to any
standard resources without introducing global state.

\---

> open, close, read, write, sync

But I thought everything is a file.. why do we need these? Why can't we
express these as files too?

Due to incorrect lessons above, in a real world system this list will
inevitably grow to multiple times its original size. What does sync mean on a
socket? How do I eject a DVD drive? How does an application read the system
time? By opening some "time://" and reading a serialized time representation?
Good luck with nanosecond resolution when microseconds are already lost on
scaffolding and maybe parsing some ASCII format timestamp, and so a "one time
only" specialized time() call appears.

10 years later and you have hundreds of entries just like /usr/share/man/man2
on a typical UNIX, probably alongside something like ioctl(). But of course
this time will be different

\---

> the open call (defined later), a file URL is specified instead of a file
> name. This allows for Dagger to natively offer programs using it quick
> access to common services like HTTP, logging or pretty much anything else.

The open call prototype as defined is totally unusable for any kind of network
application. There is no room for setting any kind of meaningful options
outside a single bitfield, and so already if I wanted to make any kind of
custom HTTP request, the supplied interface is useless and I must link a
module with a real client built on top of the tcp:// handler.

There seems to be no long term sense whatsoever in baking a handful of
protocols into the "filesystem" interface, especially not with these APIs.

\---

> I’d like to add the following handlers in the future:

I see they have thought of time:// already! By discarding one of the few
remaining uniformities of the UNIX file API - the namespace it exports. It
already looks like we're coping with a bad base abstraction by shoving
everything we need into magic strings.

What happens if I only read 2 bytes from time://? Do I get pieces of the last
sampled timestamp because some internal buffering now exists, or do i get half
the old timestamp half the new one, or do I continually get the first 2 bytes
of unrelated timestamps, or perhaps we simply return an error because the
buffer wasn't big enough? It's almost like even a timestamp doesn't really fit
the stream-of-bytes file abstraction.

\---

Very happy to see all these new WebAssembly efforts, but this one's present
design fails to learn any lesson from our past suffering.

~~~
xena
> below comments should preferably be considered constructive

Author of the post here. Don't worry, I am taking them as such.

> It seems better that the inevitability of incompatibility is honestly
> accepted and designed for rather than hoping things will just work out.

How would this look? I'm interested by what you mean by this. I'm not sure how
incompatibility with things like reading out sequential memory is going to be
tolerable though.

> Again an admirable goal, but this is Stockholm syndrome in its purest form,
> and learning completely the wrong lessons from UNIX land. Files are a shit
> abstraction, but ignoring those, streams even moreso. Very few elements of a
> modern application even at the lowest tiers lend themselves well to being
> expressed as a serialized bytestream. > > The minimum primitive (and happy
> to share many references to support this) should be something like messages
> and ports. Streams can be expressed as messages, but the converse is not
> true without application-specific protocol scaffolding, of which there are a
> thousand examples on every modern UNIX system as a consequence of UNIX
> picking streams.

Yes, I eventually want to settle on messages. The reason I have picked file
I/O in the meantime is so I can hack something together that will "just work".
From a generic level, I don't know what memory addresses are safe to write to.
Using the read() function makes the program choose for itself.

Streams are a shitty abstraction, but most of the world is currently built on
them, so I'm using streams in Dagger if only to make Dagger have to be thrown
away in the long run. Streams may be bad in the long term, but for now we can
do HTTP via the filesystem calls:
[https://github.com/Xe/olin/blob/master/internal/abi/dagger/t...](https://github.com/Xe/olin/blob/master/internal/abi/dagger/testdata/http.wat)

> file descriptors are a mistake

Yeah, I'm betting that they are gonna be a mistake in the long term. I just
don't know what abstraction to use that won't be yet.

I guess I sort of am using file streams as messages. I'm gonna see what it
would look like to make messages the primitive.

> But I thought everything is a file.. why do we need these? Why can't we
> express these as files too?

File descriptors are adjectives, system calls are verbs. The verb is the
action you want to do, such as read. The adjective is the source, ie the
semantically standard input of the process.

> Due to incorrect lessons above, in a real world system this list will
> inevitably grow to multiple times its original size.

Yep. I'm expecting this design to be wrong in the long term. It really started
as an experiment to see how far minimalism could go.

> The open call prototype as defined is totally unusable for any kind of
> network application. There is no room for setting any kind of meaningful
> options outside a single bitfield, and so already if I wanted to make any
> kind of custom HTTP request, the supplied interface is useless and I must
> link a module with a real client built on top of the tcp:// handler.

[https://AzureDiamond:hunter2@bash.org/244321](https://AzureDiamond:hunter2@bash.org/244321)
is what I usually do with URL's and hard-defined authentication tokens. Works
for me.

> I see they have thought of time:// already! By discarding one of the few
> remaining uniformities of the UNIX file API - the namespace it exports. It
> already looks like we're coping with a bad base abstraction by shoving
> everything we need into magic strings.

Yeah, I'm starting to see that Dagger is WAY TOO MINIMAL. I'm gonna go poke
around POSIX and other minimal OS' to see what syscalls they expose. For what
it's worth, the Go runtime has its own time() call that I expose here:
[https://github.com/Xe/olin/blob/master/internal/abi/wasmgo/a...](https://github.com/Xe/olin/blob/master/internal/abi/wasmgo/abi.go#L97)

> Very happy to see all these new WebAssembly efforts, but this one's present
> design fails to learn any lesson from our past suffering.

The first design always has to be thrown out :D

~~~
_wmd
Picking just the most fun aspect to reply to, because this is too long already
:)

> would this look? I'm interested by what you mean by this

One area UNIX fails at is a single global flat system interface, it makes
evolutionary change difficult without introducing permanent "variant" system
calls (e.g. openat()) that must be supported in tandem with the original
interface, in addition to an automatic penalty for any innovation that might
wish to add more, because it must be supported forever.

To skirt around this, one popular option in UNIX land is instead not to define
any interface: just dump whatever magic is required into some ioctl() and
claim it is device-specific, and so we end up with what an interface that
isn't quite guaranteed and definitely unofficial, but that's okay because it's
driver-specific, but then a second driver comes along and adds a similar
interface, and now we have 2 options for frobbing a particular device, with no
obviously correct choice, and no guarantee frobbing will be supported that way
forever. In other words, we got the worst of both worlds by trying to ignore
impermanence.

(Just a side note, a real-life example of those ioctls might be e.g. how Linux
used to support asking the filesystem driver for a mapping of unused sectors,
or how you set or query the FAT filesystem's DOS filesystem label)

UNIX exports APIs for all of this, and it's mostly frozen in stone forever:

\- Rewinding tape drives

\- Configuring start/stop bits of serial lines

\- Filesystem operations with fixed semantics designed around a local storage
device, with uid/gid/mode security, and with extended attributes and ACLs
bolted on the side

\- Multiple variants of file locking schemes that are all perfectly useless in
multithreaded apps

\- A security model that's baked in at the API level (e.g. chown()).

Just taking the silly 'rewinding tapes' example, planning an API for permanent
compatibility with a problem space that 99.9% of people no longer have, rising
to 100% at some point eventually, it's easy in hindsight to see this interface
should be modularized somehow, and preferably made entirely optional.

How that modularity looks is definitely up for debate.. looking at Windows for
one approach, in COM (and ignoring its baroque useability!) handles export
only one interface (IUnknown) whose sole method allows enumerating the
contracts the handle (object) supports. For example here is DirectShow:
[https://docs.microsoft.com/en-
us/windows/desktop/directshow/...](https://docs.microsoft.com/en-
us/windows/desktop/directshow/alphabetical-list-of-directshow-interfaces) ,
see how a single input can implement IDvdControl, IAMTVTuner, both, or
neither, and if neither are implemented, the handle is still useable via IPin
and related interfaces that actually control how data flows from the input.

Mapping that to UNIX, a handle might have protocols like IUnixSecurity,
ITapeOperations, IBraindeadLocking, ILinuxSpecificLocking, IStream, all or
none etc., where a handle has no guaranteed fixed vocabularity perhaps except
for the most fundamental expected anywhere that handle is found (in the case
of a file, perhaps that is IStream).

By exporting the impermanence down to consumer code, accessing any particular
interface becomes an explicit affair, because you must always request it
upfront. You can still implement a media player program by wiring up some
filter graph using a handle you received from someplace else, but disable the
rewind and eject buttons if querying those interfaces failed -- this is
literally how some stuff works on Windows:

    
    
        handle = ask_user_for_media_input()
        wire_up_graph(handle)
        if(supported(handle, "dvd player")) {
            enable_eject_button();
        }
    

Meanwhile at the top of the stack, outside of absolutely having to support
some core interfaces that can never go away, the cost of adding and retiring
sundry features is reduced, as the reality of permanence that has always
existed has been pushed down the stack, where end users can choose to accept
or ignore it.

\---

An ideal interface might resemble a predefined root handle or sentinel value
with the ability to construct new handles bound to some protocol. Imagine a
language like (but not quite) protocolbuffer services, where those
read/write/open/close/sync calls were methods of a particular named interface.
The IDL would output wrappers to speak it. Probably already you can see room
for splitting 'sync' out into some kind of IBufferedStream optional interface.

Maybe like:

    
    
        #include <os.h>                             // Invoke(handle, char *interface, void *in, void *out) -> bool
                                                    // Close(handle) -> void
        #include <os/interface.h>                   // GetInterface(handle, char *interface) -> void *
                                                    // Require(handle, char *interface) -> void *
        #include <os/services/console.h>            // IF_CONSOLE, Console, Console_write
        #include <os/services/filesystem.h>         // IF_FILESYSTEM, Filesystem, Filesystem_open, Filesystem_error
        #include <os/services/stream.h>             // IF_STREAM, Stream_read, Stream_write
        #include <os/services/buffered_stream.h>    // IF_BUFFERED_STREAM, BufferedStream_sync
    
        int main(void)
        {
            Filesystem fs = Require(NULL, IF_FILESYSTEM);
            Console console = Require(NULL, IF_CONSOLE);
    
            File *fp;
            if((fp = Filesystem_open(fs, "/etc/passwd", O_RDWR)) == NULL) {
                Console_write(console, "open failed: %s", Filesystem_error(fs));
                return 1;
            }
    
            if(File_write(fp, "foo\n") == -1) {
                Console_write(console, "write failed: %s", Filesystem_error(fs));
                return 1;
            }
    
            // Optional. Underlying Invoke() will fail if interface absent.
            BufferedStream_sync((BufferedStream) fp);
    
            return 0;
        }
    
    

A real killer would be other than simplifying that calling style, somehow
unifying it with the calling convention for linked modules, so by replacing a
handle you could direct an OS-level protocol to a library, or perhaps better,
the act of calling an OS service caused an optional library to be loaded and
bound to the handle you were returned, say in the case of an ancient
ITapeDrive interface that's long since stopped being a once-upon-a-time core
feature.

But the main point from the example is probably the notion of an OS with only
two core interfaces - Invoke() and Close()

Another place you could look for inspiration of this sort is Objective C -- a
similar concept is built into its guts, and its dynamic dispatch msgsend()
function is optimized to all hell to match!

~~~
walterbell
Another source of inspiration is the 1985 spec for AmigaOS IFF,
[http://wiki.amigaos.net/wiki/EA_IFF_85_Standard_for_Intercha...](http://wiki.amigaos.net/wiki/EA_IFF_85_Standard_for_Interchange_Format_Files)

 _> Customers should be able to move their own data between independently
developed software products. And they should be able to buy data libraries
usable across many such products. The types of data objects to exchange are
open-ended and include plain and formatted text, raster and structured
graphics, fonts, music, sound effects, musical instrument descriptions, and
animation ... The IFF philosophy: “A little behind-the-scenes conversion when
programs read and write files is far better than N x M explicit conversion
utilities for highly specialized formats”._

------
Serow225
Good luck and keep updating with progress! How does your goals mesh with the
ideas behind Pony (ponylang.org)?

~~~
Serow225
Also, I know you put the fact that the 'everything is a file' approach is an
intended-to-fail spike in the post multiple times, but psychologically people
are going to get stuck on it and bikeshed. Sometimes that's enough to kill
interest in a project, despite how great the underlying intentions are. Humans
¯\\_(ツ)_/¯

------
wemdyjreichert
Looks like it could be a very good replacement (or addition) to node.

------
kuwze
I wonder what the networking performance of WebAssembly code run server-side
is.

------
totallysnowman
Reminds me of Akka on JVM.

------
cascom
Have you run your name by the people at Olin Corp?

~~~
stochastic_monk
As an aside, there’s an Olin Hall at Johns Hopkins.

~~~
pavlov
It’s a fairly common Swedish family name, AFAIK.

~~~
hagbarddenstore
No, it's not. Only 1642 people have Olin as their lastname.

~~~
pavlov
I’m apparently lucky to know several then.

