This could've been the introductory chapter to "Why my new operating system won't be like Unix."
Unix provides a very spartan UI (the shell) with very spartan composition primitives (yay pipes? how many of you are still writing real programs in shell? regret not doing that little script that expanded to 5000 lines and tries to manage multiple processes, in a real programming language?) and a programming environment where you implement all the rest for yourself. Everything is a file, except when it's a process or a function or a thread or a file descriptor or an address or a socket or a ... you get it. We might be building lots of impressive products on top of Unix, but when looking at the code, it's more like we're mostly working around Unix (often trying to wish it away altogether by abstracting it out) because it doesn't solve our problems.
Files don't scale and lack transactions (i.e. are full of races, until you dump them and get a real database), pipes don't scale (and are too limited due to unidirectionality), text streams don't scale, shell scripts don't scale (and are full of races). Unix originally didn't even have a mechanism for waiting on multiple fds and then they came up with the hack that is select(2), which doesn't scale, and we still don't have a good API for asynchronous I/O (read the docs for e.g. libev for fun). Unix just makes you cross too many narrow one-way bridges, or build an island for yourself.
Your comment about scaling is on point. We're building on a system that not just enables but encourages a large amount of duplicated work because the substrate we've settled on is bytearray oriented. So all PLs and programs map their structures to 'bytes'. Could be worse - we could have different kinds of bytes (8-bit, 12-bit, etc.). But also could be better - the different processes could map their internal structures onto shared symbols or trees, and the substrate could provide a higher level of abstraction instead of clumps-of-bytes.
The interesting question to me is what kinds of composition models should a substrate provide? If we pre-share the notion of bytes, we end up share files and writing over sockets. If we pre-share more structured notion (e.g. graph) and share that it's another way. But there's more to it - how does meaning get transmitted via a message to a receiver? If we pre-share a grammar language, perhaps we can send a message that programs the receiver to extract useful info from itself? Whatever the structure of the message, there needs to be some sharing of 'tags' or 'identifiers' within the message for it to be usefully absorbed. How do we share these? Should this also be provided by the substrate?
Because it's the lowest common denominator between different implementations of a type. The real question is why every language and subsystem invents its own binary type implementation.
This is very reminiscent of VPRI (and Kay's) efforts to shrink code size using shared abstraction written in OMeta.
Also a lot of attempts have been made around meta editors (eclipse tried, and probably still does with EMF/xText)
Honestly.. I cannot help but to think that most programs are parsers. My vue app is a weird GUI to assemble a tree of things by swallowing (reducing) over a stream of events (not unlike a stream of bytes).
The idea of program as an interpreter of an input stream of instructions seems very intuitive to me.
On the other hand pure functional programming with all those monads IO monad hacks doesn't even seem to map to what we are trying to use computers for.
I would go as far as saying that not only functional paradigm but also Turing machines and lambda calculus are inadequate mathematical models for how we are using computers. In those computational models the computation is bounded - it converts a set of inputs to a result and then dies until somebody runs it again.
Most modern applications of computers are unbounded computations. The machine reads an input stream (eg. keystrokes and mose clicks) and produces another stream od outputs (such as instructions to modify pixels on a screen).
Even within a language there can be barriers to composition, as described in What Color Is Your Function [1].
When you have a sophisticated type system, it's very tempting to make fine distinctions that prevent mistakes by preventing unlike things from being composed. But whenever there is more than one way to define an API, interfaces can differ in detail, so things can't easily be plugged together even though their interface does basically the same thing. (You can have different kinds of strings, for example.) The result is having to write a lot of adapter code. (Rust in particular worries me due to its way of exposing fine-grained detail in public API's.)
Enabling composition in practice means coming up with nice common interfaces and promoting the hell out of them so everyone uses them. This is basically about standardization. Cross-language standardization is pretty hard.
- all commands have uniform introspection
- console contains 'live views', not dead text
- embedded REPL inside apps with commands that mirror the menus and buttons
- seamless jump to source, live edit
Indeed. On the LM, the sort routine used by the OS was the sort routine you called in your program. There was no serialization because everybody agreed on the memory layouts, and the types were tagged at runtime.
You could even change the system sort routine if you really wanted to, on the fly. Not a great idea unless you were a wizard, but nevertheless possible.
> If I write a program in Unix using Java or Python, can I reuse the Unix sort to sort an array of items inside my program?
Yes, if you split your program into smaller programs (one which generates the unsorted data, and one which consumes the sorted data), at which point you could run 'generator | sort | consumer'. Or your program runs 'sort' as a subprocess and communicates over STDIO. In either case you'll need to serialize and deserialize the data in question.
Maybe rather then composing things we should have compilers that create systems across these barriers.
For example a programming language that compiles to three application servers and a queue. The application itself exists across multiple barriers but the source code is a single project.
The complaint is more about bad interfaces. One of the examples (unix command line tools) has a very clumsy interface if you're trying to use it from within a lower level program: create pipes for input & output, fork, exec, use i/o on the pipes to feed data in and get results out, waitpid, figure out how the tool exited... this is slow, annoying to implement, has many failure modes that would not exist if the interface were a direct function call, and it's inflexible in that you'll be unable to use the core functionality provided by the tool if your data cannot be realized as a simple text stream that still makes sense to the receiving tool. And you can't extend the core functionality e.g. by passing another function to it. That's a huge amount of friction to using code that is already there, because of bad interface. And that's why people reimplement those tools instead of composing programs using those existing tools.
Likewise, FFIs exist, but often it's just easier to reimplement something in your language instead of trying to use the existing implementation from another language.
It's not about interfaces being an impediment but there being too many composition models, none of which seem to scale very well. A composition model is more than an interface. E.g. the way you compose a C program is you write the pieces (functions, structs) that are designed to fit together in certain ways, and then run the compiler which binds these together. The way you compose a web application is you invoke multiple processes that have a pre-shared notion of the protocols and they bind with each other. We're inventing model upon model, stacking them up like an wall of rocks. I'm saying we should look for models that are powerful but compact and can scale up. It should also be something that leads to less reimplementation because composition within it becomes easier. I don't think we have such models and I believe this needs deeper study.
Unix provides a very spartan UI (the shell) with very spartan composition primitives (yay pipes? how many of you are still writing real programs in shell? regret not doing that little script that expanded to 5000 lines and tries to manage multiple processes, in a real programming language?) and a programming environment where you implement all the rest for yourself. Everything is a file, except when it's a process or a function or a thread or a file descriptor or an address or a socket or a ... you get it. We might be building lots of impressive products on top of Unix, but when looking at the code, it's more like we're mostly working around Unix (often trying to wish it away altogether by abstracting it out) because it doesn't solve our problems.
Files don't scale and lack transactions (i.e. are full of races, until you dump them and get a real database), pipes don't scale (and are too limited due to unidirectionality), text streams don't scale, shell scripts don't scale (and are full of races). Unix originally didn't even have a mechanism for waiting on multiple fds and then they came up with the hack that is select(2), which doesn't scale, and we still don't have a good API for asynchronous I/O (read the docs for e.g. libev for fun). Unix just makes you cross too many narrow one-way bridges, or build an island for yourself.