It seems to me that one consequence of the "Theory Building View" is that: instead of focusing on delivering the artifact or the documentation of said artifact, one should instead focus on documenting how the artifact can be re-implemented by somebody else. Or in other words optimise for "revival" of a "dead" programs.
This seems especially relevant in open source, or in blog posts / papers, where we rarely have teams which continuously transfer theories to newcomers. Focusing on documenting "how it works under the hood" and helping others re-implement your ideas also seems more useful to break silos between programming language communities.
For example a blog post that introduces some library in some programming language and only explains how to use its API to solve some concrete problems is of little use to programmers that use other programming languages, compared to a post which would explain how the library works on a level where other programmers could build a theory and re-implement it themselves in their language of choice.
I also feel like there's a connection between the "Theory Building View" and the people that encourage rewriting your software. For example in the following interview[0] Joe Armstrong explains that he often wrote a piece of code and the next day he threw it away and rewrote it from scratch. Perhaps this has to do with the fact that after your first iteration, you've a better theory and therefore in a better position to implement it in a better way?
I also believe there's some connection to program size here. In the early days of Erlang it was possible to do a total rewrite of the whole language in less than a week. New language features were added in one work session, if you couldn’t get the idea out of your brain and code it up in that time then you didn’t do it, Joe explained[1] (17:10).
In a later talk[2] he elaborated saying:
“We need to break systems down into small understandable components with message passing between them and with contracts describing whats going on between them so we can understand them, otherwise we just won’t be able to make software that works. I think the limit of human understandability is something like 128KB of code in any language. So we really need to box things down into small units of computation and formally verify them and the protocols in particular.”
I found the 128KB interesting. It reminds me of Forth here you are forced to fit your code in blocks (1024 chars or 16 lines on 64 characters).
Speaking of Forth, Chuck Moore also appears to be a rewriter. He said[3] something in similar:
“Instead of being rewritten, software has features added. And becomes more complex. So complex that no one dares change it, or improve it, for fear of unintended consequences. But adding to it seems relatively safe. We need dedicated programmers who commit their careers to single applications. Rewriting them over and over until they’re perfect.” (2009)
Chuck re-implemented the his Forth many times, in fact Forth’s design seems to be centered around being easily re-implementable on new hardware (this was back when new CPUs had new instruction sets). Another example is Chuck’s OKAD, VLSI design tools, to which he comments:
“I’ve spent more time with it that any other; have re-written it multiple times; and carried it to a satisfying level of maturity.”
Something I’m curious about is: what would tools and processes that encourage the "Theory Building View" look like?
> It seems to me that one consequence of the "Theory Building View" is that: instead of focusing on delivering the artifact or the documentation of said artifact, one should instead focus on documenting how the artifact can be re-implemented by somebody else. Or in other words optimise for "revival" of a "dead" programs.
Arguably, this is the entire spirit of academia, which mildly serves as a counter example, or at least illustrates the challenges with what you are describing - even in something where the stated goal is reproducibility, you still have a replication crisis. Though to be fair, I think part of the problem there is that, like you said, people focus too much on “documenting the artifact” and not “documenting how to produce the artifact,” but this is often because the process is often “merely” technical and not theoretical (and thus not publishable) despite being where most of the hard work and problem solving and edge case resolution and so on happened.
Edit: oh, and I would also mentioned, that the kind of comment you’ve described which focuses on why some process exists in the form it does to better explain how it does what it does aligns closely with Osterhout’s notion of a good comment in A Philosophy of Software Design.
I couldn't easily count the number of re-writes for my current project, but it keeps getting better, and each new iteration has had an updated architecture allowing for new features. When I re-wrote it as a Literate Program (first a .dtx, now a "normal" .tex) things got much more expressive and easier to work with.
> I’ve seen people regularly struggle to write code that accepts all back compat state + handles it correctly.
From the post:
> In a world where software systems are expected to evolve over time, wouldn’t it be neat if programming languages provided some notion of upgrade and could typecheck our code across versions, as opposed to merely typechecking a version of the code in isolation from the next?
That’s not an answer because type checking only protects against that class of errors. But you can have a logic bug in your upgrade code `if (old state) { buggy implementation of old compat } else { implementation for new state }` (or `convert(old state) -> new state` if the conversion is external). If you transparently just run the old code instead, then you’re not actually transferring state seamlessly and you run into the choice of “run N versions of code vs terminate sessions” when you have long running sessions. In any case, I think you start to run into real constraints and it’s not clear to me how Erlang solves these with it’s “2 simultaneous versions of the code only”.
> I think you start to run into real constraints and it’s not clear to me how Erlang solves these with it’s “2 simultaneous versions of the code only”.
There's different ways to handle it, but the 'easy' way is to write your new version that updates the old state to new state on first touch, and make sure either you have a no-op message you can send so the state gets updates or you have some periodic thing that means every state will be updated in X time.
There's also a way to have a 'try_update' that fails without changing anything if there are two versions active already. (Or you can just YOLO and anything with the old old version in the stack gets killed).
I'm not sure if there's better tooling for it now, but there wasn't anything to help you test transitions when I was doing it. For automated tests, you'd need to build a state with the old version, load the new version, run a test, etc. It's the same hole in testing if you do mixed version frontends against a shared database, or mixed version frontend vs backend; it's just more apparent because it's on a single system.
> This seems like a problem you can’t solve generically and you always end up making trade offs.
That shouldn't stop us from solving the problem in the cases where it's possible though? We can tackle the corner cases separately with manual overrides.
> This is probably a big reason why most programs use external storage solutions even if they’re less efficient - it centralizes maintenance of state onto a system that has well defined semantics and can handle repair transparently.
This is certainly the case today, what I'm asking is: does it always have to be like that in the future?
I suspect that it’s impossible in the sense that the “possible” space will look like a distributed storage solution and the rest will look similar to graceful handoff of new connections to new version + shutdown of old version after some time (with forceful disconnect of sessions hanging around).
Unfortunately the documentation for Erlang doesn’t really describe any pros/cons for anything and I’m not an expert in it so I don’t know what the limitations are for the Erlang approach but they certainly must be (e.g. if you have long running sessions and do several upgrades, are you running N versions of the code & eating up RAM because the old sessions aren’t complete?).
As I understand it, Erlang/OTP captures the entire state of the program and it’s a feature of the language and VM to accomplish this. It’s not something you can retrofit into any arbitrary language. For example, your JS app or your Python app or your Rust app won’t be able to do the same easily which means it won’t be robust and it will be error prone. Thus I stand by that there’s no “generic” solution you can bolt onto an arbitrary language.
> if you have long running sessions and do several upgrades, are you running N versions of the code & eating up RAM because the old sessions aren’t complete?
I believe Erlang supports two versions running along each other. They capped it at two because back when this was developed there wasn't enough RAM. Joe Armstrong gave at least one talk where he says if he'd have liked to support arbitrary number of versions and garbage collect them as old sessions complete.
> Thus I stand by that there’s no “generic” solution you can bolt onto an arbitrary language.
The main point of the post is centered around Barbara Liskov saying "maybe we need languages that are a little bit more complete now". I'm not interested in the limitations of current languages, I'm interested in the future possibilities.
There’s no free lunch and I’m suggesting the trade offs to support this are not worth it vs simpler approaches of doing a graceful drain & upgrade approach w/ a timeout for long running sessions if those may exist (+ if you have a lot of large state to migrate, it could be insanely long to complete an upgrade). This is because availability will never be 100% anyway in any scenario and this kind of transition can easily fit within your failure budget.
> As I understand it, Erlang/OTP captures the entire state of the program and it’s a feature of the language and VM to accomplish this. It’s not something you can retrofit into any arbitrary language. For example, your JS app or your Python app or your Rust app won’t be able to do the same easily which means it won’t be robust and it will be error prone. Thus I stand by that there’s no “generic” solution you can bolt onto an arbitrary language.
I say you can do hotload in any language that supports dlsym/dlopen or eval. I've done it (rather poorly) in Perl and C, and I'm sure others have done it in other languages.
It's a lot nicer in Erlang, so IMHO, if your use case includes long running processes with expensive to construct or transfer state (such as long running sockets), it's worth considering Erlang or something than can do hot loading.
That’s a great link thanks! It really makes it clear that a) correct state changes aren’t automatically correct (there’s both a manual and automated piece and either can go wrong) b) while the language makes it possible, there’s still a lot of manual work involved & footguns (e.g. if you have a contended resource held while something is being migrated, you’re going to experience degraded availability for other sessions to the point of downtime).
> The only other place where I see this is useful is competing tasks. One task needs more resources from a thread pool shared by other tasks. A pid controller can allocate existing threads based off of pressure. Allocating more threads to a pool in this case though, still doesn't make sense.
Imagine you got 16 CPUs/cores and a 4 stage pipeline, lets say we want to run one thread per CPU/core for max performance (avoiding context switching). Without knowing anything about the workload: what is the best way to distributed the threads over the stages? You can't tell. Even if you tested all the possilble ways to spread out the threads over the different stages on some particular workload, then as the workload changes you wouldn't have optimal allocation anymore. A controller can adjust the threads per stage until it finds the optimal allocation for the current workload and change it as it changes.
Right, the scenario you describe is exactly WHAT I mentioned: Competing tasks for threads.
This article is NOT exactly about that. It's about allocating more threads to a thread pool. I'm addressing the pointlessness of using PID controllers for allocating new threads.
Not exactly about that? It's literally the example from the motivation. The first thing I do in the plan section is to say "Let's focus on a single stage of the pipeline to make things easier for ourselves." and in the contributing section "We've only looked at one stage in a pipeline, what happens if we have multiple stages? is it enough to control each individual stage separately or do we need more global control?".
The example is about allocating threads to threadpools.
It is not about utilizing existing threads in threadpools.
Thats where YOU are mistaken. Your example has threads preallocated to core affinity. Which is what Im saying when I say run all pistons on the engine even when idle.
There are be plenty of university level textbooks on control theory. For how to apply control theory to software problems there seems to be much less material though. Glyn Normington recommended the following book in another comment thread:
Feedback Control for Computer Systems
Philipp K. Janert
330 pages
O’Reilly (2013)
ISBN: 978-1449361693
Good question! At first I thought that maybe I wasn't waiting long enough after the load generator finished, but I just ran an experiment with a longer pause and I still don't see a scaling down after the traffic stops!
Perhaps my naive implementation of the PID controller isn't good enough, maybe:
> you'll probably want to put a maximum on your integral accumulator
I got the idea of scaling thread pools from a paper[0] coauthored by Eric Brewer (of CAP theorem fame and also vice-president of infrastructure at Google according to Wikipedia). The paper was written in 2001, so things might have changed a bit, but I doubt the example is completely irrelevant today.
In the 70s Ericsson programmed their telephone switches in a proprietary language called PLEX. It had hot code swapping, so when Joe Armstrong started working on Erlang to replace PLEX in the 80s this was a requirement. Dropping a few thousands of calls just to do an update simply wasn't an option.
On the other hand, even the first versions of lisps (as far as I can gather at least) had `eval`, meaning a running program could accept external output and update itself. And this was in the 60s.
Lisp would also be able to compile code to assembler, run an assembler and load the generated code into the runtime.
I would think (without knowing too much about Erlang's mechanisms) that the mechanisms of Erlang are quite different from what a Lisp runtime typically does. The Lisp runtime is just one process. Erlang is concerned with multiple processes, which are strongly isolated.
This seems especially relevant in open source, or in blog posts / papers, where we rarely have teams which continuously transfer theories to newcomers. Focusing on documenting "how it works under the hood" and helping others re-implement your ideas also seems more useful to break silos between programming language communities.
For example a blog post that introduces some library in some programming language and only explains how to use its API to solve some concrete problems is of little use to programmers that use other programming languages, compared to a post which would explain how the library works on a level where other programmers could build a theory and re-implement it themselves in their language of choice.
I also feel like there's a connection between the "Theory Building View" and the people that encourage rewriting your software. For example in the following interview[0] Joe Armstrong explains that he often wrote a piece of code and the next day he threw it away and rewrote it from scratch. Perhaps this has to do with the fact that after your first iteration, you've a better theory and therefore in a better position to implement it in a better way?
I also believe there's some connection to program size here. In the early days of Erlang it was possible to do a total rewrite of the whole language in less than a week. New language features were added in one work session, if you couldn’t get the idea out of your brain and code it up in that time then you didn’t do it, Joe explained[1] (17:10).
In a later talk[2] he elaborated saying:
I found the 128KB interesting. It reminds me of Forth here you are forced to fit your code in blocks (1024 chars or 16 lines on 64 characters).Speaking of Forth, Chuck Moore also appears to be a rewriter. He said[3] something in similar:
Chuck re-implemented the his Forth many times, in fact Forth’s design seems to be centered around being easily re-implementable on new hardware (this was back when new CPUs had new instruction sets). Another example is Chuck’s OKAD, VLSI design tools, to which he comments: Something I’m curious about is: what would tools and processes that encourage the "Theory Building View" look like?[0]: https://vimeo.com/1344065#t=8m30s
[1]: https://dl.acm.org/action/downloadSupplement?doi=10.1145%2F1...
[2]: https://youtu.be/rQIE22e0cW8?t=3492
[3]: https://www.red-gate.com/simple-talk/opinion/geek-of-the-wee...
reply