This is a fun blast from the past. One of my first real programming experiences was working as a (the) programmer on a MUD and I remember learning about this technique somewhere (I think just reading about how forking worked) and then figuring out how to implement it on our MUD.
It felt absolutely magical to be able to hot deploy changes without kicking everyone off of the server - but felt even more magical to have gotten it to work.
The MUD community was a fun early introduction to open source. People (many of them probably "kids" like I was at the time) sharing various patches and features. It felt so cool to release something and have other people use it and provide feedback. Like the author says - at some point the MUD itself became a lot less interesting than the programming.
> The MUD community was a fun early introduction to open source.
They still are these days. There's plenty around, and there's even more MUD-likes (GUI) that are open-source and played/developed by hundreds & thousands.
I was using Ruby for hot reload, but then of course it was after the first MUDs. When everything is a function, you can just reload them. Doing it in Typescript
(for fun) was also possible but quite challenging.
That was an amusing post to see pop up here, as I believe I came up with the "copyover" name when I copied Melvin Smith (aka "Fusion") idea about "hot reboot" from his "MUD++" code base to the popular Diku-based MERC/Envy etc. bases -- that was 2000 or probably earlier. Whether Melvin originally got the idea from somewhere else I don't know.
That version just used exec, and closed all files but network descriptors already logged in, the mapping of fds -> login names was saved in a file. When the new copy started up, it would log the users on existing file descriptors. Today, using explicit file descriptor passing (so you don't accidentally keep files open) or a long-running proxy would be preferable.
Back then C/C++ were often used by the developers, and we were at best CS students. There were surprisingly few segmentation faults, but I remember a few mysterious memory corruptions...
according to the patches in the source posted on your website the main work was done 1996-1997 ;-)
i am actually surprised that this didn't happen earlier, given that Diku itself was inspired by LPMuds which could do live code update already in the early 90s. of course the motivation for Diku was to produce something more stable than LPMud, so maybe they didn't think that live updates were a good idea in the first place. (that said, i don't remember LPMuds being unstable myself, but i only played from about 1992 at which time it may have improved)
This kind of reminds me of a story I heard about how they passed around raw function pointers between servers as an RPC mechanism at Yahoo back in the day. They'd disabled ASLR to make that possible. I guess there were few enough segmentation faults :).
I actually run a MUD that stills uses this copyover method (primarily C code) for major code changes and migrations. We have about 100 players online at any given time! We've actually built in some more modern automatic copyover triggers and hooks related to recovering from crashes, capturing backtraces, performing database migrations, and sending/receiving systemd signals (among a lot of other modernization upgrades).
Very cool - I thought it would be a forgotten technique by now. Which MUD, and have you written up the details anywhere? I'm very interested to know what a modern version of this looks like, particularly with the systemd integration.
It originated (more than 20 years ago) as a fork of SWReality, which is itself a fork of SMAUG, which comes from DIKU, and so on and so on.
We (myself and a lot of other awesome volunteer coders over the years) have made some pretty major modifications to the codebase. It all runs on PostgreSQL now (instead of flat text files with a custom parser), it has a built-in Lua scripting interpreter for extensions and in-game interactions, it has deep integrations to our Discord community, it has sidecar web services, and much much more...
This is awesome, thanks for sharing! I used to play lots of MUDs and also the browser-based MMORPG life sim Star Wars Combine, so I don't know how I missed this back then.
By the way, you have broken link on the `Learn More About the Economy` link in the home page. Should just have it redirect to `/crafting-the-galaxy`
That's so cool. Have you written about the MUD's history and technical evolution anywhere? If not, would you be interested in collaborating on an article like that?
HuffPo wrote a piece about us like a decade ago, and I've been meaning to write more about the technical evolution of the game..just haven't gotten around to it. Certainly interested in collaborating on something
This is a pretty interesting approach to replacing your own executable, though it's really akin to a spicy spawn where you (the parent) exec rather than they (the child) exec. Not to minimize the coolness of this approach -- it's certainly a good way to do it, and doubly so on older Unix.
These days you could probably spawn your child process, test-boot it and then mmap bits of it back into your process and then unmap your old code pages once you're certain it boots. Or given how cheap cycles are, test-boot it and then throw that away and spawn the new executable if you're happy with the results.
this is why languages such as LPC were developed. LPMuds allowed developers to code rooms and objects while the game is running and simply reload them whenever they wanted without admin intervention. if loading failed then the old version was kept and you'd just fix your code and try again. it was incredibly robust and powerful. for a player to get access to new versions of objects or rooms they had to drop the objects and pick them up again or reenter the room so that the object references could be updated to the new versions.
even better, LPC has been rewritten into pike, a general purpose programming language that retains the same capability. in the roxen webserver i can write and reload modules at runtime. this works efficiently because the lifetime of any object instance is limited to each http request. when a module is reloaded http requests already in progress are not affected, only new ones.
in the object storage server open-sTeam also written in pike a more advanced method was developed using proxy objects that can update object references and thus allow the updating of code without breaking the references. i am still using that to host my own websites.
to this day i have yet to find any other language with this power. smalltalk can do it and i believe lisp too, but that's it.
I ran and maintained a CircleMUD for some years but always felt that we should have run LPMud (or MudOS). Those were clearly the superior technology. And playing them was so cool because they _changed_ much more often because it was painless.
The last maintained LPMud fork that I know of is LDMud https://github.com/ldmud/ldmud . Would be cool if there are others.
The BEAM vm has a lot of the functionality you mention from LPC and Pike. Erlang being from the 80s—predating LPC—and also from Sweden, I imagine that Lars Pensjö and later Fredrik Hübinette et al may have taken some inspiration there.
I still maintain the CD LPMud driver for Genesis MUD at https://github.com/cotillion/cd-gamedriver.
There is not very much activity though since most critical issues have been fixed over the years and the game is very stable.
Thanks for bringing this up. I actually had a section on LPMuds in an earlier draft, but took it out because I didn't think I could do it justice. I also wanted to focus more on the Unix stuff. My friends and I tended to play DIKU derivatives, so I was never intimately familiar with LPMuds' capabilities.
I hear that class/skill variety was often greater on LPMuds because it was so much easier to code up custom behaviour. Do you know of any especially good articles on LPMuds class (in the OO sense)/object hierarchies and the software architecture of mudlibs?
I believe when regular users were able to create the world around them in a MUD they were called "wizards", this is like as if in a MMORPG some players gain the privilege to add mods to the official game world for all players to experience.
that is correct. players who essentially completed the game were promoted to wizards with editing privileges. in the muds where i was active though eventually not all players were interested in that and new challenges were created where the players could continue, while the bar to become a wizard was not raised so that at one point players could choose whether they became wizards or remained players.
The way my mud did it, and I believe most muds (considering I took my implementation from another popular codebase), was not by forking.
Sockets are just files, so all we did was save everything including players, execl() another instance of the MUD, and exit. Same port wasn't an issue with SO_REUSEADDR.
On boot as it's loading players, their file descriptor ID was saved, so it loads it up and resumes talking to the socket. It also loads the ID of the server socket descriptor. No need to send information to a new process as it just loads the previous game state.
I spent a fair chunk of the mid-90s to early 2000s on a moderately busy MUD with a group of people who, it turns out, were all about the same age. No idea in retrospect how I juggled this with school/friends/work - I guess kids just have a ton of free time.
Kind of drifted away for a couple decades during college and after, as other things filed up the time.
I came back decades later, after going to a memorial service for a friend who died untimely of a serious medical condition, and seeing that a bunch of the people there were from her online community. They talked about the MUSH she hung out on had been a real lifeline when she was bedbound for immune-system reasons -- it was really, really cool to see, and I went back and checked out my own place and met up with folks again.
During 2020 a TON of people all had the same idea and all logged in to my place again. There was a brief resurgence of activity (from dozens of people online to a hundred+). Very few new players, but very cool to see people who were all, more or less, the same cohort -- just grown up now. Folks have slowly drifted away again in the past couple years, and that's fine too. I'm glad it's there.
It's nice to have these subcritical, human-scale online communities. Not everything has to be a subreddit or even a 10,000+ person discord - you can just hang out on a server!
Burka was known to her fellow players as "ashne" - stylized in lowercase - on Tinymud Classic & Islandia, hosted by CMU prof Jim "Fuzzy" Aspnes, among other servers
A TCP port concentrator, implemented by "leet". This add-on front end multiplexed connections with separate processes, allowing us to overcome the 64 file descriptor hard limit, per process, on Unix. Nearly 256 players could participate!
Na Choon Piaw, while working for Bell Labs Research on the East Coast, added a programming language that resembles Forth, releasing TinyMUCK 1.x and TritonMUCK test bed server.
Piaw worked with a VAX server, 7800 I believe.
also, Jon "Stinglai" Blow from Berkeley designed a string interning method for TinyMUCK 2.x that saved plenty of memory, by deduplicating ASCII strings in memory.
Me? I assumed many interesting names over the years. Primarily "ChupChup" or his plural counterparts, "chupchups", or also "Lucretia", the statuesque goth woman wearing "boots of the deepest black". But in Real Life, (RL), I remain Robert Earl.
I played on a fantastic MUD as a kid. It was based on the code for some other MUD, and the admins were unhappy about some things, so they decided to write their own MUD engine from scratch. They got as far as establishing a network connection to a client - all network code written in scratch from C - before they burned themselves out and quit, which pretty much killed the MUD too.
There’s a lot to learn here. Many developers fall into this trap halfway through their careers. Spolsky would probably call these folks you’re talking about architect astronauts.
As someone who has (for several years) developed, supported, and operated my own dialect of a multiplayer game server engine (which includes an embedded physics engine), that sounds absolutely brutal. It took several years before I saw my first user and it was after I thought my project was long dead. Then it started catching on. You probably won’t see results if you expect more than that.
Hopefully they gained some appreciation for how much work goes into these.
It is also fun to engineer servers which can be upgraded without ever stopping listening, even for a nanosecond on the listen socket. Would be a fun senior C programming job interview question, but unfortunately those don't exist anymore. Only kubernetes pod wranglers do, where the answer is "destroy it all and create it all and hope that the ingress somehow hides the mess".
A big challenge lies in designing a system where every user has creation and modification rights, without breaking the system. Systems based on the https://en.wikipedia.org/wiki/Object-capability_model for access control and those that can monitor and set computational resource limits internally can prove very useful there
MOOs (based on LambdaMOO at least) had various write permission levels, including builder (you can make new instances of existing things like rooms or objects and give them descriptions) and programmer (you can make new programs, written in a neat prototype-based language). Wizards could set these bits on other users and ignore permission checks on other users' objects.
One of the weirdest features of the language in retrospect was that iirc you could use object reference literals in your code, since every object instance had an id in the database (eg #1234). This eventually necessitated a special recycler, implemented inside the MOO, to make sure that IDs got reused when objects were deleted.
To add to this, MOO is just an interpreter and very basic network manager. All of the logic for controlling the actual game or environment is inside the database in interpreted MOO code. (You also have control over the network from inside. The server is just handling TCP bookkeeping, basically.) This means, for instance, that you can implement whatever permission system you want. One of the earlier examples is JHCore, which offers different groups of permissions. Really the only server-enforced permission levels are the wizard, which is essentially root, and the programmer. Everything else is up to you.
I'm not positive on the history of the recycler, but I think you have it backwards. It actually exacerbated the problem of using literals in code because you couldn't guarantee that the object you wrote in the code is the same object 20 years later. This created security concerns (imagine you hard-code an object number into your code and now the object is owned by a completely different person who can write their own code on that object. So your code called #123:innocent_function(), which the new object owner has programmed to do something malicious) at worst and broken code at best.
I assume the reasoning was two-fold: First, huge numbers are a pain to type. And in MOO you do end up typing object references a lot. And second, in the early 90's, memory was at a premium. Even a recycled object is using some memory (it still has a space in the object list).
Anyway, if anybody is interested, MOO is still kicking! There's a fork called ToastStunt (https://github.com/lisdude/toaststunt) that offers some slightly more modern conveniences. And the community has, mostly, converged into the ToastStunt Discord channel.
Sometimes. You may be able to identify a category of customizations that has both a high frequency, and the potential to be represented as declarative data. Changes to the data can be done without a restart. That can often be enough to get by.
But in my experience (which may vary from your own), this takes significant investment and you’d need a very strong incentive or it would need to be a passion project of some sort.
I mean, this technique seems pretty general already?
Get a signal, do the thing (fork, then exec, then copy state, then you're done).
But, there's other options. If you want to keep long running connections, but there's not really shared state, you can just fork and then move the socket to the new server and let the old one drain out and exit when it's done/ after you waited long enough. Using a BEAM language like Erlang or Elixir gives you hotloading as a fundamental tool, and otp has some suggestions for how you could use that. You can do hotloading in C with dlopen and dlsym/dlfunc, but you've got to structure your code carefully to make it work, and you may be stuck with a fixed outer loop, depending on how creative you are at writing weird C. Java has hotloading too, at least in some vms. I've built hotloading in Perl, but I don't think I'd be proud of my code if I looked at it again.
Maybe a dumb question here. Why is a child process needed? Can the parent process not open a pipe, write its own state into the pipe, and then call exec, allowing the new binary to read from the pipe? Will exec destroy the pipe if there is no child process?
A pipe is a pipe, it's not a store. The buffer amount is not defined but usually you only get to write something like 512 bytes into it before it blocks.
(you could probably do the same thing by persisting state into a file, though)
Doing exec() to a new process but keeping the file descriptors relies on none of the file descriptors having the CLOEXEC (close-on-exec) flag. This flag is, IIUC, considered best practice for all file descriptors now, and in Python, all file descriptors are now by default created “non-inheritable” (i.e. with the CLOEXEC flag). A working model would instead be to create one inheritable file descriptor, do the fork() (and exec the new process in the child), and then sendmsg() all the normal file descriptors to the new process.
That was a lovely short read. I didn't play muds much, I didn't get an internet connection until they were starting to look old hat, but I did play a Discworld one for a while at uni.
It's stuck me recently how much I miss that kind of programming. Everything I do now is hidden behind so many layers of abstractions, toolkits, libraries and environments that it all feels a bit divorced from what's actually going on. I need to come up with an excuse to do a bit of low-level coding again.
Spoiled web developer here. Since I have no way to test this, I have some questions! It seems like this whole thing could've been simplified to calling exec. Whats with the whole pipe/fork thing if you are just going to kill one process, clone the other (with exec), and the kill the original. Is the pipe crucial to transferring the contents of memory, or something?
Yes, it's about transferring the contents of memory. If you just call exec(), you get a new process with the file descriptors (connections) open, but you have no idea what the game state (e.g., which connection corresponds to which player) before the exec() call. The fork() is a way of keeping that memory around, and the pipe lets the forked copy send its state to the new code that's restarting.
I don’t see why the fork is strictly necessary, one could also just write it out to a temporary file. That would have the advantage it might be easier to debug if the copyover failed for some reason.
Other options (some of which the article briefly alludes to) include POSIX shm_open, Linux memfd_create, Linux O_TMPFILE. So long as you get either an FD or a filesystem path, which can then be passed over the exec in either argv or envp
If you use a custom memory allocator with its own heap, you could then store that heap in a shared memory segment and then remap it after the exec. I guess the risk is the memory layout may have changed so you can’t remap it at the same address, in which case you either crash or have some code which rewrites all the pointers in the heap (potentially very painful to do reliably…) Or you could just not allow raw pointers in the objects in that custom heap, requiring them all to be offsets from the heap start
All of this sounds pretty reasonable, but my goal was to document the simplest way I saw it work back in the day.
You can keep the process alive if copyover fails by writing to a temporary file and designing the server so that you get a "rescue console" if it fails to come up. Then you could inspect the dumped file, see what's going wrong, and exec() into another binary to try again. You could even use sqlite as your state file, which would let you interactively explore and edit it during debugging.
A custom allocator sounds like an extremely interesting approach. I'm still not sure whether I'd want to rely on struct layouts not changing, but if you use it diligently you at least make it easier to find everything you've allocated.
You could use a file instead, but `pipe` is the most portable way to not leave clutter on the system if something goes wrong. If you need traffic in both directions, `socketpair` is useful (and I think portable across platforms that support networking as we know it).
Less-portable or less-reliable alternatives include `memfd`, a deleted-but-still-open tempfile, etc.
To avoid problems with failure after `exec`, it is very important to fork+exec the server first just to see if it actually starts. If you don't need the PID to remain the same you can actually just use that instance without the other fork or another exec (this still keeps the same PGID). Note that "failure after `exec`" can be fairly sharply divided into "error in C runtime (usually, library compatibility)" and "error in something `main` calls", which require very different approaches.
---
Aside: I've had catastrophic failures using the "save after fork" approach (mentioned later in the article) when the child process crashed before the save completed, and the parent just kept trying to have the child save (I also managed to completely crash GDB while investigating this). It is very important for the parent to commit suicide if the child can't do its job, to minimize the amount of lost data.
The other thing I would recommend to anyone working with this kind of thing: use systemd and commit all the way; it will make your life 1000% easier than trying to reimplement it badly yourself.
> `socketpair` is useful (and I think portable across platforms that support networking as we know it)
Windows is a notable platform not supporting socketpair.
Some languages offer socketpair on Windows, e.g. Python, but they are implementing it themselves using TCP loopback connections.
Somewhat irrelevant to this discussion given Windows lacks exec (starting a new executable requires a new process which gets a new PID), and in practice lacks fork too (the low-level NT API has undocumented support for forking, but most of the higher-level APIs get confused by it and break when you use it, making it unusable by the vast majority of applications)
> Somewhat irrelevant to this discussion given Windows lacks exec (starting a new executable requires a new process which gets a new PID),
Well, you can do exec() on Windows if you implement it yourself in user space, see e.g. https://github.com/polycone/pe-loader (a bit of a dated project, 32-bit only, but the I believe the same principles apply to 64-bit)
Cygwin uses a hack – exec() creates a new Windows process, but every Cygwin process has two PIDs, a Windows PID and a Cygwin PID, so exec() changes the Windows PID but keeps the Cygwin PID the same – so it looks like an exec to Cygwin processes, but just like an ordinary spawn to non-Cygwin processes
Less insane approach: if you put most of your code in a DLL, you can unload the DLL and then load a new version of it
So they didn't just fire up pods in Kubernetes with new code and shut down the old ones? And they kept state in memory instead of some fancy distributed databases? And it worked?
It felt absolutely magical to be able to hot deploy changes without kicking everyone off of the server - but felt even more magical to have gotten it to work.
The MUD community was a fun early introduction to open source. People (many of them probably "kids" like I was at the time) sharing various patches and features. It felt so cool to release something and have other people use it and provide feedback. Like the author says - at some point the MUD itself became a lot less interesting than the programming.
reply