Transactions (something like fsync) and code upgrades (something like database migrations or hot-reload in a debugger) seem like the tricky parts and I don't see a discussion of how they'll handle it.
Let's assume you get a stream of security patches from somewhere. How do you apply them without a reboot? What if a data structure needs to be migrated?
During early development, it's easy to just throw away all your data and start over after a redesign, but as with production databases or filesystem implementations, once you're storing real user data, you're not allowed to do that anymore. It's helpful architecturally to have a distinction between on-disk structures (which you're careful about either not changing or migrating) and in-memory structures (which don't) rather than trying to freeze or migrate everything.
> It's helpful architecturally to have a distinction between on-disk structures (which you're careful about either not changing or migrating) and in-memory structures (which don't) rather than trying to freeze or migrate everything.
That's true, but syncing between the database and in memory data structures is tedious. Imagine building an application and never having to worry about storage. Just instantiate and reference objects and you're set. When working with lots of data, the OS would just swap things in on demand. That would save a lot of development time for many types of applications.
But yes, there would be some big challenges regarding application updates and how not to destroy all data with a trivial bug. (A misplaced `customers = []` might be all it takes to permanently wipe all data. Perhaps this idea should be combined with immutable data structures and versioning.)
[Edit] After some more consideration: what I'm talking about feels more like a language feature than something the operating system can/should do.
Serialization doesn't go away because you still need it for network connections. Data formats useful for sending messages over the network can also be useful for saving things to disk. These formats should be OS and language-independent and have rules for backward compatibility.
Consider how hard it would be to replicate a git repository if it's just a bunch of ad-hoc classes defined in one language.
Protobufs are one design that addresses these concerns. There are others. There is an impedence mismatch, but maybe this would be better addressed through languages that have better support for serialization?
Transactions can be done "easily" by reusing virtual memory support to trap accesses to memory (read/write/execute) and basically running coherence protocols in software. This is also how distributed transactions can be implemented, which enable you to deploy a bunch of threads running in a single address space across separate nodes in a cluster. It's not all that much harder to implement than checkpoint or restore, which they seem to have already.
If I understand this idea correctly, it would have a huge performance impact if every load and store had to be trapped. Or am I missing something here?
Not every access, but those that have to be trapped in order to maintain coherence. The cost of a trap might simply be in the same ballpark as other kinds of synchronization.
For code upgrades, see e.g. multithreaded Common Lisp implementations which achieve this consistently via extremely late binding. Of course, things get a bit more complex if you e.g. need to atomically synchronize upgrading multiple code bodies rather than only one, but that is something that can be done in application logic itself.
This is an interesting idea, much like KeyKOS in the past, which had a heartbeat that saved the entire system state every N seconds, tagging blocks of memory as it went so it didn't have to be instantaneous. It was demonstrated by simply powering the system off, then back on, without letting the OS know it was about to happen. After about a minute, everything was back up and running, like nothing had happened. (Or so I've read)
Supporting Capabilities is the one thing I'm looking for in a new daily driver. I don't care how rough it is to use, as long as I can compile and fix things, I can tollerate quite a bit. I lived through MS-DOS and everything since.
Interestingly, KeyKOS's successors ended up removing the transparent checkpointing out of the kernel. As I understand it the reasoning was two fold. 1) virtual memory page outs can be implemented cleanly by a user space process, so pushing it out there makes the kernel have a smaller surface area. 2) checkpointing in distributed envs requires some idea of the checkpoint event being propagated to user space since there's all sorts of state that could be on other systems. Rolling back is going to get TCP, etc out of sync. The earlier user space knows it's been rolled back to a checkpoint, the quicker it can recover practically.
I believe this is related to the Blender file format (.blend).
It is just a dump of the program state. That's why saving and loading is extremely fast.
In short imagine Java or C++ objects, but they can live in a soup in a persistent database.
Imagine if your programming assignments at school, were "open this database then write a client to talk to the object retrieved." That's what we actually had in PS-Algol in 86. Super cool!
> In short imagine Java or C++ objects, but they can live in a soup in a persistent database.
AllegroCache still exists today. So at least Lisp people don't have to imagine that. I wonder if that is one of the reasons why CL people in the 1980s came up with UPDATE-INSTANCE-FOR-REDEFINED-CLASS.
Yeah, you could supposedly upgrade all the software on a lisp systems without having to restart anything (assuming the developers implemented the various redefinition protocols correctly.)
You can definitely do that in an image-based Smalltalk system, it is necessary to support the code browsers. I imagine the lispm devs had the same reason.
Imagine the user experience of closing the lid on your laptop and reopening it a few hours or a few days later, without using any battery life in between.
You can also imagine it being useful in extremely low power solutions where battery life is critical. Space craft, etc
> Imagine the user experience of closing the lid on your laptop and reopening it a few hours or a few days later, without using any battery life in between.
MacBooks and probably other laptops (?) already do this. You might lose a couple percent over a few days.
Surprisingly, while my Intel-based MBP does that (and can remain sleeping unplugged for several months), my M1-based one doesn't (after a few weeks, there's barely any battery left).
Funnily enough I have the opposite experience! My M1 can stay closed for weeks without losing more than a few percent, while my Intel-based tends to drain about 10%-20% a day.
I'm a bit confused. Thhe hibernate function has been available in Windows (and also Linux) for years. In Windows 10 you might need to enable it in Power Options in the Control Panel. I'm using it all the time. It's very useful for long travels for example.
Yep, and sometimes it even works. Now imagine hibernate not even being a feature to fiddle with. It just, is. Close the lid, 0 power draw, open the lid, instant resumption of where you were previously. No wake on sleep (and the quirks that come with it), etc.
Classic example of “but you can already do this by configuring X in this settings menu and then making sure to enable it before you leave and there’s just this other one trade off and yea it can be slow and clunky sometimes” vs a solution so native to the platform that it’s not even a thought in the user’s mind.
To be clear: hibernate works without any fiddling, but in Windows 10 this setting is not enabled default, you need to explicitly turn it on in the Central Panel, and then it works.
I don't know why Microsoft decided not to enable it by default. Maybe they thought the difference between sleep and hibernate is too much to grasp for common users. And sleep works a bit faster (although with modern NVME drivers I don't see as much difference as in the past when you had to write 8 or 16 GB of memory to HDD).
Fiddle with?
Simply select what action closing the lid performs.
It works every time. I have gone months without shutting down and just relying on this.
Running a tiny embeddable computer in a highly contingent environment. Off the top of my head I know someone running such computers to measure biodiversity through audio recordings. They fail all the time due to extreme weather and in one instance a bear pissing on it. I imagine the security could be useful in such an instance
As to who would need the OS state surviving reboot, what comes to my mind is things like Mars rovers and other autonomous exploratory vehicles. Currently these use cases are filled by VxWorks[0] among others but a fresh approach might yield some advantages.
I would flip it around and ask "Who wouldn't need this?" Think of it this way, operating systems like Windows have tons of vulnerabilities, which need to be patched, and applying those patches takes a reboot. But people push off those reboots so long because they mean closing all applications and interrupting work, to the point where the entire computing ecosystem is more vulnerable and less updated as a result. Microsoft started scheduling updates in the middle of the night just to find a time that was amenable to users, but that doesn't help when the machine is turned off! So I think that feature, if it were deployed everywhere, would be a net win for everyone, not specific at all.
But if your application is just restored to its previous state, it is still using the buggy code that was patched on disk. If you page in new code from patched files, the program will not necessarily know that foo() now lives at a different address.
Save/restore across shutdowns is much simpler when the code doesn’t change. When the code changes, you need some form of live patching (ala ksplice) in the mix as well.
There are also concerns when the system interacts with external systems. Suppose the system state was saved just before the million dollar transfer was performed, the transfer was performed, then the cat tripped over the power cord without that being persisted. When the system starts back up, the app may make another million dollar transfer thinking it was the first one.
Windows happily applies updates during the night on laptops even when the lid is closed. Then the next day, you open the lid and bam, nothing of what you left open the previous day is there anymore!
macOS does something kinda like it. For when your battery dies you just have to boot the machine and all running apps will sort of start in their previous state.
That really depends on how well the app manages its own state and specifically handles restarts. No mainstream OS can restart without running apps even knowing it.
If you got a Mac without a battery you can try to write something in pages and just yoink the power cable. Upon the next boot pages should open with your document and not lost anything.
It’s possible with virtualization and containers. In those cases, to be fair, there applications are running in an environment that may not be aware it was restarted.
Memory isn’t snapshotted and dumped, then restored by the OS in a way that user space is largely unaware.
macOS/iOS has adopted a model of giving apps an opportunity to persist their own state before being terminated (which may be eagerly to reduce memory pressure or power consumption, for example) and then have the application restore its own state in whatever way is appropriate.
Outside of containers, hibernation almost needs to be a system-wide operation, which makes it inappropriate for OS updates, restoring from crashes (which, I suppose is what Phantom OS is trying to resolve), or hardware changes.
Note that "containers" is basically kernel-implemented namespaces, which is why a containerized app can be checkpointed and restored independently: they can preserve the semantics of kernel resources across runs. You could have just about any arbitrary app running in a "container" with it being none-the-wiser about it: it doesn't necessarily impact userspace support.
I just read some of your posts before replying. Assuming you really are Ukrainian, I'm very sorry about what you and other Ukrainians are going through, and of course I'm 100% on your side on this, but I have to warn you that this attitude is exactly what Russian leaders would like to see: people around the world hating everything Russian indiscriminately, so that they can play victim. You're not helping your cause at all this way.
I assume, you just don't know reality good enough, not ignore reality.
In reality, from USSR times, exists sort of non-written deal, in which Russian tops allowed to do everything, even craziest things, like invade Ukraine, in exchange to relatively high life level for Russians (close to 1st world countries) and some civilization level achievements, like human space program (or something, their culture consider large achievement, like bloodless seize of Crimea).
These are extremely close to extreme-left maxima, that everything acceptable, in exchange to weighty enough something else.
For example, nearly all Russians think, that is is acceptable to sacrifice tens of millions people, to achieve something like human flight to Mars, and high percent of them accept to sacrifice their own life, or lives of their relatives, if this will be decide of random generator. Yes, they really thought themselves as just substitutable parts of huge machine changing world.
Very few Russians (~1% of population), who don't agree with such thoughts, are constantly flee from Russia and got residency in some other nation, in many cases, just not so aggressive.
And this is not just my attitude, these are extremely rational thoughts, based on tens years of observations, read books, math modelling and testing.
(For example, Algebra of conscience, Book by Vladimir Lefebvre)
Yes, it is not easy, but I have myself tested relations with Russians, and at the moment, even with tens years experience, even with closest friends, I cannot make good enough prediction, if some person nearest time will behave as normal human, or switch to totally agree part of Russian totalitarian machine.
I do not think that Russia wants to play the victim (completely different culturally for that). Plus, they have been the scapegoat for a really long time now. It became a meme. The Russians did it!
- They just don't accept thoughts, that other nations have right to make their choice.
To be honest, they divide all nations to great and not great. Rule is simple - Russians had wars with near every nation in Europe, and after WW2 in other parts of world, and who had before in well know history kicked them in the teeth, considered great nation, all others are not great, victims.
I say, this is road to nowhere. This efforts will not lead to anything real, only papers and pseudo-scientific talks. Nothing more.
Oh, sorry, forgot about Russian national sport - extreme sized embezzlement of budgets, for which need super-duper fantastic projects, so in any case could say "they just suffer from budget cuts, just few additional billions could lead to success".
This is really cool, I felt deeply relaxed and curious at the same time just thinking of the implications.
There’s got to be tons of stuff to dig out and rediscover from early CS R&D. Some of these are more advanced than what we have today, but we seem to be obsessed with patching over existing solutions.
The reason why "turn it off and on again" solves most computer issues is that the issues are caused by the system entering unexpected states that weren't properly accounted for at design/coding time. The longer a system runs, the more likely it will end up in such a broken state. So basically, this system is designed to monotonically drift toward being completely broken. How is this an improvement?
I suppose there's a procedure of some kind for rebooting from zero, rather than restoring previous state.
I guess I'd like to be able to specify at startup which processes are to be de-hibernated, which are to be restarted from scratch, and which are to remain dead.
This is really cool, but at odds with SSDs. They don't like all the write cycling, and at the same time, they can boot fast enough to not need persistence, and are low power enough that people just leave them on all day.
Lack of persistence could almost be seen as a benefit, because as we all know, most problems are solved by a reboot.
But I'm sure somewhere out there is a super niche application for this.
My first thought is that this computer will be impossible to use.
Why? How many bugs exist in computer software, where the solution is to turn your computer off and on again. If power loss doesn't effect state of running programs, then once something is weird, your only option is to reformat the machine. The IT crowd will be out of a job.
Don't Android and iOS apps use a persistent database to store much of their data? So in that case the recovery procedure is to do something like "Clear storage"/"Clear cache".
And most of your work laptops probably have a network-mounted user home directory. Swap a laptop and you still get your home dir. You can always selectively purge part or all of your home dir.
Persistence doesn't make these issues unrecoverable, it just makes the process for recovery different. Although admittedly it's less simple than it was in previous generations.
They can show you the path to successful OS use. But only you can walk it. Mistakes must be accepted, contextualized, and built upon as a path forward... :-)
As an IT OS Coach, it just irks me when people don't understand how to use their OS properly and blame the 'system'; people forget how the before-times were, when you had an OS thrown on your hands and had to waste precious time learning to use it.
We will be out of a job because we will not be able to say "have you tried turning it off and on again?". On a serious note, there are already loads of IT jobs that do not actually have to exist... Soon we will have a job specifically for coming up with the tech stack, picking frameworks and so on. :)
Could PhantomOS’s architecture be a great match to XPoint (3d memory like Optane) or eventual memristor memory, potentially achieving instant booting times?
I'm glad to see real OS research hasn't stalled. This feels very Plan 9-esque in its boldness and commitment to its ideas. We need to let go of Unixisms.
From the looks of USENIX papers it is as alive as always, researchers prefer to focus on distributed computing OSes, as there is where the research money now lives on, the classical desktop is done.
Mobile OSes (which can be turned into desktop like experiences) is where most research activities in managed stacks and improved security are now going on.
I remember this OS project starting about 20 years ago. Since then, it haven't shipped much.
The idea behind it is to make the whole OS into basically a giant Java heap. Everything is an object, referring to other objects. It gives you some nice properties, because everything can become a capability which can't be forged. Also, everything can automatically be persisted and restored, Smalltalk-style.
The problem here is, of course, garbage collection. When not just your whole RAM but your disk storage is a mesh of pointers, GC becomes a resource-intensive task.
I wish they pivoted to building an OS for Opteron architecture; there the approach might have a chance to shine.
This has a POSIX layer so it's not free of 'unixisms' by design.
I agree it's a always great idea to explore new ideas from zero without any conceptual baggage. I'm sure that unix-like OSes will still have a lot to offer for some usecases though.
well for starters some of the machinery is useless and requires constant stepping over.
some design decisions (fork, signals) have really shown themselves to be mistakes as time goes on
for what it does on a server system for example (manage processes), there is a lot of useless genuflecting and papering over problems. remote manageability being a big one
so no, no one is going to die. but we could do a lot better.
What would you propose instead of forks/signals/process management? There needs to be some way of managing different tasks whether that's threads or otherwise
Personally, if I were designing a replacement, I would probably try something like this:
* The lowest level API to create a new process just creates an empty address space with nothing running in it.
* All kernel APIs for manipulating processes take an extra parameter to specify which one.
* Together, these replace "fork, exec" with "create a new address space, map these pages into it, create a thread running in it at this address." (With the right APIs on the side for custom page fault handling and synchronization, you can even reimplement fork+exec yourself on top of this if you want- but you don't have to.)
* Signal handlers would never run on an existing thread- this is too fundamentally fragile. Instead, for cases where you do actually want that sort of asynchronous callback, it runs in its own separate context, where it can act like normal multithreaded code. (Not every use of signals actually needs this, though- a lot of process management signals get routed to threads that explicitly asked for them, and that sort of API probably still makes sense.)
Arguably you don't even need "threads" at the kernel level. You just need to allocate CPU time somehow. I might try to make the API for that look like a callback into userspace (similar to the replacement for signal handlers above) that is then responsible for doing any further scheduling on its own, e.g. among threads in that process.
process_create(executable) would be nice start. the current model of cloning the current process creates a bit of a mess about what the parent and the child should share, and has overhead even in the exec case. now that we have threads, is there any utility in keeping the old fork() model?
not saying anything novel here, but just design a sane and uniform api.
there should be a way for the kernel to do a more controlled upcall into process space. that would scrape out quite a bit of rotten asynch implementation.
Yes macOS is now a certified UNIX, however just like its NeXTSTEP predecessor, its UNIXisms are meant to bring people into the platform, not to get them out of it.
Every single user experience that actually matters to Apple customers lives outside UNIX, relying on Objective-C frameworks, and nowadays Swift ones as well.
This is the mindset that those that buy Apple products as shinny UNIX keep failing to grasp, and then start blogging all those posts about going back to BSD/Linux, which they should have done in first place to start with, sponsoring BSD/Linux hardware OEMs.
Could you (or someone) summarize some important points? I greatly appreciate the link but I can't read 360 pages right now on a topic I have almost no background in.
I think you should properly read it and see if you still agree.
Granted it’s all very dated, so the idea of poor stability and taking up a lot of RAM is no longer an issue, but if you see what came before it and the comparisons you might wonder how we ended up where we are across all commercial operating systems.
One simple reason: too similar & then everything gets ported over, & the OS becomes YET Another Annoying OS Because It's Familiar But Feels Wrong. Yes, it's not great to have to make new, basic tools, but having tools that fit the environment is better than shoehorning in tools & demanding that compromises be made in the environment so that the old tools work more easily.
> On the application code (VM bytecode) level OS shutdown (either manual or caused by failure) is not even 'seen' - applications and their data are 'never die', they continue their work after the next OS boot up as if no shutdown ever happened.
Perhaps a good thought exercise on compassion. Substitute American for your people.
“Would it make sense not to promote American projects until Americans are out of (Iraq/Afghanistan/Africa/etc)? On the other hand, the war is American government's doing and isolating American population only pushes them away into Biden’s/Trump’s hands.”
How would the above, targeted toward your people, make you feel?
I feel bad for Ukrainians who have to starve and fight for their country who are prevented by Russian government to exchange ideas and present their projects, research, collaborate.
Ukrainians who are defending their country and are called Nazis by Russians, Ukrainians who's country is being turned into ruin by Russian army.
Government cannot exist without people. Implying that all Russian people are victims of circumstances and should get a pass is a bit of a stretch. Inaction is still action.
I worked during my entire life to create peace & prosperity for myself, my family, & all. Now self appointed officials who I have no influence over want to start a conflict, raise my food & gas prices, cause food scarcity, milk me dry financially, steal my freedoms. I want nothing to do with any wars, sanctions, killing people for political & financial reasons, etc. If a politician, business leader, social influencer, or person wants a conflict, they should fly over to Ukraine with a rifle, & fight on the front lines. Red team vs Blue team. Fight! Leave everyone else out of it.
I guarantee that many people in all countries feel the same way.
Let's assume you get a stream of security patches from somewhere. How do you apply them without a reboot? What if a data structure needs to be migrated?
During early development, it's easy to just throw away all your data and start over after a redesign, but as with production databases or filesystem implementations, once you're storing real user data, you're not allowed to do that anymore. It's helpful architecturally to have a distinction between on-disk structures (which you're careful about either not changing or migrating) and in-memory structures (which don't) rather than trying to freeze or migrate everything.