I was curious about the state of the project and whether any source was available. Thought others might feel the same way.
In order for this to fly it's a ground up rewrite of fundamental principles of hardware.
For example, how on earth would paging work in such a world? Who or what would be in charge of swapping memory?
How would a system admin go about killing an errant process?
How would a developer code on such a machine that allows for no inspection of remote application memory.
Beyond that, seems like instead of decreasing the amount of trust needed to function, it increases it with every hardware manufacture. All while making fixing security holes harder (since they are now part of the hardware itself.)
Imagine, for example, your ram having a bug that allows you to access memory regions outside of what you are privileged to do.
It's a nice idea, but I think google's zircon microkernel approach is far more practical while being nearly as secure and not requiring special purpose hardware.
The easiest way to implement transparent paging would be to have a paging block that exposed the memory manager API (allocate, free, read, write) and would proxy requests to either RAM or disk as appropriate. But since I was targeting embedded devices, it was assumed that you would explicitly allocate buffers in on-die RAM, off-die RAM, or flash as required. The architecture was very much NUMA in nature.
The prototype made for my thesis only allowed the "terminate" request to come from the process requesting its own termination. It would be entirely plausible to add access controls that allowed a designated supervisor process to terminate it as well.
Remote memory can be inspected, but only with explicit consent. I had a full gdbserver that would access memory by proxying through the L1 cache of the thread context being debugged. (Which, of course, had the ability to say no if it didn't want to be debugged).
The goal was to have a system that had very small number of trusted hardware blocks which did only one thing, then build an OS on top of it microkernel style - except that all you have is unprivileged servers, there's no longer any software on top.
The point wasn't to move everything into silicon, it was to move just enough into silicon that you no longer needed any code in ring-0.
As an example, the memory controller's access control list and allocator was a FIFO of free pages and an array storing an owner for each page. Super simple, very few gates, hard to get wrong, and easy to verify.
I deliberately went with a very simple CPU (2-way in order, barrel scheduler with no pipeline forwarding, no speculation or branch prediction) to minimize opportunities for things to go wrong and keep the design simple enough that full end to end formal in the future would be tractable. Spectre/Meltdown are a perfect example of attack classes that are entirely eliminated by such a simple design.
I was targeting safety critical systems where you're willing to give up some CPU performance for extreme levels of assurance that the system won't fail on you.
What happens if the user doesn't care which CPU the process runs on?
This is certainly an interesting execution model. I think this should be adopted for IoT & given the simplicity of devices there I think it's a perfect fit. I do worry that this may increase the BOM & make this model only cost-effective for small runs. The code is generally a "fixed" upfront cost ammortized over the number of units you sell (the more units you reuse the software across, the cheaper that software cost to develop) whereas more complex hardware is a constant % increase for each unit (ignoring the costs for building that HW).
I think traditional CPUs/operating systems may pose additional use-case to adoption & I'm interested to hear the author's thoughts on this. Ignoring the ecosystem aspect (which is big, but let's assume this idea is revolutionary enough that everyone pivots to making this happen), how would you apply security & resource usage policies? Let's say I want to access file at "/bin/ps". Traditionally we have SW that controls your access to that file. Additionally, we can choose to sandbox your process' access to that file, restrict how much of the available HW utilization can be dedicated to servicing that, etc etc. If we implement it in HW, is the flexibility of these policies to evolve over time fixed at manufacturing time? I wonder if that proves to be a serious roadblock. In theory you could have something sitting in the middle to apply the higher level policy, but I think that's basically going to effectively reintroduce a micro kernel for that purpose. You could say that that could be implemented, in this example, at the filesystem layer (i.e. the thing that reads for example EXT4 & is the only process talking to that block device), but resource usage would prove tricky (e.g. if a block device had an EXT4 & EXT3 partition, how would you restrict a process to only occupy 5% of the I/O time on the block device?).
The assumption was that you'd end up with something close to a microkernel architecture, but without any all-powerful software up top. A purely peer to peer architecture running on bare metal.
I skimmed the paper but I couldn't get a sense of how the CPU scheduler works. What causes the CPU to switch between threads? A common need that I've seen, even in the more limited applications you're describing, is preemption & thread priorities. For example, a driver thread may need to preempt other threads that may be running or we want more of the CPU to be devoted to running some thread even when all threads are runnable. How is this handled?
Re blocking of threads, is the only way they get blocked because they issue an RPC request & thus the CPU then magically knows to switch to a different thread? Is there any other blocking operation that might require needing to switch threads & if so how would that be accomplished?
Another thing I've experienced is that not all HW is DMA'able for cost or space reasons & thus you frequently find such I/O busses to be implemented via bit-banging (e.g. IIRC some times you might have 2 UARTS & only 1 can be used with DMA & the other must be bit-banged). Is that compatible with your approach with all the same guarantees or does this 100% require every I/O system to be part of the DMA network? Or maybe your DMA design is more cost-effective than how CPUs deal with I/O today?
The advantage of a barrel scheduler is that you never can have more than one pair of instructions in the pipeline from the same thread at a given time, so data hazards are impossible and all of the checking/forwarding logic can be entirely absent from the CPU.
You lose single-thread performance with this vs more conventional hyperthreading, but it's much simpler to implement.
Sad choice of name though. If you're anti something, the biggest your ideas can reach is the size of what you are against.
One might, for instance, explore the "embedded systems" world and how one wants to distinguish oneself from it as well. I would say, embedded + everything that runs a kernel is a bigger scope already.
So, what do you want to stand for instead of against?
[Update] clearly it wasn't as simple as I thought, thanks!
If malicious code running in thread context 4 tries to reach the same physical address, the packet will come from 0x8004. Since the RAM controller knows that the page starting at 0x41414000 is owned by thread 0x8003, the request is denied.
The source address is added by the CPU silicon and isn't under software control, so it can't be overwritten or spoofed. If the attacker were somehow able to change the "current thread ID" registers, all register file and cache accesses would be directed to that of the new thread, and all memory of their evil self would be gone. Thus, all they did was trigger a context switch.
> Antikernel is designed to ensure that following are not possible given that an attacker has gained unprivileged code execution within the context of a user-space application or service: ... or (iv) gain access to handles belonging to another process by any means.
Even hardware IP blocks don't have the ability to send packets from arbitrary addresses. You can send to anywhere you want, but you can't lie about who made the request. Which allows access control to be enforced at the receiving end.
That "something you are" as well as "something you know" factor prevents any capability from being spoofed, since the identity of the device initiating the request is part of the capability.
What prevents me, an attacker, from making (or simulating) a bunch of hardware with different identities? Perhaps by setting up my hardware to run through multiple different 'self' identities and then trying to talk to other hardware with each disguise.
If the address space is small and cheap, it seems that I could quickly map any decentralized hardware at low cost.
The source address is added to a packet by the on-chip router as it's received. You can send anything you want into a port, but you can't make it seem like it came from a different port without modifying the router.
This model allows you to integrate IP cores from semi-untrusted third parties, or run untrusted code on a CPU, without allowing impersonation.
Let me put it another way: I ran a SAT solver on the compiled gate-level netlist of the router and proved that it can't ever emit a packet with a source address other than the hard-wired address of the source port. Any packet coming from a peripheral into port X will have port X's address on it when it leaves the router, end of story.
My conjecture was that this would make up for most of the overhead, but I didn't have time during the thesis to optimize it sufficiently to do a fair comparison with existing architectures.
One question is: how does this compare with micro-kernel OSes? People thought that was the future even before linux was invented.