Hacker News new | past | comments | ask | show | jobs | submit login
Platform that enables Windows driver development in Rust (github.com/microsoft)
339 points by mustache_kimono on Sept 24, 2023 | hide | past | favorite | 142 comments



For those who don't remember, Russinovich prior to taking over his boss' job and possible succession as the next MSFT CEO, he owned a software tools and NT kernel consulting company where Redmond sent their engineers to learn NT kernel dev. Also, he found Sony's DRM rootkit, Symantec's rootkit-like file protection, and caught Best Buy pirating ERD Commander.

I'm kind of kicking myself now that I nuked my LinkedIn in 2014 having 2 degrees from the whole Valley and Mark as 1st. Oh well, there's freedom and agility in relative obscurity. What's that reggae song by Desmond Dekker about it's pointless to talk about how much you did, made, or who you used to know? xD


> he owned a software tools and NT kernel consulting company where Redmond sent their engineers to learn NT kernel dev.

Apple apparently used to do something similar, internal Apple Docs were nothing compared to Jonathan Levin's books[1], so Apple would routinely buy those for their OS engineers.

It's pretty interesting to me that companies of that size can't really do these things themselves, despite being in a much better position by having access to the engineers actual source code. These third parties (or at least Levin) had to rely on reverse-engineering to a large degree.

[1] https://newosxbook.com


It is a matter of priorities, and walls between business units.

Back in my Nokia days it wasn't also that easy to get access to this kind of information about Symbian and what not.


Also because there's no promo or bonus for writing good documentation. It's mostly a thankless task even for internal docs.


Yeah, a consultant can make more than an FTE just by spending a lot of time reading and writing lots of documentation. Especially if they can form a business around it.


How can you mention him and not mention sysinternals? I rely on that suite to do my job every day! Sysinternals have had more of an impact than perhaps even Azure itself considering how basically every company using windows in the world uses at least one sysinternals tool.


The sysinternals tools should be part of any windows installation. They are a must in mine.and are sorely missed whenever I need to fix/debug a friends computer. why else did MSFT acquihire them/Mark Russinovich?


You don't need to miss them. You can run them straight from live.sysinternals.com and even mount it as a network folder.

https://www.nextofwindows.com/tip-having-all-the-sysinternal...


He is a good coder, but having read his books I don't think he would be a good CEO.

My take from the first book was that he has an extremly black and white view of the world.


> My take from the first book was that he has an extremly black and white view of the world.

Things change over age. He might be no longer such.


Apparently it's good for a comment on HN.


The song is Fu Man Chu, by Desmond Dekker and the ACES.


>and possible succession as the next MSFT CEO

wat


That's Live and Learn off of Double Dekker. It's good stuff.


Is he going to be the CEO? That would be fantastic. He was super technical and super knowledgeable about Windows kernel. He was a true hacker.


I don't think I'd want a "true hacker" to be a CEO.

"true hacker" feels good when describing people at the top of engineering career paths like fellows, but for a CEO I'd rather have someone who's mix of decent tech skills and unparalleled business skills. I've had an opportunity to see companies where people didnt know how to do business and that was tragedy. That was way, way worse than non-technical manager trying to push solutions on ya.

How he is doing from business perspective?


Why not? Why can't a true hacker be a CEO? Bill Gates was a hacker. He implemented the Altair Basic language when computer languages were still in their infancy. Timothy Sweeney was a hacker and the creator of the Unreal Engine. He was the CEO of Epic. Eric Schmidt was a hacker. He wrote Lex. He became the CEO of Sun and Google. I would say Elon Musk was a hacker, as he was super technical.


As much as Paul Allen? Would the latter have been as good a CEO?

Assuming you can't be among the best of both, which is a natural if possibly wrong assumption, I think it's clear where Gates excelled and that was obviously great for Microsoft.


The job of a Chief Executive Officer is to run the company, which requires a distinctly different set of skills and talent from people who like getting their hands dirty at the workbench.


> the job of a Chief Executive Officer is to run the company

Thats a very vague job description. Executive's jobs are to make investments that will maximize shareholder return and an executive of an engineering company will only be able to make the best investments if they understand the technology they are selling and their customers. People have been noting how engineers have been running more and more companies, thats because more and more companies rely on technology to drive their growth.


Why can't a hacker poccess both the skill sets?

On the contrast I can show you pure business people failed at the CEO post. E.g. John Sculley ran Apple to the ground.


It depends a lot… Being ”a hacker” does not remove the required skills even if he has used his time for other things in the past.

For software company, it could be useful if the CEO knows what software is and the company might find other business models than trending ”enshittification” route.


What I’m missing in all this.. there’s literally a role in the executive suite for the hacker-leader: CTO.


Absolutely, but we're talking specifically about the CEO role.


I agree he is a true hacker. Have all his books. But he is CTO of Azure and we know what s%#%$$t show that is...


The best kernel hacker is not the best manager for people - which the CEO very much is.


Are you saying the best kernel hacker categorically makes someone not the best manager.

Or are you staying the best kernel hacker does not necessarily make someone the best manager.

I disagree with the former but agree with the latter. But I also want to add that the best leader for a technology company like Microsoft is not just some guy with an MBA but someone with both deep technical expertise and management/business experience.

The CEO's of Intel, Nvidia, and AMD are all engineers by training.


That's just quite presumptuous.


Backed by statistics. Although Google’s example shows that having a non-engineer souled guy is not a recipe for preventing enshittification and rot.


Are you saying that it is backed by statistics that good CEOs need to be good managers of people and also implying that engineers are not good managers of people? I would be interested in seeing that statistics.

It feels wrong to me. In fact, I would argue that in tech the opposite is true. Go watch 'Downfall',the documentary on Netflix about Boeing and what happened when they dropped their engineering ethos and let in the "people persons" there.


Google is a textbook example of both enshittification and rot.


> ... having 2 degrees from the whole Valley and Mark as 1st.

I'm having trouble parsing this. What does this mean?


Degrees of separation in LinkedIn. 1st degree means you've connected with them, 2nd degree means you've connected with one of their connections.


Oooh okay that makes sense.

Thank you.


What kind of benefits do you think you would derive from being “connected” with him on LinkedIn?




Nice /s, it's not idiomatic:

    pub struct QueueContext { 
      buffer: PVOID, 
      length: usize, 
      timer: wdf::Timer, 
      current_request: WDFREQUEST, 
      current_status: NTSTATUS, 
      spin_lock: wdf::SpinLock, 
    }


Almost all of those refer back to FFI integer types, because that's what the Windows driver API uses. The Linux kernel has similar issues when it comes to writing Rust drivers, and so does every other project interoperating with C.

Types like NTSTATUS have a significant amount of documentation attached to it. Unlike C and C++, you can't just assign an integer value to an enum value. I suppose they could've generated a wrapper class and required you to cast every time, but then you'd get code that's 50% .into()s and that would probably add overhead to your driver as well.

As far as I can tell, you never set or alter these enum values, other than maybe the void* for buffer; you only ever read from them. A match {} block should work just fine on NTSTATUS. WDFREQUEST is a handle, so you can't really do anything with it other than maybe swap it out for another handle of the same type.

Timer and SpinLock have been converted into idiomatic Rust objects, wrapping unsafe library calls in their bodies. Other than that, I don't really see how you'd make this code idiomatic without adding a significant performance burden.


I think it would have been nicer if this platform took the enum, and didn't the integer conversion at the FFI level, no? That should all be elided by optimisations.


NTSTATUS is actually a combination of five types of information: https://learn.microsoft.com/en-us/openspecs/windows_protocol...

The first two bits indicate severity, the next bit identifies if it was OS generated or not, then there's an indicator to see if it can be turned into a HRESULT or not, then a bunch of bits indicating the source of the error, and lastly the error code relevant given the previously set bits.

The Windows status codes could be converted, but the status codes generated by other drivers can't be. Even then the conversion would be far from free, you'd need to keep a close eye on your optimisations with every compiler version to keep them free.

That also ignores the possibility programs have to classify their own status codes, so you'd need some kind of "unconvertible" object to indicate that you can't reliably parse the faculty ID, and that you have to treat it like a simple integer.


> Types like NTSTATUS have a significant amount of documentation attached to it.

I don't think Rust developers would have problems with converting NtStatus to NTSTATUS for the sake of searching documentation.


These are automatically generated bindings. Someone could go over every typedef and correct the spelling but I doubt any Rust developer would have problems with using a few capitals here and there.


could you clarify what you mean?



Doesn't use Rust naming convention, imports Windows' PVOID (void pointer) type, when Rust's whole purpose is not to have unsafe pointers.


I think they would need to rewrite everything to get rid of that


What were you expecting exactly?


You know what would be even better? Device drivers in a memory managed language like C#!


Microsoft is pushing more and more code into user space. The latest change is that they now insist printer manufacturers use UWP printer apps rather than drivers to add custom functionality to their printers; these apps are usually written in languages like C#.

I agree with Microsoft that the best method is probably to run less manufacturer code in the kernel, not to make the kernel accessible more easily. Running a garbage collector in the kernel sounds near impossible to pull off without GC bugs because C# isn't exactly suited for "I want to access some buffer but I can't because it hasn't been paged in and I'm in an interrupt handler".


> The latest change is that they now insist printer manufacturers use UWP printer apps rather than drivers to add custom functionality to their printers; these apps are usually written in languages like C#

it would be nice, however .NET Native is deprecated, Native AOT doesn't do UWP, and WinDev is pretty bullish in C++.


The GP I was responding to called for drivers written in managed C#, so .NET Native/AOT/C++ are the thing they want to move away from. The example code Microsoft provides (https://learn.microsoft.com/en-us/windows-hardware/drivers/d...) is all plain and simple C# with XAML UI.

I suppose you could write these tools in VB.NET or F# if you wanted to as well, but the goal seems to be to avoid memory and security risks in the printer framework by forcing everything into the nice and safe managed environment. Of course developers could always introduce vulnerable code into a managed environment if they wanted to, but most people don't intentionally make their programs more insecure, not even printer manufacurers.


C# is always managed, regardless of being AOT or JIT compiled.

GC and language runtime are still there.


I didn't realise that. Then again, I only experimented with AOT executables quite briefly. Perhaps Cosmos[1] stuck in my head more than what I read about AOT dotnet applications.

[1]: https://github.com/CosmosOS/Cosmos


>Native AOT doesn't do UWP

Is there a need for it? Printing isn't performance sensitive.


The UWP execution model to start with, no JIT code allowed, based on COM.


AIU the package for Windows Store contains msil, then Windows Store figures out what to do with it.


Depends, it can be already compiled to native code, in the case of still having MSIL, it is compiled by Windows Store before becoming available for distribution.

From the point of view of consumer, it is all AOT compiled code.


Native AOT is not dead I don’t know where you heard that https://learn.microsoft.com/en-us/dotnet/core/deploying/nati...


Read very carefully what I wrote.


Very helpful reply.


Would that be better though? I have no experience of driver development by my feeling is that device drivers being low-level and fundamental building blocks of a working operating system, doesn't fit well with garbage collection interrupting execution on its own schedule.


Depends what you’re writing to be honest.


I’d prefer not to have my mouse stutter because the driver paused to run the garbage collector.


Mouse stutter has become more common in Windows. Every time I open a web page with Spotify embed mouse freezes for ~0.5 to 1 second and then continues moving, I haven't found any reason for it, it happens with Firefox, Google Chrome, and Edge.

It's been repeatable ever since I bought a Ryzen 7950X at the beginning of the year, that's why I tried using system profiler tools to find the reason but as I have no good experience in this it's difficult to find the root cause.


I had similar problems maybe 15 years ago, so I don't remember exactly how I solved it. In my case it was a NIC driver that filled up some queue when I downloaded "Linux ISOs" too fast over bittorrent.

There are tools available to find the source of the problems, I think I used event tracing as a starting point.

https://learn.microsoft.com/en-us/windows-hardware/drivers/d...


Did you do any overclocking (especially RAM)? I've seen issues like this with either slightly unstable OC, or bad drivers.

I'd also suggest running https://www.resplendence.com/latencymon to see if it's a specific driver causing the stalls, or if it's random.


You can use UI4ETW[0] to capture an event trace when the mouse freezes. You may need to install the Windows SDK or Visual Studio to interpret the results, though.

[0] https://github.com/google/UIforETW


Unplug and try a different mouse. Unplug and disconnect any other peripherals except the mouse and try that. Try a new user on windows (user space apps) Try a fresh install (system stuff) If you have spinning rust try another hard drive (an app doing stupid blocking exclusive stuff with writing or reading can pause input stuff).


I noticed it after I upgraded to a 3900X. Not sure if it's an AMD thing, a USB hub thing, or a many-core thing. But it had never happened before. (Microsoft Pro Intellimouse)


It's probably an AMD thing. I had the same with a 5800x until I switched to a PCIe/usb card instead of the onboard ports.


Days of lag-free mice are well and truly over...

(Sitting here running wayland, where the mouse feels stuck in molasses!)


(Sitting here running Wayland aswell, where the mouse feels just fine)


I wish I had a problem with mouse on Wayland, but it doesn't run for me at all


(Sat here using KDE Plasma on Wayland where the mouse lagged and glitched every time you opened a program.)

I think later versions fixed this one, though.


Which compositor are you using?

If it's Gnome, the very recent release 45 has improvements running the cursor in a separate thread.

I personally run KDE and agree that you can feel a difference between the X11 and Wayland session but it's not terrible.


Might be worth it to check that you have the "Adaptive" acceleration profile with a positive pointer acceleration value set in your mouse settings.


Check your display refresh rate. I had issues on Linux with it not supporting the maximum HDMI bandwidth properly so it drops the refresh rate to 30Hz which makes mice super laggy.


That's not a Wayland issue. It may be a specific Wayland bug that affects you, but the architecture of the system does not force any extra delays.


Imagine we are half way through sending a frame to the display.

The cursor is in the bottom half of the screen.

If I move the cursor, that should be represented in the frame that's being currently outputted, not the next frame.

Wayland has no support for that, by design.

The end result is that there is always a delay of at least one Frame.


There's a big difference between slow as molasses and one frame of 16ms (which you sacrifice in exchange for no screen tearing). Unless you're really into high rank competitive shooters, I doubt you can even tell that 1 frame delay is happening.

Also, make sure you use the right config - there are actual issues with some competitors which are not Wayland architecture issues. For example https://github.com/swaywm/sway/issues/4763


There are ways to solve that and it’s not an architectural flaw. You just need prediction in the compositor to adjust the location of the mouse.

But honestly, the difference of a frame of latency is ridiculously difficult to feel and we’re talking about a half frame of latency here on average. It’s almost likely something else considering the entire end to end latency here is closer to 50-100ms on most systems and still isn’t what people would describe as laggy. Usually it’s some kind of software bug/hiccup that interrupts things to not be rendered at a constant tick rate or just getting hung and not processing events in a timely fashion.


As opposed to a stall induced by reference counting?

Don't allocate => don't need to collect.


It's not that hard to write no-alloc C# code. Happens in Unity dev all the time.


It's not, no. But it's also very easy to write alloc C# code and people will if you give them the option. That won't be possible in Rust / C.


It's also very easy to write alloc C++ code, no?


This is pretty silly. They have the option to alloc and leak in C too. Rust is also just guaranteed memory safety not leak free, right? In a competition of lazy code, I'd rather GC pauses than OOM crashes.

But C# isn't ideal for a driver because of the weight of the runtime not because it's hard to save on allocations.


It is correct that Rust does not (nor claim to) solve memory leaks.


Even so 16.666 ms is a lot of time for just handling mouse input.


Attitudes like these is how we eventually end up with bloated messes like Electron. Performance matters, everywhere. Some of us increasingly also use monitors with more than 60 hertz.


Yes. 640 usec should be enough for every mouse.


The CPU has other things to do too.


No matter what you do the CPU will always have other stuff to do.


So you agree...


Microsoft tried that already. It failed miserably. I don't think they gave a reason but I suspect that garbage collection caused too many freezes in early 2000s hardware.


It didn't fail, it was sabotaged by Synosfky and friends, which then carried on doing exactly the same ideas with COM and C++.

Guess why UWP applications are much slower than regular Win32, app sandboxing and COM reference counting.

But hey, they won.


> it was sabotaged by Synosfky and friends

Is this described anywhere in detail?


Yes, basically when Longhorn started to fail to meet expectations, there were two possible ways, do what Google did with Android, push forward no matter what to make it work, or forcing a rewrite into something else, which is what WinDev ended up doing.

"What Really Happened with Vista"

https://hackernoon.com/what-really-happened-with-vista-4ca7f...

"Turning to the past to power Windows’ future: An in-depth look at WinRT"

https://arstechnica.com/features/2012/10/windows-8-and-winrt...


macOS sandboxes apps, which are usually written in Swift or ObjC and thus use reference counting, and doesn’t suffer from the kind of slowness that plagues UWP. So your hypothesis appears to be incorrect


Maybe I'm wrong but I suspect your parent comment was alluding to that sarcastically.


It is very difficult to write device drivers in even C++ while still maintaining the ability to write pageable kernel drivers. Drivers must have very strict control over availability of memory because it will likely access it at times when a page fault is not possible (i.e. during an interrupt handler). The C# language and runtime would have to add features to explicitly accomodate this at a bare minimum


Microsoft did have an experimental OS[1] written in a variant of C# so it doesn't seem completely unrealistic.

[1] https://en.wikipedia.org/wiki/Midori_(operating_system)

My guess is the internal politics would be a bigger problem than any technical issues.


It's not unrealistic but Midori made a lot of changes to C# to make it possible, it was almost a different language by the end of the project


Learnings from Midori were partially incorporated into C#


Starting in C# 7, and still being done in newer versions.


Here's an example [1] of a device driver in C#. It is possible. It is also not recommended.

[1] https://github.com/VollRagm/KernelSharp


I thinking “this is some project probably written years ago and loading the .NET Framework runtime into kernel space” but this is actually using NativeAOT and the latest features/SDK.


C# doesn't help against data races

Indeed pretty much no mainstream language prevents data races at compile time like Rust does


Rust only helps in a specifc use case of data races, when the memory belongs to the same process and is being access by threads.

Rust type system doesn't help when the data resides in a shared memory segment accessed by multiple processes.


Rust helps against data races in a kernel, as shown by the Rust for Linux project and other kernel projects (Redox, etc).

Of course if you interact with foreign code (including code in other programs, but also code in the same program written in another language), you need unsafe. Or if you write a new synchronization primitive, etc. The unsafe is there to say "I manually checked and this is okay".

What's important is that with Rust, concurrent business logic - in a kernel, things like filesystems and drivers - shouldn't need any unsafe to synchronize correctly. You use unsafe for your lower level infrastructure, but the code that actually does things can be data race free (and thus is easier to modify and assure it's working correctly). And that's incredible.


Filesystems get modified by third parties.

Also the right way to implement filesystems and drivers in modern OSes should be in userspace drivers, which is the route being taken by Apple, Google and Microsoft across their OSes anyway.


> Rust type system doesn't help when the data resides in a shared memory segment accessed by multiple processes.

This isn't quite true. You can provide a safe abstraction that involves cross-process locking APIs. https://github.com/elast0ny/shared_memory/blob/HEAD/examples... is an example using a mutex guard.

Rust's type system helps more in some cases than others, but you can get at least some help from it almost all of the time.


Which are worthless on OSes like UNIX, with advisory locking.


I'm not aware that C# has security or reliability advantages over Rust. It may be better integrated into the Microsoft ecosystem but from the point of view of a driver developer who has to support more than one OS, Rust would seem to be a good tradeoff between security, performance, and reliability vs. C/C++.


I'm a big fan of Rust (and a minor contributor), but one can't deny that a garbage-collector makes most developments much easier. Now, I'm not convinced that a gc makes much sense in driver development, but I could be wrong.


I wrote a bit about this. After years of having move semantics and borrow checking I don't really agree. I find that GCs are way way harder to reason about. They make it very hard to know when I'm sharing something or not, when something can be mutated, when it will be copied, etc. And then I still have to clean up other resources with a whole other, separate system.

At the end of the day I find languages like Java much harder to reason about and actually a bit harder to write.

https://insanitybit.github.io/2023/06/09/Java-GC-Rust


You're not wrong, but I believe that we're talking of different properties.

Move semantics and borrow checker shine when you don't want your data structure to be shared and when you want to control mutation. In some domains, that brings you an unequaled level of safety and Rust is the obvious choice.

Move semantics and borrow checker slow you down when you sharing and mutation are properties you don't care about (or at least not enough to prove them to the compiler). In such cases, I'd rather code in OCaml (or Haskell, or TypeScript, ...)

I hope that one day we'll be able to have the best of both worlds, but I don't see a path forward at the moment.


Not wrong. Kernels usually have their own memory management schemes, tied to the lifetime of i/o ops and so on. It would make more sense to somehow integrate that into the driver programing language.


Xerox PARC and ETHZ workstations were fully written in GC enabled languages, including the device drivers.

Smalltalk, Interlip-D, Mesa/Cedar, Oberon, Oberon-2, Active Oberon

Source code is available for some of them, many home made OSes in non GC enabled languages are far from meeting the capabilities of any of those systems.

"Eric Bier Demonstrates Cedar" - Computer History Museum

https://www.youtube.com/watch?v=z_dt7NG38V4

https://people.inf.ethz.ch/wirth/ProjectOberon/Sources/Kerne...


Very interesting, thanks!

Do these include subsystems with fairly high-performance requirements (GUIs, high-volume network services, etc.)?


In what concerns Xerox PARC, enough stuff on Bitsavers, about how everything run on top of those graphical workstations, in a distributed computing environment.

Likewise ETHZ IT department during the late 1990's used several Oberon workstations.


Thanks!


Don't those systems have substantially less capabilities? In particular, swap/paging, virtual memory, multi-processor support, and low-ish latency?

GC is easy if you don't have to worry about those problems.


The Xerox systems were more modern and demanding than UNIX, for their time.


For their time, sure.

But that doesn't help with today's devices. Even $10 boards are multicore now and come with hundreds megs of RAM.

At least in garbage collected OS design, the lessons from 1970's and 1980's are interesting, but do not directly apply to modern systems.


It makes some things easier, and some things harder, or more complex.

So I can deny that most development becomes easier with GC. At least if you also intend to actually run the thing. Running Java means forever fighting the GC, needing to do runtime and development to reduce impact of inherent GC problems.


You are right that I was talking about delivering a first working version, not about tuning performance. It could be argued that optimizing for memory (de)allocation so early in the development is a case of premature optimization. Sadly, I don't know a (good) path from gc-based development to safe manual memory management, so I'll keep that counter-argument for the day I see one such path :)


Yeah, the problem with that kind of thing is that GC languages disincentivizes thinking about ownerships, and tacking ownership on after the fact is basically a rewrite.


Now that Haskell has linear types, perhaps we'll see some new ideas emerge?


But it's a Java issue: Java relies on Heap Objects way too much. C# and the CLR behind it has a much better support for value types and stack allocations, where some optimization can even be done by the JIT compiler.


While I don't have much experience with C#, and I agree that Java is the worst, this at least applies to Go as well. Though to a lesser degree.


Rust is memory managed, it just doesn't have a garbage collector.


It's a static garbage collector


Or a garbage denier


Many common device class supports user mode driver. I think that supports writing in memory managed languages


Yeah, this is one of the reasons why I think a microkernel would be excellent for today's computing.


C# is my main language but this is one of the few cases where it's not the best choices.


How would this work? And why would it be a better idea?


> How would this work?

Don't know exactly what you're asking.

> And why would it be a better idea?

Poorly written device drivers are a significant attack vector. It's one of the reasons Linux is now exploring using Rust for its own device drivers.[0] You may be asking -- why Rust and not some other language? Rust has many of the performance and interoperability advantages of C and C++, but as noted, makes certain classes of memory safety issues impossible. Rust also has significant mindshare among systems programming communities.

[0]: https://rust-for-linux.com


Yeah okay but device drivers in C#? That sounds very impractical when you could pick any other language. C, C++, Rust, D, Zig, Objective-C, all seem more suitable than C#. I agree with all that stuff you said about Rust; I was asking how drivers in C# would work


Like this, for a possible example,

https://github.com/WildernessLabs/Meadow.Foundation


Look into the Singularity project.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: