Hacker News new | past | comments | ask | show | jobs | submit | t0b1's comments login

The bin packing will probably make it slower though, especially in the bool case since it will create dependency chains. For bools on x64, I don‘t think there‘s a better way than first having to get them in a register, shift them and then OR them into the result. The simple way creates a dependency chain of length 64 (which should also incur a 64 cycle penalty) but you might be able to do 6 (more like 12 realistically) cycles. But then again, where do these 64 bools come from? There aren‘t that many registers so you will have to reload them from the stack. Maybe the rust ABI already packs bools in structs this tightly so it‘s work that has to be done anyway but I don‘t know too much about it.

And then the caller will have to unpack everything again. It might be easier to just teach the compiler to spill values into the result space on the stack (in cases the IR doesn‘t already store the result after the computation) which will likely also perform better.


Unpacking bools is cheap - to move any bit into a flag is just a single 'test' instruction, which is as good as it gets if you have multiple bools (other than passing each in a separate flag, which is quite undesirable).

Doing the packing in a tree fashion to reduce latency is trivial, and store→load latency isn't free either depending on the microarchitecture (and at the counts where log2(n) latency becomes significant you'll be at IPC limit anyway). Packing vs store should end up at roughly the same instruction counts too - a store vs an 'or', and exact same amount of moving between flags ang GPRs.

Reaching 64 bools might be a bit crazy, but 4-8 seems reasonably attainable from each of many arguments being an Option<T>, where the packing would reduce needed register/stack slot count by ~2.

Where possible it would of course make sense to pass values in separate registers instead of in one, but when the alternative is spilling to stack, packing is still worthy of consideration.


> Reaching 64 bools might be a bit crazy, but 4-8 seems reasonably attainable from each of many arguments being an Option<T>, where the packing would reduce needed register/stack slot count by ~2

I don't have a strong sense of how much more common owned `Option` types are than references, but it's worth noting that if `T` is a reference, `Option<T>` will just use a pointer and treat the null value as `None` under the hood to avoid needing any tag. There are probably other types where this is done as well (maybe `NonZero` integer types?)


Rust has a thing called the Guaranteed Niche Optimisation, which says if you make a Sum type, and the Sum type has exactly one variant which is just itself, plus exactly one variant which has a niche (a bit pattern which isn't used by any valid representation of that type) then it promises that your type is the same size as the type with the niche in it.

That is, if you made your own Maybe type which works like Option, it's also guaranteed to get this optimisation, and the optimisation works for any type which the compiler knows has a "niche", not just obvious things like references or small enumerations, the NonZero type, but also e.g. OwnedFd, a type which is a Unix file descriptor - Unix file descriptors cannot be -1, and so logically the bit pattern for -1 serves as a niche for this type.

I really like this feature, and I want to use it more. There's good news and bad news. The good news is that although the Guaranteed Niche Optimisation is the only such guarantee, in practice the Rust compiler will do much more with a niche.

The bad news is that we're not allowed to make new types with their own niches (other than enumerations which automatically get an appropriately sized niche) in stable Rust today. In fact the ability to mark a niche is not only permanently unstable (thus usable in practice only from the Rust stdlib) but it's a compiler internal feature, they're pretty much telling you not to touch this, it can't and won't get stabilized in this form)

But, we do have a good number of useful niches in the standard library, all references, the NonNull pointers (if you use pointers for something), the NonZero types, the booleans, small C-style enumerations, OwnedFd, that's quite a lot of possibilities.

The main thing I want, and the reason I tried to make more movement happen (but I have done very little for about a year) is BalancedIx a suite of types like NonZero, but missing the most negative values of the signed integers. You very rarely need -128 on an 8-bit signed integer, and it's kind of a nuisance, so BalancedI8 would be the same size, it loses -128 and in exchange Option<BalancedI8> is the same size and now abs does what you expected, two for the price!


Yeah, `NonZero*` but also a type like `#[repr(u8)] enum Foo{ X }`, according to `assert_eq!(std::mem::size_of::<Option<Foo>(), std::mem::size_of::<Foo>())` you need an enum which fully saturates the repr, e.g. `#[repr(u8)]Bar { X0, ... X255}` (pseudo code) before niche optimization fails to kick in.


Oh, good to know!


Maybe people don‘t because they see, for example, Valve (a billion dollar company) struggling to get GNOME to implement drm-leasing for VR headsets. IIRC they‘ve been at it for multiple years, too.

Or maybe it‘s because the compositor developers are not exactly concerned about ease of development. To quote a GNOME dev[0] about support for the aforementioned drm-leasing protocol:

> I honestly don't have a problem with forcing clients to implement the portal if they want to work on mutter.

I wouldn‘t blame the people who choose to simply not engage with that process, especially those who work on these things in their free time.

[0]: https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/2759


> exotic things like take a screenshot

I'm not sure but is this meant ironically? Because taking a screenshot is a thing people do very often. Capturing your screen is, too. Even by third-party programs. And yet they never put it in the protocol but just said to go to dbus.

> GNOME, KDE, and wlroots have each implemented a different protocol

Is that true? My understanding was that GNOME never put forth a protocol but always did their dbus thing. I know that wlroots proposed a bunch of protocol extensions but those never went anywhere since, well, GNOME didn't want to implement them[1]. Nowadays I think every compositor(?) just implements the xdg-portal stuff.

[1]: https://gitlab.freedesktop.org/wayland/wayland/-/issues/32#n...


I think to the Wayland folks, anything other than using Windows is an exotic use case. They’re not entirely wrong, but if we wanted to use Windows, we … would be using Windows. The great thing about Linux was that we could do more than Windows or macOS permitted.


Weird take. Windows and macOS both let you do things Wayland doesn't currently have widely adopted protocols for. Only thing Xorg can do that any of the others can't is let a client application manage windows, and it's only macOS that really doesn't like that.


> I'm not sure but is this meant ironically? Because taking a screenshot is a thing people do very often.

Yes, I was intending to mock Wayland for failing, after 15 years of development, to have a single way of taking screenshots that works in all environments. I understand wanting to make things modular and flexible and make as much optional as possible[0], and I understand wanting to ship the core protocol first and then follow up with drafting other things and building consensus, but the fact that this thing was allowed to hit end-users with screenshots this broken - and that it still hasn't been fixed, and AIUI never will be fixed because GNOME seems to think that the situation doesn't need fixing - is a fairly damning incitement of Wayland as a whole.

> Is that true? My understanding was that GNOME never put forth a protocol but always did their dbus thing. I know that wlroots proposed a bunch of protocol extensions but those never went anywhere since, well, GNOME didn't want to implement them[1]. Nowadays I think every compositor(?) just implements the xdg-portal stuff.

I think it's true, though it's honestly hard to tell. Most of the screenshot tools I could find are using the wlroots protocol, KDE's official tool notes,

> Spectacle is a screenshot taking utility for the KDE desktop. Spectacle can also be used in non-KDE X11 desktop environments.

which doesn't really explicitly say much, and in fact the only tool I could find that claimed to be able to support everything was ksnip, which seems to work fine with wlroots but beyond that https://github.com/ksnip/ksnip#known-issues outlines the situation well enough; KDE is at least only temporarily broken, but GNOME isn't going to improve because GNOME did that on purpose. Now, that readme says you can use xdg-desktop-portal, but I have a GNOME+Wayland machine on hand, and I couldn't get it to actually work. I think what's supposed to happen is that every time I do a screenshot it prompts for permission, which I wanted to verify so I could complain that that was totally unreasonable, but what actually happens is that it just fails, which is... not better. Oh, and while searching for solutions to that I found flameshot, but that just refuses to even run. So... maybe someday the portal solution will work; in the meantime, I feel comfortable describing the situation as Wayland not having a uniform working way of taking screenshots.

[0] In particular, so we can avoid the situation from X11 where a load of drawing primitives are baked in that nobody has any use for anymore.


That‘s a very nice writeup, I like the style. The mix of text to illustrations/memes was really pleasent. I have my reservations about the RISC/CISC nomenclature but I guess that‘s „each to their own“ >.>

As someone who has spent some time figuring out how parts of the kernels work I can sympathize with the pain it probably was (but well worth it given the article imo).

For NT, I think that Windows Internals covers a lot about the stuff one wants to know and Microsoft‘s documentation is also not bad (certainly better than Linux‘s kernel docs imo); it‘s a really good starting point.

For more info about Windows I can recommend gamehacking forums/resources. There‘s a lot of filtering needed but they are a pretty good source of info for niche things sometimes.

As a last note, I noticed that the font of some code blocks are pretty large when viewed on my smartphone making them hard to read (e.g. Ch. 6/main.c)

P.S.: > If you are a teenager and you like computers and you are not already in the Hack Club Slack, you should join right now

Way too remind me that I‘m getting old lol


How so? The bootloader already runs in long mode with UEFI and it also takes care of bringing up other cores so this is not really a problem in my experience.


UEFI doesn't really take care of bringing up the other cores. There's a way to get access to them, but you have to give them back to UEFI before exiting boot services, so that doesn't really help your kernel. You still need to do the SIPI into x86-16 code to really take control of the AP cores.

Even the new way (that isn't actually implemented AFAIK) just SIPIs into long mode code, it doesn't use the UEFI multicore stuff.


Linux still needs to use a Startup IPI to bring up other CPUs, which starts them running in 16-bit mode at an address under 1MB.

Having a fully 64-bit way to bring up other CPUs would simplify that.


That reminds me of a somewhat funny story. I was listening to a presentation about a startup with about 10 people as you mentioned doing hospital digitalization (gist of it seemed to be sending patient data from one doctor to another). They explained that they wanted to choose a “robust” architecture so they chose microservices (kubernetes and all). I thought that was a bit odd for something hosted at a hospital locally especially because they had like 7 or so services (and maybe even two databases but not sure on that one). Well, we later asked them how much data they were even handling and (after a bit of side-stepping) they said it was around 200mb excluding images. That was my “but why?” moment. Apparently you need 7 services to have a web frontend for 200mb of data with most of it probably never accessed and hospitals can suddenly get three new floors overnight.


CERN at least seems to be using a solution that is running over Ethernet[1] but with custom hardware that is probably fairly expensive. They use a single time source and then measure the delay between each switch/node. Though, this is limited by needing to be able to run a cable between each node so idk how your definition of a distributed experiment fits.

[1]: http://white-rabbit.web.cern.ch/


Depends on the university - and course. I‘m currently at TUM (CS) and you don‘t need to submit any homework assignments to be admitted to an exam. Though there are some courses where part of or the complete grade are made up of homework assignments. Cheating is still a problem nonetheless and from personal experience and being involved with grading seems to have gotten bigger with online examinations (obv), closed or open book.

One interesting approach I saw one course do to handle online exams is to have an unsupervised open book exam with some randomization in the problem statements and then inviting a random subset of students for an oral re-examination. Basically just a 10-20 minute conversation in which you should explain how you went about answering the questions. The idea seems to be to do that often enough and you will find the cheaters who will be unable to explain how they derived their answers. I wonder if that‘s an approach that could work generally and maybe even for homework assignments though the effort becomes quite considerable then.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: