Hacker News new | past | comments | ask | show | jobs | submit login
FreeBSD replaces bubblesort with mergesort on SYSINTs (twitter.com/cperciva)
339 points by gslin on Aug 21, 2023 | hide | past | favorite | 297 comments



To sum up what I learned from the comments:

Is FreeBSD stupid for having used bubblesort to begin with? NO. It worked well for decades before surfacing as a problem in this extreme and unanticipated use case.

Is optimizing this a waste of time? NO. The use case revolves around super frequent bootups to power lambda and for this type of thing every ms matters.


yes, it was stupid to use bubble sort to begin with; there is never a good reason to use bubble sort

    for (size_t i = 0; i < n; i++) {
        for (size_t j = 1; j < n - i; j++) {
            if (a[j-1] > a[j]) {
                item_t tmp = a[j];
                a[j] = a[j-1];
                a[j-1] = tmp;
            }
        }
    }
when you could use insertion sort

    for (size_t i = 1; i < n; i++) {
        for (size_t j = i; j > 0; j--) {
            item_t tmp = a[j];
            if (tmp > a[j-1]) break;
            a[j] = a[j-1];
            a[j-1] = tmp;
        }
    }
which is no more complicated (both are 17 amd64 instructions with -Os), but twice as fast in the random case, and enormously faster in the sorted and almost-sorted cases

(above code only minimally tested: https://godbolt.org/z/43dqz9Gq8)

this is not much of a criticism of freebsd; if you polished your codebase until there was nothing stupid in it, you'd never ship anything. freebsd's code quality is quite high, but probably we can all agree that this was stupid

of course it wouldn't be nearly as fast as mergesort on large inputs, but mergesort is more complicated

(also mergesort is slower on small inputs than insertion sort or bubble sort, but in the case where you're only sorting something once at startup time, that probably isn't important. it might matter if you were sorting a million lists of ten items or something)


> there is never a good reason to use bubble sort

I used to believe this but then a friend pointed out a real use case for bubble sort: when you have an array that is always mostly sorted, and real-time performance is more important than perfectly correct ordering. Then running one pass of bubble sort each cycle around the main loop is somewhat satisfyingly ideal.

One pass of insertion sort would be too expensive, and bubble sort is guaranteed to make incremental progress each pass.

This is sort of like how a linked list is never the right solution... unless it's one of those cases where intrusive linked lists are ideal. Except while intrusive linked lists intrude in space, intrusive bubble sort kind of intrudes in time.


I remember a website that "raced" different sorting algorithms visually but I don't remember which website, or which oddball algorithm that stood out to me as being especially good for "continuous partial ordering".

Most sorts are designed to "really sort something", however I'm very interested in eg: ranking preferences of movies, and in cases like that there may never be "the one true ordering" ... however the mechanism for getting things "closer to sorted" is incredibly close to the mechanism you mentioned: multiple passes, and continual, incremental progress.

My imaginary GUI for interactive sorting in this manner is something like starting with a few "A vs. B" of a generally accepted "top 10/100 vs generic v pretty bad", and then morph to a "drag this new movie in between a few other movies either above, between, or below"

There are sorting algorithms for computers (eg: compare + swap), but fewer sorting algorithms for humans (eg: bucket sort / postman's sort - https://en.wikipedia.org/wiki/Bucket_sort#Postman's_sort ) which can take advantage of "not just compare and swap" to get things "close to correct" quicker.


Have you read Judea Pearl's paper on Theoretical Bounds on the Complexity of Inexact Computations? It's an old paper from 1976. The abstract is as follows:

Abstract-This paper considers the reduction in algorithmic complexity that can be achieved by permitting approximate answers to computational problems. It is shown that Shannon’s rate-distortion function could, under quite general conditions, provide lower bounds on the mean complexity of inexact computations. As practical examples of this approach, we show that partial sorting of N items, insisting on matching any non zero fraction of the terms with their correct successors, requires 0 (N log N) comparisons. On the other hand, partial sorting in linear time is feasible (and necessary) if one permits any finite fraction of pairs to remain out of order. It is also shown that any error tolerance below 50 percent can neither reduce the state complexity of binary N-sequences from the zero-error value of O(N) nor reduce the combinational complexity of N-variable Boolean functions from the zero-error level of 0(2N/N).


Never in a million years, but that's exactly the angle I'm approaching it at. I'll look it up, but it intuitively says similar to what I'm thinking. "Linear-time Sort-ish..." and I guess the "extra" work is performing the comparison for the new item to insert and validating the comparison (implicitly) by confirming the order of the "extra" items you're compared against.

When humans are tasked with the comparison, it doesn't have to be strictly pair-wise and instead can devolve (kindof) into the various kinds of "sorting networks" that minimize the cmp-swap given a limited subset of the list.

Thanks for the reference!


maybe you could use a multi-pivot quicksort or an insertion sort where you drag and drop each new item into any position in an already-sorted subset


(upon further investigation that's what the cited "bucket sort" is actually)


good point, i never thought of that

i feel like one pass of insertion sort (inserting one item into the sorted run) or selection sort (selecting one minimum or maximum from the unsorted run) would work tho


I would trust myself more to implement bubble sort than to implement insertion sort. Not that I couldn't implement insertion sort, I've done it before, but if I was in a context where I had to implement a quick and dirty ad-hoc sorting algorithm which I thought would only ever have to sort a few items, I would probably go with bubble sort.

If I suspected at all that the algorithms would have to sort a decent number of items, I wouldn't consider using an O(n^2) sort at all and would take the time to find a good implementation of (or implement) one of the O(n log(n)) algorithms.


my subjective experience writing the above comment as documented in https://news.ycombinator.com/item?id=37206998 may be a relevant data point; note that the insertion-sort code is subtly buggy, and the bubble-sort code isn't


Pratt–Shell is only mildly more complex than insertion sort and gives a good worst-case performance O(n log(n)^2). It may have a place sometimes.


Personally I find insertion easier to reason about once you internalized the idea of a memmove.


To me, it seems like bubble sort is almost inherently easier to reason about.

One iteration of bubble sort is: find two elements in the wrong order and swap them.

One iteration of insertion sort is: find an element out of order, then find the location in the array where it should've been, then move all the items between those two slots one slot ahead, then insert the out-of-order element into the now free slot where it belongs.

Admittedly, insertion sort is pretty intuitive, since it reflects more or less how a human would instinctively try to sort a list; take an out-of-order element and move it to where it belongs. But when translated to code, I think it's very hard to beat "find two items in the wrong order and swap them".


A lot of people look at sorting algorithms with the mental model of a 1970s era PDP-8. They analyze scalar instructions as if they are being executed sequentially.

Once you admit superscalar execution (multiple loadstore pipes), out of order execution and caches into the picture ( as well as the possibility of almost sorted arrays as input) the picture is never as simple.

That said, I’m curious whether insertion sort is categorically better than bubble sort. I thought they might have been equivalent for the almost sorted case. I’ve heard both sides argued on this thread.


the vast majority of computers still aren't superscalar, much less ooo; for every i7 there are 100 cortex-m0+s, which don't even have caches. gpus aren't superscalar or ooo either, just heavily multithreaded, simt, and simd

even if insertion sort reliably beats bubble sort on your ssd ftl chip and your wristwatch, though, it seems like there are cases on modern high-performance hardware where insertion sort is worse than bubble sort (cf. https://blog.reverberate.org/2020/05/29/hoares-rebuttal-bubb...) which is very surprising to me


Per the wiki:

> Adaptive, i.e., efficient for data sets that are already substantially sorted: the time complexity is O(kn) when each element in the input is no more than k places away from its sorted position

In fact it’s often a sub-sort of hybrid sorts, like timsort:

> If a run is smaller than this minimum run size, insertion sort is used to add more elements to the run until the minimum run size is reached.


i think that adaptive time complexity bound holds for both bubble sort with early exit and insertion sort


I trust myself most to implement selection sort, its the most intuitive to me: just keep finding the next smallest element in the array and place it in its position.

I can never remember the loop bounds of bubble sort, always have to look it up.


I just remember something from basic math. Exponential curve < linear for small values.

That said a lot of time I have small arrays of objects. Instead of keeping them sorted I just grind through them to find the next largest value. That tends to be hard to get wrong.


Exponential can be better than linear for small n depending on your constant factors, but it's never a given. E.g. `2n` < `10^n + 5` for all n.


If you find yourself having to re-implement a sorting algorithm ad-hoc, you are probably using the wrong language. (Especially these days.)


i think it's pretty common for a specialized reimplementation of a standard library type or function to outperform it by an order of magnitude or so in some particular case. chris wellons gave an excellent example of this with hashing in golang in https://nullprogram.com/blog/2023/06/26/

commonly (as in wellons's post about the two-sum problem) there's a variation on the standard algorithm that can help significantly, but which the standard library function doesn't have an option for

also i think the cost of reimplementing simple 'algorithms from the book' like insertion sort or even quicksort is pretty low; as you saw in my comment above, insertion sort is six lines of code and only moderately bug-prone. we aren't talking about fibonacci heaps or red-black trees or fm-indexes or rader's algorithm or something. the cost is low enough that it's easy to justify in a lot of cases


What language should they have used instead of C?

And given that C's stdlib has a quicksort function, how would using a different language have helped?


Back then C might have been the best bet.


...or you're implementing a kernel and don't have full access to a libc.


Who says you have to use libc?

Eg Rust has no-std crates that you can use even if you can't use the standard library.


I think we have different definitions of stupid - the original implementation was perfectly fine for the time and requirements it was written for.


But it could have been better for no additional effort. Doing something worse when the better option is just as easy suggests incompetence.


Yeah, but it was working.

And maybe it wasn't 'no additional effort' because they were more confident with quickly writing a bubble sort.


Doing something better when you already have something that works suggests a lack of time management xP


At some point there was no sort, and then somebody implemented the sort. That's the point in time that I'm talking about.


> But it could have been better for no additional effort.

Not if it was already written.

I don't know anything about this particular codebase, but one thing I have learned over the years is that when you are tempted to say "such and such decision was stupid" in a production codebase, you are almost always missing something.


Never underestimate incompetence. It usually succeeds despite itself.

It also could have been apathy: knowing there was a better solution and just not caring what was implemented. I feel like this is more common than incompetence, especially in larger projects or organizations. Nature tends to take the path of least resistance.


i wouldn't go so far as to say incompetence

even the best of us do stupid things sometimes


Then the requirements were terrible?


I don't think the 90's developers could have forseen that 2ms on boot would be huge for cloud serverless compute like lamdba.


True, but a lesson we frequently have to learn over and over again is that the code is in software in widespread use, the only valid values for collection sizes are 0 and N -> ∞.


> the only valid values for collection sizes are 0 and N -> ∞.

Of course there are valid values below that...

If they were designing to a collection size of N -> ∞ then they shouldn't have used Insertion Sort either then, because it is O(n²) in the worst case.


True, but make that valid value at least something like 1 million.

It's trivial to start with something that gets human input and someone, somewhere, decides to plug the thing into something else that generates that input.

And we all know that if computers are good at something, it's doing a lot of small things millions of times.


On an OS whose only gimmick is that it's fast and stable? Idk, maybe they could've.

Besides there's little reason for any codebase of any kind of ever contain bubble sort (except for an academic one, showing it as an example of a bad one). We're talking 20 years ago, not 2000 years ago, the basic sorting algorithms were pretty well researched and established by then.


For reference the original code was a bit more complicated than just sorting integers which may also have been a reason for the choice.

        /*
         * Perform a bubble sort of the system initialization objects by
         * their subsystem (primary key) and order (secondary key).
         *
         * Since some things care about execution order, this is the
         * operation which ensures continued function.
         */
        for( sipp = (struct sysinit **)sysinit_set.ls_items; *sipp; sipp++) {
                for( xipp = sipp + 1; *xipp; xipp++) {
                        if( (*sipp)->subsystem < (*xipp)->subsystem ||
                            ( (*sipp)->subsystem == (*xipp)->subsystem &&
                              (*sipp)->order < (*xipp)->order))
                                continue;       /* skip*/
                        save = *sipp;
                        *sipp = *xipp;
                        *xipp = save;
                }
        }

https://github.com/freebsd/freebsd-src/blob/2b14f991e64ebe31...


this is actually selection sort; the comment is wrong


That is a pretty interesting observation, I don't think I would have spotted that, especially given everyone have been saying it was bubblesort.


i probably wouldn't have spotted it without trying to figure out which variant of bubblesort it was: counting up? counting down? does it adjust the inner loop size? early exit?

selection sort has the potential advantage for things that are larger than your key that it can avoid copying the entire item, just doing a single swap at the end of the inner loop, but it's not implemented that way here. that would have looked like this

    size_t m = 0;
    register unsigned int subsystem = (*sipp)->subsystem;
    register unsigned int order = (*sipp)->order;
    for( xipp = sipp + 1; *xipp; xipp++) {
        register unsigned int newsubsystem = (*xipp)->subsystem;
        register unsigned int neworder = (*xipp)->order;
        if( subsystem < newsubsystem ||
            (subsystem == newsubsystem && order < neworder))
                  continue;       /* skip*/
        m = xipp - sipp;
        subsystem = newsubsystem;
        order = neworder;  /* enjoy the sorting */
    }

    save = sipp[0];
    sipp[0] = sipp[m];
    sipp[m] = save;
on the other hand if you were going to put effort into optimizing this you probably would have used mergesort or something


I guess the bigger problem is using a language like C that's almost incapable of providing useful libraries. So everyone is re-implementing everything from scratch all the time.


qsort is part of the ISO C90 standard though.


When this code was written there was no "amd64" though, so you would have to compare them on the original/typical hardware from then?


> there is never a good reason to use bubble sort

Not true. Bubble sort is excellent for sorting very small arrays.

See "Hoare’s Rebuttal and Bubble Sort’s Comeback (2020)" under "Fallback for sorting short arrays" and the performance results presented therein [0]. Discussion [1].

[0]: https://blog.reverberate.org/2020/05/29/hoares-rebuttal-bubb...

[1]: https://news.ycombinator.com/item?id=23363165


If you find yourself sorting small arrays, may I suggest a sorting network? https://en.m.wikipedia.org/wiki/Sorting_network


remarkable, thank you

all of the bubble sorts i tried compiling in this thread ended up being compiled to code with either conditional branches or (on arm thumb-2) a couple of conditionalized stores. but conditional stores don't require a pipeline flush, and it is of course perfectly correct that insertion sort depends on its inner loop having an unpredictable branch to get better performance on almost-sorted data, which bubble sort does not


after thinking about it some more, i was skeptical that this really implements bubble sort, but after trying it, it does

python version of the inner loop

    max_ = arr[0]
    for j in range(1, i):
        y = arr[j]
        arr[j - 1] = min(max_, y)
        max_ = max(max_, y)

    arr[i - 1] = max_


Neither of your code snippets are working, or are an actual implementation of bubble sort.

Since what insertion sort is faster that bubble sort on sorted data?


i tested both of them on some input data and they both sorted it correctly, though of course they could contain bugs (if so i'd be grateful if you pointed them out)

admittedly i probably added the godbolt link showing this after you wrote your comment

perhaps you can elaborate; i couldn't understand your last sentence or why you think the bubblesort function above doesn't bubble-sort


Because it misses a crucial step, early exit if there were no swaps. This way it would be extremely efficient in case of already sorted data.

You should probably check wikipedia page. Here: https://en.wikipedia.org/wiki/Bubble_sort.


i think i wrote that page originally 22 years ago, before the switch from usemodwiki to mediawiki; while i've often been referred to wikipedia to correct my errors, i think this is the first time i've been referred to a wikipedia page that i wrote in the first place :)

it's true that, if your data is exactly sorted, or has a small number of items that are far too early (e.g., 43 3 4 5 6 7 8 9 10), you can modify bubble sort to notice this and exit early, at the expense of making it more complicated and even slower in the average case

however, this doesn't help if instead you have a small number of items that are far too late (e.g., 4 5 6 7 8 9 10 43 3), in which case bubble sort takes the same number of passes as it normally would, while insertion sort takes linear time. of course you can complicate bubble sort further into "shaker sort" to handle this, which makes it even more complicated but at least doesn't slow it down any further

but all of this is sort of futile because it's still half as fast as insertion sort in the case of unsorted input, and even without the tweaks, no simpler



well, that's just the first mediawiki revision of the page; i was working on it earlier than that, which is why i was editing the subpage link syntax in that version, but the usemodwiki history has been lost

someone else might have written the original version, i can't remember any more, but i think it was probably me


I do not have access to the historic definition of bubble sort, but all the modern implentation i've seen so far, all have a check if there were no swaps. I absolutely cannot wrap my mind how exactly it would make it slower in the average case, I almost certain it won't, how exactly this would make code sgnificantly more complex to any meaningful extent.


we're talking about six lines of code if we don't count the curly braces (which can be omitted)

bubble sort with early exit is

    for (size_t i = 0; i < n; i++) {
        bool sorted = 1;
        for (size_t j = 1; j < n - i; j++) {
            if (a[j-1] > a[j]) {
                item_t tmp = a[j];
                a[j] = a[j-1];
                a[j-1] = tmp;
                sorted = 0;
            }
        }
        if (sorted) break;
    }
which is three more lines of code, which is in some crude sense a complexity increase of 50% over both the simple bubble sort in my earlier comment and over insertion sort

if you compile it https://godbolt.org/z/jdzov8cPn you can see that the inner loop goes from 10 instructions to 11. that doesn't guarantee that it's 9% slower (that depends a lot on the platform, and in particular on a big superscalar cpu it probably just uses a little more power but the same number of clocks) but it does tend in that direction. of course occasionally it will save you a pass or two over the array, which will tend to compensate, but it won't usually save you 9% of the passes

if some item is in position n and needs to be in position m, where m < n, it will require n - m passes of bubble sort to get there. a little thought suffices to show that, for randomly shuffled input, where m = 0 the item has to move about half the size of the array, where m = 1 it has to move about half the size of the array minus one half, where m = 2 it has to move about half the size of the array minus one; but it's the maximum of these three numbers which determines the number of passes after which all three of those positions will have the correct item, and that maximum is heavily skewed to be almost the entire size of the array. the chance that none of those three items is in the last 25% of the array originally is only .75*3 = 42%. and as the value of three increases, it gets worse. so in general early exit doesn't buy you much

as for modern implementations of bubble sort, they're all either didactic classroom exercises like this one or careless mistakes like the original freebsd code; because insertion sort always beats it, there aren't like a community of bubble sort implementors who have annual bubble sort conferences. so you can expect some variation in precise terminology and shouldn't worry about it

(fwiw clrs 3ed. gives bubblesort without an early termination test, as i did; their definition is in exercise 2-2 on p. 40. but they're counting in the opposite direction, so everything i said above about items moving forward quickly and backward slowly is reversed for the clrs version of bubblesort)


Your original claim that is more complex or slower is unsubstantiated. Yes, it is trivially more complex to add a flag, but it does not actually make it any more difficult to understand or error prone. Your argument is way too pedantic. Claim about the performance is equally unsubstantiated. To make a proper proof you need to show that on average, we still have all N outer passes executed, before we face no swaps situation. Which almost certainly is never true, especially with non-random data.

Although I certainly agree that Bubble Sort is almost always a bad choice, it still is used in computer graphics, or other situations when you have only one or two misplaced pairs of elements. In this case it will outperform the other algorithms, in average case while maintainig functionally in the less than ideal situation.

Besides it is entirely irrelevant what are the purposes the algorithm is used today, didactic, deliberate or poor judgement, the de facto standard way of doing it today is to check for the early exit. Your argument about poor performance on the already sorted data is a straw man argument, your are stubbornly defending. Well ok, if you believe so...


if you aren't convinced that adding the early exit generally hurts performance, well, it's not surprising; measuring performance is difficult and tricky, and you're faced here with claims you don't really have the knowledge to evaluate properly, so it's entirely proper that you are dubious about them, even if you're confused about what a proper proof would consist of

it might help your intuition to think of it this way:

1. with random input, the first ten positions of output are pretty likely (65%) to have an item that was originally in the last 10% of the input, and almost certain (97%) to have an item that was originally in the last 30% of the input. so those ten items alone account for needing 95% of the passes you'd need without an early exit, minus about five, in 65% of cases.

2. the first 20 positions are pretty likely (64%) to have an item that was originally in the last 5% of the input, and almost certain (96%) to have an item that was originally in the last 15% of the input. so those 20 items account for needing 97% of the passes you'd need without an early exit, minus about ten, in 64% of cases.

3. the first 30 positions are pretty likely (64%) to have an item that was originally in the last 3.3% of the input, and almost certain (96%) to have an item that was originally in the last 10% of the input. so those 30 items account for needing 98.5% of the passes you'd need without an early exit, minus about 15, in 64% of cases.

of course, if you're only sorting 30 items, the first ten positions of output are actually a third of all the positions, so "minus five" is pretty significant. and probably if you're sorting more than about 30 items at a time you ought to use a linearithmic sort or a linear-time radix sort instead of a quadratic sort

similarly, it's reasonable to argue that complexity (in the sense of containing many parts, not in the sense of asymptotic computational work required) is subjective, and to disagree about it

but i think these are sort of moot points

insertion sort still beats bubble sort in the situations where you have only one or two misplaced pairs of elements, and generally replaces it as soon as somebody notices

as for 'the de facto standard way of doing it today', i suggest you argue with clrs, not with me


Count number of outer passes of insertion sort (it is N) vs bubble sort (with early exit; it is 2) if you have only a single pair misplaced.


insertion sort only ever makes one outer pass; it does almost exactly half as many comparisons as bubble sort (n rather than 2n-2) in that case, and the same number of swaps (1)

the wikipedia article has a cleverer early-exit version of bubble sort that keeps track of the last pair it had to swap; it will do 1.5n-1 comparisons in that case on average (because the swapped pair is on average in the middle of the array)

above i linked to working code on godbolt (modulo the extra unnecessary swaps on duplicate keys that you were kind enough to point out earlier); edit a comparison counter and a swap counter into it and you'll see


The early exit can be done without an extra flag, but increased code size.


If you look at the actual change in FreeBSD (https://cgit.freebsd.org/src/commit/?id=9a7add6d01f3c5f7eba8... ) you'll see that the implementation of bubble sort used in the FreeBSD kernel was not doing early exits.

  /*
  *
  * Perform a bubble sort of the system initialization objects by
  * their subsystem (primary key) and order (secondary key).
  */
 TSENTER2("bubblesort");
 for (sipp = sysinit; sipp < sysinit_end; sipp++) {
  for (xipp = sipp + 1; xipp < sysinit_end; xipp++) {
   if ((*sipp)->subsystem < (*xipp)->subsystem ||
        ((*sipp)->subsystem == (*xipp)->subsystem &&
         (*sipp)->order <= (*xipp)->order))
    continue; /* skip*/
   save = *sipp;
   *sipp = *xipp;
   *xipp = save;
  }
 }
 TSEXIT2("bubblesort");


i very much appreciate you and erk posting this because we can also now see that it wasn't actually bubble sort but selection sort


meh, an optimization, hardly "not working"


It is not "an optimization", it is in the definition of Bubble Sort. What you have written is not a bubble sort, and therefore, by the fact of being not a bubble sort, is not a working bubble sort.


That's not part of the definition of bubble sort.

It's also not an improvement over the insertion sort. If you look at kragen's code, you'll see that running it on a list of e.g. 6 elements in ascending order will mean that you make 5 comparisons, swap 0 elements, and exit. (You do have to count from 1 to 6 as you do this.)

The insertion sort is implemented as a bubble sort, interestingly enough, but one that doesn't consider the entire array at every iteration. Given that it's equally correct, it's not difficult to understand why it will always be faster than a bubble sort that does consider the entire array at every iteration.


It is in the definition in Wikipedia and most other sources I have checked. All the implementations I have seen so far, check if any swaps have happened, and stop if they have not.


not clrs


In any case, you should retract you claim that Bubble Sort is slower on already sorted case. It is CS101, that bubble sort is one of the fastest in this case.


after i posted the below comment to say 'oh yeah, you're right, thanks' someonefromca, perhaps inadvertently, edited the parent to say something completely different from what i was agreeing with. it previously said that there was a bug in the insertion sort code because it would swap things repeatedly, which is true and is a bug. my reply to their original comment is below

oh yeah, you're right, thanks, it does do useless extra work if there are equal values in the list

it should say

    if (tmp >= a[j-1]) break;
however, this doesn't make it fail to terminate, which is a possible reading of your bug report


> above code only minimally tested

There you have your problem: any programmer can write a bubble sort while half asleep and inebriated at the same time, but for an insertion sort you actually have to think a little. There's much more chance of errors.


i'm not sure; i made a few mistakes in each of them as i was writing them, but they were both working by the time i first tested them. i do feel like the insertion sort was maybe a little more challenging?

edited to add: someonefromca spotted a bug in the insertion sort; where it says

    if (tmp > a[j-1]) break;
it should say

    if (tmp >= a[j-1]) break;
which is both a performance bug and also breaks the stability property the algorithm otherwise has

this is some evidence in favor of your point that insertion sort is more error-prone; even if i could have made the same error in the bubble sort, i didn't, and possibly that was because the rest of the algorithm required less mental effort?


fwiw that comparison was where I went "wait, is that right?" in my reading.

Maybe a fair judge of mental difficulty would say "you gotta write at least the loop invariant" (termination being too pedantically obvious) -- though admittedly I leave it out more than I used to.

When I was introduced to sorting as a new programmer, via bubblesort, I thought "why that way?" and came up with selection sort as a method that made more obvious sense. It seems to me like bubblesort's alleged intuitiveness is post hoc from the shortness of the code in terms of all-ascending for-next loops -- the intuitive reason it worked was just that the inner loop did move the next max into place just like selection sort, but with extra swaps along the way.


You should edit the code in your original comment, someone is bound to copy paste the code somewhere…


it's only editable for two hours, for better or worse


This has gotta be the most bullshit justification for writing bad kernel code that will be reviewed and tested by 200 people before it reaches production. Literally anything written as low level as this has the potential to crash the system on a dime and needs to be thoroughly tested.


Old code in BSD wasn't thoroughly tested except by virtue of being old; there's no QA department and traditionally UNIX code was tested by thinking about it really hard rather than automated testing.

(FreeBSD got a test suite in 10.1, which came out in 2014, not that long ago in UNIX terms.)


Not just in BSD... GNU wasn't much better.

People would be shocked by how little the core tools we use were tested automatically. Some still aren't.


Quite.

Thankfully the practice of just randomly shelling out to 'random' programs (like the core tools) has been stemmed somewhat, so perhaps we're mostly safe from RCE via that vector. Downloaded files might still be a vector, tho. Maybe I'm an optimist even though I sound very pessimistic.


If your testing infrastructure is so anemic that a sorting algorithm (of all things) is hard to subject to e.g. property-based/fuzzing tests then it'd probably be worth it to expand that before doing anything else.

I dunno what FreeBSD's test infrastructure is, but I'm just saying that "it needs testing" is not a good reason not to change to a better sort algorithm in this case.


if only the people working on bsd in 01993 had had access to your wisdom, think how much more successful they would have been


have you ever been bitten by typing a year in a C/C++ program and getting it inadvertently converted to octal?


heh


This is about changing the sorting algoritm. No snark needed. It should be trivial to add a test for the current algorithm (whenever it was introduced) and to replace it afterwards. Heck, even keep running both for bisimulation testing.


probably colin was more interested in booting faster than in improving testing infrastructure

if your interests run the other way, that's great, and hopefully you will succeed in improving freebsd's testing infrastructure, but it's hardly a reason for colin not to speed up booting

but i don't think i saw anyone claiming that the change should not be made because it wouldn't be adequately tested?


I don't think you fully read the post I was replying to. I was just saying that that poster's argument against changing the sorting algorithm was misguided.

Quote:

> will be reviewed and tested by 200 people before it reaches production

I was only saying that this is a bad argument for changing to any simple-ish sorting algorithm. (Because: add tests if you need assurance)

Granted, if you're doing super-complicated sorting algos with fallbacks for different sizes, heuristics for detecting semi-sorted input, etc. you might want something more sophisticated than trivial property-based checking.


i read that as an argument against having done it that way 30 years ago, not against changing it


No worries!


One of the truths I've come across over my years is that almost without fail whenever code is mentioned, someone will come across to show how it could be done better. I haven't seen it in this thread, yet, but generally there will be someone who will maximize how badly it could also be done, usually using Java and invoking "Enterprise coding" memes.


Why didn't they just use qsort, which is in libc?


It’s kernel code do no libc?


Yeh obviously. I mean copy the source.

There _could_ be arguments against some sorts, e.g. less predictable memory or performance. But qsort is pretty well behaved.


As is often said, O(n^2) (or even 3) is fine until it stops being fine.


Yeah and it almost always stops being fine. The problem with an O(N^2) solution is that you rarely put in an assert that N < 100 or something like that to guard your assumption that big N "doesn't happen". Because after all it's better that the process eventually completes than that the process crashes. And if you don't dare to put that in, you perhaps shouldn't use O(N^2).

N^2 or N^3 is fine for things that are fixed size and very unlikely to stop being fixed size, or at least unlikely to grow a lot. But any time you think "oh it's just a few files" when there is nothing to stop it from being 1000 files in ten years, or at one rare user: don't use an O(N^2) solution.


Here it took almost 30 years for it to stop being fine, which is a pretty good run.

And it was a blatant O(n^2) algorithm, easily swapped out, it’s not like they were calling sscanf in a loop and whoopsie turns out C strings are shit so now parsing a 4 characters decimal number is O(buflen^2).


One should indeed be forgiven when getting 30 years out of a code base, any code base. It's hard to predict that you'd need to shave milliseconds of cloud/VM startups or from some embedded device, if you can't imagine Clouds, VMs or computers the size of watches.


> Yeah and it almost always stops being fine.

No, it’s just a non-issue in the 99% of cases where it really does stay at small inputs. I come across configuration code all of the time that is stupidly inefficient like this and it’s almost always irrelevant. Tons of stuff works on inputs of <10.


It's an observation with heavy selection bias: in 100% of the cases where you had to deal with a perf problem it had stopped being fine... Since it also almost invariably is accidentally quadratic and not deliberately quadratic as with BubbleSort, you also have a hard time estimating how many quadratic algorithms you have that haven't caused problems, and perhaps never will.

Adding complexity because of an anticipation that some input will grow to an unreasonable size (at the time of writing) may or may not be a good idea. But using a different off the shelf function or data structure is usually not a high cost for making sure it's not a problem down the line. Of course, that also assumes that there is a good and efficient standard lib of methods and data structures so that "making a more clever choice" isn't also a burden or risk in terms of code size, complexity, risk of bugs.


The problem is that it is an issue in 1% of cases, and it is hard to know in advance which 1%. So either you do a proper analysis to see how many inputs you really need, or you use an algorithm that scale.

For sorting, getting a n.log(N) algorithm that doesn't break down is usually just a matter of calling a library, or writing <100 lines of code if you really can't use a library. It usually doesn't have much of a negative impact when the number of inputs is small, and it is often simpler than making a proper analysis. So, when in doubt, never use a n^2 sorting algorithm.


> Yeah and it almost always stops being fine.

Properly predicting if it will be fine or not is a sign of wisdom. But something like this is a fine place to use something where the complexity blows up. The sort happens in a clear place; early boot is a little hard to debug, but still. It's much worse when you put something N^3 in where when it blows up, the time spent is diffuse.


Old colleague had some wisdom on this which stuck with me. Thanks Ben! Spoken at separate times.

- it's constant time or it's crap

- constant time means your best case is as slow as your worst case

In this case, constant time would be sorting it before the program runs.


> - constant time means your best case is as slow as your worst case

Which is in fact important in crypto code to prevent timing attacks.


It is surprising how popular is using O(N^2) algorithms that are simple in implementation even in extremely popular libraries. E. g. ICU search method: https://github.com/unicode-org/icu/blob/a7a2fdbcf257fa22c7a2... ICU is used by Python, Windows or OpenOffice.


For people (like me) who are wondering why a kernel needs to boot in under 28ms: It's for virtual machines that get launched on-demand in services like AWS Lambda. https://www.daemonology.net/blog/2022-10-18-FreeBSD-Firecrac...


There was an article about a Linux embedded system handling a car back-camera that had to boot within 3-5 seconds or something along those lines.


Very OT, but I concur. I need a DSP for my car audio system. I considered a rpi3 with a DAC attachment.

I tried the raspbian OS with some moderate overclocking and removing unneeded startup processes. Couldn’t get it under 25 seconds for boot. Kind of a deal breaker (remote start unit could help, but…)

Apparently somebody can start the rpi3 in <3 seconds. I’ll believe it when I try their image myself. Their setup disables USB and networking (maybe audio too who knows) which is a giant pain for configuring my system.

Maybe I’ll try again with an RPi 4 and usb3 nvme. or maybe I’ll just pay hundreds of dollars for a dedicated DSP unit.

In case anyone cares, my main use case is parametric EQ for phon adjusted low bass frequencies. 90dB 20Hz sounds as loud as 70dB 50Hz (give or take) and subwoofers are generally quieter at lower frequencies. Hence, audio processing unit to cut as you go higher than your lowest decently loud frequency.

The lowest cost option for this is about $230USD plus some electrical components (miniDSP 2x4HD). I figure the rpi would cost around $100 including the DAC unit, open source audio software, just need some power source to link the battery to the rpi power cord.

Edit: I don’t want to spam thanks comments for helpful replies, so I’ll just say it here (possibly in advance): thanks!!!


If you make your own buildroot image, you can get VERY quick boot times - here's a random vid I found [1] with 3.5 seconds on a pi zero w (which is considerably less powerful than a Pi3.

I've worked with similarly underpowered boards (licheepi zero, only has 64mb ram and uses the cpu from a dashcam!) and those will still boot and start an SDL app in under five seconds.

If you can identify the bare minimum that you need for your EQ solution, you can probably get similar boot times.

[1]: https://www.youtube.com/watch?v=ZehLyumyMvE


i feel like maybe 3.5 seconds is not VERY quick compared to 28 milliseconds?


28ms is a VM; in hardware you can't get through firmware in 28ms. Apples to oranges.


true if your firmware is shitty enough but plenty of hardware platforms don't have that problem


The use case is different so the comparison doesn't really matter.


3.5 seconds might be tolerable for your car radio i guess


Yep. My head unit has its own startup time for digital audio inputs, around 2-3s.


You could use a ADAU1701 DSP board (cheaper and shittier minidsp) or do it directly on a modern/fast microcontroller, e.g. ESP32. I used an ESP32 to adjust frequency response of a small diy loudspeaker (and added Bluetooth).


It doesn‘t always have to be a RPi, sometimes there is dedicated hardware for that use case that is also kinda cheap.

Check this thing out, it‘s a programmable audio DSP:

https://www.thomannmusic.ch/the_t.racks_dsp_4x4_mini.htm


Search ebay etc... for Texas Instruments DSP evaluation boards.


Can't you just run on bare metal on rpi? Also a resistor, a capacitor and a realay would work to filter out the extra bass.


Wild that people have accepted a 5000ms delay between starting their car and seeing behind them.

I suppose I'm more sensitive to downtime than most - I recently (unsuccessfully) tried to add a ton of capacitance into my head unit's power supply and acc-signal lines to keep it from turning off for a few seconds when starting the engine.


Up until a couple years ago my car didn't even have a camera. We just looked out the window, which is still an option.


It is a much worse option than it used to be. Visibility out of the rear windows from the driver's seat has gotten worse over the years.


So industrial design with poor usability gets complemented by bad electronic devices. Progress?


> So industrial design with poor usability gets complemented by bad electronic devices. Progress?

https://en.wiktionary.org/wiki/Chesterton%27s_fence

It's for safety in case of roll-over and for better aerodynamics due to increased emissions standards and in the case of EVs, increasing the range.


So cars get safer through more competent structural design and are enhanced by reliable camera systems that allow for better visibility than craning ones neck around 120 degrees.


what are they getting for the poor visibility tradeoff compensated by electronics? I have no idea. If it's safety, then yes, that would be progress.


Safety, and aerodynamics (i.e. energy efficiency).

Dismissing it as "design with poor usability" is very facile. It's trade-offs between different goals, all the way down.


Or it’s just cheaper to build


Cost is yet another trade-off.

Those pillars that hold the roof up are safer if they're sturdy, but sturdy means wider and that obstructs the driver's field of vision a bit, and weighs more. "strong and slender" is possible, but it costs more. It all inter-relates.

(strong, narrow, cheap), pick any 2.

There does come a point where you start to appreciate what Polestar is doing on one of next years models (the Polestar 4): give up on the idea and change direction; no rear window at all, just a camera.


The failure mode is pretty horrible though.

And I'm not only thinking about damaged camera. Humidity, weather etc. often makes these cameras completely useless.

You'd have to have wipers for your camera-lens and probably heating or something as well to even approach the utility of a window - short term.

Because none of that is ever going to fail, and it isn't cheap either.


I don't think that I said that the Polestar 4's lack of rear windows is a slam-dunk sure-fire hit. Rather just that we can appreciate the attempt to side-step the issues with rear-window visibility with a very different design.

It is IMHO a bold move, which will attract fans and haters. Neither of us know for sure how it will play out; if the potential issues outweigh the rest. The implementation details clearly matter. In a couple of years we will see if it's a genius innovation, a terrible misstep or somewhere in-between. Until the vehicle ships and is on the road for some time in some volume, we won't know for sure.


While I agree I kind of think this should be nailed down before such a departure.

After all, on a high-end vehicle a back-camera that hasn't sorted this is just a gimmick.

Maybe they have, all my experiences indicate otherwise though I don't have much experience with new expensive cars.


Lens washdown is rather common now on vehicles.


Trucks drive just fine with no rear view at all other than side mirrors, so cars can do that as well.


On the highway, sure. But generally whenever I see a truck back up in the city, there will be someone outside directing the driver.

You don't see lots of truck parking in parallel unassisted on streets with pedestrians and cyclists, but I do this all the time.


Hmm? They specifically taught me in driving school how to parallel park using just the side mirrors.


Truck drivers are pros, though, while we're just annoying traffic jam fodder.


They can't see what they can't see. Being a pro makes no difference.

Reminds me of the 18 wheeler at the traffic light. He got red while waiting for his left turn and decided to back up instead of "going for it" once oncoming traffic had seized.

The minivan behind him got crushed like it was made of sticks. Luckily he heard the noise after a while and stopped backing up. Not a pretty sight. I had the side view so I was like "WTF is he doing?" when he started backing up. Well he couldn't the minivan who had snuck up behind him.

Lesson: when behind a truck make sure you can see the drivers face through his mirrors. Then you have a chance that he is going to see you. If you can't see his face, assume that he can't see you. Move away.


Americans don't care about the prices of their cars and neither do American safety regulations.


This is correct. It's easy credit all the way down. Cheap cars (Chevy Spark, Mitsubishi Mirage, Kia Rio) are basically all slowly being discontinued in NA, meanwhile luxury pickups sell like hot cakes.


It's safety. Same with thicker pillars which are worse for visibility but made thicker for rollover protection.


You don't see small kids right behind your car this way. I believe this was the main reason why rear-view cameras were made mandatory on new cars.


Also, because the cars now are gigantic, compared with old cars.

Even the Mini Cooper is huge now.


I wouldn't be surprised if the Mini Cooper is now a light truck. Everything is larger because manufacturers are gaming the footprint regulations to get by with worse emissions and DOT isn't serious about revamping CAFE to stop that behavior.


A large majority of cars in traffic don’t have a back camera. Either because they’re old or because they’re cheap (I don’t think a back camera is usually a standard feature at least where I’m from…)


In the US a backup camera is now required by law


For new cars, I guess.

In the EU, it’s similar. https://ec.europa.eu/info/law/better-regulation/have-your-sa...:

All new vehicles sold from May 2022 must be fitted with advanced safety features, including:

- monitors that detect when a driver has become drowsy or distracted

- emergency stop signal to help prevent rear-end collisions

- rear-view camera or parking sensors

- alcohol interlock system to prevent drunk driving.

These systems will help reduce serious accidents on Europe’s roads.


It seems that the only part that got implemented out of that draft was drowsiness/inattention detection. At least that's what I read from the document in "Commission adoption" section of that page.


> alcohol interlock system to prevent drunk driving.

Wait, what? _All_ new vehicles need to have this? Or do they just need to be constructed such that one can be installed easily?


The car must be built to allow for the installation of a breathalyser, i.e. some standardised interface. Doesn't really affect you unless you're ordered by a court to use one.


Looks like EU law, so considering the wording, probably every new car.

Also, see:

https://road-safety.transport.ec.europa.eu/statistics-and-an...

Edit:

The actual text:

> Article 6 of Regulation (EU) 2019/2144 of the European Parliament and of the Council requires motor vehicles of categories M and N to be equipped with certain advanced vehicle systems, including driver drowsiness and attention warning systems. It lays down in its Annex II basic requirements for the type-approval of motor vehicles with regard to the driver drowsiness and attention warning systems.


I rather have no car at all than go back to that.


That would be better for the planet anyway.

Disclaimer: Not a joke. I've never owned a car.


I agree and in modern countries with proper transportation there is no reason to own a car.


If you have e.g. outdoors activities in sparsely populated areas as a hobby, having a car gives you a lot more freedom than being dependent on public transport. Buses may work if you're only going to locations that are popular enough but that may limit your options. Of course you could also rent but that could become expensive or cumbersome if you do it often enough.

Also, if you live in a sparsely populated area, public transport with good coverage often isn't really economically (or ecologically) feasible, even in countries that have good public transport in cities. A bicycle might be an option if distances aren't too great but relying on it exclusively probably takes more than an average person is willing to take on. Weather conditions could be poor, you might regularly need to transport groceries for an entire family, etc. All of that is entirely doable within a few miles in a bikeable city but it's potentially a lot less so if distances are greater.

I've never owned a car, I bicycle some thousands of kilometres per year (probably more than I drive most years), and I wish people didn't automatically consider owning a car a necessity even when it actually isn't. But I can definitely see reasons for owning one.


Well, there is a reason, but you won't like it.

Convenience :-)


It’s probably less than 5 seconds in perceived terms. Eg the car gets power before the engine turns over, and you then need to select reverse after the engine is running. And you’d need to check your blind spots before checking the camera too. So by the time the driver is ready to check the camera, it might feel like a much smaller delay or even no delay at all.

The delay that I notice the most is the time it takes to connect to my phone via Bluetooth (usually so I can play music from it). I’m often already pulling out of my parking space before it connects.


> Eg the car gets power before the engine turns over, and you then need to select reverse after the engine is running.

I can change gears in well under a second when moving. My car requires the clutch to be depressed when the car starts, meaning that I can actually ahve the car started in reverse in under a second. Waiting another 4 seconds is crazy.


You’re not going to change gears before turning your engine on though. Which is my point.

And if you’re reversing within 3 seconds of turning your engine on then you’re either driving recklessly or on an open and safe space where you don’t even need your cameras.

To be clear, I’m not suggesting that a 4 second boot time for an embedded device is great. But I do think your claim that waiting 4 seconds “is crazy” is rather hyperbolic.


> You’re not going to change gears before turning your engine on though. Which is my point.

If I've parked my car, I have to have my foot on the clutch to turn it back on agian. My foot is already on the clutch, and I can very easily (and often do) put my car in gear _immediately_.

> then you’re either driving recklessly or on an open and safe space where you don’t even need your cameras.

Or I've looked in my mirror and put my seatbelt on before I turned the engine on. This is the equivalent of "you're holding it wrong".


But starting the car takes a couple of seconds (engine turn over isn’t instant) so you would then need to double check your mirrors and blind spots anyway. I’ve been driving longer than most people on here have been alive and a lot can happen in 2 seconds at busy parking spaces. And if it’s somewhere quiet, then you don’t really need to watch your camera like a hawk.

My point being, 5 seconds might sound like an eternity on paper but I’d wager it’s not nearly as impactful once you start factoring real world scenarios.

Personally, I never look at my camera. I just listen to the beeps and watch my blind spot. Not because it’s safer not because of any delay in the camera coming on, but simply out of habit. So I’ve never noticed a delay in the camera coming on. Maybe there isn’t even a delay in my particular car — but if there was, I wouldn’t know.


It could very well begin booting the second the car is unlocked


I mean, I back into the parking spot instead of out of it most of the time so it wouldn't bother me.

Also my usual sequence is "start the engine, set up on phone (music, GPS), leave", so it would've booted before I need it.


What was unsuccessful? I did this in a 2012 Miata with a pioneer head unit (capacitor on the Acc line so it discharges while cranking, which keeps the head unit from turning off). Has a fused line directly to the battery for power. It’s not a huge capacitor either… like a 1500 uF or so? Just tried a couple diff cap values until it had around 5 secs of charge to hold the line open. It also makes it so the music doesn’t cut off immediately when the car is turned off as well… it gives it a few seconds to shut down. Feels more premium.


Well if I knew for sure why it didn't work I'd fix it ;)

I think it works better now than before, but it's hard to be sure because I neglected to measure the downtime before/after the modifications. Essentially I had a cap/diode to keep the acc line high during startup for the head unit, but it would still cut off due to the power supply voltage dropping to ~10-11V on the engine turning over (4.8L V8). So then I added a cap/diode to the power supply line, and it works "better...ish...maybe..." now: there are some times when it will stay on across startup, but not always. Perhaps with a larger cap on the PS line (I'm using ~2500uF), but it seemed to be discharging much faster than I'd expect, as if the diode wasn't doing it's thing and the line was effectively just at the system's general "+12V". I wondered if there was perhaps some way the power was escaping the diode/cap isolation and making it back into the main system, but I didn't have a good way to test or fix that (I previously diagnosed a similar issue to a bad diode, but this one seemed to measure fine).

And besides all that, the amp is on a separate line in the way back of the car that I didn't bother to tear into because the head unit turns off in the critical stage anyways. So the head unit does stay on for a couple seconds after I turn if off completely, but since the amp is off it doesn't matter.


You're supposed to let your car idle for 30 seconds after startup to let the engine lubricate. It's more important in cold climates.

So delayed startups aren't some new issue.


Probably not necessary in modern cars, whose electronically controlled engines have many internal details not obvious to typical drivers.

If you believe your idea is true, can you tell me what I can expect to see if I don't do this? IE, I've never done this and I have cars that last 10+ years without any engine trouble.


probably because it started off as a 1000ms delay, which they begrudgingly accepted. Then it started to slow to 1100ms....


My 2015 Model S would lose badly in a quarter mile against a good cyclist if you included cold boot time. Worst bank robbery car ever.


3-5 seconds sounds like the latency I see in my 2018 Toyota’s garbage Entune 2.0 system. It’s criminal that things like this are released for general usage.


"Linux Kernel Fastboot On the Way", Linux Plumbers Conference 2019, https://www.linuxplumbersconf.org/event/4/contributions/281/... / https://www.youtube.com/watch?v=A7N_O8pnyTw


With a custom kernel and without initrd/initramfs is possible to boot around 1 second.


Might be quicker if you put everything you need into an initramfs that's built into the kernel. Then there's no slow storage needed once the kernel is loaded.


There is an interesting talk from BMW how they did it for their cars: https://youtu.be/_cSTBiwY7HE.


Yes I remember being surprised that Linux containers were used for that purpose.


Very naiive question... But why can't you "just" memcopy and exec an image of an already booted system?


You sort of can, but drivers have to be in sync with the (emulated) devices they're controlling. The fastest way to go about this would be to snapshot an already booted virtual system. It could be as simple as instructing the restored snapshot to reseed its RNG and (re-)attach to the network, but you'll probably in into a few more assumption broken by cloning a multiple VMs from the same snapshot. If pre-booting and suspending/resuming each individual VM is good enough it could look similar to a pre-forking HTTP server like old Apache versions.


Yeah, for real hardware it is important to be flexible.

But these virtual machines can all be exactly the same, except for the network MAC and IP.

It could even be possible to compile executables that bypass the need for a kernel, just have some libraries with the required stuff, and boot to the program, MS-DOS style.


> could even be possible to compile executables that bypass the need for a kernel, just have some libraries with the required stuff, and boot to the program

Or consider: https://github.com/uniqernel/awesome-unikernels


Simplistic answer is because hardware and firmware environment gets configured during boot, even for virtual machines.

Some virtual machine environments do have facilities to snapshot the entire state of a machine and resume it (potentially on another machine), but if you look at what is involved with snapshot metadata it's more than just the state of memory, you have CPU and device state as well.


Wouldn't hibernating the machine and then copying the hibernation data to another machine do the trick? Sure it wouldn't be as instant, but it would skip most of the boot configuration.


Conceptually at a high level yes (if the hardware was ~identical). Resuming from hibernation on the same machine resuming from hibernation is basically loading an image into memory and executing it that OP asked about too. But in either the resume or hypothetical hypernation migration cases, you can't just start running where the previous boot left off, you have to go through hibernation resume functions which basically do the same thing as boot in terms of configuring hardware.

On virtual machines there probably isn't nearly so much real hardware to set up, and hypervisors can optimize virtual device setup. The machine image could also be provided via COW snapshot of memory. So I don't know, possibly at some corners it could be faster to do a resume type operation than a full boot. I haven't played with hypervisor guest boot at these kinds of speeds before.


The usual culprit is CPU feature flags on different CPU generations.


Well then wouldn't you just have one image file for each CPU generation in your datacenter. Or better yet, have one image file on the hypervisor for each individual machine! That machine is probably dedicated to Lambda instances and boots them 1000x per day, if you're spending valuable time that's actually customer-facing latency on each call to reboot the system then switch that memory<->time tradeoff back towards memory!


Possibly a bit fragile if there is any situation where one day the other machine isn't entirely identical? (just hand-waving here)


There’s things like encryption keys and tokens that need to be unique so it’s not easy, if you do it like that you would run into new, difficult to predict vulnerabilities.

So it’s difficult and dangerous for not a lot of gain. Perhaps an easier approach would be to boot a few spare systems in advance and hibernate them so they can be booted quickly.


meanwhile our new dishwasher takes between two and four SECONDS from pressing the power button to showing something on the display


But you still bought this one and not one of the alternatives that start faster... which is part of the reason things get built this way.


Unfortunately stuff like boot time / time to interactive isn't in the specifications. You have to check user reviews, and not everybody cares about this.


I don't think I've ever read a dishwasher review that talks about time to interactivity. How do I know which of the 242 dishwashers available[0] in my online retailer staarts in under a second?

[0] https://ao.com/l/standard_dishwashers/1/21-23/


I doubt this is an absolute requirement. Would I choose the faster booting dishwasher among a set of otherwise identical ones? Sure. Would I pay more for it to boot faster? It depends. €10? Sure. 100? Nah.

Plus, as the sibling poster says, you usually don't know this before you first start to use it, which means it's already been delivered, connected, and everything.


Would you really pay $10 for your dishwasher to boot in 1s instead of 4s? I absolutely wouldn't. Maybe $1, but even that is pushing it.


For something that's going to annoy you daily for the next 10+ years? I'd definitely pay $10. I'd pay another $10 for the dishwasher to remember the program selection instead of always offering the 4+ hours eco program (which I believe is required by some regulation to be the default).

I already paid $100 more for 2dB less, according to the spec. No regrets.


I'd say it depends on how long it takes to boot up. Fortunately, my dishwasher boots up instantly. If it takes a few seconds, sure it's annoying, but you can work around that. Say press the on button, then check the detergent is in place, etc, then press start. If it's some ridiculous length, like several minutes, I'd probably only buy that crap at a steep discount.

> which I believe is required by some regulation to be the default

Huh. This must be recent-ish. My parents' dishwasher has a rotary button to choose the program, which stays put as long as the little grandchildren don't mess with it..


Found it: https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=uriserv:...

> From 1 March 2021, household dishwashers shall provide an eco programme meeting the following requirements:

> (a) this programme shall be: [...] set as the default programme for household dishwashers equipped with automatic programme selection or any function maintaining the selection of a programme, or, if there is no automatic programme selection, available for direct selection without the need for any other selection such as a specific temperature or load;

Ours was bought in late 2020, but manufacturers probably take a head start with implementation whenever they renew their models.


If consumers cared it wouldn't cost anything. They would just do it.


came with the house

it's inferior to our old one, which was from the 90s and still worked, in other aspects too, most annoyingly cleaning performance. i guess it uses less water thou


As two persons have already asked:

> When the FreeBSD kernel boots in Firecracker (1 CPU, 128 MB RAM), it now spends 7% of its time running a bubblesort on its SYSINITs.

> O(N^2) can bite hard when you're sorting over a thousand items. Time to replace the bubblesort with something faster.


Related discussion: "FreeBSD spends 7% of its boot time running a bubblesort on its SYSINITs"

https://news.ycombinator.com/item?id=36002574 (381 points | 3 months ago | 358 comments)


Small increments add up. Sure, its not 100x faster booting. its 100x less spent in one fragment of overall boot time. Do another 20, and you've made a LOT of difference.


Yep. When work started working on boot time, FreeBSD took 30 seconds to boot. It’s now down under 10 seconds. Many have contributed, but Colin’s done a lot of it [0].

[0]: https://wiki.freebsd.org/BootTime#Past_Performance_Improveme...


How is time measured on things like this? Surely processors have also become faster as well.


From the link,

> An Amazon EC2 c5.xlarge instance is being used as a reference platform and measuring the time between the instance entering "running" state and when it is possible to SSH into the instance.


People, including me, used to think I knew some special sauce about optimization. It's true that I have a bit more mechanical sympathy than most people, due to not only the curriculum at my school but which parts fascinated me.

The big thing is that I'm willing to sit down and earn a 50% improvement in performance by doing 10x 4% improvements in perf time. That's perseverance and foresight. Most of what I know about optimization tricks vs optimization process wouldn't quite fill up the first Gems book.

It's all about budget. If you're trying to double, quadruple the performance of something, you can't look at what the 5 slowest things are. You have to look at the thing taking 5% of the time and ask, "Does this deserve 20% of the CPU?" Because if you get 4x without touching this code, that 5% becomes 20%.

Then you don't work on the tall tent poles first, you work on the hot spots that relate to each other, and the ones you have the time and attention to do well. Because if you get a 20% improvement where a 25% improvement is possible, most bosses will not let you go back for the 2 x 2.5% later if you don't get it the first time. So then you are forever stuck with 50 x 1.5% slowdowns. Which is not a problem if you're profitable, but definitely is if you're bleeding money on hosting. Or if your main competitor is consistently 50% faster than yours.


This is known as the hotspot or bottleneck model of optimization. It has its flaws:

https://lemire.me/blog/2023/04/27/hotspot-performance-engine...


That article is 100% right, but gmm said "do another 20" - that's not really what hotspot people expect to do. They think there's one or two hotspots that they need to optimise and then they'll get a 10x speedup.

You can definitely do 20-100 small optimisations and get a 2-3x speedup.

The best example I've seen of that is Nicholas Nethercote's work on speeding up the Rust compiler. A ton of tiny speed improvements that add up to 2x or 3x speed improvements overall.

I've done similar work on optimising a compiler. You're never going to get 10x without architectural changes but you can get 3x with a lot of work.

Of course the "we'll just optimise the hotspots" people have written their code so badly they need at least 10x.


I compare this to man-to-man versus zone defense. Zone defense, as any parents with three+ children will tell you, is the only way to stay sane.

By looking at a module or concern as a whole, you get to do small-to-medium architectural improvements instead of none-to-small improvements. You get a broader picture, and can pull in more small areas of improvement in the process. The fastest code is code that doesn't run, and 'tall tent-pole' optimizations miss at least half of those opportunities. In some cases they just push them around, like broccoli on a child's plate.

Many teams get to the end of this road and then have absolutely no idea what to do next. I find it infuriating for a bunch of my 'betters' to listen to someone on the business side say, "Our customers are complaining about our speed/leaving for a competitor with a faster app" and then pull up a rectangular flame chart to 'prove' "We've done all we can". You have only done the easy parts, probably not even the best parts. You might have actually broken the best parts.

These apps are 'death by a thousand cuts'. No one part of the app is particularly slow, it's just uniformly mediocre coding and an architecture that squeaks by with 'adequate' (to them, but not the business) attention to capacity planning. Either a force of nature shows up and fixes it, or the company fades away.

The only boss I ever followed to a new job had two sayings that I stole. 1) "If you want to complete a meeting in an hour you need to complete half the meeting in half an hour" and the other was about scaling:

    The first order of magnitude is simple. The second is difficult. The third requires talent (sometimes tacking on "which not all teams have.")
You can interpret that a number of ways, but if your founding team isn't paying homage to the problem of scaling up to 1000 times the users you have at the beginning, you probably don't have the talent to get there (you didn't buy it, and you didn't build it along the way). I don't know what percentage of companies that claim "we did all the right things and still failed" have this problem, but I sincerely suspect it's at least 10%, which is a very big failure mode to ignore.


Nice article. I didn't see it say "don't do it" I saw it say "try not to have to"


I wonder since 25 years or so why bubblesort is considered a viable sort at all. Putting aside its terrible complexity and thus performance, it is not even intuitive. For example, if you look at it, it is not how we humans sort things manually.

It is IMO an example of a "following the herd" confusion: some scientist wrote a paper (they must do it to survive, don't they), others mentioned it in a textbook (doesn't hurt to have more pages, huh), others put it in their other textbooks because, well, it was mentioned in the past, and this is how this abomination survives decades..


FWIW: As an instructor, I used to include bubblesort when talking about sorting algorithms.. I stopped, because too many students saw it presented as an algorithm, and thought it was therefore a valid choice. It is not. As others have pointed out, insertion sort and selection sort are just as simple, but significantly faster.

As long as the quantity of data is small, there is nothing wrong with using an n^2 algorithms for sorting. They are simple, robust, stable, and easy to implement correctly.

Sure, things may change. In 25 years, you may have 1000 elements instead of 10. If that happens, your successors can change the algorithm to meet the new requirements. That's what software maintenance is all about.


Yeah, great points about insertion and selection sort.

Thank you for the story and for keeping up the quality of education.


To me, bubble sort is intuitive, as far as sorting algorithms or their detailed implementations go.

I absolutely wouldn't sort that way if I were to sort something as a human, but in that case I wouldn't be even trying to figure out an exact systematic logic with its minute details that always gets it right. I'd be thinking in terms of ad hoc problem solving, and possibly some heuristics for speeding it up. When thinking of algorithms in a programming (or CS) sense, getting the exact logic right down to the detail is exactly what you'll need to do. So, to me, those aren't the same kind of intuitive.

As is probably common, a bunch of simple sorting algorithms were given as some kinds of introductory examples of algorithms during my first university programming courses. I think those included selection sort, bubble sort and insertion sort. I don't think I initially considered bubble sort the more intuitive one (I think selection sort was it for me). But for some reason, over the years the basic logic of bubble sort became more obvious to me than those other options. So, at least personally, maybe there's something else to it than following the herd.

Of course I've never actually written bubble sort in any kind of real code, but it's still perhaps the most intuitive one to me, in terms of detailed algorithms rather than in terms of heuristics for human problem solving. (Merge sort is also about as intuitive to me at least as a general idea.)



Does anyone know why they used bubblesort in the first place? It is known for its bad performance, so what other factors came into play? Number of ops, maybe?


According to the Twitter thread linked, at the beginning there were only 30 items, so using bubblesort didn't matter. The number of items has since grown to a thousand.


For 30 items bubblesort might very well have been faster than mergesort.


Bubble/selection sort with early exit can win outright on big lists if they're already almost in order. Textbook example years ago was cheque numbers which mostly are presented in order.

Know your data. (Which in this case, i don't).


But it always loses to insertion sort.


For everyone challenging Colin on this, by all means, but do remember that it’s Colin.

If anyone misses less I can’t think of them.


Why not sort this in a precompile step and not sort the static list every boot?


Two reasons: First, it's not worth screwing with the linker to save 20 microseconds. Second, you need to merge lists anyway because kernel modules.


> First, it's not worth screwing with the linker to save 20 microseconds.

...yet. Certainly not when boot is 28ms; might be worth it when boot is 1ms.

> Second, you need to merge lists anyway because kernel modules.

Does FreeBSD have the equivalent of a non-modular kernel build, for kernels where you know every piece of hardware it'll ever talk to?


If I get the entire kernel boot down to 1 ms I might take another look at this. ;-)

You absolutely can compile everything you need into the kernel -- in fact for Firecracker you have to since you don't have a boot loader handing you multiple kernel modules. But I needed to write the code to support more than just Firecracker.


> If I get the entire kernel boot down to 1 ms I might take another look at this. ;-)

I look forward to the upcoming boot-time competitions between FreeBSD and Linux. :)

> You absolutely can compile everything you need into the kernel -- in fact for Firecracker you have to since you don't have a boot loader handing you multiple kernel modules. But I needed to write the code to support more than just Firecracker.

Absolutely. But it seems reasonable to have, for instance, link-time sorting support that only works when building a kernel that doesn't support kernel modules.


Maybe. The tradeoff is that having two different ways of doing something makes it more likely that one will break without anyone noticing.


Valid. Especially in early boot.

Another possibility: always use link-time sorting for the kernel, and for individual modules, and then do runtime merging if and only if you have kernel modules. That way, the link-time sorting path is always tested, and the runtime merging is tested anytime you have kernel modules (which sounds like the more common case).

(All of this, though, is gated under "is it worth saving the microseconds yet".)


Why not boot as a compilation step, and save a memory instance to be loaded on actual boots.


Probably because that would assume far too much about the state of the hardware.


No good reason. It would just require something fiddly with the linker / build process. This is good enough.


"1 files changed, 86 insertions, 91 deletions"

Net reduce of 5 LOCs and it's "100x faster".

Nice commit.


You'd think some algorithms would be considered near always wrong solution for the problem and so never used aside for very specific use case (like tiny microcontroller) but here we are.


Why do you boot the system anyway? Wouldn't it be faster dump the memory of an already booted system and just read that in?


That paradigm used to be a thing for things like home-computer games-- the game might take 5 minutes for an anemic Commodore 1541 drive to load, so you'd use a "freeze" cartridge that snapshotted RAM after the load finished and produced something smaller; I suspect a major side appeal was that you bypassed copy-protection that involved the loading process.

On a more complex system, with external devices active at all times, this gets a lot harder. It would be easy to take a snapshot at the "wrong" time when something is deadlocked on a timer or external device that won't be in an appropriate state at the next power on. A "boot" process ensures everything attached has been roused from their slumber and get them back to a known state.

OTOH, I could imagine if you were designing to a specific enough use case, you could design a system that relied on a minimal support devices and a CPU that bit-banged almost everything, so you knew if you were in the idle loop, everything was safe to snapshot.


Would it?

Read in small amount of code (or execute directly from flash) & use that to initialize hardware, might just be faster than read memory dump of already-initialized system.

Also the "read memory dump" method would include code & data structures which were used on a previous run, but may not be needed for the next run (or only much later). And not re-initialize hardware which may have gone flaky, or changed configuration in the meanwhile (like USB port with different device plugged vs. state that memory dump reflects).

Rebooting is just an all around cleaner method to return system to a known state. But of course it all depends on hardware specifics & what type(s) of initialization is done.


The context here is starting a vm really fast on aws firecracker, so those concerns about hardware don't apply.


That's what windows fast startup does, it starts from a post-boot hibernated state of RAM. Linux mint on the same laptop boots fully in less time.


Would be insecure because nothing would be randomized, all your systems would have the same ASLR slide, etc.


iOS ~13 seconds to boot

FreeBSD ~10 seconds to boot

As a comparison point, my iPhone 14 Pro (latest phone) running iOS 16.6 (latest OS) boots in 12.7 second from when the Apple logo is displayed when you can see the wallpaper.


"This is good for LLMs"


100x speed on what though? bootup time?


> When the FreeBSD kernel boots in Firecracker (1 CPU, 128 MB RAM), it now spends 7% of its time running a bubblesort on its SYSINITs.

It’s a 100x speed increase on 7% of the boot time, so it should be approximately a 7% decrease in boot time.


EDIT: I misread—please disregard the below comment.

3.5% decrease.

A 100% increase in speed is a 50% reduction in time.

E.g., traveling 60km at 30km/h takes two hours, but at 60km/h takes one hour.


A 100x increase is a 9900% increase, not a 100% increase.


Thank you, that's what I was interested in. Not sure why I got downvoted, it was a legitimate question as I didn't know what parts were adversely affected.


You're misreading 100x as 100%. 100x is 10,000%.


100x not 100%. 3,000 km/h takes 1.2 minutes to go 60km.


Boot time was reduced by 2ms, from 28ms to 26ms. Hardly 100x speed.


The mergesort is roughly 100x faster than the bubblesort was. I made no claim in that tweet about the overall speedup.

(But since you mention it, when I started working on speeding up the boot process, the kernel took about 10 seconds to boot, so I have a kernel booting about 400x faster now than I did a few years ago.)


Why does any sorting anywhere use bubble sort at all? Aren't there sorts out there that are strictly better?


Options for sorting algorithms are severely limited when (a) you can't use malloc, (b) you don't have enough stack to recurse, and (c) you want a stable sort.

(The first two limitations are because this happens very early in the boot process.)


Thanks! I completely forgot about the malloc/in-place aspect.


Why the hash stack limit? I assume this precludes a fixed-size array.


I'm not sure when this sort happens, but early boot may not have a proper virtual memory implementation yet, so the kernel has to use a fixed stack size allocated by the boot loader.


Right, in this case specifically we're running off a limited bootstack allocated in the kernel's .bss (somewhere between 2 to 6 pages, generally); we won't finish initializing VM until shortly after this sort is done (in some of the SYSINITs that we're sorting).


Doesn't mergesort require malloc?


There are in-place array mergesorts. (Nobody uses them because they're horribly complicated, though.)

In this case however I'm putting all of the SYSINITs onto a linked list; mergesorting a linked list is easy.


sure, but bubble sort is trivial to write. they may have figured they'd replace it when they needed to. they may have figured the particular usage was so trivial they would never need to.

grabbed the source and git blamed the file before the patch to see when the original code was from.

    94e9d7c12 (Peter Wemm              1998-10-09 23:42:47 +0000 157)  * The sysinit table itself.  Items are checked off as the are run.
    94e9d7c12 (Peter Wemm              1998-10-09 23:42:47 +0000 158)  * If we want to register new sysinit types, add them to newsysinit.
So the section was written in 98 and had some bits modified in 2001.

The time wasted during boot here was probably irrelevant next to a hundred other things the project has worked on since.

the bubble sort was good enough that 25 years later they replaced it to save only 2ms during boot.

seems reasonable to me.


You'd still think it'd be easier to write an insertion sort though, even accidentally! It's almost the same algorithm conceptually, except cutting out an odd pessimization.


Bubble sort is optimal for sorting data stored on tapes (no random access possible). If you have RAM, insertion sort is strictly superior (both simpler code + better performance).


On tapes (plural ≥ 3), isn’t merge sort the way to go?

I just coarsely checked TAOCP volume 3, and didn’t see bubble sort mentioned in the chapter on external sorting.


Fascinating, thanks!


Because bubble sort is often faster for small sorts and it’s easy to write. This is the kernel, there’s no libc you have to write the sort algo yourself.


It's really because C doesn't have a proper dependency management system like modern languages do. If it did there's no reason you couldn't import a third party sorting algorithm in the kernel. You can do that in Rust for example, although you would just use the built in sort in this case.


It's not because of that. There exist countless BSD compatible code out there, and many are taken and used in projects like this. CRC codes being an obvious example. There is no reason sort code could not have been grabbed from somewhere, no "dependency management system" required. The FreeBSD kernel has its own existing sort libraries already in the code too -- https://cgit.freebsd.org/src/tree/sys/libkern/qsort.c?id=9a7...

The real reason is because this is an early boot environment with exact constraints that are unusual if not unique to FreeBSD where the facilities to support your general language / runtime environment is not up yet.


> the facilities to support your general language / runtime environment is not up yet.

Rust's [T].sort_unstable() is part of core, that is, the code to sort a slice of arbitrary type T doesn't need the allocator, operating system etc.

Only if you want a fast stable sort do you need to wait for such an environment because their fast stable sort needs an allocator, I doubt that FreeBSD needs a stable sort here.


It does need to be stable, which is why the replacement is mergesort.

The SYSINITs need to bring up the system in a particular order. I dunno why there are both explicit ordering (the sort keys) and implicit ordering (the stability requirement), perhaps cperciva can chime in.

Based on when I last looked at this code, I guess the explicit ordering is to do with startup phases, and the implicit ordering is because of the way SYSINITs are (or were?) implemented using linker sets, which have more guarantees about ordering than __attribute__((constructor)) and other similar functionality.


> I dunno why there are both explicit ordering (the sort keys) and implicit ordering (the stability requirement), perhaps cperciva can chime in.

In the event Colin is looking at this sub-thread now I'm intrigued too.


There isn't supposed to be an implicit ordering requirement. But a number of bugs have happened in the past because people didn't order properly.

What we need isn't actually "stable" so much as "consistent from one boot to the next" to make sure that ordering bugs don't end up being heisenbugs.


> a number of bugs have happened in the past because people didn't order properly.

This may be impractical, but I think my reaction would be to re-design it so that the ordering was mechanically a full order, even if culturally the people writing these things don't need to specify if they don't want to.

e.g. maybe a macro can turn your existing explicit order into high bits of a value, and then a "noise" factor into low bits, where that "noise" is derived from a date integer like 20230822 or filenames or whatever, and truly order on the entire value.

The idea is, this means any hidden order is the same for everyone, Intel laptop test rig, ARM WiFi cameras at customer site, last week's build, this week's build, Colin's build, the CI system's build, they're all using the same order, because while heisenbugs are strictly worse, "It only happens with my setup" is also extremely frustrating.

When people discover a hidden ordering requirement they can adjust the "real" ordering accordingly, just now there's a deliberate systemic consistency.

Probably not worth all the bother, but I think I couldn't live with the "stable sort to avoid heisenbugs" situation.


I'm not sure what part of my post you are addressing.

If you had a kernel written in Rust, you would have to bring up the rust runtime environment as part of the boot process. You don't just magically get it because that's the language you used, whether it is "core" or not. You can't even execute simple functions before you have set up stack, and in this case apparently there is a constraint on stack usage. And evidently they do need a stable sort.

In any case my main point is that it's not anything to do with dependency management, not quibbling about whether or not some language could support this function at this exact point of the boot.


> you would have to bring up the rust runtime environment as part of the boot process

Like the existing C runtime environment, this just isn't a big deal by the time we've got here. If you're the code for initialising the DRAM controller then sure, that's pretty intimidating. But this is all happening much later, we're part of an operating system, we were already loaded from disk.

> And evidently they do need a stable sort.

How so? They appear to even have a deliberate tie-breaking mechanism in their data structure, which suggests if order actually matters you're expected to specify when creating the data, not rely on stability to do what you expected.


> > you would have to bring up the rust runtime environment as part of the boot process

> Like the existing C runtime environment, this just isn't a big deal by the time we've got here. If you're the code for initialising the DRAM controller then sure, that's pretty intimidating. But this is all happening much later, we're part of an operating system, we were already loaded from disk.

I don't know what you're trying to say. The code which is written here that we are discussing is in a constrained environment where the regular kernel runtime facilities are not all available. This was cited as one of the reasons cited for open coding a very simple algorithm.

> > And evidently they do need a stable sort.

> How so?

From reading comments from patch author here.


> the facilities to support your general language / runtime environment is not up yet.

What facilities does Rust's sort require? It probably just needs a bit of stack but that's it.

That qsort implementation doesn't look like it needs anything either tbh, though I only skimmed it.


Not sure, the qsort is recursive and stack limitations were mentioned. But the point being that the reason they don't use their library functions because it's a constrained environment.

Maybe they could permit some of their sort library functions to be used in such a constrained environment, sure. Don't need rust to do that, just need to be happy that the implementation is suitable for purpose. Which they would have to do regardless of what language and runtime they were using.


Way to go.

Do you know how fast Linux starts up? I found the number 125ms but that’s a 4 year old number.

Just curious how similar they are. This is the first I’ve heard of Firecracker.


I believe Linux is at 75-80 ms for the same environment where I have FreeBSD booting in 25 ms. That comes from a Firecracker benchmark which I think is an apples-to-apples comparison with what I'm measuring in FreeBSD, but I'm not 100% certain. Once FreeBSD support is merged into mainline Firecracker (it's currently in a feature branch) I'm going to see if I can apply their performance tests to FreeBSD.


I’d purchase a simpler, easier to debug algorithm over a complicated one for 2ms. But that’s me…

The overall speedups are still quite impressive - but for “thousands” of data items a simple O(n^2) algorithm seems fine, doesn’t it?


Totally agree but the title makes it sound like it’s 100x at init. I know you didn’t write the title…


welcome to amdahl's law


Hey what's amdahl's law. Can you explain


Amdahl's law is common sense, but also the one of the reasons premature optimization is the root of all evil.

A small speedup on the longest (by time) part of the program is usually better than infinite speedup on a short part of the program.


The Wikipedia article is pretty good, but tldr: "the overall performance improvement gained by optimizing a single part of a system is limited by the fraction of time that the improved part is actually used."

https://en.wikipedia.org/wiki/Amdahl%27s_law

Usually used in the parallel computing context, but works here too.


Same thing happens when driving - your trip duration usually depends on the slow bits not the faster parts e.g. in NZ driving through small towns will often be the longest part of your trip time.


So if you have something that takes 1 second out of 10 seconds, you can't improve the overall system more than 1 second (by removing it entirely).


That's not true, because in addition to taking 1 second it could be emptying all your caches. Performance isn't just adding up wallclock times.


Something about 5% of things can’t be parallelized which then ends up dominating performance after the 95% of things that can be parallelized are.


In hardware engineering this is called critical path.


That is about latency, but Amdahl's law is about throughput I think.


Amdahl's law is not about latency or throughput, it is about optimizing one part of a program having diminishing returns relative to the part that isn't optimized. It is not really a law, it is more common sense that most people would never assume needs to be explicitly said.


Okay, but what exactly is 100 times faster when measured? Simply the sorting algorithm? If it not be a bottleneck to something then it doesn't matter much.

I assume the entire performance of FreeBSD hasn't improved a hundredfold by this small change.


They explain this further on in the Twitter thread. It's the portion of time spent sorting the array of system calls during boot. This didn't used to matter when it took longer than a second to boot up, but now that the boot times are in the sub-second range, the amount of time spent sorting the array of system calls ended up taking up a notable chunk of the time spent booting.

Might not matter for the things you care about, but some people certainly do care about boot performance.


The author mentions in the thread that the boot time they measured on the device was 28 ms and 7% of it (2 ms) was spent sorting.


On the sorting of SYSINIT (not sysint as the title says) calls during bootup. The actual speedup is only a couple milliseconds but that means the old method was in the order of hundreds of milliseconds. Which is indeed a lot for a sorting operation.


100x faster on 7% of Firecracker boot

So ~5ms speedup


Well, it's a bit of a embarrassing moment.

Bubblesort has been controversial in many cs programs. That memo even reached the President ten years ago.

[0]https://youtu.be/koMpGeZpu4Q




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: