
Debugging Your Operating System: A Lesson in Memory Allocation - jsnell
https://lukasa.co.uk/2016/12/Debugging_Your_Operating_System/?hn=1
======
eridius
The article has been updated at this point to have a long header talking
directly to Hacker News commenters, and explicitly asks us to challenge HN's
censoring of political content. This is quite strange, as that experiment has
already ended and HN is no longer censoring political stories.

------
kr7

        void *calloc(size_t count, size_t size) {
            assert(!multiplication_would_overflow(count, size));
    
            size_t allocation_size = count * size;
            void *allocation = malloc(allocation_size);
            memset(allocation, 0, allocation_size);
            return allocation;
        }
    

Should really return (void *)0 on overflow, instead of asserting.

~~~
sp332
I think assert() is usually used by people who are going to just turn them off
during a "production" build (with -NDEBUG).

~~~
loeg
And GP is right that in that case, you really want calloc to return NULL
rather than an overflowed (short) non-NULL allocation.

------
Someone
I am not sure I would call this a bug. Is _" call calloc, free, and calloc,
without ever touching the memory"_ really a use case one should optimize for?
One could argue that improving performance for this case only is worthwhile if
it can be done in a way that barely impacts anything else.

Edit: apologies for being somewhat ambiguous. The "bug" I refer to is the
Radar issue the writer of the blog created, not the performance issue with the
Python code.

~~~
CalChris
The author had a bug report in front of him. The behavior was repeatable and
certainly unexpected. That is worth tracking down.

I've tracked down corner cases like this myself and been correctly told by
developers that they're not worth fixing. But you don't know that _a priori_.

~~~
eridius
The bug report wasn't caused by calloc, free, calloc being expensive. It was
caused by making a call that did "malloc + memset" when only malloc was
needed. The code in question wasn't even invoking calloc at all, it just ended
up being an excuse to investigate zeroing memory behavior.

Incidentally, the blog post asks "why was the memory being actively zeroed at
all?" but never actually answers it.

------
Animats
This may be over-optimization. Allocating all-zero pages as copy-on-write
doesn't reduce zeroing overhead. It just postpones it until the page is used,
at which time zeroing takes place. Possibly at a higher cost, because you now
have to take a page fault, allocate a page, mess with the MMU, and only then
zero the page. Only for programs which allocate vast amounts of memory which
they then never write is this a win.

Is that common?

~~~
brigade
If you zero upfront, you have to do all of that save the page fault anyway,
but for large allocations you also thrash the cache zeroing stuff that's
immediately evicted to memory. It really is faster in general to zero a page
when the program actually touches it, since that's when the program wants it
in cache (for certain thresholds of "large allocation")

~~~
haimez
"Thrashing to memory". Man, we must have different SLOs.

------
nxc18
The article mentioned testing on macOs and Linux implementations. I was
curious and I tried the author's demo program on Windows Subsystem for Linux;
it performs quite well, running almost instantly on a low-end Surface 3.

Its definitely interesting to see these very low-level implementation details
bubble up to the surface from time to time.

~~~
this-dang-guy
I find the optimizations tend to derive from real use cases.

In this case, both Windows and Linux are used for huge server farms. MacOS not
so much. Apple probably doesn't spend the energy optimizing this sort of thing
since it's less likely to cause trouble for them.

------
deathanatos
Zeroing a 100MB buffer is pegging the CPU? It takes ~30ms to zero such a
buffer on my machine, which seems to imply that `iter_content` here is
allocating a buffer on each iteration of the loop. But if `iter_content` is
just returning a bytes object, this would make complete sense.

One could perhaps pass in a bytearray object, or even have it return the
_same_ bytearray object each loop, with a very large warning on the tin about
how you will want to take a copy if you want to keep it outside the loop's
body. (i.e., the returned bytearray is owned by the iterator, and you're
merely borrowing it for the loop; this is a potentially confusing API but
would reduce the memory allocation churn significantly: you'd only alloc your
buffer once, instead of O(download size) times.)

~~~
AstralStorm
At gigabit Ethernet speed (in pps) setting big chunks of memory per packet
happens to be expensive. At 10GbE you actually may need specialised hardware.

------
bluefox
> It is clear that the original intent of calloc is to allocate heap memory
> for arrays of objects in a safe way

This again? Just 4 days ago there was an article making up the same nonsense.

~~~
qwertyuiop924
>This again? Just 4 days ago there was an article making up the same nonsense.

Seems perfectly reasonable. Care to enlighten me?

~~~
bluefox
Please give historical evidence that the original intent behind calloc was to
allocate heap memory for arrays of objects in a safe way.

~~~
jsjohnst
That was the tiniest of point in this article (unlike the last one). This
article is far more useful and accurate to the average person.

~~~
bluefox
I agree. It's interesting that two seemingly independent sources make the same
unfounded claim about this obscure C function so close in time. Since I
replied in the first case, I was sensitive to the second case. Perhaps someone
should make a third attempt so that we can conclude that it's a conspiracy to
rewrite the history of calloc.

~~~
jsjohnst
I am pretty positive they aren't unrelated. Both mentioned Python's requests
library and PyOpenSSL if memory serves.

If I had to hazard a guess, it felt like the first one was potentially written
by the reporter of the initial bug, the second by the actual developer who
investigated it. This is just a guess, but I bet if you reread both you'll
likely agree with me.

------
blibble
seems to me the more accurate title would be "debugging a function in libc"

------
dang
We merged comments here from
[https://news.ycombinator.com/item?id=13110615](https://news.ycombinator.com/item?id=13110615)
because of an annoying limitation of our software. Feel free to check that
thread if you want the gory details, but let's keep this thread on topic.

