
Android Project Changeset 4f8b683: libc/memset.c - jey
https://review.source.android.com/#patch,sidebyside,14699,1,libc/memset.c
======
pkaler
Forget the typo. This is the slowest, most naive implementation of memset
possible.

Is this actual shipping code? I've followed the changeset up to the top and it
looks like it's in platform/bootable/bootloader/legacy. Hopefully this isn't
actually called anywhere, but the review date says May 13th.

What's the point of optimizing the Java VM when functions like memset, malloc,
memcmp, memcpy, memchr, etc, etc haven't even been optimized for the platform.

EDIT: I understand that 95% of your time will be writing in a language like
Ruby, Python, Javascript or a C based language. But dear god, please add Agner
Fog's Optimization Manuals to your reading list.
<http://www.agner.org/optimize/>

~~~
ars
How would you write this function? Fully portable, so no ASM.

~~~
tptacek
You'd unroll the loop. But knowing the average workload for memset(), it
probably wouldn't be a perceptible win.

~~~
ars
With todays CPU, particularly with branch prediction, and cache, loop
unrolling is not always better.

In fact I bet the performance would be worse, especially because you have to
worry about setting areas of memory that are not a multiple of the loop unroll
size. (I guess you could use duff's device.)

I actually tested it while ignoring that, and the unrolled version took 34.551
seconds vs 34.239 for the regular version (but those number are meaningless
since the variation in time between runs is greater than the difference in
runtime between versions).

And see:
[http://lkml.indiana.edu/hypermail/linux/kernel/0008.2/0171.h...](http://lkml.indiana.edu/hypermail/linux/kernel/0008.2/0171.html)
\- loop unwinding is not worth it anymore.

So I ask you again pkaler: how would you speed this up?

~~~
sshumaker
I would imagine writing the native register size bytes per iteration would be
a win, with a little cleanup for the remaining bytes. For a 32-bit
architecture, you'd typically be doing 1/4 the iterations.

~~~
JoeAltmaier
Far more involved than that. There are alignment issues. Its several machine
cycles faster to do aligned store of a large scalar e.g. 32-bit. So a
responsible implementation would do a head check for alignment, store 1,2 or 3
bytes efficiently, then do aligned stores of 32-bit values (or 64-bit or
whatever native size is appropriate), then a tail check for the small change.
OR better yet switch(p & 3) and handle each case with unrolled code.

------
spolsky
Aren't there are processor-specific assembler versions of memset for common
chips? There are usually assembler versions (for example using REP MOVS on
Intel which is light years faster) or even unrolled C versions (typically
setting 8 or 16 bytes per iteration) that are used. I suspect this might be
"fallback code" that doesn't actually get executed on any chips that anybody
really uses.

~~~
faragon
For this case, the required x86 opcode could be "rep stosX" (just write), and
not "rep movsX" (copy). Anyway, on x86 processors, "rep movsX" was faster from
8086 to 80386, but since the 80486, it is faster to do assembly unrolling for
reducing the jump penalty (as the "rep movsX" doesn't do unrolling, being its
conditional jump a penalty). On CPUs with prefetch hardware support -e.g.
SSE2-, you could still speed it up a bit more, by fetching/ordering future
cache line before needed, avoiding pipeline stalls.

The shown example, that just writes to RAM, is probably just fast enough (it
is harder to optimize read cases), although a bit of unrolling could reduce
jump penalty impact (specially on non superscalar or in in-order superscalar
CPUs -typical ARM included on handheld devices-).

~~~
chadaustin
While we're jumping down the architecture pedant rabbit hole, a simple loop
like that will be trivially predicted, so the branch will be basically free.
In addition, hardware prefetchers do a much better job at predicting linear
memory access than manual prefetch instructions. On Core 2, iirc, if you have
2 or 3 L2 misses at fixed offsets either direction from each other, the
hardware will automatically begin prefetching memory so it's there when you
need. The problem with manual prefetch instructions is there high latency.
They're best for hinting to the processor that you're about to make an
unpredictable load.

------
humbledrone
The thing that shocks me about this bug is that it should have been caught
when the compiler issued a warning about an unused function parameter. Either
the warning was not enabled, or the programmer just ignored it; either of
those options indicates a pretty severe lack of professionalism.

~~~
klodolph
"Pretty severe" lack of professionalism? It's not unprofessional not to turn
the warnings all the way up and sift through each and every one. Some of us
work code inherited from other people with different coding guidelines, and
some of the warnings will generate loads of messages for code that is not only
technically correct but stylistically correct as well. The unused parameter
warning is one of the biggest offenders in this area. And if you have a
hundred warnings for unused parameters, which ones do you pay attention to?

Another potential issue is that the code in the patch might almost never be
called in the first place. Most libc implementations (three that I know of,
where I've examined the "memset" source) have a "memset" like that as a
fallback only for platforms without an assembly version.

I'd say the problem is that there are several different ways this error could
have been detected sooner: warnings, test, static analysis tools, and review,
but none of them worked. But again, it's also possible that the code is never
called because assembly versions are always available.

~~~
queensnake
| The unused parameter warning is one of the biggest offenders in this area.
And if you have a hundred warnings for unused parameters, which ones do you
pay attention to?

Well, so then you're exposed to this error. Do:

    
    
        void foo(int x)
        {
           (void)x;
        }
    

to silence unused parameter warnings.

~~~
tptacek
Go look at Google code search, which includes vast quantities of code written
by professional teams and (often later) open sourced. Very little of it casts
expressions to (void). When I started at Arbor, I got rid of all the (void)
casts. What an eyesore.

Very little C code is const-correct either. Most of the C devs I know make fun
of const-correctness.

~~~
humbledrone
| What an eyesore.

It's only an eyesore if you don't acknowledge the benefits that strict
warnings can bring to the table.

| Most of the C devs I know make fun of const-correctness.

I for one don't mind putting in a tiny bit of upfront effort so that my
compiler can double-check my work. Do the C devs you know also make fun of
assert?

~~~
tptacek
I don't acknowledge the benefits of casting things to (void). Therefore:
definitely an eyesore.

And, no. Nobody makes fun of assert. Which rather makes my point for me.

~~~
ehnus
If assertions disappear in certain optimized builds then what do you do, or
recommend, to deal with warnings related to unused parameters or values if you
don't believe that casting to void has value?

~~~
scott_s
As I pointed out above, in gcc you can explicitly mark variables as
potentially unused.

~~~
ehnus
And what do you suggest for MSVC, which doesn't support attribute unused or
have any comparable declspecs?

~~~
scott_s
If what you say is true, then you have to either put up with the warnings,
compile at a level that doesn't have the warnings, or figure out a hack around
them - I don't like any of those options.

I'm not as dogmatic as some others are in the thread. I just think it's a good
idea to be warned when a variable is unused, enough that I'll mark my very few
that might be #ifdef'ed out. If you're using a compiler that doesn't let you
mark variables as such... well that sucks. You do what you can.

------
niallsmart
Ironic that this makes the homepage the same day as an article bashing unit
tests :)

------
audidude
Interestingly, they have the OpenBSD version that does word-size assignments
in asm for x86/arm. wonder why it wasn't being used.

[http://android.git.kernel.org/?p=platform/bionic.git;a=blob;...](http://android.git.kernel.org/?p=platform/bionic.git;a=blob;f=libc/arch-x86/string/memset.S;h=1059ccc99516e277af6e04ebc7b078c4f37d2e6d;hb=HEAD)
[http://android.git.kernel.org/?p=platform/bionic.git;a=blob;...](http://android.git.kernel.org/?p=platform/bionic.git;a=blob;f=libc/arch-
arm/bionic/memset.S;h=93abe15a2f9eb647c5d282310f5c7cf241457cd6;hb=HEAD)

~~~
fhars
It was. That's why this bug hasn't been caught before: the buggy code was
never actually compiled as part of a production build. Real builds always use
the version optimized for the actual hardware platform.

------
mahmud
Some libc hacker has fond memories of _bzero(3)_ , tsk tsk ;-)

------
jared314
Can someone who lives in the world of C fill in the blanks? If this was a
major typo, I have done things along those lines.

~~~
justinweiss
memset sets a block of memory to a given value. It is often used to initialize
an uninitialized block of memory to 0. So often, in fact, that this bug where
it _always_ initialized the memory to 0 survived so long.

~~~
abstractbill
_So often, in fact, that this bug where it always initialized the memory to 0
survived so long._

Given that the bug went undetected for so long, I think you could almost
reasonably claim that calling memset with a nonzero parameter is a "corner
case" ;)

~~~
Psyonic
It is a "corner case," or close to it, but that's not why it wasn't caught
here. I don't think this code is ever called, as explained above. It's default
code in case there isn't a platform specific version, but there pretty much
always is a platform specific version.

Link: <http://news.ycombinator.com/item?id=1379639>

------
snowbird122
Now this is hacker news. We don't see too many of these types of posts.

~~~
kelnos
No, this is The Daily WTF. "Oh look, someone wrote some really stupid code!"
Who cares? How does it enrich us to know that someone mis-implemented memset()
in a bootloader that possibly never even calls it, or if it does, probably
passes zero as the argument anyway? (Or is dead code because on any arch
anyone uses, memset() is implemented in asm.)

~~~
snowbird122
It enriched me to learn what memset was. I love hacker news, but I wouldn't
mind a few more posts that contain some code.

------
jacquesm
This is a typical cut-and-paste programmer bug, the person took 'memzero',
copied it and tried turning it in to memset.

Instead of special casing memzero as memset(p,0,count);

------
jwr
This bug might actually be difficult to find — 99% of the time memset is asked
to fill the memory with 0.

