
`three = 1` in the linux sourcecode - shark234
https://github.com/torvalds/linux/blob/d158fc7f36a25e19791d25a55da5623399a2644f/fs/ext4/resize.c#L698
======
lifthrasiir
[https://github.com/torvalds/linux/blob/d158fc7f36a25e19791d2...](https://github.com/torvalds/linux/blob/d158fc7f36a25e19791d25a55da5623399a2644f/fs/ext4/resize.c#L652)

Variables `three`, `five` and `seven` are better described as
`next_power_of_three`, `next_power_of_five` and `next_power_of_seven`. Since
the `ext4_list_backups` function should iterate through 1 (= 3^0 = 5^0 = 7^0),
3, 5, 7, 3^2, 5^2, 3^3, 7^2, ... and 1 should not repeat three times, the
initial value of `next_power_of_three` (or any of others) should be 1 and
those of others should not be 1. The naming is a bit unfortunate (couldn't
they be `threes` etc., for example?) but actually makes sense.

~~~
Syssiphus
A perfect example of a missing code-comment.

~~~
dangrossman
The explanatory comment is 40 lines _earlier_ in the same file where the
function those variables are being passed to is defined. It need not be
repeated every couple lines; "being clear to outsiders linked to a specific
line of a specific file without context" is not a reasonable concern.

~~~
orclev
The function declared earlier is perfectly fine, the comment is sufficient to
explain what it's doing, but functions should be understandable simply by
reading their source and in that respect the linked function fails. This would
be really simple to solve with a simple comment, E.G.

    
    
      // current power of three, init to 3^0
      unsigned three = 1;
    

The fact that this looks like a bug at first glance is a pretty good
indication that there should be some explanation of what exactly it's doing.

~~~
awda
This is actually contrary to Linux kernel "good style". Functions should be
short and understandable. Comments should be on the top of the function,
describing what the function is for. Comments describing variables in
functions are discouraged.

~~~
resu_nimda
Are you supporting that dogma or just stating it?

This brings to mind the Orwell essay on language usage[1]: _Break any of these
rules sooner than say anything outright barbarous._

IMO having "three = 1" with no immediate context qualifies as barbarous. Yes
it would be best to rename the variable, but, failing that, just toss in a
freaking comment for common sense's sake.

[1][https://www.mtholyoke.edu/acad/intrel/orwell46.htm](https://www.mtholyoke.edu/acad/intrel/orwell46.htm)

------
simias
Read the comment above ext4_list_backups right above:

[https://github.com/torvalds/linux/blob/d158fc7f36a25e19791d2...](https://github.com/torvalds/linux/blob/d158fc7f36a25e19791d25a55da5623399a2644f/fs/ext4/resize.c#L652)

    
    
        /*
         * Iterate through the groups which hold BACKUP superblock/GDT copies in an
         * ext4 filesystem.  The counters should be initialized to 1, 5, and 7 before
         * calling this for the first time.  In a sparse filesystem it will be the
         * sequence of powers of 3, 5, and 7: 1, 3, 5, 7, 9, 25, 27, 49, 81, ...
         * For a non-sparse filesystem it will be every group: 1, 2, 3, 4, ...
         */
    

I'm not sure what's so noteworthy about that. There's ton of arcane code in
any kernel, as long as it's commented correctly there's no issue.

~~~
h2s
See this comment:
[https://news.ycombinator.com/item?id=7296586](https://news.ycombinator.com/item?id=7296586)

Good commenting is no substitute for good naming. For a variable containing
the number 1, "three" is a shitty name.

~~~
simias
Okay? Is that really worthy of being submitted to hn however?

The function being called is directly above this declaration. It's a static
function not used outside of this file. What are the chances that someone
would edit this code without understanding what "three" is used for in this
context? Pretty slim I wager.

It's good that people are auditing the linux source code but if you stumble
upon some weird looking code (which again, is not the case here in my opinion)
the right way to deal with it is not to post it on hacker news. Contact the
maintainer (look at the MAINTAINER file at the root of the kernel) if possible
with a patch to fix the issue.

~~~
orclev
Depends on the intent of submitting it. If his intent was to point out a
potential bug in the Linux kernel then yes, it shouldn't have been submitted.
On the other hand what has resulted has been a rather interesting debate about
the merits of variable naming vs. proper commenting, with some side commentary
about what a shitty name something like three and five make for variables.

------
keypusher
If only there was a system for fixing the code yourself instead of having to
post about it to a widely read tech site.

~~~
VierScar
It's not a bug, as shown by the currently top comment. It is however,
misleading and could be better documented or named. Still, it is at least a
little amusing, worth posting here?

~~~
spinlock
I think its worth posting. It's the first piece of code I've seen on hacker
news in a while. It also makes a great point about good variable names making
code clear.

------
cauliturtle
> There are only two hard things in Computer Science: cache invalidation and
> naming things.

> \-- Phil Karlton

~~~
ProblemFactory
I've always thought that the "naming things" refers to the more subtle problem
of giving things unique identifiers in distributed systems, rather than coming
up with variable names.

"Naming things" includes systems like MAC address allocation, IP address
allocation, DNS, URLs for documents, process IDs, name to inode mapping in
filesystems, autoincrement primary keys in databases, etc.

Any ideas on what Karlton really meant with the quote?

~~~
omh
I always assumed it was the more general "coming up with names" \- for
variables, hostnames, project names etc.

The non-human-readable things like MACs, IPs and auto-increment keys are easy.
But when I have to come up with a short, memorable and non-confusing name for
something like a script that can take me longer than writing the code.

~~~
ProblemFactory
Unique IDs are easy on a single machine, but (just like cache invalidation)
become surprisingly complex in a distributed system.

For example, how to assign unique MAC numbers to network cards _without a
centralised database_ that has to be modified for each card manufactured?
Vendors get address space blocks, and probably split these further into blocks
for factories and production lines.

Generating unique autoincrement row numbers in a distributed database or
process IDs in a cluster is also a complex problem if it has to be _faster_
than any communication between the nodes.

Designing the layout of IP addresses, architecture of DHCP and DNS, and the
very idea of URLs are fantastic pieces of Computer Science work. Easy to use,
but a hard problem for the original designers!

But of course, I have no idea what Karlton had in mind with the quote. "Naming
things" as "assigning unique identifiers" always felt appropriate with cache
invalidation, as both are problems that are easy for single cores/machines,
but very complex in distributed systems.

------
koverstreet
/yawn

[https://github.com/torvalds/linux/blob/d158fc7f36a25e19791d2...](https://github.com/torvalds/linux/blob/d158fc7f36a25e19791d25a55da5623399a2644f/drivers/md/bcache/bset.c#L406)

~~~
codereflection
The comment on that function is a bit more entertaining than the original
post.

------
jakobe
Lot's of people suggesting to add explanatory comments. I don't think that's a
good idea, it would be better to change the names of the variables:

    
    
        unsigned counter1 = 1;
        unsigned counter2 = 5;
        unsigned counter3 = 7;
    

or use an array:

    
    
        unsigned counter[3] = {1,5,7};
    

No more confusion. That being said, I doubt that the names of these local
variables is actually a real source of confusion and is not a problem that
needs to be fixed.

~~~
jcalvinowens
> or use an array

That's absolutely horrible: you've now introduced the possibility of
accidental out-of-bounds access where there previously was none, not to
mention that you just forced the compiler to put those variables on the stack
instead of using registers.

Contrived example:

    
    
      int array(int lol)
      {
            unsigned counter[3] = {1,5,7};
    
            for (int i = 0; i < lol; i++)
                    counter[i % 3]++;
    
            return counter[0] + counter[1] + counter[2];
      }
    
      int vars(int lol)
      {
            unsigned counter1 = 1;
            unsigned counter2 = 5;
            unsigned counter3 = 7;
    
            for (int i = 0; i < lol; i++)
                    switch(i % 3) {
                    case 0: counter1++;
                    case 1: counter2++;
                    case 2: counter3++;
                    }
    
            return counter1 + counter2 + counter3;
      }
    

The output is quite different (gcc -c -O3 -std=gnu99):

    
    
      Disassembly of section .text:
    
      0000000000000000 <array>:
       0:   85 ff                   test   %edi,%edi
       2:   c7 44 24 e8 01 00 00    movl   $0x1,-0x18(%rsp)
       9:   00 
       a:   c7 44 24 ec 05 00 00    movl   $0x5,-0x14(%rsp)
      11:   00 
      12:   c7 44 24 f0 07 00 00    movl   $0x7,-0x10(%rsp)
      19:   00 
      1a:   7e 5f                   jle    7b <array+0x7b>
      1c:   41 b8 01 00 00 00       mov    $0x1,%r8d
      22:   31 c9                   xor    %ecx,%ecx
      24:   be 56 55 55 55          mov    $0x55555556,%esi
      29:   eb 1f                   jmp    4a <array+0x4a>
      2b:   0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
      30:   89 c8                   mov    %ecx,%eax
      32:   f7 ee                   imul   %esi
      34:   89 c8                   mov    %ecx,%eax
      36:   c1 f8 1f                sar    $0x1f,%eax
      39:   29 c2                   sub    %eax,%edx
      3b:   8d 04 52                lea    (%rdx,%rdx,2),%eax
      3e:   89 ca                   mov    %ecx,%edx
      40:   29 c2                   sub    %eax,%edx
      42:   48 63 c2                movslq %edx,%rax
      45:   44 8b 44 84 e8          mov    -0x18(%rsp,%rax,4),%r8d
      4a:   89 c8                   mov    %ecx,%eax
      4c:   f7 ee                   imul   %esi
      4e:   89 c8                   mov    %ecx,%eax
      50:   c1 f8 1f                sar    $0x1f,%eax
      53:   29 c2                   sub    %eax,%edx
      55:   8d 04 52                lea    (%rdx,%rdx,2),%eax
      58:   89 ca                   mov    %ecx,%edx
      5a:   83 c1 01                add    $0x1,%ecx
      5d:   29 c2                   sub    %eax,%edx
      5f:   39 f9                   cmp    %edi,%ecx
      61:   48 63 c2                movslq %edx,%rax
      64:   41 8d 50 01             lea    0x1(%r8),%edx
      68:   89 54 84 e8             mov    %edx,-0x18(%rsp,%rax,4)
      6c:   75 c2                   jne    30 <array+0x30>
      6e:   8b 44 24 ec             mov    -0x14(%rsp),%eax
      72:   03 44 24 e8             add    -0x18(%rsp),%eax
      76:   03 44 24 f0             add    -0x10(%rsp),%eax
      7a:   c3                      retq   
      7b:   b8 0d 00 00 00          mov    $0xd,%eax
      80:   c3                      retq   
      81:   66 66 66 66 66 66 2e    data32 data32 data32 data32 data32 nopw %cs:0x0(%rax,%rax,1)
      88:   0f 1f 84 00 00 00 00 
      8f:   00 
    
      0000000000000090 <vars>:
      90:   85 ff                   test   %edi,%edi
      92:   7e 52                   jle    e6 <vars+0x56>
      94:   31 c9                   xor    %ecx,%ecx
      96:   41 b8 07 00 00 00       mov    $0x7,%r8d
      9c:   41 b9 05 00 00 00       mov    $0x5,%r9d
      a2:   41 bb 01 00 00 00       mov    $0x1,%r11d
      a8:   41 ba 56 55 55 55       mov    $0x55555556,%r10d
      ae:   89 c8                   mov    %ecx,%eax
      b0:   89 ce                   mov    %ecx,%esi
      b2:   41 f7 ea                imul   %r10d
      b5:   c1 fe 1f                sar    $0x1f,%esi
      b8:   89 c8                   mov    %ecx,%eax
      ba:   29 f2                   sub    %esi,%edx
      bc:   8d 14 52                lea    (%rdx,%rdx,2),%edx
      bf:   29 d0                   sub    %edx,%eax
      c1:   83 f8 01                cmp    $0x1,%eax
      c4:   74 09                   je     cf <vars+0x3f>
      c6:   83 f8 02                cmp    $0x2,%eax
      c9:   74 08                   je     d3 <vars+0x43>
      cb:   41 83 c3 01             add    $0x1,%r11d
      cf:   41 83 c1 01             add    $0x1,%r9d
      d3:   83 c1 01                add    $0x1,%ecx
      d6:   41 83 c0 01             add    $0x1,%r8d
      da:   39 f9                   cmp    %edi,%ecx
      dc:   75 d0                   jne    ae <vars+0x1e>
      de:   45 01 d9                add    %r11d,%r9d
      e1:   43 8d 04 01             lea    (%r9,%r8,1),%eax
      e5:   c3                      retq   
      e6:   b8 0d 00 00 00          mov    $0xd,%eax
      eb:   c3                      retq
    

Arrays mean memory - don't use them unless you actually want the compiler to
use memory.

~~~
jakobe
Your example has nothing to do with the actual code in the kernel. In the
original code, the counters are passed by reference to a function. That means
they need to be in memory anyway, right? The compiler can't pass the address
of a register.

~~~
jcalvinowens
> That means they need to be in memory anyway, right?

Actually no, GCC can optimize out pointer indirection like that if it ends up
inlining the function (which it could in the kernel example, since it's
static, relatively short, and has only three callers).

Another contrived example:

    
    
      static int inlineme(int *lol, int lolol)
      {
            *lol += lolol;
            return *lol * 5;
      }
    
      int test(int x)
      {
            int tmp = 5;
            tmp += inlineme(&tmp, x);
            return tmp;
      }
    

... all of which compiles to (with -O2):

    
    
      0000000000000000 <test>:
       0:   83 c7 05                add    $0x5,%edi
       3:   8d 04 bf                lea    (%rdi,%rdi,4),%eax
       6:   01 f8                   add    %edi,%eax
       8:   c3                      retq
    

Note that "tmp" doesn't have to be a constant for that to happen.

As far as the actual kernel function, GCC ends up emitting: (The only
uncompressed kernel binary I had laying around was for my beaglebone black,
sorry)

    
    
      c0136fa0 <verify_reserved_gdb>:
      c0136fa0:       e92d4ff0        push    {r4, r5, r6, r7, r8, r9, sl, fp, lr}
      c0136fa4:       e24dd02c        sub     sp, sp, #44     ; 0x2c
      <snip>
      c0136fbc:       e3a03001        mov     r3, #1
      c0136fc0:       e58d301c        str     r3, [sp, #28]
      c0136fc4:       e3a03005        mov     r3, #5
      c0136fc8:       e58d3020        str     r3, [sp, #32]
      c0136fcc:       e3a03007        mov     r3, #7
      c0136fd0:       e58d3024        str     r3, [sp, #36]   ; 0x24
      c0136fd4:       e1a00007        mov     r0, r7
      c0136fd8:       e28d101c        add     r1, sp, #28
      c0136fdc:       e28d2020        add     r2, sp, #32
      c0136fe0:       e28d3024        add     r3, sp, #36     ; 0x24
      c0136fe4:       ebffffd4        bl      c0136f3c <ext4_list_backups>
    

... so in this case it does actually pass by reference.

That doesn't change my argument: an array is the more confusing way to do
this, and in similar situations could prevent the compiler from making the
optimization it did in my example.

------
adobriyan
Why don't you all "deserves better comment" people send a patch?

~~~
_pmf_
> Why don't you all "deserves better comment" people send a patch?

Trying to get a patch into Linux as a new developer without personally knowing
one of the (sub-)lieutenants is a futile exercise.

~~~
apaprocki
I did it (and repeated a few times) and I didn't know anyone. I did it by
reading about the process, formatting my patch correctly, and sending it to
the appropriate list. If it were so difficult, Linux would not be a success.
Now, just like other large projects, maintainers might dislike comment-only
nit patches, but if it objectively increases readability it might work.

~~~
army
It's probably easier to get it in when you're solving a real problem.

------
userbinator
That's a very... awkward of doing quite a simple thing, generating ascending
interleaved powers. I'd probably use one or more arrays of multipliers to
maintain the counters. Simpler and more general -- not often that you can get
a solution with both these two attributes.

------
raverbashing
And here's how the "rainbows and butterflies" world of "code perfectionists"
come crashing down.

Yes, the naming is unfortunate. No, there's probably no way to make it better
(and if you think code reviews suck, wait until you see a kernel code review)
(Well, this came directly from Linus but there's usually some discussion as
well)

But in the end this has shipped and is running. (Well, the double goto from
Apple as well, but I doubt that went through a code review)

------
jheriko
there are many, many problems in this code style which lead to this kind of
problem.

localised comments may help but naming things properly would help more (and
negate a comment being required at all), actually adopting some better
practices would be a better fix going forward.

looking at the called function's comment:

The counters should be initialized to 1, 5, and 7 before * calling this for
the first time.

so how about making a function with the same name, ending in '_first' (maybe
rename the original _next) which initialises these variables internally before
calling the version intended to be called multiple times...?

since the parameters are not well described by their names renaming them would
help too... it is not a pointer to a 3 a 5 or a 7. it might not even be
pointing at multiples... but current_multiple_of_three etc. would be better.
there is some reason why these values are significant in this algorithm (which
I do not know) and the best names would capture that.

the code is moderately difficult to read in the function itself - the method
of swapping around a local copy of a pointer based on the conditions is not
very easy to follow. although makes sense here to avoid excessive code...
comments would help make it clearer what is going on.

------
yusukeshinyama
At line 655: "... In a sparse filesystem it will be the sequence of powers of
3, 5, and 7: ..."

So I think this means three = pow(3, 0);

~~~
rockdoe
Doesn't explain why it's not 1, 1, 1 or 1, 5, 49.

~~~
thaumasiotes
It does, you're just not paying attention. The comment appears over the
function that uses those variables; it's not 1,1,1 because they want to hit 1
only once in the sequence 1,3,5,7,9,25,27,49,81,125, etc.

It doesn't explain why it's not 3,1,7 or 3,5,1, but that's because there is no
reason for that.

~~~
rockdoe
The full comment in the source explains it fine, but that's not what I'm
replying to; who's not paying attention here?

The abbreviated quote I was replying to just confuses things by leaving out
critical parts.

------
chrisBob
Wasn't there a similar thread a few months back about "two" being corrected to
equal 2?

~~~
adobriyan
Speaking of...

    
    
      kernel/sysctl.c
    
      #ifdef CONFIG_LOCKUP_DETECTOR
      static int sixty = 60;
      #endif
      
      static int __maybe_unused neg_one = -1;
      
      static int zero;
      static int __maybe_unused one = 1;
      static int __maybe_unused two = 2;
      static int __maybe_unused three = 3;
      static unsigned long one_ul = 1;
      static int one_hundred = 100;
      #ifdef CONFIG_PRINTK
      static int ten_thousand = 10000;
      #endif

------
Vextasy
Seems like a perfectly reasonable naming scheme to me. Anyone foolish enough
to think that the variables are there to hold the values 3, 5 and 7 is
probably wasting their time on the code in the first place.

Besides, it made you read the code and the comments.

------
blueskin_
Reminds me of 'how to write unmaintainable code':

[http://mindprod.com/jgloss/unmaincamouflage.html](http://mindprod.com/jgloss/unmaincamouflage.html)

------
Trindaz
I finally have an answer to the "what's your favorite line of linux source
code" question.

~~~
virtualSatai
This is a great one too:
[https://github.com/torvalds/linux/blob/d158fc7f36a25e19791d2...](https://github.com/torvalds/linux/blob/d158fc7f36a25e19791d25a55da5623399a2644f/drivers/md/bcache/bset.c#L393)

------
zamalek
Is this because there is some inconsistency in the GDT documentation?

------
athenagoras
Trinitarian theology?

------
wazoox
That deserves an explanation in the code comments...

~~~
h2s
Better still, it deserves to be replaced with a better name. Comments should
not be our first port of call when we notice an issue like this in code. Fix
the name. If you can't fix the name, you hang your head in shame and _then_
you write a comment.

