Hacker News new | past | comments | ask | show | jobs | submit login
Branch-free FizzBuzz in Assembly (pepijndevos.nl)
47 points by luu on Jan 3, 2015 | hide | past | favorite | 43 comments



This is equivalent code in C if anyone is interested.

  #include <stdio.h>
  #include <stdlib.h>

  const char* table[] = { "%d\n" , "Fizz\n" , "Buzz\n" , "FizzBuzz\n" } ;

  void E( int i )
	{
	    exit( 0 ) ;
	}

  void F( int i )
	{
	        size_t c = !( i%3 ) + !( i%5 )*2 ;
		printf( table[c] , i ) ;
	}

  void ( *func[2] )( int ) = { F , E } ;

  int main( void )
	{
		int p = 1 ;
		while( 1 )
		{
			func[p/102]( p++ ) ;
		}

		return 0 ;
	}
This of course only avoids conditional branches.

EDIT: I just noticed I have undefined behavior in my code.

Bonus internet points for the first one to point it out!


"Unsequenced modification and access to 'p'" for:

    func[p/102]( p++ ) ;
A quick search brought me to http://www.bionoren.com/blog/2013/07/unsequenced-modificatio... : 'Basically, the compiler is free to reorder anything and everything until it hits a “sequence point”. Things like return, if, assignment, variable declaration, etc are all sequence points.'

In the parent code sample, it seems the compiler is free to execute "p++" before the "p/102" since there is no sequence point between them.

I was previously unaware of this undefined behavior. Thanks for this, rertrree!


Have you looked at the generated ASM? I suspect printf contains a lot of branches. But then so might those syscalls I guess.


Yes.

Library calls will have conditionals, but apart from that there are none.


I've been writing a bunch of branch-free code recently, so I took a crack at it. In my world, data-dependent table lookups might as well be branches, so let's eliminate those as well.

Here's a fizzbuzz(char*, int) function that can accept any number up to 99999999, and will put the correct FizzBuzz result into the provided buffer (either the printed number, "Fizz ", " Buzz", or "FizzBuzz"). As promised, it's loop-free, and as a bonus it should be constant-time as well:

Assembly: http://pastebin.com/EnJEuxnp compiled from this C: http://pastebin.com/PCQQQ2cn [edit] generated from this Python: http://pastebin.com/ijr3thE2

Pastebinned since it's about 700 assembly instructions.

Unrolling the loop and printing to the screen are left as exercises...


Oops: 0x00000001000015af <fizzbuzz+47>: cmp edx,0x0

                                        ^^^


It does a cmp+setz, which isn't a branch (and I believe is constant-time). An x86 branch would be cmp+jne or something similar.


Do you mind sharing the python as well?


Sure! It's ripped + modified from a project I'm working on, so it's a little messy:

http://pastebin.com/ijr3thE2


Random fizzy buzzy observation:

Always a little baffled when I see three cases (fizz, buzz, and fizzbuzz) rather than both fizz and buzz handled well such that modulo 15 gets a fizz and a buzz.

The normal fizz buzz is divisible by 3 gives a fizz, divisible by 5 gives a buzz. It so happens that 15 is both, so should exhibit both. But as I see it, handling the 15 right should be emergent behavior of a correctly written fizz and correctly written buzz, not a separate test case.

But what if the business requirement changes that divisible by 4 gives a fizz and divisible by 5 gives a buzz? Now the "3 cases" programmer doesn't just change 3 to 4, and correctly see fizzbuzz emerge on divisible by 20, he's still got a spurious fizzbuzz on the divisible by 15 case. He's got to remember this case, do some manual computation, and change his 15 to 20. Structuring the code in such a way the programmer has to track all ancillary implications of one business rule change seems a recipe for disaster.

So I consider as broken all code that explicitly prints "fizzbuzz" with a third conditional.


It's interesting that you can't imagine code that explicitly prints fizzbuzz with a third conditional. It's as simple as this:

for i in fizzbuzz range fizz = i % 3 == 0 buzz = i % 5 == 0 if fizz and buzz print fizzbuzz else if fizz print fizz else if buzz pritn buzz else print i

If you want to see how they react to changing requirements, then ask them to change it. It's stupid to look at someone's answer to a toy problem and treat it like it's production code.


> It's stupid to look at someone's answer to a toy problem and treat it like it's production code.

There are no toy problems in interviews. The way you approach a toy problem says everything about how you would write production code according to the 'how to hire programmers' manual v 3.3.

Obviously that's not how it should be but if you're going to places where they think 'fizzbuzz' is going to weed out the ones they don't want then you'd better be showing off your capabilities.


How would you handle a case where it's not divisible by either? We need to handle a conditional that evaluates to true when the number is not divisible by either 3 or 5:

    #include <stdio.h>
    void main()
    {
        unsigned int i, is3, is5;

        for (i = 1; i < 101; i++)
        {
            is3 = i % 3 == 0;
            is5 = i % 5 == 0;

            // If both are true,
            // FizzBuzz gets printed
            if (is3) printf("Fizz");
            if (is5) printf("Buzz");

            // Still need to handle
            // the raw number though
            if (!is3 && !is5) printf("%i", i);

            printf("\n");
        }
    }


I consider as broken all interviewers who extrapolate a candidate's decision on this issue to real-world "business requirement changes."


This is one of the worst parts of technical interviews.

"Take 10 minutes to write out code for this simple, abstract, poorly defined problem. Okay, now I'm going to ding you because you didn't write it using the same mindset you'd use when writing a large application tailored towards a specific domain and given a detailed set of constraints."

Maybe we should call this sort of cargo-cultism Schroedinger interviews. If you make your code simple and easily readable in order to solve the problem in the most intelligible way, you get penalized for not making your solution scalable/easily adaptable to "business requirement changes". If you structure your code so that it can easily be extended or modified you get penalized for YAGNI and premature optimization. Of course, exactly what the interviewer is looking for is never actually specified as part of the problem statement, because it exists in a state of quantum superposition until right before the interviewer decides whether or not he or she likes your physical appearance/sense of humor/"cultural fit".


A good interview exercise (not necessarily vouching for fizzbuzz here) allows depth of discussion past the initial answer. It is in this discussion stage where I spend the majority of my time with a good candidate.

In such a discussion we might talk about whether it makes sense to make the /15 case "fall out" naturally. Using your example, there's an implicit assumption that the requirements would change to {4: fizz, 5: buzz, 20: fizzbuzz}, but in real life I've found things have the annoying tendency to change to, e.g. {4: fizz, 5: buzz, 15: fizzbuzz}.


You assumption that a separate case that prints fizzbuzz will break the code is not correct. With correct code, you get three if checks either way, but the readability is higher, code complexity is lower and cases are nicely separated.

  bool is3 = i%3 == 0 ;
  bool is5 = i%5 == 0 ;	
  
  if( is3 && is5 )
  {
  	printf("fizzbuzz\n") ;
  }
  else if( is3 )
  {
  	printf("fizz\n") ;
  }
  else if( is5 )
  {
  	printf("buzz\n") ;
  }
  else
  {
  	printf("%d\n" , i ) ;
  }


And what if there's a is7 woof criteria as well? You going to add another set of special cases for fizzwoof, buzzwoof, and fizzbuzzwoof?

You see the problem, versus just doing fizz, buzz, and woof distinctly?


It is not logical to subvert your code to adhere to some imaginary future requirements.


Well, I'm always a little baffled when I see such great techniques for rejecting potentially awesome candidates.

Why overthink a toy example? The only purpose served by fizzbuzz is to weed out people who can't write any code.

If you want to know anything else, ask a more relevant technical question. It's downright lazy to simply jump to conclusions, and as a result you end up hiring the wrong person.


To be clear, I'm not referring to candidates. I'm referring to the questioners/interviewers suggesting this approach.


Here I have an explicit "FizzBuzz" but only 2 checks. Do you consider this also broken?

   python3 -c '
     for i in range(1,20):
       print([i,"Fizz","Buzz","FizzBuzz"][int(i%3==0) + 2*int(i%5==0)])'


>But what if the business requirement changes...

Are you kidding? This is FizzBuzz! It is just a quick check to verify that an applicant is even a programmer at all. That's all it is. There is nothing else there.


Depends on how the task is communicated and what's implied.

Also recognize that there are excellent programmers who take to heart Dijkstra's dictum to "Always design your programs as a member of a whole family of programs, including those that are likely to succeed it."

Has AmaTwitBookGooSoft ever asked for a generalization of FizzBuzz? I wouldn't put it pass them.


I don't think you know what the word 'broken' means. You sound like the type of person that would make code as terse as possible with all kinds of traps for future maintainers.


Fizz on /3, Buzz on /5, is the actual requirement. Shortcutting /3 & /5 with a hard coded /15 that has to be edited if /3 or /5 changes, that's the trap.


Apparently the Rust compiler produces a slightly different branch-free FizzBuzz[1] involving this delightful snippet:

    static OUT: &'static [u8] = b"\
    1\n2\nFizz\n4\nBuzz\nFizz\n7\n8\nFizz\nBuzz\n11\nFizz\n13\n14\nFizzBuzz\n\
    16\n17\nFizz\n19\nBuzz\nFizz\n22\n23\nFizz\nBuzz\n26\nFizz\n28\n29\nFizzBuzz\n\
    31\n32\nFizz\n34\nBuzz\nFizz\n37\n38\nFizz\nBuzz\n41\nFizz\n43\n44\nFizzBuzz\n\
    46\n47\nFizz\n49\nBuzz\nFizz\n52\n53\nFizz\nBuzz\n56\nFizz\n58\n59\nFizzBuzz\n\
    61\n62\nFizz\n64\nBuzz\nFizz\n67\n68\nFizz\nBuzz\n71\nFizz\n73\n74\nFizzBuzz\n\
    76\n77\nFizz\n79\nBuzz\nFizz\n82\n83\nFizz\nBuzz\n86\nFizz\n88\n89\nFizzBuzz\n\
    91\n92\nFizz\n94\nBuzz\nFizz\n97\n98\nFizz\nBuzz\n";

    ...
1: http://chrismorgan.info/blog/rust-fizzbuzz.html


By my reading that implementation was assembled by humans, not the optimizer.

Although, there's certainly nothing magical that prevents the optimizer from generating such code. Here's something[1] I just threw together that (ab)uses constexpr to do just that.

On my machine, building with:

  $ clang++ -std=c++14 fizzbuzz.cpp -O2 -S
Gives code similar to:

  main:                                   # @main
          .cfi_startproc
  # BB#0:
          pushq   %rax
  .Ltmp0:
          .cfi_def_cfa_offset 16
          movl    $1, %edi
          movl    $_ZL6output, %esi
          movl    $413, %edx              # imm = 0x19D
          callq   write
          xorl    %eax, %eax
          popq    %rdx
          retq
  .Ltmp1:
          .size   main, .Ltmp1-main
          .cfi_endproc
  
          .type   _ZL6output,@object      # @_ZL6output
          .section        .rodata,"a",@progbits
  _ZL6output:
          .ascii  "1\n2\nFizz\n4\nBuzz\nFizz\n7\n8\nFizz\nBuzz\n11\nFizz\n13\n14\nFizzBuzz\n16\n17\nFizz\n19\nBuzz\nFizz\n22\n23\nFizz\nBuzz\n26\nFizz\n28\n29\nFizzBuzz\n31\n32\nFizz\n34\nBuzz\nFizz\n37\n38\nFizz\nBuzz\n41\nFizz\n43\n44\nFizzBuzz\n46\n47\nFizz\n49\nBuzz\nFizz\n52\n53\nFizz\nBuzz\n56\nFizz\n58\n59\nFizzBuzz\n61\n62\nFizz\n64\nBuzz\nFizz\n67\n68\nFizz\nBuzz\n71\nFizz\n73\n74\nFizzBuzz\n76\n77\nFizz\n79\nBuzz\nFizz\n82\n83\nFizz\nBuzz\n86\nFizz\n88\n89\nFizzBuzz\n91\n92\nFizz\n94\nBuzz\nFizz\n97\n98\nFizz\nBuzz\n"
Of course, having now written this I feel like I should retroactively fail my last interview.

[1] https://gist.github.com/anonymous/7818f902a374a953b274


This code snippet really made my night. I never thought of using constexpr to write a FizzBuzz that compiles to one of the most "optimal" solutions that I have ever seen. Awesome!


The last time I saw a FizzBuzz post, I wrote a fairly typical mod and if version, and one that used a mod and lookup table both in C.

It turned out the compiler (clang and I think gcc as well) had converted the branchy version into a lookup table, that ran a tiny bit faster then the lookup table by hand. The assembly looked fairly similar so it was something fairly subtle, but I was out of my depth at that point.


  int

  ret

  int

  ret

  ret

  call

  call

  call

  int

  jmp
No conditional branches. As long as you ignore what happens on the other side of those int instructions! Especially that last one...


Yea, you could probably inline the calls and unroll the loop, but I don't see a way of getting around the syscalls.

I'd be curious to know what those syscalls are doing inside.


Care to elaborate? Isn't `jmp` just a non-conditional jump? How is that branching?


Branch and jump are synonyms in this context.


Oh cool! I always thought of a branch as a conditional jump - I guess I was wrong. It appears that even unconditional branches are still branches!

"A branch is an instruction in a computer program that may, when executed by a computer, cause the computer to begin execution of a different instruction sequence. Branch (or branching, branched) may also refer to the act of beginning execution of a different instruction sequence due to executing a branch instruction. A branch instruction can be either an unconditional branch, which always results in branching, or a conditional branch, which may or may not cause branching depending on some condition."


I wouldn't be surprised if there was some place that used "branch" to indicate conditional-only. Assembly language terminology seems to be wonderfully inconsistent. But typical usage these days seems to be as you quoted.


Intel uses Jcc (Jump conditional) while Motorola uses Bcc (Branch conditional). Additionally, the Motorola 68k has both JMP and BRA (branch unconditionally) and the Motorola 6809 has JMP, BRA and BRN (branch never---useful for a two or four byte NOP, or for changing a conditional branch when patching code).


But generally being branch-free is in the context of avoiding a mispredict and pipeline stall. The processor is perfectly capable of pipelining an unconditional jump


I tried to reimplement it with less code and data (as x86 instead of x86_64 however).

I stopped at 110 bytes .text and 49 bytes .data: http://pastebin.com/bPfrB165


I wonder if CMOVcc instruction on x86 is considered a branch. It is fairly trivial to implement fizzbuz without Jcc but with CMOVcc.

EDIT: Or SETcc, minus 1, and use it as data mask.


I don't get sites that have moving parts in the background at all - this makes the reading experience very annoying.


If there's one thing I love while reading, it's animation! Really makes it easy to focus on the text!


I think it's neat, which is probably what they were going for.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: