

Interviewing programmers: Analysis taster (work in progress) - RiderOfGiraffes
http://www.solipsys.co.uk/Writings/TestsForProgrammers_Part_3.html

======
Groxx
I'd be interested in an analysis of the compiled assembly for those 5 versions
in 2b. A quick test on my end shows them all to be _extremely_ similar.

I know little assembly, and am not going to attempt to pick it apart, so I
can't attest to any speed difference. But as an example the first and the last
compile to the same, though the last has 2 fewer instructions and some -16s
changed to -24s. The first is identical to the second. The third has a bit of
shuffling, but also appears identical to the first. And the last two are
identical to each other (well, obviously).

How much optimization are you looking for in your C-code vs your coder's
productivity? If they can get the last ones in one shot, all well and good,
but if not is refactoring worth _anything_ in this case?

edit: with -O3 optimizations on gcc, they _all_ appear to be literally
identical, the only differences I can see being variable names and the order
of functions (in code, not in call order) between the first three and the last
two.

~~~
RiderOfGiraffes
Indeed - these are issues that I will be discussing, and would discuss with a
candidate once I had their code in hand. I say in 2b:

    
    
        Each of these changes is unnecessary and in some cases
        damage readability. Further, a good compiler will get
        most, if not all, of the efficiency for you without
        making these changes.
    

If changes damage readability and don't change the compiled code, _don't make
them!_

Readability has to be balanced against efficiency, but too often people think
something is more efficient when it isn't. As always, profile before
"improving."

As I say, these are issues for discussion. Let me add at this point that some
of the submitted versions, and some of my reference versions, are
significantly different in both style and efficiency.

There is more to come.

~~~
Groxx
I'll definitely follow along, it's been interesting. And I'd probably come up
with something like the last one after a little thought, because it's pretty
idiomatic of string copying.

I just tend to seriously dislike _code_ refactoring instead of _algorithm_
refactoring, and anything that advocates it. Typically, the compiler is _far_
better than the coder, especially when you turn on optimizations, but the
compiler (nearly) can't swap out your code for a better algorithmic approach.
Thus, you'd be _much_ better off with expansive, seemingly inefficient code
from a smart programmer who knows the be(st|tter) algorithm(s).

------
dkersten
Wow, I was surprised to see mine at the edge of the graph on its own. I wonder
is that a good or bad thing..! :-)

~~~
cperciva
My first two submissions aren't even on the graph. :-)

~~~
RiderOfGiraffes
There's a surprise. I did say I'd excluded outliers.

------
nitrogen
I'm quite familiar with C, but I don't see the difference between the while
and for loops in the trivia box on article 2 of the series.

I would expect the while loop to compile something like this:

    
    
      while(condition)
      {
        // stuff
        i++;
      }
    

becomes

    
    
      loop:
        if(!condition)
          goto end;
        // stuff
        i++;
        goto loop;
      end:
    

I would expect the for loop to do this:

    
    
      for(;condition;i++) {
        // stuff
      }
    

becomes

    
    
      loop:
        if(!condition)
          goto end;
        // stuff
        i++;
        goto loop;
      end:
    

So what's actually going on?

~~~
RiderOfGiraffes
Consider the effect of a "continue" statement in the "stuff" section.

~~~
nitrogen
Ah, thanks. It makes perfect sense now.

------
what
Is there a full size image so I can read the labels and find my submission? :)

~~~
RiderOfGiraffes
It wasn't intended for that purpose, but stand by ...

EDIT: OK, I've uploaded a larger version where the labels are readable, but
I've had to change the layout so the nodes don't overlap. The lengths of the
edges are now not always realted to the distance, although it's approximate.

<http://www.solipsys.co.uk/Writings/x.dot.png>

Note: Yours might still not be on it, as yours might be one of the outliers.
You can email me your ID and I'll try to find time to send you some data. Or
you can wait.

~~~
dpritchett
Yeowch! Mine (g034.d) shows up as the second red dot from the left. I
submitted a quick pseudocode answer without any research in an effort to mimic
interview whiteboarding conditions.

I had no idea my solution would be automatically validated - I was happy
enough to get an in-place O(n) solution. The complete lack of pointers on my
part must be what separated me from the cluster of correct solutions at the
center.

~~~
RiderOfGiraffes
You use _sizeof(test_string)_ which isn't what you want.

As an experiment I fixed that, but you had another problem as well which also
caused it to fail.

------
cperciva
How are you computing your similarity measure between routines?

~~~
RiderOfGiraffes
I'm replacing recognised keywords with their initial letter, other symbols
with "x", removing all spacing, retaining all punctuation, and then using a
Levenshtein distance.

For example, this:

    
    
      void condense_by_removing(
          char *z_terminated ,
          char char_to_remove
          ) {
        char *p_read;
        for (p_read = z_terminated;*p_read;p_read++)
          if (*p_read != char_to_remove)
            *z_terminated++ = *p_read;
    
        *z_terminated = '\0';
      }
    

gets mapped to this:

    
    
      vx(c*x,cx){c*x;f(x=x;*x;x++)i(*x!=x)*x++=*x;*x='\0';}
    

I'm debating inserting braces around every block to assist with the similarity
concept, but that's hard to do automatically without fully parsing the
routine. The above "fingerprint" would then become this:

    
    
      vx(c*x,cx){c*x;f(x=x;*x;x++){i(*x!=x){*x++=*x;}}*x='\0';}

~~~
cperciva
Did you consider tokenizing the inputs and comparing those? Based on
Levenshtein distance alone you're basically saying that "++" and "!=" are
twice as important as "=" or "*", which doesn't seem right to me.

Next question: How are you picking (x,y) coordinates for the graph? You've
explained how you determine the connectivity, but the positioning is a bit
unclear -- edges with the same score often have quite different lengths.

~~~
RiderOfGiraffes
I did consider tokenising the inputs, and probably will. The only reason not
to have done so yet was that this was a no-brainer in terms of getting
something working just to see if produced something useful.

I'm using neato for the layout. Graph layout is hard, and in some cases
unsolved. I'm using this for rough visualisation, then I'll write code to find
true clusters.

~~~
cperciva
_I'm using neato for the layout._

Ok, so you're using all the pairwise distances for computing the layout, even
though you're only showing the tree edges on the graph?

~~~
RiderOfGiraffes
No, I'm generating a tree.

Put every node in its own component. Find the shortest edge that joins two
components, emit that edge, merge the components. Lather, Rinse, Repeat.

Also, braces penalise you twice. Code that is identical except that one
includes, the other excludes, a pair of braces are distance 2 apart. There is
some reason to say they should have distance 0. Fully parenthesised code, and
then ignore the close (or open) brace would fix that.

------
unwind
This one is awesomely embarassing. I submitted a quick response, but it turned
out to (probably) be quite pessimal. Heh.

The challenge was of the category that triggers my "oh, that's easy!" knee-
jerk response, and I _still_ think it's easy. It's just that my knee didn't
jerk towards the solution shown by RiderOfGiraffes, which really feels like
the best one. :)

I think I'll blame job stress for my performance when it comes to this one.

~~~
RiderOfGiraffes
I wouldn't claim my solution is the best. In fact, I think it's definitely not
the best. It is, however, a starting point for comparison.

Given your comment I'd be interested to know which is yours - email me your
submission ID?

------
RiderOfGiraffes
Earlier item in this series: <http://news.ycombinator.com/item?id=1401117>

------
ghfdrvtsyup
whew! I'm glad I passed, though it should be obvious from my submission that
my C is far from fluent! I am looking forward to the analysis - thank you for
doing this RoG!

------
johkra
I seem to have one of the outliers. Well, let's wait for the detailed analysis
to see whether my test cases were good enough. ;-)

