EDIT: And another 10 in the last 12 minutes. Thank you to those doing as I ask - it's making things easier. I'm going off to do some work now - I'm a little concerned about what I'll come back to ... although it is interesting. As I said earlier, I'll collect submissions until there's a 48 hour gap, then I'll start processing them.
Nor does http://www.solipsys.co.uk/new/ColinWright.html?Personal - though I'm not sure that Colin Wright even wrote that, given I've yet to find a link to the article from the root, and the article doesn't have the author's name ...
I'm one of the HNers that sent in code for the test. The feedback provided by RiderOfGiraffes was useful and the follow up questions were food for thought.
At the time I'd just failed a coding test for a systems programmer role and felt pretty low, which is why I decided to take this on.
In the interview I failed, I fared well in most of the questions, apart from one set involving heavy binary maths — I pretty much forgot basic binary computation from when I did my undergrad course. Following this, it prompted me to buy "Hacker's Delight" by Henry S. Warren. It's a great read if you're into high-performance algorithms http://amzn.com/0201914654
You're testing whether active participants on HN that volunteer to do a problem are worth talking to?
I've done tests like this in the past couple of years for everyone responding to a job ad for all sorts of positions. The job ads usually include the problem(s), so I'm getting the same self-selection you are. Combining our results for a freakonomics style conclusion:
HN readers are an order of magnitude more skilled than job ad readers.
I'm not really testing them - I'm providing them with information about the tests I do, why I do them, what I do with them, and the discussions that arise from them. If you don't try it for yourself then you haven't given yourself the chance to think of the issues that I then raise in subsequent discussions.
These test aren't for everyone, they are examples of what I specifically do.
We already know that self-selected from HN already means to 0.1%.
OK - I've had a flood of emails trying the challenge, so I'm going to delay the next post about this to let more people have a go. It's not automated so I'm dealing with it by hand, but feel free to have a go. I'll do the next posting when I've received no submissions for 48 hours.
// esi -> asciiz string
// cl = character to remove
// NB cl=0 won't work well ;)
mov edi, esi
cld
loop1:
lodsb // Load the byte from [esi] into al
cmp al, cl
je loop1 // Skip this byte
stosb
or al, al
jnz loop1
x86 assembly is full of old cruft, that 'cld' there is a thing that if you've ever forgotten it probably cost you a lot of time.
99% of the time the flag is clear... unless it isn't.
So for a seasoned assembly programmer that cld is idiomatic, axod probably typed the instruction reflexively because he knows he can't rely on the state of the direction flag, even if it has nothing to do with the problem per-se.
Huh? This is C (well, asm). If you pass in garbage, you get undefined behaviour (a crash by NULL pointer dereference); this is how all str* functions work. Even
assert(str != NULL)
is being overly generous, I think - if only because a debugger would lead you to the source of the crash immediately.
You have not received any specification about the context in which the code operates, therefore the correct course of action is to assume the worse and produce bullet proof code.
If you were operating under constraints in terms of speed that would warrant a more cavalier attitude towards input checking then that would be another thing but since that wasn't specified and there is no context, 'safety first' is the way to approach the issue.
The specification states that "your routine should modify the given zero-terminated string in place". This implies that a zero-terminated string is passed in, and that it is mutable. If this is not the case, it is impossible to perform as specified; any behavior is wrong. An assertion error would be the ideal way to handle things; an immediate segfault is almost as good (and probably the practical thing to do if the string is immutable or not null-terminated). And of course, there are some error situations you simply can't detect - eg, not all cases of non-terminated strings are detectable, because you run into and mutate some other heap garbage before hitting an unmapped or immutable page.
I think it is actually really cool that RoG did not specify that part of the problem because in real life that is exactly how stuff gets specified, and plenty of times that's where the trouble comes from. Asking to revise a spec so that it is explicit is usually a very good course of action.
Not all hardware will segfault when writing to NULL, the immutable should segfault and strings that are not null terminated are indeed hard to impossible to check for (pointer wrap..., after a lot of swapping, or a stack overwrite).
Also note that libc string functions generally seg-fault when passed invalid addresses (including NULL). Implicitly expecting more sophisticated error handling in this scenario is silly.
So, you either go back to the source and ask for more info, or you take a stab at what is the right thing to do.
In this case that wasn't possible, so I assumed the worst, and I think that simply noting that there is a potential problem there would already score you points with the interviewer because they'd realize you spotted that the specification was incomplete, it didn't tell you what to do in case of invalid input.
I think this may be a cultural thing. C programmers are very much used to the standard not defining the behaviour of some things, e.g. what happens when you call strcmp(str, NULL). Segfaulting is expected here, and easy to track down.
(I would use some assert()s if a function could cause silent memory corruption; but this particular function either works (valid string), segfaults (NULL) or cannot detect a problem (stray pointer).)
Oh definitely. I agree. I assumed though that the "problem" itself was the main aim here, not testing if someone can write scaffolding error checking code.
How far do you go? Check that the memory isn't mapped as read only? ;) Check that there's actually some memory mapped into that address space?
Obviously you need both skills, but they're pretty related I'd say. If you can solve low level problems you can probably think through any potential issues, invalid inputs, edge cases etc and churn out code to deal with them.
I guess my issue is I don't see what being able to solve this says about someone, apart from "isn't completely terrible at programming".
I'd be really interested to see your original version and your latest version side-by-side, because I'm wondering if your "clean-up" actually changed its behavior.
// esi -> asciiz string
// cl = character to remove
mov edi, esi
cld
loop1:
lodsb // Load the byte from [esi] into al
or al, al
jz alldone
cmp al, cl
je loop1 // Skip this byte, since it's one we want to remove
stosb // Store the byte at [edi] from al
jmp loop1 // Loop around for the next byte
alldone:
xor al, al
stosb // 0 terminate it
The only real behavioral change was that this version would cope with an input of cl=0. My cleaned up version would not cope with cl=0, but would need a sanity check for that.
Hmm. My memory must be faulty, but I don't remember the first one having the final xor and stosb. I do remember looking at your solution and thinking that you'd forgotten it, that's why I expressed an interest in seeing the original.
heheh my initial version also explicitly added the 0, and then I also found a way to do it in the loop. Seems a common evolution.
But your final solution is clearer than mine. While it uses an extra variable (register), it avoids looking up the same thing twice. So... more efficient and clearer.
BTW: TIL char * a = "hello"; doesn't work in C (I think it doesn't allocate memory). I'm actually happy enough that I could solve this at all, not having used C for 20 years ...old
Actually, I might just be tired. :-p I don't see anything wrong with axod's code. Just emailed you a fairly verbose C version. I hate writing production code...
Even if you didn't know x86 at all you should still be able to figure out what it does by looking at it given the fact that the specification of what it does is only one sentence long.
No doubt, but if my desire is not to see other answers, I'm not going to read the assembly closely enough to understand. Your objection makes more sense with indented programming languages whose indentation structure may give away more information.
I can ignore x86 assembly just like I can ignore French: the graphical structure (i.e., indentation) of an x86 program gives away nothing, just like a paragraph of French gives me no immediate sense of its meaning. In either case, I could probably puzzle out the meaning with intense reading, but the mere presence of the text does not give it away.
My C solution, on the other hand, has a distinct graphical structure which would indicate, even to people for whom C isn't their "native language," a lot of the meaning of the text with merely a glance.
It was dead easy and I'm seriously surprised that anybody on HN that's here not for the 'startup' angle would have a problem implementing this quickly and correctly.
As code challenges come I don't think I've ever seen a more trivial example.
This was my sentiment. I've been asked (and succeeded, presumably so do other people) to write pseudocode to solve more complicated problems than this on a whiteboard, in an interview.
Maybe this seems harder to people unaccustomed to C. I don't know. It seems like it would make sense to aim just a little higher, though. I recognize that it has to be a small enough that people won't be put off by it, but most people will be willing to invest the fifteen minutes necessary to solve a mildly complicated problem.
It's not about being arrogant. Unless you have never done any real programming apart from simply calling library functions, the original problem is about as interesting as asking "Write a program to add 47 and 88 together".
I think the issue is that many programmers have never done real programming. They just chain library calls together. It's like comparing a furniture maker who starts with a tree, to a furniture maker who buys parts from IKEA and assembles them.
To be a really good programmer, you need to be really really at ease with bits bytes, moving stuff around, and getting your hands dirty with real programming.
I studied applied math instead of computer science, and the exercise took me a good 15 minutes plus some time to adjust to C as I've only worked with C++. Then another 5 to simplify some redundancies. I believe the solution I came up with was elegant, correct, and idiomatic.
Does this mean I'm not a good programmer? If I worked with C and pointers a lot then it might, but I find the attitude that not being able to slip into that mindset immediately equivalent to being an inept programmer arrogant.
I'd say that a good answer is a good answer regardless of language. I wouldn't personally care what language someone wrote the answer in as long as they got the key concepts which in this case are pretty damn simple -
Have a source pointer, have a destination pointer. Copy stuff from src that you want, over to dest. Leave out any bytes of value N. Don't forget to copy/make a new 0 at the end.
And that's it. That's being a programmer. Knowing some particular languages syntax isn't. That's the easy/irrelevant bit IMHO.
So it really depends on if that 15 minutes you spent was to arrive at the 'solution' I wrote out in words above, or if it was checking up syntax to write it in C as to wether you're a good programmer or not.
I just submitted a solution. I believe that one can tell "has this person written a non-trivial amount of C code?" from 15 seconds glancing at their answer in your peripheral vision.
I agree that it can be solved essentially the same way in many languages, but the idiomatic Erlang solution will look a lot different than the idiomatic C solution.
Edit: After reading some of the code posted here and linked from here, perhaps I don't have the same sense as others of what the idiomatic C solution is either. This should be interesting to read the followup posts by RoG.
I agree that it's a simple problem. I just don't type quite fast enough to do this in two minutes (especially if I count time spent making a main function and testing my solution).
One thing I had to do years ago as part of an interview was to write a C function to find the first bit set in a block of memory, flip it and return the location of the bit. Doing that at a whiteboard while explaining to the bearded chap interviewing me what I was doing.
From what I recall, I actually sort of enjoyed solving it as my C was bit rusty as I'd mostly been programming in Lisp.
Exactly. The comment I replied to was asking for something less trivial, while this is probably all you have to do to identify the candidates who can't code.
The FizzBuzz problem is at least one order of magnitude easier than this, and still is able to rule out a great part of the candidates (by time limits, according to CodingHorror, not exactly by not being able to code it).
The good thing with this test is that you can actually get a glimpse of the ability of the candidate to come up with an efficient solution and identify its complexity.
In that case I completely agree that trivial problems would weed out the candidates that can't program, however who says you can't give something harder than FizzBuzz?
Also, FizzBuzz is now so common on the web that a simple search would produce numerous solutions for those that can't program, allowing them to get through to an interview. Assuming this is your yard-stick.
I'm the inventor of the fizzbuzz problem and I kept using the question for a long time after it became popular and I didn't see a single candidate (out of maybe ~100) who'd just memorized the answer. I'm guessing the kind of people who can't program are generally the kind that don't do indepth background research/prep prior to interviews.
That's just really sad, that there are people that would be interviewing for a coding position and it turns out they can't even do that in under a minute or so.
I got fizzbuzzed in an interview before I knew what fizzbuzz was. It took about a minute for me to think "what? you want me to do what? Why? ... Ok?" So I would say 1-10 minutes is acceptable.
I'd have a problem with being interviewed like that because the question is so simple I'd think they're pulling a joke or something.
First thing I'd ask is where is the camera :)
I can see how for a junior coding position this might be an appropriate question, say people fresh out of school, and then 1 to 10 minutes might be acceptable.
But 4 modulo statements (or 3, if you think about the problem a bit longer) and a loop ?
10 minutes ?
That's pretty slow. I understand there is a lot more to programming than coding up a simple solution like this but as problems come it is really a very simple one and if someone would take 10 minutes to put this together I'd be a bit worried about throughput, and probably about experience as well.
When I first sat down to write FizzBuzz for kicks I spent a few minutes trying to think of an elegant or clever solution before writing the obvious answer. I think I would be similarly on guard in an interview situation. I'd say 1 to 10 minutes is easily acceptable accounting for over thinking.
Now if they are actually struggling for 10 minutes, that's another matter.
With regards FizzBuzz, my follow-up goes a bit like this:
Now suppose the modulo test is really expensive. Suppose we're doing something complicated with large records on disk and we want to do this is that's true, something else if the other is true, etc, etc, just as in FizzBuzz.
How would you restructure your code so it's still obvious to a maintainer what it's doing, but so that it avoids doing more modulo operations than necessary.
You see, all these trivial exercises can be used as starting points for deeper conversations about aspects of coding.
Weird example, but fine, we can reduce the number of modulo operations if that is the one with the 'cost', but that's not very realistic given that we're printing stuff to stdout, but if we take your premise as true and we optimize for the cost of the modulo operator you would get silly solutions like this:
main()
{
int i;
int ncounters = 2;
int counters[2] ;
int presets[2] = { 5, 3};
int n; // number of counters that tripped
int j;
char * strings[2] = { "fizz", "buzz" };
// first time, copy presets to counters
for (j=0;j<ncounters;j++) {
counters[j] = presets[j];
}
for (i=1;i<=100;i++) {
n = 0; // reset number of counters that have zeroed
for (j=0;j<ncounters;j++) {
counters[j] = counters[j] - 1;
// output relevant string when counter trips
if (counters[j] == 0) {
// separate strings by dashes if more than one counter trips
if (n != 0) {
printf("-");
}
n++;
counters[j] = presets[j];
printf("%s",strings[j]);
}
}
// no counter tripped, just output the number
if (n == 0) {
printf("%d",i);
}
printf("\n");
}
}
Forgive the lack of comments and the hardcoded number of strings.
No modulo operations.
By asking the question that way you'd get solutions that you're not really looking for.
To be honest, I didn't believe in the FizzBuzz situation when I read it. I didn't believe that it was possible to exist such a group of programmers who couldn't do a simple code like that.
Assuming such a group doesn't exist, I'm all for harder problems, harder than the one presented.
But assuming the group exists, and assuming, as I did, that the problem presented tries to weed out its members, I just though something that "hard" wasn't needed.
But of course, the point of an interview is to identify the good, not the bad, so harder problems would do just fine as well in ruling out the bad.
But the whole point of the article was a user's doubt in being able to pass a FizzBuzz kind of problem. For that, something harder than the problem presented may probably not be embarrassing at all to fail at.
>But of course, the point of an interview is to identify the good, not the bad, so harder problems would do just fine as well in ruling out the bad.
Yes, and no. Hard problems, being hard, mean that you don't expect all the good programmers to pass them, but to see how they think around them. And some people are able to talk the talk without walking the walk.
It's better to have an easy problem and a hard one. The easy one weeds awfully bad people, the hard one helps you find the good among the rest.
If you're interviewing and are interested in filtering out the idiots with little work on your part, http://www.ziprecruiter.com/ may be useful. (Disclaimer. I know the people who created it. But it still is a useful piece of work reduction.)
I'm really amazed at the variation I'm seeing in the solutions that just my friends are providing. One said his first try would be quadratic, and his second would require allocation; another used a strange mix of indexing and pointer arithmetic. Even those who commit to either indexing or pointer arithmetic have significant variation in their solutions.
I actually think there's a distinct best solution to this problem, so it's remarkable to me that there's so much variation between people that I consider to be good programmers.
That seems pretty poor. I'm a pretty crappy programmer and haven't written any C in 18 years, but despite tripping over my feet for fifteen minutes, I have a working 4 line function. I despair for what passes as a professional programmer in shops that don't bother to check their people out properly..
Added: In the interests of "put up or shut up": http://gist.github.com/415975 - no idea if it's OK by today's standards but hey, it works
I think these names make it clearer that neither "z_terminated[condensed]" nor "z_terminated[condensed - 1]" are reliable ways to find the end of the original string.
Well, if you don't understand your own code, it's probably not a good idea to write it that way (for-loops are a lot easier to read & verify if the stop condition depends on the loop variable, for example).
But your solution isn't just hard to read, it's also horribly broken. If you cannot figure out why, try printing the value of 'i' inside the loop.
void condense_by_removing (char *z_terminated, char char_to_remove)
{
int i, index;
for (i = 0, index = 0; i <= strlen(z_terminated); ++i) {
if(z_terminated[i] != char_to_remove) {
z_terminated[index++] = z_terminated[i];
}
}
}
int main ()
{
char dt[] = "Now is the winter of our discount tents";
condense_by_removing (dt, 'u');
printf ("%s\n", dt);
}
Points for the <= copying of the termination character :)
You do make two passes over the string (one for strlen and one for the copy loop), also you assume the compiler will optimize the caching of the return value of the strlen function, if it doesn't your performance will be horrible.
It's actually O(2n), so that works out to O(n) but it does have a 50% penalty, assuming an optimizing compiler.
Bwhite is not all wrong though, on a non-optimizing compiler you are hitting the string every time in the loop because of the strlen, and then it is O(n^2).
on a non-optimizing compiler you are hitting the string every time in the loop because of the strlen
In practice, what assumptions one should make here about optimizing compilers?
It seems like it would take a lot of compiler intelligence to figure out that even though we are modifying the string in place, we are never terminating the string, and thus strlen() remains constant. Thus my instinct would be to never use it in a loop like this.
But maybe current compilers are actually this smart? I just compiled with gcc -O3 and looked at the result with objump -S, but I'm not familiar enough with the assembly to figure out how it's handling things here.
By contrast, my instinct was to do this with in a single pass with two incrementing pointers. I'm not sure if the code is clearer (http://gist.github.com/416170), but at least the assembly turns out simple enough to read!
> In practice, what assumptions one should make here about optimizing compilers?
> It seems like it would take a lot of compiler intelligence to figure out that even though we are modifying the string in place, we are never terminating the string, and thus strlen() remains constant. Thus my instinct would be to never use it in a loop like this.
You're completely right, I just tried it on several gcc settings and it is O(n^2) every time. I thought the 'n' is invariant across the execution of the loop but it might be changed because we're modifying the string and the compiler realizes that and re-runs strlen every time.
> By contrast, my instinct was to do this with in a single pass with two incrementing pointers. I'm not sure if the code is clearer (http://gist.github.com/416170), but at least the assembly turns out simple enough to read!
That's pretty much what I sent in as a solution, and I figure there will be a lot of them isomorphic with yours.
Count on it, I am always very hesitant to post code on HN, realizing that if I'm not fresh or have tested the code that I'll be mercilessly hacked to little bits. It's fun though, I think my personal record is 5 bugs in 2 lines...
This is exactly why I contribute tens of thousands of lines to F/OSS projects.
Your mettle has never truly been tested until you have hundreds of developers, from around the world, pointing out your idiocy in an excruciatingly specific manner. After your ego has been throughly decimated, then you can improve.
This is closer to how I would do it in practice.
void condense_by_removing (char *z_terminated, char char_to_remove)
{
char *next = z_terminated;
while (1) {
while (*z_terminated == char_to_remove) ++z_terminated;
if (!(*next++ = *z_terminated++) ) break;
}
}
This won't work, if the first char of z_terminated == char_to_remove.
I have a similar solution farther down the page. Not many people seem to like doing it with pointer manipulation.
EDIT: We should both probably add
if(!char_to_remove) return;
at the top since we'll get a seg fault if someone tries to be sneaky and passes the zero-terminator. And then might as well check if z_terminated is NULL while we're at it. Or is there a better way to handle it?
Apropos your first example it really was O(n^2) all the time, I just ran a bunch of tests on various optimization settings and the strlen gets called every iteration of the loop.
For extra performance, use strchr to find the first char_to_remove.
/* Find the first character to remove */
/* This doubles the performance when there are no
remove characters, I assume because it uses a
builtin and because I don't do unneeded writes. */
src = strchr(z_terminated, char_to_remove);
if (src == NULL)
return;
/* src is the position in the old string, with
the possible characters to remove.
dest is the end position of the new string,
where non-removed characters are added */
dest = src++;
for (;;) {
/* Skip additional remove characters */
while (*src == char_to_remove)
src++;
/* Copy (including copying the terminal NUL) */
if (! (*dest++ = *src++)) {
/* copied a NUL - end of string */
break;
}
}
Note: I read the entire thread several hours before making this attempt so my results don't represent a good lab environment. Please let me know if I made any major errors.
Done! Thanks for taking the time to read this. My hope in posting the pastebin link was that other commenters might chime in if they saw glaring errors.
I'm not afraid, or proud, to admit I couldn't get past the n^2 solution until I talked it out with a friend. Guess it's time to go back and read my K&R...
uh, so not a single person has mentioned memory allocation issues yet?
the pointer to the 0 terminated string could have been to a statically allocated array or memory allocated from the heap using malloc. If you just shorten the string without reallocating the memory, the extra bytes that got condensed are still marked as allocated and are wasted.
That's pretty obvious, but since that was the requirement that's what you do.
For all you know the length of the original string is kept around somewhere for future reference when the buffer is re-used, it's not uncommon for buffers used like this to be limited to some fixed size larger than the maximum expected string length + 1 for the terminating nul char, and to have a string copied in to a buffer for processing.
You're in deep trouble when you try to insert stuff into strings like that though...
The bytes that got condensed and are wasted would be wasted even worse in the way you describe, if you re-allocate the memory then you'll have to free the original one, leaving the larger portion sitting unused, instead of just a few bytes...
Sure, that could be re-used by some other call to the allocator by a different part of the program, but the most important issue here is that that was not what was specified.
Whoever spec'd it needs it like that, so that's what gets built.
I worked it out with converting z_terminated to a list inside condense_by_removing, and then returning "".join(z_terminated), but yea it does change the problem
I suppose you could first convert the string to a list of characters and use integers as a replacement for pointers. You could then modify the list in place.
I submitted the python solution. I changed the problem to accept a list. I was guessing this preserved the integrity of the problem, although I don't really know C so I couldn't be certain. I explained as much when I emailed my solution.
If you want to try this you're welcome, but I ask:
* Please no HTML emails - it's easier to extract from plain text
* Please remember the anti-spam measure - more than one person has been caught in the spam trap.
* Please put BEGIN before and END after you code so I can auto-extract it.
I've had 20 submissions in the last few hours, so these will help a lot.
Thanks.
========================================================
EDIT: And another 10 in the last 12 minutes. Thank you to those doing as I ask - it's making things easier. I'm going off to do some work now - I'm a little concerned about what I'll come back to ... although it is interesting. As I said earlier, I'll collect submissions until there's a 48 hour gap, then I'll start processing them.