Hacker News new | past | comments | ask | show | jobs | submit login
Interviewing programmers: coding test results. (solipsys.co.uk)
81 points by RiderOfGiraffes on May 27, 2010 | hide | past | favorite | 129 comments



Please can I make a request:

If you want to try this you're welcome, but I ask:

* Please no HTML emails - it's easier to extract from plain text

* Please remember the anti-spam measure - more than one person has been caught in the spam trap.

* Please put BEGIN before and END after you code so I can auto-extract it.

I've had 20 submissions in the last few hours, so these will help a lot.

Thanks.

========================================================

EDIT: And another 10 in the last 12 minutes. Thank you to those doing as I ask - it's making things easier. I'm going off to do some work now - I'm a little concerned about what I'll come back to ... although it is interesting. As I said earlier, I'll collect submissions until there's a 48 hour gap, then I'll start processing them.


> * Please remember the anti-spam measure - more than one person has been caught in the spam trap.

What antispam measure? Neither this page nor http://www.solipsys.co.uk/Writings/TestsForProgrammers_Part_... says anything about spam (except this note).


Nor does http://www.solipsys.co.uk/new/ColinWright.html?Personal - though I'm not sure that Colin Wright even wrote that, given I've yet to find a link to the article from the root, and the article doesn't have the author's name ...


Please check the profile, it mentions the AntiSpam measure (adding HackerNews to the subject): http://news.ycombinator.com/user?id=RiderOfGiraffes


Read his HN profile.


Thanks for this write-up RiderOfGiraffes.

I'm one of the HNers that sent in code for the test. The feedback provided by RiderOfGiraffes was useful and the follow up questions were food for thought.

At the time I'd just failed a coding test for a systems programmer role and felt pretty low, which is why I decided to take this on.

In the interview I failed, I fared well in most of the questions, apart from one set involving heavy binary maths — I pretty much forgot basic binary computation from when I did my undergrad course. Following this, it prompted me to buy "Hacker's Delight" by Henry S. Warren. It's a great read if you're into high-performance algorithms http://amzn.com/0201914654

I can't wait to read the next part!


You're testing whether active participants on HN that volunteer to do a problem are worth talking to?

I've done tests like this in the past couple of years for everyone responding to a job ad for all sorts of positions. The job ads usually include the problem(s), so I'm getting the same self-selection you are. Combining our results for a freakonomics style conclusion:

HN readers are an order of magnitude more skilled than job ad readers.


I'm not really testing them - I'm providing them with information about the tests I do, why I do them, what I do with them, and the discussions that arise from them. If you don't try it for yourself then you haven't given yourself the chance to think of the issues that I then raise in subsequent discussions.

These test aren't for everyone, they are examples of what I specifically do.

We already know that self-selected from HN already means to 0.1%.


OK - I've had a flood of emails trying the challenge, so I'm going to delay the next post about this to let more people have a go. It's not automated so I'm dealing with it by hand, but feel free to have a go. I'll do the next posting when I've received no submissions for 48 hours.

And when I get time.


What a great thing to do, RoG, really, that's a very useful thing.

Maybe a bit less trivial challenge next time ?

(though I can see how in an interview setting this would be all you might be able to do).


It's not trivial till you've done it.

If it really is trivial then it shouldn't take you long.

If it isn't trivial then you'll learn something.

Care to try it before I publish Part II ??

<grin>


It seems very trivial :/

I don't really do much C, but here's x86:

  // esi -> asciiz string
  // cl = character to remove
  // NB cl=0 won't work well ;)

  mov edi, esi
  cld
  loop1:
      lodsb      // Load the byte from [esi] into al
      cmp al, cl
       je loop1  // Skip this byte
      stosb
      or al, al
       jnz loop1

Did I miss the "gotcha"s?

edit: condensed code a bit


Not much experienced with x86 asm, but you don't seem to be incrementing esi anywhere...

What am I missing?


stosb = store sequential byte increments the destination pointer, just like lodsb increments the source pointer.


Ah, thanks.

Also, eww. It decrements or increments dependent upon a flag. Not nice.


x86 assembly is full of old cruft, that 'cld' there is a thing that if you've ever forgotten it probably cost you a lot of time.

99% of the time the flag is clear... unless it isn't.

So for a seasoned assembly programmer that cld is idiomatic, axod probably typed the instruction reflexively because he knows he can't rely on the state of the direction flag, even if it has nothing to do with the problem per-se.


Please don't post the solution, people would like to think for themselves.


The worrying thing is that you're suggesting there may be people on hackernews who can't bang this out in a few seconds :/

Does it even really require much thinking? I'm not trying to be arrogant here, but it seems like a good definition of "trivial".


> The worrying thing is that you're suggesting there may be people on hackernews who can't bang this out in a few seconds :/

My sentiment exactly.

And by the way, you do have a bug in that code.


Really? give me a clue...?


You are making some scary assumptions about the validity of that pointer...


Huh? This is C (well, asm). If you pass in garbage, you get undefined behaviour (a crash by NULL pointer dereference); this is how all str* functions work. Even

    assert(str != NULL)
is being overly generous, I think - if only because a debugger would lead you to the source of the crash immediately.


No Huh, really.

You have not received any specification about the context in which the code operates, therefore the correct course of action is to assume the worse and produce bullet proof code.

If you were operating under constraints in terms of speed that would warrant a more cavalier attitude towards input checking then that would be another thing but since that wasn't specified and there is no context, 'safety first' is the way to approach the issue.

Your assertion would have done the job just fine.


The specification states that "your routine should modify the given zero-terminated string in place". This implies that a zero-terminated string is passed in, and that it is mutable. If this is not the case, it is impossible to perform as specified; any behavior is wrong. An assertion error would be the ideal way to handle things; an immediate segfault is almost as good (and probably the practical thing to do if the string is immutable or not null-terminated). And of course, there are some error situations you simply can't detect - eg, not all cases of non-terminated strings are detectable, because you run into and mutate some other heap garbage before hitting an unmapped or immutable page.


I think it is actually really cool that RoG did not specify that part of the problem because in real life that is exactly how stuff gets specified, and plenty of times that's where the trouble comes from. Asking to revise a spec so that it is explicit is usually a very good course of action.

Not all hardware will segfault when writing to NULL, the immutable should segfault and strings that are not null terminated are indeed hard to impossible to check for (pointer wrap..., after a lot of swapping, or a stack overwrite).


I didn't include all the boring error checking code, I included just the 'meat' ;) If you call it with valid inputs, it produces valid outputs.

IMHO That's not a 'bug'. That's a lack of boring gruntwork error checking.


Also note that libc string functions generally seg-fault when passed invalid addresses (including NULL). Implicitly expecting more sophisticated error handling in this scenario is silly.


So, you either go back to the source and ask for more info, or you take a stab at what is the right thing to do.

In this case that wasn't possible, so I assumed the worst, and I think that simply noting that there is a potential problem there would already score you points with the interviewer because they'd realize you spotted that the specification was incomplete, it didn't tell you what to do in case of invalid input.


I think this may be a cultural thing. C programmers are very much used to the standard not defining the behaviour of some things, e.g. what happens when you call strcmp(str, NULL). Segfaulting is expected here, and easy to track down.

(I would use some assert()s if a function could cause silent memory corruption; but this particular function either works (valid string), segfaults (NULL) or cannot detect a problem (stray pointer).)


Yes, that's more or less the problem with lots of code.

I'm not saying your code is invalid or that it won't work, but 'defensive' programming says:

- assume your input is total garbage

- handle all correct situations correctly

- handle all garbage gracefully or throw an error depending on what the circumstances dictate

I'm also missing the stackframe handling, but as you said, this is the 'meat', that meat needs a bit of scaffolding to work.


Oh definitely. I agree. I assumed though that the "problem" itself was the main aim here, not testing if someone can write scaffolding error checking code.

How far do you go? Check that the memory isn't mapped as read only? ;) Check that there's actually some memory mapped into that address space?

Obviously you need both skills, but they're pretty related I'd say. If you can solve low level problems you can probably think through any potential issues, invalid inputs, edge cases etc and churn out code to deal with them.

I guess my issue is I don't see what being able to solve this says about someone, apart from "isn't completely terrible at programming".


Defensive programming is not always the correct assumption though.


any assumption is wrong, but when given the choice between two assumption (safe vs unsafe) safe is best.

I did note in my 'answer' that the problem wasn't specified fully.


The result of the program should always be a valid string. ;) Tried to be as ambiguous as I could, not really sure of another way to say it.


"valid string" ? :/ can you give me an example of an invalid string?

It correctly tacks a 0 byte on the end if that's what you mean, of course cleaning up the unused space would be outside of the task at hand.

Maybe I'll look like a complete idiot soon if it transpires I've completely missed the 'gotcha' ;)


I'd be really interested to see your original version and your latest version side-by-side, because I'm wondering if your "clean-up" actually changed its behavior.


Original code:

  // esi -> asciiz string
  // cl = character to remove

  mov edi, esi
  cld
  loop1:
      lodsb      // Load the byte from [esi] into al
      or al, al
       jz alldone
      cmp al, cl
       je loop1  // Skip this byte, since it's one we want to remove
      stosb      // Store the byte at [edi] from al
      jmp loop1  // Loop around for the next byte
  alldone:
      xor al, al
      stosb      // 0 terminate it
The only real behavioral change was that this version would cope with an input of cl=0. My cleaned up version would not cope with cl=0, but would need a sanity check for that.


Hmm. My memory must be faulty, but I don't remember the first one having the final xor and stosb. I do remember looking at your solution and thinking that you'd forgotten it, that's why I expressed an interest in seeing the original.

Still, likely my memory is at fault.


heheh my initial version also explicitly added the 0, and then I also found a way to do it in the loop. Seems a common evolution.

But your final solution is clearer than mine. While it uses an extra variable (register), it avoids looking up the same thing twice. So... more efficient and clearer.

BTW: TIL char * a = "hello"; doesn't work in C (I think it doesn't allocate memory). I'm actually happy enough that I could solve this at all, not having used C for 20 years ...old


Actually, I might just be tired. :-p I don't see anything wrong with axod's code. Just emailed you a fairly verbose C version. I hate writing production code...


It's not that it requires much thinking. It's just that it's not every day that most of us have to code something like this in C.


From another I made on this page:

> If you don't find any egregious bugs, I hereby claim it really is trivial.

Still, it may be a nice exercise.


Are you saying that some people couldn't do the problem, but could actually read that Assembly?


How many HNers do you think know x86 assembly so well that simply seeing the structure of the code gives away the answer? :)


Even if you didn't know x86 at all you should still be able to figure out what it does by looking at it given the fact that the specification of what it does is only one sentence long.


No doubt, but if my desire is not to see other answers, I'm not going to read the assembly closely enough to understand. Your objection makes more sense with indented programming languages whose indentation structure may give away more information.


It's 8 instructions. It's a single loop. It's moving stuff from one buffer to another and ignoring specific bytes.

What are you talking about? What's to understand?


I can ignore x86 assembly just like I can ignore French: the graphical structure (i.e., indentation) of an x86 program gives away nothing, just like a paragraph of French gives me no immediate sense of its meaning. In either case, I could probably puzzle out the meaning with intense reading, but the mere presence of the text does not give it away.

My C solution, on the other hand, has a distinct graphical structure which would indicate, even to people for whom C isn't their "native language," a lot of the meaning of the text with merely a glance.


edit: condensed code a bit

Well well, the other version seemed to work ;-)


Wasn't quite as elegant though, and 10 instructions vs 8.


Elegance is in the eye of the beholder. I found your first version easier to grasp.


I think it's trickier than FizzBuzz, but not by a whole lot.


I think it is actually simpler.

And the fizz-buzz problem has two structurally different solutions.


Yeah, I tried it too. If you don't find any egregious bugs, I hereby claim it really is trivial.

EDIT: although the point is to weed out the idiots, not to provide a challenge.


Ok, mail in your inbox, do I get my cookie now ?


Sure, will mail you.


Less trivial? It wasn't that difficult, although I wasn't able to complete it in 2 minutes like the colleagues of RiderOfGiraffes did!


It was dead easy and I'm seriously surprised that anybody on HN that's here not for the 'startup' angle would have a problem implementing this quickly and correctly.

As code challenges come I don't think I've ever seen a more trivial example.


This was my sentiment. I've been asked (and succeeded, presumably so do other people) to write pseudocode to solve more complicated problems than this on a whiteboard, in an interview.

Maybe this seems harder to people unaccustomed to C. I don't know. It seems like it would make sense to aim just a little higher, though. I recognize that it has to be a small enough that people won't be put off by it, but most people will be willing to invest the fifteen minutes necessary to solve a mildly complicated problem.


are you this arrogant in real life?


It's not about being arrogant. Unless you have never done any real programming apart from simply calling library functions, the original problem is about as interesting as asking "Write a program to add 47 and 88 together".

I think the issue is that many programmers have never done real programming. They just chain library calls together. It's like comparing a furniture maker who starts with a tree, to a furniture maker who buys parts from IKEA and assembles them.

To be a really good programmer, you need to be really really at ease with bits bytes, moving stuff around, and getting your hands dirty with real programming.


I studied applied math instead of computer science, and the exercise took me a good 15 minutes plus some time to adjust to C as I've only worked with C++. Then another 5 to simplify some redundancies. I believe the solution I came up with was elegant, correct, and idiomatic.

Does this mean I'm not a good programmer? If I worked with C and pointers a lot then it might, but I find the attitude that not being able to slip into that mindset immediately equivalent to being an inept programmer arrogant.


I'd say that a good answer is a good answer regardless of language. I wouldn't personally care what language someone wrote the answer in as long as they got the key concepts which in this case are pretty damn simple -

Have a source pointer, have a destination pointer. Copy stuff from src that you want, over to dest. Leave out any bytes of value N. Don't forget to copy/make a new 0 at the end.

And that's it. That's being a programmer. Knowing some particular languages syntax isn't. That's the easy/irrelevant bit IMHO.

So it really depends on if that 15 minutes you spent was to arrive at the 'solution' I wrote out in words above, or if it was checking up syntax to write it in C as to wether you're a good programmer or not.

Just my 2c.


I just submitted a solution. I believe that one can tell "has this person written a non-trivial amount of C code?" from 15 seconds glancing at their answer in your peripheral vision.

I agree that it can be solved essentially the same way in many languages, but the idiomatic Erlang solution will look a lot different than the idiomatic C solution.

Edit: After reading some of the code posted here and linked from here, perhaps I don't have the same sense as others of what the idiomatic C solution is either. This should be interesting to read the followup posts by RoG.


That's very well possible, but it is also possible that it really was a simple problem.

As for my arrogance: http://jacquesmattheij.com/Mistakes+I%27ve+made,+and+what+yo...


I agree that it's a simple problem. I just don't type quite fast enough to do this in two minutes (especially if I count time spent making a main function and testing my solution).


One thing I had to do years ago as part of an interview was to write a C function to find the first bit set in a block of memory, flip it and return the location of the bit. Doing that at a whiteboard while explaining to the bearded chap interviewing me what I was doing.

From what I recall, I actually sort of enjoyed solving it as my C was bit rusty as I'd mostly been programming in Lisp.


A neat trick for clearing the lowest bit is subtracting one and and'ing that with the original:

    x_cleared = x & (x - 1);
You can get the lowest bit from the difference.


Isn't this easier and faster, both type- and performance-wise?

  x_cleared = x & ~1;


The intension is to clear the rightmost 1-bit.


I'm pretty sure that doesn't do what I was asked to do.


Wasn't the big discussion all about how easy the task was, and how unable were the perfect-CV people to complete it?


In RoG's write-up he explains that this is a task candidates would have to pass to gain an interview.


Exactly. The comment I replied to was asking for something less trivial, while this is probably all you have to do to identify the candidates who can't code.

The FizzBuzz problem is at least one order of magnitude easier than this, and still is able to rule out a great part of the candidates (by time limits, according to CodingHorror, not exactly by not being able to code it). The good thing with this test is that you can actually get a glimpse of the ability of the candidate to come up with an efficient solution and identify its complexity.


In that case I completely agree that trivial problems would weed out the candidates that can't program, however who says you can't give something harder than FizzBuzz?

Also, FizzBuzz is now so common on the web that a simple search would produce numerous solutions for those that can't program, allowing them to get through to an interview. Assuming this is your yard-stick.


I'm the inventor of the fizzbuzz problem and I kept using the question for a long time after it became popular and I didn't see a single candidate (out of maybe ~100) who'd just memorized the answer. I'm guessing the kind of people who can't program are generally the kind that don't do indepth background research/prep prior to interviews.


That's just really sad, that there are people that would be interviewing for a coding position and it turns out they can't even do that in under a minute or so.

How do they expect to do their jobs ?


I got fizzbuzzed in an interview before I knew what fizzbuzz was. It took about a minute for me to think "what? you want me to do what? Why? ... Ok?" So I would say 1-10 minutes is acceptable.


I'd have a problem with being interviewed like that because the question is so simple I'd think they're pulling a joke or something.

First thing I'd ask is where is the camera :)

I can see how for a junior coding position this might be an appropriate question, say people fresh out of school, and then 1 to 10 minutes might be acceptable.

But 4 modulo statements (or 3, if you think about the problem a bit longer) and a loop ?

10 minutes ?

That's pretty slow. I understand there is a lot more to programming than coding up a simple solution like this but as problems come it is really a very simple one and if someone would take 10 minutes to put this together I'd be a bit worried about throughput, and probably about experience as well.


When I first sat down to write FizzBuzz for kicks I spent a few minutes trying to think of an elegant or clever solution before writing the obvious answer. I think I would be similarly on guard in an interview situation. I'd say 1 to 10 minutes is easily acceptable accounting for over thinking.

Now if they are actually struggling for 10 minutes, that's another matter.


With regards FizzBuzz, my follow-up goes a bit like this:

Now suppose the modulo test is really expensive. Suppose we're doing something complicated with large records on disk and we want to do this is that's true, something else if the other is true, etc, etc, just as in FizzBuzz.

How would you restructure your code so it's still obvious to a maintainer what it's doing, but so that it avoids doing more modulo operations than necessary.

You see, all these trivial exercises can be used as starting points for deeper conversations about aspects of coding.


Weird example, but fine, we can reduce the number of modulo operations if that is the one with the 'cost', but that's not very realistic given that we're printing stuff to stdout, but if we take your premise as true and we optimize for the cost of the modulo operator you would get silly solutions like this:

  main()

  {
	int i;
	int ncounters = 2;
	int counters[2] ;
	int presets[2] = { 5, 3};
	int n; // number of counters that tripped
	int j; 
	char * strings[2] = { "fizz", "buzz" };

        // first time, copy presets to counters

	for (j=0;j<ncounters;j++) {
		counters[j] = presets[j];
	}

	for (i=1;i<=100;i++) {
		n = 0;	// reset number of counters that have zeroed

		for (j=0;j<ncounters;j++) {
			counters[j] = counters[j] - 1;

                        // output relevant string when counter trips

			if (counters[j] == 0) {
                                // separate strings by dashes if more than one counter trips
				if (n != 0) {
					printf("-");
				}
				n++;
				counters[j] = presets[j];
				printf("%s",strings[j]);
			}
		}
                // no counter tripped, just output the number
		if (n == 0) {
			printf("%d",i);
		}
		printf("\n");
	}
  }
Forgive the lack of comments and the hardcoded number of strings.

No modulo operations.

By asking the question that way you'd get solutions that you're not really looking for.


The short version

  int c3 = 1;
  int c5 = 1;
  for ( int i = 1; i <= 100; i++, c3++, c5++ ){

	if ( c3 == 3 ){ printf( "FIZZ" ); c3 = 0; }

	if ( c5 == 5 ){ printf( "BUZZ" ); c5 = 0; }

	if ( c3 && c5 ){ printf( "%d", i ); }
  }


You're missing the needed printf("\n"); at the end of the loop.

Elegant solution, but fails to meet the problem specification.


Hehe, compact.


Or you could just ask "Do you know what a lookup table is", and "Can you write 15 entries into an array and cycle through them" ;)

I agree it's a good starting point, but surely it's still just getting rid of the ridiculously bad programmers rather than anything else.


I'm guessing a lot of these people end up working by doing a lot of copy & paste and a trial & error approach to coding.


To be honest, I didn't believe in the FizzBuzz situation when I read it. I didn't believe that it was possible to exist such a group of programmers who couldn't do a simple code like that.

Assuming such a group doesn't exist, I'm all for harder problems, harder than the one presented.

But assuming the group exists, and assuming, as I did, that the problem presented tries to weed out its members, I just though something that "hard" wasn't needed.

But of course, the point of an interview is to identify the good, not the bad, so harder problems would do just fine as well in ruling out the bad.

But the whole point of the article was a user's doubt in being able to pass a FizzBuzz kind of problem. For that, something harder than the problem presented may probably not be embarrassing at all to fail at.


>But of course, the point of an interview is to identify the good, not the bad, so harder problems would do just fine as well in ruling out the bad.

Yes, and no. Hard problems, being hard, mean that you don't expect all the good programmers to pass them, but to see how they think around them. And some people are able to talk the talk without walking the walk.

It's better to have an easy problem and a hard one. The easy one weeds awfully bad people, the hard one helps you find the good among the rest.


I always thought that you should use different words to prevent easy googling? The test remains the same, but with, say, BaaLamb or CowTip.


If you're interviewing and are interested in filtering out the idiots with little work on your part, http://www.ziprecruiter.com/ may be useful. (Disclaimer. I know the people who created it. But it still is a useful piece of work reduction.)


I'm really amazed at the variation I'm seeing in the solutions that just my friends are providing. One said his first try would be quadratic, and his second would require allocation; another used a strange mix of indexing and pointer arithmetic. Even those who commit to either indexing or pointer arithmetic have significant variation in their solutions.

I actually think there's a distinct best solution to this problem, so it's remarkable to me that there's so much variation between people that I consider to be good programmers.


That seems pretty poor. I'm a pretty crappy programmer and haven't written any C in 18 years, but despite tripping over my feet for fifteen minutes, I have a working 4 line function. I despair for what passes as a professional programmer in shops that don't bother to check their people out properly..

Added: In the interests of "put up or shut up": http://gist.github.com/415975 - no idea if it's OK by today's standards but hey, it works


The program also seems to work if, in the loop header, you replace “i <= strlen(z_terminated)” with “z_terminated[j] != 0”.

I don’t understand why I have to put j in the index there, rather than j-1: after the string-ending null is copied, j is incremented, right?


I don't think 'j' nor 'j-1' would make sense here. 'i' would make sense, if you also terminate the condensed string.

Maybe it would be clearer if the names of the index variables better described their purpose?

http://gist.github.com/416298 (no peaking if you're submitting!)

I think these names make it clearer that neither "z_terminated[condensed]" nor "z_terminated[condensed - 1]" are reliable ways to find the end of the original string.


Well, if you don't understand your own code, it's probably not a good idea to write it that way (for-loops are a lot easier to read & verify if the stop condition depends on the loop variable, for example).

But your solution isn't just hard to read, it's also horribly broken. If you cannot figure out why, try printing the value of 'i' inside the loop.


#include <stdio.h> #include <string.h>

void condense_by_removing (char *z_terminated, char char_to_remove) { int i, index;

		for (i = 0, index = 0; i <= strlen(z_terminated); ++i) { 
			if(z_terminated[i] != char_to_remove) {
				z_terminated[index++] =	z_terminated[i];
			}	
		}
	}

	int main () 
	{
		char dt[] = "Now is the winter of our discount tents";
		condense_by_removing (dt, 'u');
		printf ("%s\n", dt);
	}


Points for the <= copying of the termination character :)

You do make two passes over the string (one for strlen and one for the copy loop), also you assume the compiler will optimize the caching of the return value of the strlen function, if it doesn't your performance will be horrible.

No points for posting the solution...


This is an example of code that is unnecessarily O(n^2).


No. It isn't. There is only one iteration. It's O(n).


It's actually O(2n), so that works out to O(n) but it does have a 50% penalty, assuming an optimizing compiler.

Bwhite is not all wrong though, on a non-optimizing compiler you are hitting the string every time in the loop because of the strlen, and then it is O(n^2).


on a non-optimizing compiler you are hitting the string every time in the loop because of the strlen

In practice, what assumptions one should make here about optimizing compilers?

It seems like it would take a lot of compiler intelligence to figure out that even though we are modifying the string in place, we are never terminating the string, and thus strlen() remains constant. Thus my instinct would be to never use it in a loop like this.

But maybe current compilers are actually this smart? I just compiled with gcc -O3 and looked at the result with objump -S, but I'm not familiar enough with the assembly to figure out how it's handling things here.

By contrast, my instinct was to do this with in a single pass with two incrementing pointers. I'm not sure if the code is clearer (http://gist.github.com/416170), but at least the assembly turns out simple enough to read!


> In practice, what assumptions one should make here about optimizing compilers?

> It seems like it would take a lot of compiler intelligence to figure out that even though we are modifying the string in place, we are never terminating the string, and thus strlen() remains constant. Thus my instinct would be to never use it in a loop like this.

You're completely right, I just tried it on several gcc settings and it is O(n^2) every time. I thought the 'n' is invariant across the execution of the loop but it might be changed because we're modifying the string and the compiler realizes that and re-runs strlen every time.

> By contrast, my instinct was to do this with in a single pass with two incrementing pointers. I'm not sure if the code is clearer (http://gist.github.com/416170), but at least the assembly turns out simple enough to read!

That's pretty much what I sent in as a solution, and I figure there will be a lot of them isomorphic with yours.


My original snippet contained: "int len = strlen(z_terminated);"

I cut it before posting as I was going for compact code, not efficiency.

Nits have been throughly picked. :)


> Nits have been throughly picked. :)

Count on it, I am always very hesitant to post code on HN, realizing that if I'm not fresh or have tested the code that I'll be mercilessly hacked to little bits. It's fun though, I think my personal record is 5 bugs in 2 lines...


This is exactly why I contribute tens of thousands of lines to F/OSS projects.

Your mettle has never truly been tested until you have hundreds of developers, from around the world, pointing out your idiocy in an excruciatingly specific manner. After your ego has been throughly decimated, then you can improve.

This is closer to how I would do it in practice.

        void condense_by_removing (char *z_terminated, char char_to_remove) 
	{
		char *next = z_terminated;

		while (1) {
			while (*z_terminated == char_to_remove) ++z_terminated;
			if (!(*next++ = *z_terminated++) ) break;
		}
	}
EDIT: Fixed bug


This won't work, if the first char of z_terminated == char_to_remove.

I have a similar solution farther down the page. Not many people seem to like doing it with pointer manipulation.

EDIT: We should both probably add

  if(!char_to_remove) return; 
at the top since we'll get a seg fault if someone tries to be sneaky and passes the zero-terminator. And then might as well check if z_terminated is NULL while we're at it. Or is there a better way to handle it?


> After your ego has been throughly decimated, then you can improve.

Aye... and that's when you learn. I've worked for a bit for a guy that had learned C in the 70's, best school I've had... also very painful at times.


Apropos your first example it really was O(n^2) all the time, I just ran a bunch of tests on various optimization settings and the strlen gets called every iteration of the loop.

I think you owe 'bwithe' a beer.


You're right, I should have left in 'int len = strlen(z_terminated);'


This is closer to how I would do it in practice.

        void condense_by_removing (char *z_terminated, char char_to_remove) 
	{
		char *next = z_terminated;

		while (1) {
			while (*z_terminated == char_to_remove) ++z_terminated;
			if (!(*next++ = *z_terminated++) ) break;
		}
	}


Seems simple enough (formatted a little funny due to comment quirks):

void condense_by_removing(char *z_terminated, char char_to_remove) {

    int i;
    int write_index = 0;

    for (i = 0; i < strlen(z_terminated); i++) {
        if (z_terminated[i] == char_to_remove) {
            continue;
        }

        z_terminated[write_index] = z_terminated[i];
        write_index++;
    }

    z_terminated[write_index] = '\0';
}


The strlen would make this slow. Rather check if z_terminated[i] != 0.


How does that solution compare to something like this?

  char *copy_to = z_terminated; 
  while(1) { 
      if(*z_terminated == char_to_remove) 
          z_terminated++; 
      else if((*copy_to++ = *z_terminated++) == '\0') 
          break; 
  }
Yeah, I know, it will break if char_to_remove is the zero-terminator or z_terminated is NULL.


You don't need all that logic in there.

  for (i = 0; i <= strlen(z_terminated); i++)
    if (z_terminated[i] != char_to_remove)
      z_terminated[write_index++] = z_terminated[i];
That even removes the need to add the \0 later on.


For extra performance, use strchr to find the first char_to_remove.

  /* Find the first character to remove */
  /* This doubles the performance when there are no
     remove characters, I assume because it uses a
     builtin and because I don't do unneeded writes. */
  src = strchr(z_terminated, char_to_remove);
  if (src == NULL)
    return;

  /* src is the position in the old string, with
       the possible characters to remove.
     dest is the end position of the new string,
       where non-removed characters are added */
  dest = src++;

  for (;;) {
    /* Skip additional remove characters */
    while (*src == char_to_remove)
      src++;

    /* Copy (including copying the terminal NUL) */
    if (! (*dest++ = *src++)) {
      /* copied a NUL - end of string */
      break;
    }
  }


I haven't touched C in the better part of a decade but the algorithm seemed straightforward to me. I'll fake it with some pseudo-C:

http://pastebin.com/UpLNgxKC

Note: I read the entire thread several hours before making this attempt so my results don't represent a good lab environment. Please let me know if I made any major errors.


Please email it using the instructions - I've got too many submissions to go chasing them by hand.

Thanks.


Done! Thanks for taking the time to read this. My hope in posting the pastebin link was that other commenters might chime in if they saw glaring errors.


I'm not afraid, or proud, to admit I couldn't get past the n^2 solution until I talked it out with a friend. Guess it's time to go back and read my K&R...


uh, so not a single person has mentioned memory allocation issues yet?

the pointer to the 0 terminated string could have been to a statically allocated array or memory allocated from the heap using malloc. If you just shorten the string without reallocating the memory, the extra bytes that got condensed are still marked as allocated and are wasted.


That's pretty obvious, but since that was the requirement that's what you do.

For all you know the length of the original string is kept around somewhere for future reference when the buffer is re-used, it's not uncommon for buffers used like this to be limited to some fixed size larger than the maximum expected string length + 1 for the terminating nul char, and to have a string copied in to a buffer for processing.

You're in deep trouble when you try to insert stuff into strings like that though...

The bytes that got condensed and are wasted would be wasted even worse in the way you describe, if you re-allocate the memory then you'll have to free the original one, leaving the larger portion sitting unused, instead of just a few bytes...

Sure, that could be re-used by some other call to the allocator by a different part of the program, but the most important issue here is that that was not what was specified.

Whoever spec'd it needs it like that, so that's what gets built.


Interesting that so many solutions posted thus far use strlen().

My C is beyond rusty, but this is what I came up with; do/while is so underappreciated these days. (HN needs a "spoiler" formatting tag, methinks. :)

  void condense_by_removing(char *z_terminated, char char_to_remove) {
      char *pos = z_terminated;
      do {
          if (*z_terminated != char_to_remove)
              *pos++ = *z_terminated;
      } while (*z_terminated++ != 0);
  }


With standard C style my solution is 12 LOC.


I am curious to what the Python solution was, since the C requirement was to do it "in place".


I had the same thought. You would have to be passed a list I guess but then that kind of changes the problem.


I worked it out with converting z_terminated to a list inside condense_by_removing, and then returning "".join(z_terminated), but yea it does change the problem


I suppose you could first convert the string to a list of characters and use integers as a replacement for pointers. You could then modify the list in place.


I submitted the python solution. I changed the problem to accept a list. I was guessing this preserved the integrity of the problem, although I don't really know C so I couldn't be certain. I explained as much when I emailed my solution.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: