#include <stdio.h> #include <string.h> void condense_by_removing...

jacquesm · on May 27, 2010

Points for the <= copying of the termination character :)

You do make two passes over the string (one for strlen and one for the copy loop), also you assume the compiler will optimize the caching of the return value of the strlen function, if it doesn't your performance will be horrible.

No points for posting the solution...

bwhite · on May 27, 2010

This is an example of code that is unnecessarily O(n^2).

mcu · on May 27, 2010

No. It isn't. There is only one iteration. It's O(n).

jacquesm · on May 27, 2010

It's actually O(2n), so that works out to O(n) but it does have a 50% penalty, assuming an optimizing compiler.

Bwhite is not all wrong though, on a non-optimizing compiler you are hitting the string every time in the loop because of the strlen, and then it is O(n^2).

nkurz · on May 27, 2010

on a non-optimizing compiler you are hitting the string every time in the loop because of the strlen

In practice, what assumptions one should make here about optimizing compilers?

It seems like it would take a lot of compiler intelligence to figure out that even though we are modifying the string in place, we are never terminating the string, and thus strlen() remains constant. Thus my instinct would be to never use it in a loop like this.

But maybe current compilers are actually this smart? I just compiled with gcc -O3 and looked at the result with objump -S, but I'm not familiar enough with the assembly to figure out how it's handling things here.

By contrast, my instinct was to do this with in a single pass with two incrementing pointers. I'm not sure if the code is clearer (http://gist.github.com/416170), but at least the assembly turns out simple enough to read!

jacquesm · on May 27, 2010

> In practice, what assumptions one should make here about optimizing compilers?

> It seems like it would take a lot of compiler intelligence to figure out that even though we are modifying the string in place, we are never terminating the string, and thus strlen() remains constant. Thus my instinct would be to never use it in a loop like this.

You're completely right, I just tried it on several gcc settings and it is O(n^2) every time. I thought the 'n' is invariant across the execution of the loop but it might be changed because we're modifying the string and the compiler realizes that and re-runs strlen every time.

> By contrast, my instinct was to do this with in a single pass with two incrementing pointers. I'm not sure if the code is clearer (http://gist.github.com/416170), but at least the assembly turns out simple enough to read!

That's pretty much what I sent in as a solution, and I figure there will be a lot of them isomorphic with yours.

mcu · on May 27, 2010

My original snippet contained: "int len = strlen(z_terminated);"

I cut it before posting as I was going for compact code, not efficiency.

Nits have been throughly picked. :)

jacquesm · on May 27, 2010

> Nits have been throughly picked. :)

Count on it, I am always very hesitant to post code on HN, realizing that if I'm not fresh or have tested the code that I'll be mercilessly hacked to little bits. It's fun though, I think my personal record is 5 bugs in 2 lines...

mcu · on May 27, 2010

This is exactly why I contribute tens of thousands of lines to F/OSS projects.

Your mettle has never truly been tested until you have hundreds of developers, from around the world, pointing out your idiocy in an excruciatingly specific manner. After your ego has been throughly decimated, then you can improve.

This is closer to how I would do it in practice.

        void condense_by_removing (char *z_terminated, char char_to_remove) 
	{
		char *next = z_terminated;

		while (1) {
			while (*z_terminated == char_to_remove) ++z_terminated;
			if (!(*next++ = *z_terminated++) ) break;
		}
	}

EDIT: Fixed bug

what · on May 27, 2010

This won't work, if the first char of z_terminated == char_to_remove.

I have a similar solution farther down the page. Not many people seem to like doing it with pointer manipulation.

EDIT: We should both probably add

  if(!char_to_remove) return;

at the top since we'll get a seg fault if someone tries to be sneaky and passes the zero-terminator. And then might as well check if z_terminated is NULL while we're at it. Or is there a better way to handle it?

jacquesm · on May 27, 2010

> After your ego has been throughly decimated, then you can improve.

Aye... and that's when you learn. I've worked for a bit for a guy that had learned C in the 70's, best school I've had... also very painful at times.

jacquesm · on May 27, 2010

Apropos your first example it really was O(n^2) all the time, I just ran a bunch of tests on various optimization settings and the strlen gets called every iteration of the loop.

I think you owe 'bwithe' a beer.

mcu · on May 27, 2010

You're right, I should have left in 'int len = strlen(z_terminated);'

mcu · on May 27, 2010

This is closer to how I would do it in practice.

        void condense_by_removing (char *z_terminated, char char_to_remove) 
	{
		char *next = z_terminated;

		while (1) {
			while (*z_terminated == char_to_remove) ++z_terminated;
			if (!(*next++ = *z_terminated++) ) break;
		}
	}