lambdajones's comments

lambdajones · on Aug 20, 2009

it means why is probably in a remote, hidden mountain village in Colorado.

lambdajones · on July 5, 2009

I don't disagree. I tried to tell the product people that on several occasions, but it fell on deaf ears. So might as well do it right, eh?

lambdajones · on July 3, 2009

b52 maps each short url code to a unique number, 0-(52^5)

gojomo · on July 3, 2009

There's a shorter way without a bunch of mysterious constants:

  // decode a 5-digit base52 number
  unsigned long b52(const char *c) {
    unsigned long factor = 1;
    unsigned long result = 0;
    for(int i = 0; i < 5; i++) {
      result += factor * b52map[c[i]];
      factor *= 52;
    }
    return result;
  }

(52^5 < 2^32 so an 'unsigned long' is plenty; 52^11 < 2^64 so you could get an 11-digit base52 number into an 'unsigned long long'. I'd be tempted to generalize the function to handle 0-11 digits, by requiring the parameter to be null-terminated or passing in a length.)

lambdajones · on July 3, 2009

Yeah, a ull is overkill. This code comes from a furious couple weeks of coding, an attempt at being clever and going for speed. I'm not sure if I accomplished either. The thinking at the time was that the constants might cache better. I know, don't be clever.

lambdajones · on July 4, 2009

Got around to profiling it last night. My code is faster, if you accept the cost of mystery.

http://lambdajones.com/b52why

gojomo · on July 5, 2009

How could that possibly make a difference for the intended application? Is the server CPU-maxed and never faces IO waits where more calculation is free?

If speed were a valid goal (and it's not), then:

(1) You would only use overkill 64-bit integers if on a 64-bit processor and profiling had proven them to be faster.

(2) Precalculating the factors would speed the clearer code, possibly making it faster:

  const unsigned long factors[5] = 
    { 1, 52, 52*52, 52*52*52, 52*52*52*52 };
  // decode a 5-digit base52 number
  unsigned long b52(const char *c) {
    unsigned long result = 0;
    for(int i = 0; i < 5; i++) {
      result += factors[i] * b52map[c[i]];
    }
    return result;
  }

...or unroll the loop...

  // decode a 5-digit base52 number
  unsigned long b52(const char *c) {
    unsigned long result = b52map[c[0]];
    result += (52) * b52map[c[1]];
    result += (52*52) * b52map[c[2]];
    result += (52*52*52) * b52map[c[3]];
    result += (52*52*52*52) * b52map[c[4]];   
    return result;
  }

(Here I'm trusting the compiler to precalculate the powers of 52, but the code is still clearly 'parsing a base' rather than 'mysterious multiplications and divisions'.)

(3) You would strongly consider using a Base64-based approach, so all conversions involve bit-shifts instead of multiplication. If certain characters can't be used -- eg vowels -- you might still prefer large gaps in your ID series. (If you're using a SQL database, it won't care -- but then again, if there's a SQL database anywhere in the system, its operations will make these micro-optimizations even more ludicrous than they are already.)