Hacker Newsnew | past | comments | ask | show | jobs | submit | lambdajones's commentslogin

it means why is probably in a remote, hidden mountain village in Colorado.


I don't disagree. I tried to tell the product people that on several occasions, but it fell on deaf ears. So might as well do it right, eh?


b52 maps each short url code to a unique number, 0-(52^5)


There's a shorter way without a bunch of mysterious constants:

  // decode a 5-digit base52 number
  unsigned long b52(const char *c) {
    unsigned long factor = 1;
    unsigned long result = 0;
    for(int i = 0; i < 5; i++) {
      result += factor * b52map[c[i]];
      factor *= 52;
    }
    return result;
  }
(52^5 < 2^32 so an 'unsigned long' is plenty; 52^11 < 2^64 so you could get an 11-digit base52 number into an 'unsigned long long'. I'd be tempted to generalize the function to handle 0-11 digits, by requiring the parameter to be null-terminated or passing in a length.)


Yeah, a ull is overkill. This code comes from a furious couple weeks of coding, an attempt at being clever and going for speed. I'm not sure if I accomplished either. The thinking at the time was that the constants might cache better. I know, don't be clever.


Got around to profiling it last night. My code is faster, if you accept the cost of mystery.

http://lambdajones.com/b52why


How could that possibly make a difference for the intended application? Is the server CPU-maxed and never faces IO waits where more calculation is free?

If speed were a valid goal (and it's not), then:

(1) You would only use overkill 64-bit integers if on a 64-bit processor and profiling had proven them to be faster.

(2) Precalculating the factors would speed the clearer code, possibly making it faster:

  const unsigned long factors[5] = 
    { 1, 52, 52*52, 52*52*52, 52*52*52*52 };
  // decode a 5-digit base52 number
  unsigned long b52(const char *c) {
    unsigned long result = 0;
    for(int i = 0; i < 5; i++) {
      result += factors[i] * b52map[c[i]];
    }
    return result;
  }
...or unroll the loop...

  // decode a 5-digit base52 number
  unsigned long b52(const char *c) {
    unsigned long result = b52map[c[0]];
    result += (52) * b52map[c[1]];
    result += (52*52) * b52map[c[2]];
    result += (52*52*52) * b52map[c[3]];
    result += (52*52*52*52) * b52map[c[4]];   
    return result;
  }
(Here I'm trusting the compiler to precalculate the powers of 52, but the code is still clearly 'parsing a base' rather than 'mysterious multiplications and divisions'.)

(3) You would strongly consider using a Base64-based approach, so all conversions involve bit-shifts instead of multiplication. If certain characters can't be used -- eg vowels -- you might still prefer large gaps in your ID series. (If you're using a SQL database, it won't care -- but then again, if there's a SQL database anywhere in the system, its operations will make these micro-optimizations even more ludicrous than they are already.)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: