Hacker News new | comments | show | ask | jobs | submit login
Ask HN: how can I generate youtube style id?
4 points by jacktang 2890 days ago | hide | past | web | 18 comments | favorite
Given sample URL: http://www.youtube.com/watch?v=xqvObyU2ITs, any availabale algorithm to generate youtube style video id (xqvObyU2ITs)? Thanks!


def generate_code(len = 5)

  (1..len).map { (("a".."z").to_a + ("A".."Z").to_a + (0..9).to_a)[rand(62)] }.join

That rocks dude.

First thought is that they're generating a globally unique video ID number and base-62 encoding it to keep the URL shorter. Maybe not though?

You could do that easily by just base-62 encoding your table's auto-generated primary key.

Here's a very quick and dirty one for Ruby:

require 'zlib'

base = "whatever"

salt = "whatever"

Zlib.crc32(base + salt).to_s(36)

This will generate 6-7 character strings. Not sure how likely collisions are, but they should be rare enough that a simple check/regenerate should work.

What's wrong with starting with '0' for the first item and incrementing up from there? Guaranteed uniqueness.

a) impossible to do in a distributed fashion, b) may leak information about your operations to competitors. of course, OP may not care about these things.

I understand (b), and in fact I considered it a potential weakness, but why does (a) matter if you're selling from a single site? Even if you got multiple simultaneous orders, storing the last license generated in a DBMS should make it multiprocess-safe.

uh, in python:

  import string, random
  ''.join(random.sample((string.letters+string.digits), 12))
I hope that's what you were asking for. If not, you might want to clarify your question.

On morning's light, that won't do what you think. random.sample() gives you a unique sampling. No character will be repeated. try this instead:

    alphanum = string.letters+string.digits
    ''.join([alphanum[random.randint(0,61)] for i in xrange(12)])

You're quite right; I should have caught that.

You want to use an ID that can predictably be unique for something like this. You shouldn't use a random string.

What do you mean by "predictably unique"? Do you mean "guaranteed unique"?

As a way to generate unique ids this isn't horrible. 12^62 is something like 220 bits. The odds of a collision are even lower than with a UUID.

Guaranteed uniqueness is preferred, yes. But the level of effort needed to guarantee uniqueness across a large application / dataset / etc is much higher than "unique enough", just as it's a lot more expensive to prove a number is prime than to generate a number that is 99.9999% probably prime.

well, even 99.9999% possible, we should handle the 0.0001% exception ;) Can I understand that, if collision occurs, let the it generate the id again?

That would be guaranteeing uniqueness. Don't bother guaranteeing uniqueness. The 99.99999% was an understatement. 12^62 is 99.999999999999999999999999999999999999999999999999999999999992% unique. This is higher odds than being killed by a meteorite.

:) thanks

Hi, how to keep the string/id is unique? In another words, how to deal with the id conflict?

It's generating a big (BIG) random numbe. The odds of a conflict are many billion times more than the odds of getting hit by a meteorite.


Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact