
Ask HN: Why can't we just use this idea for clean URLs without IDs? - myf01d
I am here to ask and test the validity of this idea:<p>most sites have urls e.g.:<p>example.com&#x2F;object&#x2F;{ID}&#x2F;{SLUG}<p>or<p>example.com&#x2F;object&#x2F;{SLUG}-{ID}<p>where {ID} corresponds to the object to fetch from the database assuming the database is too big and that&#x27;s why we don&#x27;t address the sql using the slug (in some cases the slug even needs some computations to be reconstructed so it&#x27;s impossible to address using {SLUG} directly). So {SLUG} does nothing practically except for SEO.<p>Why can&#x27;t just we use the cleaner<p>example.com&#x2F;object&#x2F;{SLUG}<p>And when the URL is request, we hash the slug using a 32-bit or 64-bit hash algorithm, and lookup the sql using the hash using an additional integer column in the sql table containing the corresponding hash value?<p>What is wrong with this idea?
======
grzm
There's no reason you can't do that. Even more to the point, there's no reason
to look up using a hash of the text. You can look it up using the slug
directly.

People generally want (permanent) URIs to resources such as articles to be
unchanging as you don't want links to break. This means that once you decide
on a slug, you can't change it. If you want some correspondence between titles
and slugs, this also constrains changing the title.

Including an arbitrary, context independent id in the URI means you can use
that to look up the article and you're free to change the slug. You're not
using the slug as part of the key used to look up the article in the database.

This also explains why using a hash of the slug would be effectively the same
as using the slug itself, and with the same constraints.

~~~
myf01d
If I address the sql table using the slug (let's say a slug is between 10 to
100 character), isn't it an overkill for performance instead of just looking
for an int or bigint like in id or hash?

Maybe it won't be a difference for something like Redis (i.e. O(1) for any
key) but it will be a performance hit for a sql containing hundreds of
thousands of row, am I right?

~~~
grzm
If you have an index on the column, it shouldn't be an huge issue. You'd have
similar issues if you had no index on the integer column.

Of course, if you're really concerned about it, you can easily put together a
test. I suspect other latencies in your app would be larger than those due to
the difference between integer lookup and text lookup. But like I said: if
you're highly sensitive to performance, you'll want to be measuring against
your application, your stack, regardless of advice you might read about on the
internet.

If beautiful URIs are a priority, you can use a separate mapping table or
service (memcache? Redis?) that maps the URI to a database ID. Of course, this
adds latency/overhead as well. This could also give you flexibility in
supporting multiple slugs for a single story.

------
jepler
1\. A 32-bit hash isn't enough no matter what (birthday paradox), 64-bit might
be if you don't have adversaries choosing {SLUG} strings and it's the
truncation of a cryptographic hash. (compare with 32-bit public key IDs in
PGP)

2\. Are you going to really enforce {SLUG} being unique for the lifetime of
your system?

~~~
myf01d
Thanks for reply

1\. you're right that 32-bit isn't enough for tens of thousands of slugs
(maybe even thousands?), I just need to design a performant and economic url
design, like in letterboxd.com

2\. yes, my project is about artistic works, so the work name wont change and
subsequently the slug.

Since I use Postgres which has only 32-bit and 128-bit integers, is there a
considerable performance hit if I use the latter with 64-bit or 128-bit
algorithm so I reduce the probability of collision?

------
jepler
3\. do you ever need to change {SLUG} but keep old references? e.g., to track
a headline/title change as a story develops? {ID}-{OLDSLUG} redirects to
{ID}-{NEWSLUG} is the same as {OLDSLUG} redirects to {NEWSLUG} though I guess.

------
10rogues
The slug could change (when based on the name or title) and then a visitor
could be cleanly redirected to the new correct url.

