I still think you should try not to exceed 255 characters, at least for the part...

Frozenlock · on Jan 7, 2013

I use long URL because I want unique URL that will never change.

Specifying what item(s) should be targeted in a pool of possibly millions, with endless possible combinations is bound to require some kind of precise pointer.

Here is an example URL I use: https://bacnethelp.com/vis/overview/KHs6cHJvamVjdC1pZCAiNTA1....

What I could do, however, is use some kind of shortening url scheme, kinda like google maps: http://goo.gl/maps/3uP8y.

I'm still uncertain about which way is better.

EDIT: Of course I mean a local shortening url, pointing to my own databases.

dorianj · on Jan 7, 2013

In the example URL you gave, the content of the URL (base64): ({:project-id "505a125e44ae42e05a750c97", :object-instance "2", :object-type "0", :device-id "1234"} {:project-id "505a125e44ae42e05a750c97", :object-instance "1", :object-type "0", :device-id "1234"} {:project-id "505a125e44ae42e05a750c97", :object-instance "0", :object-type "0", :device-id "1234"})

seems like it would be better stored on the server in redis or something (or, at least if leaving it in the URL, a more compact deduplicated format might be worthwhile)

Frozenlock · on Jan 8, 2013

Yeah I'm still wondering if I should gzip to whole thing (I'm already base64 encoding anyway).

However the duplication overhead would only be really paying off with a large number of objects.

By the redis reference, I suppose you refer to a uniquely created key each time a user request a possible combination. Something like /short-url/abcd, where abcd would be a key matching {:project-id "505a125e44ae42e05a750c97"... ?

That's what I was thinking when talking about a shortening url scheme. It requires more work, but the final URL would indeed be more sexy.

Thank you for the input, I appreciate it!

mnarayan01 · on Jan 7, 2013

Part of me wonders what would happen if someone where to Base64 encode something like

  ({:project-id {:conditions "true); DELETE FROM projects WHERE (true"}})

Frozenlock · on Jan 8, 2013

Nothing, it's not sql. It's a clojure map with all the necessary info to find the different components.

But nice reflex! ;-)

lambda · on Jan 8, 2013

> Specifying what item(s) should be targeted in a pool of possibly millions, with endless possible combinations is bound to require some kind of precise pointer.

Since you're using base 64 there, let's think for a minute. How many characters would you need to uniquely identify over a million objects? log_64(1,000,000) is about 3.3. With 4 characters, you could represent over 16 million objects. If you just store all of the objects that you need to reference along with an incrementing primary key, you wouldn't have to use more than 4 characters until you had more than 16 million objects in your database.

Have a billion objects? That's just five characters. Still not enough? With 7 characters, you could index more than 4 trillion.

But let's say that you can't actually keep a single database, with an incrementing primary key. You have multiple independent processes or people generating objects that need identifiers that will always be stable, you can't rely on manually picked names, and so on. So just use a secure hash: a SHA-2 or SHA-3 hash of the objects. If you use a 256 bit secure hash (44 characters in Base64, including the padding), and had 500 octillion items in your set, you would have about one in a quintillion chance of having an accidental collision. I'll give you a hint; you are never going to have that many items in your data set.

Now, you might object "what if SHA-2 is broken". Well, that may happen, though it's fairly unlikely. Most of the ways of breaking a secure hash involve making it a few orders of magnitude easier to compute a collision. But at 256 bits, you have a substantial safety margin; it would have to be pretty thoroughly broken before anyone would be able to find meaningful collisions. Heck, Git uses SHA-1 still, which uses 160 bit hashes, and is much closer to being broken.

Anyhow, the point of all of this is that a URL is supposed to be an identifier. It doesn't take that many characters to create an identifier that could uniquely identify each quark in the whole universe. You absolutely don't need long URLs to guarantee uniqueness; if your URLs are long, it's because you're including a lot of redundant information in the URL, or you are actually trying to store a description of the object in the URL, rather than an identifier.

Frozenlock · on Jan 8, 2013

Interesting. I'm reticent to implement any of this. Why? Because it adds a bunch of complexity where none is required. As I said before, I might add a URL shortener if the need arises.

If my goal was to get the smallest possible URL, you would be correct (well, you still are...) However, for the same reason people prefers a website named "ycombinator" over "zgrrc", even if the url would be shorter, I don't mind not being concise.

If I can check my url and without any database see that it's project X, device Y, object Z, it makes debugging easier.

But you are absolutely right: there are ways to make short, unique urls. I just don't want to use them.

bcoates · on Jan 7, 2013

Your site has all these nice human names in the page text, why not just put them in the URL?

Frozenlock · on Jan 8, 2013

This might seems odd... but I just never thought of it. :\

I was probably stuck in a weird mindset.

mnarayan01 · on Jan 7, 2013

Initially I was going to say to make the URL shorter, but it actually appears to do the opposite.

brc · on Jan 8, 2013

You can also get into trouble if the URL will attempt to be mapped to an underlying file system. Many file systems will have a maximum path length which can easily be reached with a long URL. You need to ensure that you absolutely are not trying to check the file system with a long URL.