I still think you should try not to exceed 255 characters, at least for the part before the hash sign. A good URL should not change, and anything this large is almost bound to change rather soon.
The data URI scheme does not really apply here since it's never sent to any server. If a browser understands data URIs, it should logically also allow such long URLs.
I use long URL because I want unique URL that will never change.
Specifying what item(s) should be targeted in a pool of possibly millions, with endless possible combinations is bound to require some kind of precise pointer.
In the example URL you gave, the content of the URL (base64):
({:project-id "505a125e44ae42e05a750c97", :object-instance "2", :object-type "0", :device-id "1234"} {:project-id "505a125e44ae42e05a750c97", :object-instance "1", :object-type "0", :device-id "1234"} {:project-id "505a125e44ae42e05a750c97", :object-instance "0", :object-type "0", :device-id "1234"})
seems like it would be better stored on the server in redis or something (or, at least if leaving it in the URL, a more compact deduplicated format might be worthwhile)
Yeah I'm still wondering if I should gzip to whole thing (I'm already base64 encoding anyway).
However the duplication overhead would only be really paying off with a large number of objects.
By the redis reference, I suppose you refer to a uniquely created key each time a user request a possible combination. Something like /short-url/abcd, where abcd would be a key matching {:project-id "505a125e44ae42e05a750c97"... ?
That's what I was thinking when talking about a shortening url scheme. It requires more work, but the final URL would indeed be more sexy.
> Specifying what item(s) should be targeted in a pool of possibly millions, with endless possible combinations is bound to require some kind of precise pointer.
Since you're using base 64 there, let's think for a minute. How many characters would you need to uniquely identify over a million objects? log_64(1,000,000) is about 3.3. With 4 characters, you could represent over 16 million objects. If you just store all of the objects that you need to reference along with an incrementing primary key, you wouldn't have to use more than 4 characters until you had more than 16 million objects in your database.
Have a billion objects? That's just five characters. Still not enough? With 7 characters, you could index more than 4 trillion.
But let's say that you can't actually keep a single database, with an incrementing primary key. You have multiple independent processes or people generating objects that need identifiers that will always be stable, you can't rely on manually picked names, and so on. So just use a secure hash: a SHA-2 or SHA-3 hash of the objects. If you use a 256 bit secure hash (44 characters in Base64, including the padding), and had 500 octillion items in your set, you would have about one in a quintillion chance of having an accidental collision. I'll give you a hint; you are never going to have that many items in your data set.
Now, you might object "what if SHA-2 is broken". Well, that may happen, though it's fairly unlikely. Most of the ways of breaking a secure hash involve making it a few orders of magnitude easier to compute a collision. But at 256 bits, you have a substantial safety margin; it would have to be pretty thoroughly broken before anyone would be able to find meaningful collisions. Heck, Git uses SHA-1 still, which uses 160 bit hashes, and is much closer to being broken.
Anyhow, the point of all of this is that a URL is supposed to be an identifier. It doesn't take that many characters to create an identifier that could uniquely identify each quark in the whole universe. You absolutely don't need long URLs to guarantee uniqueness; if your URLs are long, it's because you're including a lot of redundant information in the URL, or you are actually trying to store a description of the object in the URL, rather than an identifier.
Interesting. I'm reticent to implement any of this. Why? Because it adds a bunch of complexity where none is required. As I said before, I might add a URL shortener if the need arises.
If my goal was to get the smallest possible URL, you would be correct (well, you still are...) However, for the same reason people prefers a website named "ycombinator" over "zgrrc", even if the url would be shorter, I don't mind not being concise.
If I can check my url and without any database see that it's project X, device Y, object Z, it makes debugging easier.
But you are absolutely right: there are ways to make short, unique urls. I just don't want to use them.
You can also get into trouble if the URL will attempt to be mapped to an underlying file system. Many file systems will have a maximum path length which can easily be reached with a long URL. You need to ensure that you absolutely are not trying to check the file system with a long URL.
The data URI scheme does not really apply here since it's never sent to any server. If a browser understands data URIs, it should logically also allow such long URLs.