Open standards for shortening URLs rather than private Hash Tables of random shortlinks is a great idea.
Being open source and predictable it would not be necessary to specify any particular shortening service so that part of the URL is extraneous.
Of course there would need to be some chars to refer to which particular open standard shortening encoder was used, still would be shorter than the oringal url for this post.
They could be reliably decoded, preventing the short link rot problem.
As the user could decode the URL themselves it also allows one to preview short link's URLs which can be reassuring and better security and prevent bait and switch shortlink destination changing tactics.
Yes that is actually a rather faithful rendering of the types of ideas and possible future paths or use cases I had in mind. Thank you for saying that.
Unfortunately the longest urls, the ones you want to shorten the most, are the worst candidates for compression (with long query strings full of things like uuids).
I have an idea to identify the uuid and other identifier type parts and encode them up as numbers. The file radix_coder.js is working toward this. So we cover a few formats like guid base64 digits base36, to get a little bit more gain.
But my initial experiments suggest it's just a little gain. Digits go to 68%, base36 to 90%, and then we also have to add in the prefix to indicate we are switching encodings.
That's basically what a Huffman code is. Once you have a big list of examples, it's just a way to encode it to take advantage of frequency of occurrence.
If it were to use a dictionary that defines a large subset of domain names and common strings found in URLs it could reduce the size somewhat more, but still seems too long to be viable. What might be more interesting is an Ethereum approach to URL shortening that is distributed amongst the Internet.
Yeah it's a challenge if we want to serve the code from a short domain name and prefix compressed strings with the domain name and on the client have the path section automatically decompressed and redirected.
Even a 1 letter host at a 2 letter tld like .co .in .ws .tk introduces 7 or 8 for the scheme, one for the slash, and 4 for the host. So a minimum of 14 to put it on a domain.
I haven't decided whether to put it on a domain or to use a scheme prefix like zurl: our something else.
The other possibility is this if not really about shortest, or Twitter, but more about, transport encoding and an efficient binary format, and something fun.
Thanks for commenting! If you have any ideas how to improve the compression or other things, please submit a PR!
https://news.ycombinator.com/item?id=14245119
If you encode it with this shortener, you get this 35 character string:
mNb:w9iIp7u8di:AKB2xrPUVYUFhfRUWHwA
Assuming you were using this string as a key in a shortening service like this:
https://short.url/mNb:w9iIp7u8di:AKB2xrPUVYUFhfRUWHwA
... you'd end up with a url longer than the original url! So it's not technically a url shortener :)