
Hashify: what becomes possible when one is able to store documents in URLs? - potomak
http://hashify.me/IyBIYXNoaWZ5CgpIYXNoaWZ5IGRvZXMgbm90IHNvbHZlIGEgcHJvYmxlbSwgaXQgcG9zZXMgYSBxdWVzdGlvbjogX3doYXQgYmVjb21lcyBwb3NzaWJsZSB3aGVuIG9uZSBpcyBhYmxlIHRvIHN0b3JlICoqZW50aXJlIGRvY3VtZW50cyoqIGluIFVSTHM/XwoKIyMgRG9jdW1lbnQg4oaUIFVSTAoKSGFzaGlmeSBpcyBkaWZmZXJlbnQgZnJvbSB2aXJ0dWFsbHkgZXZlcnkgb3RoZXIgc2l0ZSBvbiB0aGUgV2ViIGluIHRoYXQgKipldmVyeSBVUkwgY29udGFpbnMgdGhlIGNvbXBsZXRlIGNvbnRlbnRzIG9mIHRoZSBwYWdlKiouCgpUaGUgYWRkcmVzcyBiYXIgdXBkYXRlcyB3aXRoIGVhY2gga2V5c3Ryb2tlIGFzIG9uZSB0eXBlcyBpbnRvIHRoZSBlZGl0b3IuCgojIyMgQmFzZTY0IGVuY29kaW5nCgpPbmx5IGEgdGlueSBmcmFjdGlvbiBvZiBhbGwgVW5pY29kZSBjaGFyYWN0ZXJzIGFyZSBhbGxvd2VkIHVuZXNjYXBlZCBpbiBhIFVSTC4gSGFzaGlmeSB1c2VzIFtCYXNlNjRdWzFdIGVuY29kaW5nIHRvIGNvbnZlcnQgVW5pY29kZSBpbnB1dCB0byBBU0NJSSBvdXRwdXQgc2FmZSBmb3IgaW5jbHVzaW9uIGluIFVSTHMuCgpUaGlzIHRyYW5zbGF0aW9uIGlzIGEgdHdvLXN0ZXAgcHJvY2VzczogW1VuaWNvZGUgdG8gVVRGLTggY29udmVyc2lvbl1bMl0gYXMgb3V0bGluZWQgYnkgSm9oYW4gU3VuZHN0csO2bSwgZm9sbG93ZWQgYnkgYmluYXJ5IHRvIEFTQ0lJIGNvbnZlcnNpb24gdmlhIFtgd2luZG93LmJ0b2FgXVszXS4KCiMjIyMgRW5jb2RpbmcKCiAgICA+IHVuZXNjYXBlKGVuY29kZVVSSUNvbXBvbmVudCgnw6dhIHZhPycpKQogICAgIsODwqdhIHZhPyIKICAgID4gYnRvYSh1bmVzY2FwZShlbmNvZGVVUklDb21wb25lbnQoJ8OnYSB2YT8nKSkpCiAgICAidzZkaElIWmhQdz09IgoKIyMjIyBEZWNvZGluZwoKICAgID4gYXRvYigndzZkaElIWmhQdz09JykKICAgICLDg8KnYSB2YT8iCiAgICA+IGRlY29kZVVSSUNvbXBvbmVudChlc2NhcGUoYXRvYigndzZkaElIWmhQdz09JykpKQogICAgIsOnYSB2YT8iCgojIyBVUkwgc2hvcnRlbmluZwoKU3RvcmluZyBhIGRvY3VtZW50IGluIGEgVVJMIGlzIG5pZnR5LCBidXQgbm90IHRlcnJpYmx5IHByYWN0aWNhbC4gSGFzaGlmeSB1c2VzIHRoZSBbYml0Lmx5IEFQSV1bNF0gdG8gc2hvcnRlbiBVUkxzIGZyb20gYXMgbWFueSBhcyAzMCwwMDAgY2hhcmFjdGVycyB0byBqdXN0IDIwIG9yIHNvLiBJbiBlc3NlbmNlLCBiaXQubHkgYWN0cyBhcyBhIGRvY3VtZW50IHN0b3JlIQoKIyMjIFVSTCBsZW5ndGggbGltaXQKCldoaWxlIHRoZSBIVFRQIHNwZWNpZmljYXRpb24gZG9lcyBub3QgZGVmaW5lIGFuIHVwcGVyIGxpbWl0IG9uIHRoZSBsZW5ndGggb2YgYSBVUkwgdGhhdCBhIHVzZXIgYWdlbnQgc2hvdWxkIGFjY2VwdCwgYml0Lmx5IGltcG9zZXMgYSAyMDQ4LWNoYXJhY3RlciBsaW1pdC4gVGhpcyBpcyBzdWZmaWNpZW50IGluIHRoZSBtYWpvcml0eSBvZiBjYXNlcy4KCkZvciBsb2
======
rsaarelm
I've sometimes wondered about a system where the URL of a document is an
actual hash, like SHA-1, of the document. That'd chance the semantics of
hyperlinks from "link to document at this internet address" to "link to
document with these contents", just like Hashify does, but it could do
arbitrarily large documents.

The tricky part with that system would be that you'd also need some new
mechanism to retrieve the files. Instead of the regular WWW stack, you'd need
something like a massive distributed hash table that could handle massive
distributed querying and transferring the hashed files. Many P2P file sharing
systems are already doing this, but a sparse collection of end-user machines
containing a few hashed files each isn't a very efficient service cloud. If
every ISP had this sort of thing in their service stack or if Amazon and
Google decided to run the service, all of them dynamically caching documents
in greater demand in more nodes, things might look very different.

This would mean that very old hypertext documents would still be trivially
readable with working links, as long as a few copies of the page documents
were still hashed somewhere, even if the original hosting servers were long
gone. It would also make it easy to do distributed page caching, so that pages
that get a sudden large influx of traffic wouldn't create massive load on a
single server.

On the other hand, any sort of news sites where the contents of the URL are
expected to change wouldn't work, nor would URLs expected to point to a latest
version of a document instead of the one at the time of linking. Once the hash
URL was out, no revision to the hashed document visible from following the URL
would be possible without some additional protocol layer. The URL strings
would also be opaque to humans and too long and random to be committed to
memory or typed by hand. The web would probably need to be somehow split into
human-readable URLs for dynamic pages and hash URLs for the static pieces of
content served by those pages.

I'm probably reinventing the wheel here, and someone's already worked out a
more thought out version of this idea.

~~~
mhitza
<http://en.wikipedia.org/wiki/Uniform_resource_name>

~~~
rsaarelm
Right. Turns out my use of 'URL' everywhere in grandparent comment is a
misnomer then. Should've used URN or URI.

I'm not quite sure if URN is exactly right for the hash thing either, given
that it both fails to unify things which humans would probably assign the same
URN to, such as two image files of the same picture using different encodings,
and it has the theoretical chance of assigning the same hash to two entirely
different things.

~~~
mhitza
I think these issues are clearly answered by the rfc
<http://tools.ietf.org/html/rfc1737>.

* Global uniqueness: The same URN will never be assigned to two different resources. ((the encoding would be part of the URN))

* Independence: It is solely the responsibility of a name issuing authority to determine the conditions under which it will issue a name. ((a URN wouldn't necessarily be a hash of the resource in question))

The second point makes it pretty clear that the assignment of URN's would be
done by some authoritative parties, which makes sense if you think that in
their initial view URN's would have been useful in linking citations,
references; for research papers. Just that the Internet has far time ago
branched from that scope.

------
juanre
Very neat idea, but I think the reliance in bit.ly is self-defeating. This
kind of approach would allow people to distribute documents using the web
without having to trust them to a particular server, which can be very
convenient if your target audience is in a country where access to the server
storing your documents can be closed. For this to work you need to be able to
recover the document from the URL locally.

Some years ago a friend and I wrote <http://notamap.com>, a very similar idea
for sharing/storing/embed geotagged notes fully encoded on a URL, without
having to rely on a server. Looking at it now I wish we had not put all the
crazy animations. Maybe I should recover it and simplify the UI.

~~~
rapala
_Very neat idea, but I think the reliance in bit.ly is self-defeating. This
kind of approach would allow people to distribute documents using the web
without having to trust them to a particular server, which can be very
convenient if your target audience is in a country where access to the server
storing your documents can be closed._

I just can't see the gain here. You need _a_ server to distribute the URLs in
any case. You are just moving the data from the server that served the
document to the server that servers the URLs. It is still the same data, just
in different form.

 _For this to work you need to be able to recover the document from the URL
locally._

How about saving the document?

~~~
seanp2k2
Right, exactly. It's only really cool for magnet links because the law is
different for "linking to content" vs "hosting content" right now.

One step further in this direction would just mean "including all the data for
one or more documents on one page". You've just invented...a document,
possibly with more than one media type.

Think about it this way: You have a 10K document which contains 200 bytes of
link data; a URI to another hypertext document that is 5K in size...vs...you
have a 15K document which contains 5K of "link data"; the other document.

------
aurelianito
I don't get it. Why do they claim that this is in any way better than a
data:// url? (<http://es.wikipedia.org/wiki/Data:_URL>)

~~~
sp332
This lets you use the data as a _piece_ of a URL, so you can pass it as a CGI
query string to another web page.

~~~
aurelianito
I still don't get it. Passing data as pieces of URLs is what normal parameters
are for and data URLs are for generating a "virtual file", id est a link that
contains all the information of the file linked.

With those 2 things, everything should be covered.

~~~
nerfhammer
Some limitations. You can't redirect to a data:// url, e.g. <img
src=<http://tinyurl.com/44c8ctt> > or <a href=<http://tinyurl.com/44c8ctt>
>link</a> gets stopped by Chrome as an injection attack. So you can't use
data:// to (ab)use a link shortener as a CDN.

------
mmahemoff
I looked at URL shortener limits some time ago and found these approximate
limits by trial-and-error:

* TinyURL 65,536 characters and probably more, but requests timed out; there isn’t an explicit limit apparently

* Bit.ly 2000 characters.

* Is.Gd 2000 characters.

* Twurl.nl 255 characters.

This was 2.5 years ago, not sure how much of these have changed (other than
bit.ly, which the linked article confirms is 2048, probably the same as when I
tested it).

<http://softwareas.com/the-url-shortener-as-a-cloud-database>

~~~
eli
2000 is roughly the maximum length of a URL that IE can handle, incidentally.

------
choffstein
Repost: <http://news.ycombinator.com/item?id=2464213>

------
mattvot
Boiling it down, it's a new file format with a built in viewer. You need to
find a way to store the data.

Interesting, but I can't think of any practical application, apart from the
service provider not having to worry about storage (maybe that's key ... more
thinking needed).

~~~
irrumator
It would be except it's not new, it's a dupe of post from not too long ago
that had a large amount of points:
<http://news.ycombinator.com/item?id=2464213>

------
sgdesign
I really like this approach, in fact that's what I used for
<http://www.patternify.com/>

This way the whole tool can be 100% client-side javascript, without a need for
any back-end.

------
Jare
The original version of Mr Doob's GLSL Sandbox at
<http://mrdoob.com/projects/glsl_sandbox/> used the same approach, but
increased the maximum possible size of the document by doing LZMA compression
before base64.

The project later moved to <http://glsl.heroku.com/> with an app-driven
gallery, and that particular feature went away. I think that is a pretty
natural evolution of any such idea, so I'm not convinced of hashify's
logevity, but hey, simple sometimes is really enough.

------
samgranger
Cool to see but stupid idea, who in their right mind would use this for
production?! By using such a "technology", you lose SEO strength due to urls-
not-being-like-this.html and even worse, what can stop me from publishing a
fake press release on there site/spamming porn and getting that URL indexed?
And what are the benefits? To also bring SOPA into this, couldn't I share
copyrighted material on someone's site like this? How could they control
that?! Besides blocking each URL manually. Just seems dumb. As a concept,
cool, but for production.... Yikes?!

~~~
samgranger
Obviously you could also have a database with all content URL strings that you
publish - but that makes this technology worth nothing at all.

------
Angostura
"Internet Explorer cannot display the webpage" is what happens here (IE 8).

~~~
tkellogg
Not really a good reply, but I think that hashify.me's potential for an IE
audience was probably small to start with. But consider this: if this idea
took off, wouldn't this press MS into keeping IE more modern?

~~~
brudgers
My mobile browser of choice does not support cross browser resource sharing,
according to the article...or rather the error message I get in lieu of the
article.

~~~
Freaky
Same with Opera and Chrome on my desktop. Firefox works at least.

~~~
tkellogg
Yeah, his message was 5548 characters (so the URL is at least 7380) which is
way too long for the generally accepted 2000 character limit. So this
particular protocol could use some enhancements - maybe a feature to break up
messages into several parts represented by other bit.ly addresses. This would
keep URL length under control

------
dools
I took a similar approach to this with <http://cueyoutube.com> and recently
found snapbird which gives extended twitter search capabilities. So the URL
contains the playlist and twitter becomes the data base, so I just tweet my
playlists and they're "saved". You can see all the lists I've created by
searching the account iaindooley and search term cueyoutube in snapbird.

------
cobychapple
What becomes possible? The entire internet could effectively get rid of
hosting account providers, with each page in every site being contained in a
hashify URL, and with each page linking to other pages using other hashify
URLs.

Trouble is, there might be a DNS-like system needed to match hashify URLs to
more human-readable strings (or a way for existing DNS to resolve to hashify
style URLs).

Neat idea.

~~~
friggeri
The real trouble is that when you link to a hashified URL, you are actually
embedding in your web page (an encoding of) the content of the page you are
linking to. Think matryoshka.

~~~
LeafStorm
Not to mention that makes link cycles impossible.

~~~
aurelianito
Nah! I am pretty sure that a quine can be made using this!
<http://en.wikipedia.org/wiki/Quine_(computing)>

------
feralchimp
Clever? Yes.

But URL shortening services are a public good, and hacking one to be your
personal cloud storage platform is kind of a dick move.

~~~
jheriko
agreed

------
jroseattle
This is cool, but I wouldn't use it for any real documents. I care about
versioning, edit history, etc.

------
tony_le_montana
Great idea this. But it saves on each edit and likely to hit rate limit on
bit.ly :(

------
orclev
This is an _ancient_ idea. I read a 2600 article back in the early 2000s or
possibly late 1990s that did essentially this same thing using a bash script
and one of the first URL shortening services available at the time.

~~~
eternalban

       What has been will be again,
       what has been done will be done again;
       there is nothing new under the sun.
    
       - Ecclesiates 1:9

------
markkum
Check out <https://neko.io/> ... we are scrambling/encrypting messages into
URLs which you can then share on Facebook, Twitter or where-ever.

------
djbender
Older Hacker News Post: <http://news.ycombinator.com/item?id=2464213>

------
GICodeWarrior
<http://www.semicomplete.com/projects/tastydrive/>

~~~
vidoss
This is very cool. Not the tasydrive, your "keynav"

<http://www.semicomplete.com/projects/keynav/>

------
7952
Maybe this would work well in an email? Especially if you want to get content
past filtering.

~~~
icebraining
Yes, but so would any pastebin website. You could say the advantage is that
the content is not available to the server (since its transferred in the URL
itself), but then it is when you actually read it, so it's not any more
private.

------
nerfhammer
ought to gzip the string before it goes to base64 while you're at it

~~~
gresrun
We actually do this exact thing to send dynamic parameters to a chart-
generating backend server. It works great; you get a surprising amount of
compression using gzip (2-4x space savings) and the URLs are naturally cached
by proxies without any magic!

~~~
nerfhammer
If it's mostly ascii compression is something you should almost automatically
think about.

There's also snappy <http://code.google.com/p/snappy/>

------
d2fn
some observations: 1\. when content changes every hyperlink to that content
must change along with it. 2\. pass by reference (url) is no longer possible.

------
zalew
The first use case that comes to mind is anonimity.

~~~
mattvot
Solutions like pastebin.com provide the same, with a small url.

~~~
tkellogg
These messages are SOPA proof. They can never be "taken down" since they don't
actually reside on the server

~~~
VMG
They _do_ however reside on server hosting the link

~~~
tkellogg
No, that's not entirely true. I don't have to use a service like bit.ly to
send one of these messages. And further, I could just as easily use _any_ or
_many_ services. Since the technology is fundamentally browser-to-browser kind
of distributed concept, it's just the URL shortening that's not SOPA
compliant.

There's also several ways to obscure the impact of SOPA on the URL shortening
anyway. For instance, if several services use the same hash algorithm for
representing URLs, they can be used interchangeably (if you post the URL to
all of them). Further, you can always set up your own temporary shortening
service as well.

~~~
uxp
* No, that's not entirely true. I don't have to use a service like thepiratebay.org to send one of these files. And further, I could just as easily use _any_ or _many_ trackers inside my torrent. *

Altered to convey another point. Naturally, it would be quite difficult to
"embed" a feature length movie into a single url, but if one was to split the
file into chunks like torrent transferring does, or simply a multi-part rar
like newsgroups still do, it enables each chunk to be more manageable.

I do agree with you though, but I think the reason that a service like this if
changed in such a way to be user-friendly for file sharing, not just document
sharing, would be able to get around a lot of the pitfalls a torrent tracker
(for example) would have if it's DNS lookup was blocked (which aren't many) is
due to the simple fact that SOPA is written in a way that assumes all IP
addresses and DNS names are statically tied together and slow to alter, not
that I can have a new domain name in a matter of minutes that resolves to my
existing server. Even more so if the final URL hash was nothing more than a
common and known algorithm, like base64, that one could easily plug into a
basic desktop app and get the same result.

------
michaelfeathers
Well, for one thing, you can't tweet a link.

~~~
samgranger
Link shortening service?! Twitter automatically shortens it for you :)

------
tronicron
Damn, so cool!

------
ColdAsIce
Fuckify this _ify trend to emulate or ride the spotify (anti)fame.

Is it just me or does anyone else also just back away whenever there is a
project which turns nouns into verbs with _ify? Spotify is a sockpuppet of

~~~
sp332
-ify is a pretty common suffix in English. It means to turn something into something else. <https://en.wiktionary.org/wiki/-ify> and [https://en.wiktionary.org/wiki/Category:English_words_suffix...](https://en.wiktionary.org/wiki/Category:English_words_suffixed_with_-ify)

