Hacker News new | past | comments | ask | show | jobs | submit login
Hashify.me - store entire website content in the URL (hashify.me)
554 points by kevinburke on Apr 19, 2011 | hide | past | web | favorite | 123 comments



I see this as a remarkable answer to the problem of needing to view a cached version of the website.

For example, what if a URL were posted to Hacker News, but after the URL was a ?hasifyme=THEHASH, where THEHASH was the Hash of the website linked-to.

This way, if the URL could not be loaded because the server load was to high, you could just forward the URL to Hashify.me and the cache of the plain text from the website would still then be readable.

Boom, instant cache of the website content stored right in the URL!!!!


You can also instantly search for dupes. Keep the hashes in a database and just check before accepting the submission.

(This has always been an idea I was thinking about; nice to see it implemented)


it wouldn't work: dynamic content. even comment counters below posts would change the hash.


Actually, you can run it through an article scraper, then hash it.


it wouldn't be 'entire website content'


include dynamic content via javascript


Clever use. But do you think it would scale? Doesn't it rely on bit.ly and/or hashify.me being able to convert the URL into a readable page?


I am missing something. How is this dramatically different from passing around the data: URIs ?

data:text/html;charset=utf-8,However,%20data%20URI%20does%20the%20same%20without%20the%20server.

Here is a self-containing cached document, for simple texts probably with a better efficiency than base64.

EDIT: ah. The shorteners barf if you try to "shorten" the data URIs


Actually tinyurl will shorten it but your browser may not allow the redirect. Chrome throws up an error page for security reasons: http://tinyurl.com/3maue6t


Ha, nice - I tried only goo.gl and bit.ly

With http://preview.tinyurl.com/3maue6t it works, btw.

Though, tinyurl still barfs when I try to shorten the data URI that http://software.hixie.ch/utilities/cgi/data/data gave to me when I let it grab the index of HN.


I assume hashify.me stores the value associated with the hash.

I guess for this to work well. Dont you need Hashify to provider a service where you pass in hashcode and it returns the HTML to you.

You just wrap the call with good old JSONP and you are good to go.


> I assume hashify.me stores the value associated with the hash.

Not so. hashify.me has no server-side component (beyond nginx serving three static files). _bit.ly_ kindly provides the hash table. ;)


I tried to make a self-referencing page, but you need to be able to generate the hash of the page url that includes its own hash...

Not having the time to do so, I leave the challenge here: http://bit.ly/ialoWI


That's not possible because the base 64 encoding is longer than the plaintext. There are 4 characters of text in the URL for every 3 characters of text entered.


If you're willing to jump through hoops and js-compress your content, you can deal with this issue, unless your compressor uses something like crc. That way, you can grow the plaintext faster than the b64 - though you'd still need to do clever things with the url.


That's essentially what you're doing when you are trying to mine Bitcoin :)



until you jumped the api limit :D why not hot swap any shortner?


bit.ly is the only URL shortening service to support cross-origin resource sharing, as far as I'm aware.

The _right_ thing to do would be to build a shortening service designed to handle URLs of arbitrary length. I'm not sure that I'm willing to take on that responsibility, though.


Especially if used in conjunction with a text-extraction API such as Alchemy: http://www.alchemyapi.com/api/text/

That way you only get the text of the page, stored in a cache.


Oh dear. This is like some kind of sick, twisted Rube Goldberg machine...

Take entire text of Bram Stoker's Dracula

Chunk into 123 parts

Data URI encode each part

Generate a TinyURL link for each data uri (thanks for having an API guys)

Embed the TinyURL links in the Hashify Markdown editor using object elements

Curses! Even just the objects takes it over the limit. It has to be done in two parts. Create two Hashify pages.

Part 1:

http://bit.ly/havNYE

Part 2:

http://bit.ly/feoYrR

Works in Firefox and Safari. Chrome, Opera and IE9 don't like it.


Here ya go mate, in one page:

http://tinyurl.com/hashcula

hashify allows style tags :-)


You, my friend, are one sick individual. Very cool!


With a name like 'Hashify', I'm surprised they don't also offer the option of putting the content into the '#fragment' portion of the URL. Then, not even the hashify.net site would need to receive and decode the full URL; they'd just send down a small bit of constant Javascript that rebuilds the page from the #fragment.


> they'd just send down a small bit of constant Javascript that rebuilds the page from the #fragment.

That's _exactly_ what the site does. Everything happens client-side. nginx serves a single index.html for every request. ;)


The OP is saying that with a #fragment, the browser wouldn't even need to send the server the doc contents. (Since they aren't used anyway, there's obviously no use in doing so.)


Then why put the encoded content in the /path-info rather than the #fragment?


It looks nicer.


And gratuitous roundtrips are fun, too :P

(You're uploading the data in order to then convert it client-side. Groxx noticed this too: http://news.ycombinator.com/item?id=2464347)


Oh, Bit.ly's gonna _Love_ you guys.

Seriousy, though - awesome hack.


A fantastic abuse of technology. That's one heck of a URL.


I can't recall the last thing that made me giggle as much as realizing what they were doing. I can't see any time I'd really use this, but the audacity is inspiring.


Why abuse?


Because "URI" stands for "Uniform Resource Identifier" (or URL for "Locator"), not "Resource". The intent is for it to be a pointer, not the value at the location. And, if you were to use the URI as the content (instead of chunking and shrinking it via Bitly), you'd be duplicating that content on every page that links to it. And in your browsing history, by merely viewing it.

edit: oooh, another thought: you're essentially uploading the content of the page to view it.


There's an interesting copyright question in there somewhere too. If the URL for my document is the document then sharing the link is infringing my copyright, or something.


there is such a thing as fair use in copyright law (at least, in australia and US and most major western states).


I don't think there's any abuse here (after all, data: URIs exist). A value is just another kind of pointer.

The important thing is that the identified resource is unambiguously identified.


This is basically stealing bandwidth from sites like bit.ly, getting them to host your webpages for you.

There might be legitimate uses for this, right now I can't think of one. Clever hack though.


Is this really the case? Bit.ly would still have to forward the user over to the hashify.me site, where the hash would be decoded server-side and the content would have to be sent back over the wire to the client. That's still eating the same amount of bandwidth on hashify.me, no?


Bit.ly has to store the entire document encoded in base64 as the URL of the destination in their database in order to return to users the value of the given bit.ly URL hash. In essence, yes, bit.ly is storing the entire document on their servers anytime anyone shortens a hashify.me link.

Think if it like the difference between the postal service letting you know there is a package that you can go pick up at the post office, and the postal service giving you a package at your home or work that cannot be opened until you go to the store to buy a box cutter, but you have to bring the package with you.

The first example is cheap, since you only receive a pointer or link to where the package is, but you have to do all the work to get it. The second is not cheap, since if the package was a bed from Ikea (for a random large example), the postal service (bit.ly) has to deliver the package to you, and then you have to go somewhere (hashify.me) while carrying that package in order to see what's inside.


Ok, that makes sense. I thought we were debating on whether or not bit.ly incurred ALL the load and hashify.me incurred NONE, but that doesn't seem to be the case.


No ;-)


Perhaps 'misuse' would have been better. It's certainly that.


Last week on a whim I whipped up a URL shortener that expires the forwarded URL after one week[1]. Using that plus hashify, you can essentially make expiring web pages.

[1] pygm.us


"A hash function is any well-defined procedure or mathematical function that converts a large, possibly variable-sized amount of data into a small datum"

Hashify is not really a hash, is it?


No hashify is not really a hash function. Since a hash function takes large data to small data it is implicitly not invertible. Hashify's URLs obviously are.


End users are given bit.ly URL with a hash.


Splitting hairs because it's closer to a common understanding, but bit.ly URLs aren't hashes they are just alphanumeric IDs.

The difference is that AFAIK there's no algorithm to take a URL (plus or minus a username) and give you a bit.ly ID, short of looking it up at bit.ly.


And the relationship is this: 'path' element of bit.ly URLs are keys in a key->value mapping (where the value is your target URL), and one of the best implementations of a key->value mapping is a hash-table. (At least, it's good for in-memory implementations... I suppose that on disk something a little more elaborate may be called for?)

Historically, the authors of Perl and Ruby (and WP tells me, Common Lisp?) decided to confuse the interface with the implementation, and use "hash" or "hash table" to refer to the mapping, and not ever Perl hacker has a Computing Science degree, so now we live in a world of people who think that "hash" means the thing that bit.ly does for you.

Once again, Larry Wall Ruins Everything. :P


In Common Lisp, it's not interface for mapping but really an hash table as some internal details of hash table implementation are exposed by the interface.


encode.me & base64.me are taken, but encodify.me is not. Doesn't have the same ring as hashify.me, though.


abuse.me may be a better alternative... ;-D

I love this little hack. Sure, it may have no practical purpose; but it gave me great joy to see this. I'm still smiling.


It's probably being purchased by someone right now anyway.


I hacked together an encrypted (aes 256) read/write "database" once with the bitty API as the persistence backend.

However, this site disappoints me, it doesn't seem to do anything other than what a data URL can do, except it's vulnerable to downtime because of a centralized website.

Edit: for those of you unfamiliar with what a data URL is. You an store a HTML or image document using a URL like data:text/HTML;base64,hashifystuffhere


Markdown > HTML

Just sayin' ;)


Add the trivial step, then. Markdown to HTML, HTML to data URL.

Or just, you know, email people the content if you can give them the same data anyway.


This is pointless. It's impossible to create two pages that link to each other, for one. Also, as noted, most browsers won't allow URLs greater than 2k in size.


This research is a few years old but, hopefully, things should be even better by now: http://www.boutell.com/newfaq/misc/urllength.html .. Safari, Opera, and Firefox all go over 80k. I found another source for Chrome that says they "could not find any limits on Chrome and Safari".

Nonetheless, it's tricky because you have no idea if proxies in the middle will be able to cope, mobile clients, and all sorts of things.. so you're right in the sense that it's pointless (if you want it to be universally acceptable ;-)).


In my own (very imprecise) testing, Chrome seemed to freeze around 215k characters.

I found that my web server started throwing 413s after only about ~30k characters.


Don't know about the browser, but my work proxy won't allow it.


Clearly this is awesome, I'm curious as to what lead you to build it? Understanding that you weren't solving a 'problem', but you've created something really compelling here.

Care to give a peak into how you came up with it?


A month ago, all the developers in the office spent two days working on interesting small projects, the idea being that at the end of that time we'd have a bunch of cool shippable features. Though it was encouraged to work on useful, sensible things, this was not a requirement.

Anyway, I felt great until 4pm on the Friday, when we presented our creations. Afterwards, I felt flat (as one often does after meeting a deadline or finishing a series of exams). I didn't want the excitement to end.

As I was walking home I had an idea. For some reason I wanted to share thoughts in 72pt Helvetica. I didn't want to broadcast them (I was melancholic after all), but I felt compelled to express them visually.

I began to think about how this might be done. The Web seemed like the obvious platform. I wondered whether it could be done without a database. I remembered something I had heard on a podcast about a site that allowed one play musical notes on a computer keyboard, and would encode these in the URL for easy playback.

This seemed a lot more interesting than sharing my moody thoughts, and now that I had something cool to work on I no longer felt the need to do so anyway.

I think I spent 40 hours working on it that first weekend (yes, I was consumed). I truly believed that I could ship it before showering and leaving for work on Monday! Doing so would have been a mistake – I'm pleased that I spent several weeks ironing out the kinks and integrating with bit.ly and Twitter.


So in effect, you're using bit.ly as a webhost. Url shorteners might not be completely useless after all.


Here's a Python shortcut:

Instead of...

    from base64 import b64decode
    b64decode(foo)
You can do...

    foo.decode('base64')
Encoding works too. As well as zip (foo.encode('zip')).


It is worth noting there's a difference in the base64.b64encode function, vs the default MIME Base64 codec on strings[2].

Per RFC2045[1]:

    (Soft Line Breaks) The Quoted-Printable encoding
    REQUIRES that encoded lines be no more than 76
    characters long.  If longer lines are to be encoded
    with the Quoted-Printable encoding, "soft" line breaks
Due to the insertion of these soft-line breaks, encoding is not the same, as you can verify yourself:

    import os
    import base64
    import unittest


    class Base64Test(unittest.TestCase):

        def test_long_string_base64_decoding_and_encoding(self):
            byte_seq = os.urandom(500)
            mime64_encoded = byte_seq.encode('base64')
            self.assertNotEqual(base64.b64encode(byte_seq), mime64_encoded)
            self.assertEqual(base64.b64decode(mime64_encoded),
                             mime64_encoded.decode('base64'))


    if __name__ == '__main__':
        unittest.main()
Decoding, as you can see above, is fine. This makes a difference when encoding a really long string in an HTTP header.

[1] http://www.ietf.org/rfc/rfc2045.txt [2] http://docs.python.org/library/codecs.html#standard-encoding...


I wonder if the equivalent of a quine for this is possible.



Doesn't work in latest version of Opera: Connection closed by remote server

Check that the address is spelled correctly, or try searching for the site.


Squid too (with default settings).

ERROR

The requested URL could not be retrieved

While trying to process the request:

GET http://hashify.me/IyBIYXNoaWZ5CgpIYXNoaWZ5IGRvZXMgbm90IHNvbH... HTTP/1.1 Host: hashify.me Proxy-Connection: keep-alive User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US) AppleWebKit/534.16 (KHTML, like Gecko) Chrome/10.0.648.205 Safari/534.16 Accept: application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,/;q=0.5 Accept-Encoding: gzip,deflate,sdch Accept-Language: en-US,en;q=0.8 Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3

The following error was encountered:

Invalid Request Some aspect of the HTTP Request is invalid. Possible problems:

Missing or unknown request method Missing URL Missing HTTP Identifier (HTTP/1.0) Request is too large Content-Length missing for POST or PUT requests Illegal character in hostname; underscores are not allowed


For some reason i always thought URLs have a max length of 1024. Is there anything about it in the HTTP standard?


There is no max length in the standard. Of course every browser has a max it can handle. For IE it is 2000 characters.


What about gzipping the content first?


...and with a large library of switchable preset dictionaries for common document scenarios...


I am just coding the same thing right now (began a week ago). Also had the idea to use bit.ly as shortener (because of its api) and make use of multiple shortened links to store the data. Right before looking at HN I was doing some research for a good js compressing algo.

On the one hand i am a bit disappointed (that i am too late), but on the other hand hashify.me is made far better I could make it. Great realisation.


It's interesting that you say that. For some reason I felt under immense pressure to get this out quickly, but I wrote that off as paranoia (should've heeded Kurt's warning).

Now that it _is_ out in the wild, I can't wait to see what people do with it slash build upon it.


Well, it really _was_ out before in many ways, but not in such a nice one (formated html etc.). Btw, do you have a base64 online converter and read my code six days ago? Just "kidding".


embedded images works. what would this be the solution for? http://hashify.me/aW1hZ2Ugb3ZlciBoZXJlIDxpbWcgc3JjPSdkYXRhOm...


What are the differences between this and a data: URI? Just the shortener and that it can use out-of-band Javascript for the editor?


Nice hack, though odd name given that no hashing occurs.


Technically, if you squint hard enough, and assuming that Bit.ly always shortens the same URL to the same code (which on my brief testing it did when I tried the same URL from two different browsers), a one-way hashing function is being created from bit.ly shortened URLs -> the full "hashed" document. Technically. It certainly lacks most of the usual properties we associate with hash functions but the bare skeleton is sort of there. And there's no way to feasibly reverse that hash function algorithmically short of keeping the original input table and querying that, which is how Bit.ly of course works.


bit.ly does not always shorten the same URL to the same code


> Nice hack, though odd name given that no hashing occurs.

When I registered the domain I imagined that state would be stored in a Twitter-style hashbang. Thanks to `history.pushState` and `history.replaceState`, this hack is not required in modern browsers. :)


Base64 is a two-way hash function


"Two-way hash function" is an oxymoron. The fundamental characteristic of a function that makes it a hash function is the inability to (feasibly) reverse it.

Base64 is an encoding.


I agree with your final conclusion, that Base64 is an encoding, but it is also a hash function and hash functions can very well be two-way. In fact, the trivial hash function just maps data to itself, which is trivially reversible. However, we do want our cryptographic hash functions to be trapdoors to be of use.


Trapdoor function means the function is easy to calculate in one direction, but is difficult to calculate in the other direction without knowing a special bit of information that allows this calculation much easier.

What would a trapdoor value be for SHA1 ?

What I can agree is that I would like the cryptographic hash to be a one-way function, yes. But not trapdoor functions, please :-)

(and can you point to the definition of the hash functions that you described ? I'm curious).


> What would a trapdoor value be for SHA1 ? Okay, you got me there. It's been long enough since I've actually used the term that I seemed to have remembered it being the same as one-way. Oops.

As for the parenthetical, I'm not sure I take your meaning properly. If you are asking where I learned that hash functions don't have to be one way it seems to be an odd question, but I just checked Wikipedia and it agrees with me, at least.


"Okay, you got me there." - it was not really intended as a trick - sorry for the wording.

As for the trivial hash - now I scrolled down the page on Wikipedia, indeed. I never thought of it this way. (That an identity function on an integer would deserve to be called a hash function :-)

The part that got me was the initial sentence about hash function converting "large, possibly variable amount of data" into a "small datum". As "large" and "small" are implied to be of different sizes, I glazed over a possibility of identity function there.

Thanks!


Oh, I didn't think you intended to trick me. Perhaps some context that'd help you understand my state of mind when I wrote that: while I may be a dev and hacker professionally and as a hobby, I went to university in mathematics specialising in number theory. I'm expected to know silly things like the difference between "one way function" and "trapdoor function", so it's a touch off-putting when I forget.


I know what you mean. For me it was: "Hm. he wrote that with a good confidence. Which part of my knowledge is wrong or incomplete ?" :-) Kind of the same feeling when the significant other garbage-collects a pen you put on the table just a few minutes ago.


"Two-way hash function" is a contradiction. An oxymoron is a figure of speech that combines contradictory xterms.


So it's a reimplementation of data URIs, except it depends on two different sites being up and responding to replies, so it lacks even the tiny amount of usefulness data URIs have?

I can't think of a single use case where you would go, "Ah ha! Hashify would work perfectly for this!"


One site; once you have the URL you don't necessarily need Hashify to decode it for you. Actually, it seems like Hashify is simple enough that it could be made to work offline using HTML5 without much work.


A similar technique is used already on the website http://www.wondersay.com Here the URL path is the text to animate and the fragment hash stores the settings. Bitly is also used to hide the contents of the URL (and hence the messages).

This is clever, in that the entire content of the website is not stored in a database, but in external links. Obviously the biggest problem with this technique is having bots crawl your site, so Google's #! convention is used.


would be a good text "host" but needs clones, so that when it disappears in a few years I can still easily convert my urls back into the document therein. that's the one problem these text host sites have. they never last. this gets around this by hosting nothing, merely converting, but still.

and using bit.libya. i dont trust it.

isn't this also somewhat censorship resistant. since the hashify url without its bitly can be put anywhere on the web that is writable, thus making multiple copies available in a covert way.


This is going to break in cases where the request line grows above 8k-16k. Many browsers/proxies implement limits on headers/request lines, for good reasons.

It's a very cool idea though.


This is quite clearly covered in the actual document!

> For longer documents, Hashify splits the contents into as many as 15 chunks. The chunks are then Base64-encoded and sent to bit.ly in a single request. The bit.ly hashes contained in the response are then "packed" into a URL such as http://hashify.me/unpack:gYi2Ie,g4fpte. Finally, this URL is itself shortened.


> as many as 15 chunks

Yes, 15 * 2048 = a 30k limit on document size


python> 'a' * 16000

copy that to hashify, and let me know the length of the url you get.


So basically, the URLs are files, and copying/pasting them is like copying/pasting encoded data. It is the same as data: urls actually, except maybe for browser security, which is pretty irrelevant for this anyway.

Actually, now I am wondering if an iframe src could be a data: url in browsers. If so, that could be interesting! Showing content without hitting the server. Probably not though, because of cross-domain security again. Any ideas?


Alternatively, you can use the Data URI Scheme like so:

  >>> "<h1>Hello, World!</h1>".encode('base64').strip('\n')
  'PGgxPkhlbGxvLCBXb3JsZCE8L2gxPg=='
Paste this into your location bar: data:text/html;base64,PGgxPkhlbGxvLCBXb3JsZCE8L2gxPg==

Works in Chrome 10.0


What is the purpose of this? The URL is already pointing to the store - the actual site which hosts the page. Instead now we have a shortened URL which stores the document. So they just took away the distributed nature of the URL and put it all in one store (bitly).


Hashify.me seems to be overloaded for the moment. Nevertheless a brilliant and delightful concept!


> Hashify.me seems to be overloaded for the moment.

Here's what I'm seeing in my browser console:

{ "data": [ ], "status_code": 403, "status_txt": "RATE_LIMIT_EXCEEDED" }

I should have included appropriate error handling for this! Everything except shortening continues to function, though.


This reminds me of the old tiny-url file system: http://tech.slashdot.org/story/05/10/25/0350222/TinyDisk-A-F...


Very cool idea, it's something I can think of a bunch of cool uses for. Will definitely be looking into it.

One downside came when I tried bookmarking with Delicious (hit the url length limit, truncation would break it). But great for shorter content.


Bookmark the bit.ly URL, perhaps?


This is sweet. It would actually be possible to create a database using bitly entirely in javascript. It would be read-only for clients and read/write for webservers. You could even make it ACID compliant. I might have a go.


Careful.

Apache responds with an HTTP code 414-Request URI Too Large once the URI reaches around 8K in length.

Default limits exist in several load balancers as well.


Hashify + pen.io and you've got a great service.

Hashify gets a pretty UI. pen.io removes the need for a DB.


I could see this as a very useful implementation for HTML5/Mobile Web sites.

Consider the user experience for the target site on a mobile platform. You have already loaded the site on your mobile device before even taking action, so when you click the link the response is much faster than requesting the site at the click.


At my workplace, this completely freaks out our corporate proxy, so no go :(


The client side shorten request to bitly.com exposes the bitly credentials.


Par for the course in any JSONP/CORS implementation.


Isn't it a "page" or "document", and not a "website"?


How long before the bit.ly namespace is exhausted?


extremely cool


Finally, an easy and quick way to decode base64 hashes.



That's a feature. You can encode an entire webpage, including JavaScript, and, yes, including alerts.

As long as they don't have user accounts or database access or such, XSS doesn't let an attacker do anything meaningful. It's not weak security, it's just how the site works.

Edit: To point out the obvious, your iframe trickery is not necessary. Script tags are not escaped, nor are event attributes: http://hashify.me/PGRpdiBvbmNsaWNrPSJhbGVydCgnVGhpcyBpc25cJ3...


> That's a feature.

Agreed! One can always link someone to a static HTML page with <script>alert('fu')</script> in the body, but no one would tag that "XSS".

Does hashify.me make it easier to send annoying alert messages to your friends? Sure. Annoying, but no more harmful than sending them to the static equivalent.


Fair enough. It's similar to embedding third-party gadgets for things like iGoogle. However, in practice that content is sanitized.

Note there are risks to hosting arbitrary Javascript beyond stealing cookies. For example, you can steal browser history, discover NAT IP addresses, scan intranet ports, etc. Here's a presentation by Jeremiah Grossman covering some of these attacks: http://www.blackhat.com/presentations/bh-usa-06/BH-US-06-Gro...

Of course, attackers can host malicious content anywhere they control. I could just as easily send someone a bit.ly link to a malicious site I control.


It's also vulnerable to html injection... by design




Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: