

Broken Links - martey
http://www.tbray.org/ongoing/When/201x/2011/02/09/Hash-Blecch

======
zemaj
Despite all the FUD around hashbangs, the genuine problem I see with them is
that they optimise for internal page loads, not the entry into a website. For
example with hashbangs, requests to twitter when logged in go like;

1) HTTP GET <http://twitter.com/some_account> [~500ms for me]

2) 302 redirect -> HTTP GET <http://twitter.com/> [~600ms for me]

3) HTML tells browser to download some JS -> HTTP GET bundle.js [~500ms for
me] (concurrently here we start getting CSS)

4) JS reads hashbang & request actual data -> HTTP GET data.json [~500ms for
me]

... only after about 2 seconds can we start(!) rendering data. Now there's
about another 2 seconds for all json data & CSS calls to complete. It takes
upwards of 4 seconds for a twitter page to render for me (the Load event lies
as it fires well before actual data shows. Try it yourself with your favourite
browser inspector).

When not using hashbangs, a single HTTP request can get all the data for the
page and start rendering it. One blocking CSS call (possibly cached) is all
that's needed for styling.

Hence when I see an external link with a hashbang it frustrates me (barely
perceptively) because I know that when I load the page it's going to take a
longer than a normal HTTP request. Significantly longer. While subsequent page
loads are faster, it's not these you want to optimise for if you care about
bounce rates. This issue affects every new link you click into a website, so
it affects an even larger number of requests than normal bounces.

Hashbangs are a good solution to an important problem, but I don't see them as
a tool to build entire websites upon. Fortunately I see the performance issue
as one which will result in people voting with their browsers and choosing
sites which only use hashbangs when they genuinely improve the user experience
- especially since they're easily visible in the url.

~~~
wmf
Basically, NewTwitter isn't a Web site, it's an app and you have to "launch"
it before you can do anything.

~~~
mustpax
But once you do launch it, everything is faster than it would have been if you
were performing full page loads at each step. For sites you "live" in, the
application route makes a lot of sense. This is the way GMail works and people
seem to like it a lot.

Unfortunately, web applications and web pages are growing increasingly
divergent. It is simply not feasible to take the performance of web apps to
the next level without doing away with full page loads. This is why Facebeook,
Twitter et al are going the #! route. That's the cold hard truth.

~~~
roc
Wouldn't js click handlers work just as well?

You follow a canonical link to the resource, get a real page back, with real
links, but js click handlers to enable AJAX-goosed speed for those with
javascript enabled? And given that they imply a fallback for those times when
javascript fails, aren't they actually _better_?

And GMail is a bit different than Twitter. It handles inward-facing data;
content that no-one particularly _wants_ crawled and wouldn't benefit much
from caching.

~~~
jbri
Unless you want your URLs to look like twitter.com/someone#!someone_else,
you're going to have to take the multi-step page load at some point when you
transition from the HTML version to the AJAXy one.

~~~
witten
Not if you use the HTML history API: <http://html5demos.com/history>

Admittedly, you need a modern browser for that. But you can always present
full-page-load HTML to users with older browsers and then provide AJAXy
history-ified goodness to everyone else.

------
pilif
With pushState not widely implemented, you have three choices:

1) don't use AJAX in response to actions that alter the page content in a
significant way. This of course forces page reloads and prevents the cool
emerging pattern that is to not serve dynamic HTML but just have a REST API
and do the rendering client side.

2) you do the ajaxy stuff but you don't touch the URL. This leads to a
nonworking back button and prevents users from bookmarking or sharing links to
specific views. You can work around this google maps style with some explicit
"link to this page" functionality, but I would guess, people just don't get
that.

3) you do the fragment change thing which allows for ajaxy page content
changes but also makes the back button work and keeps links inherently
shareable and bookmarkable at the cost of that one redirect, at the cost of
screen-scrapability and maybe confusing to techies (normal people probably
don't care either way)

pushState can look like a salvation, but keep one thing in mind: to keep the
page working for browsers without JS (and screen scrapers), you will have to
do your work twice and STILL render dynamic content on the server which is
something people are now beginning to try to avoid.

Finally, as pushState is yet another not widely deployed thing, for the next
five to ten years, you would have to do all of this three times: dynamic HTML
generation for the purists. pushState for the new browsers and fragment change
for IE.

Personally, I really feel that fragment change is a good compromise as it
works with browsers and even in IE while still allowing the nice pattern of
not rendering anything on the server and keeping the URLs shareable.

Maybe this current uproar is based on a) techies not used to this (normal
people don't notice) and b) badly broken JS that sometimes prevents views from
rendering AT ALL, but this is not caused by an inherent problem with the
technology: if I screw up the server side rendering the page will be as empty
as it is if I screw up on the client side.

~~~
othermaciej
pushState with non-hash URLs doesn't require you to do server-side HTML
generation. You can just send a stub page which looks at the URL and loads the
right data, just as with hash URLs. To deploy it incrementally, you only
really need one code path with a slight fork depending on whether the current
URL contains a #! and whether the current browser supports pushState.

~~~
pilif
I know, but browsers without JS need the server side generated content for
that URL or the original complaints just arise again (empty page, albeit with
a different URL now)

------
othermaciej
HTML5 "AJAX History", also known as History.pushState, can solve this problem.
It allows a website to update its contents with AJAX, but change the URL to a
real URL that will actually retrieve the proper resource direct from the
server, while maintaining proper back-forward navigation.

See <[http://dev.w3.org/html5/spec/Overview.html#dom-history-
pushs...](http://dev.w3.org/html5/spec/Overview.html#dom-history-pushstate>);
for spec details.

It's in Safari, Chrome and Firefox. While Opera and IE don't have it yet, it
would be easy to use conditionally on browsers that support it. I'm a little
surprised that more sites don't use it.

EDIT: This chart shows what browsers it will work in:
<http://caniuse.com/history>

~~~
thomas11
It's really great that in a few years, browsers will support a new AJAX
technology that solves this problem that we wouldn't even have with sane,
traditional URL schemes.

~~~
othermaciej
Maybe you're trying to be snarky, but I'll choose to take your comment
seriously.

The AJAX approach to Web apps does provide a genuine user interface benefit. A
full page load is very disruptive to application flow, and being able to have
new data appear without incurring that penalty is great. Most of the time you
only need to load a little bit of data anyway, and it's wasteful to reload all
the markup that wraps it.

AJAX solves that problem, but it creates the new problem that your address
field no longer properly reflects the navigation you do inside a web app. #!
URLs are one approach to fixing it, and pushState will do even better. At that
point, the user won't even have to notice that the Web app they use is built
in an AJAXy style, other than the navigation being smoother and faster.

~~~
thomas11
"Most of the time you only need to load a little bit of data anyway" - that's
highly questionable as a general statement. In a rich UI like GMail, yes. But
in examples like the new Lifehacker, you load a whole story, yet its locator
is behind the hashbang.

Not every website is a web app. Just show one article or item or whatever the
site is about under one URI.

~~~
robryan
Lifehacker kind of looks nicer only loading reloading the story and not the
whole page. I gives things an application feel rather than a collection of
pages and saves a heap of extra processing, why run the code again to generate
a header and footer and side bars constantly when the version the user is
seeing is perfectly up to date.

~~~
joelanman
The experience is slicker - if you run a search on lifeHacker, you can click
through and browse the results without affecting the rest of the page
(including the list of results). With traditional page refreshes this would
not be possible.

------
bruceboughton
Isn't the underlying problem that web applications are often displaying
combinations of content that doesn't have a natural URL?

Take New Twitter, for example. If I click on a tweet in my stream, it shows
related tweets. If a drill down a few of those, at some point it becomes
impossible to represent the address of the current state in a sane manner.

I think URLs are particular to the web (desktop apps don't have them) because
the web is traditionally about content. Web applications are increasingly
breaking that. Perhaps web applications and URLs don't go together all that
well.

Don't get me wrong--I love URLs, and it's crazy for content sites like
Lifehacker to break them for so little benefit. But maybe the reason for this
hashbang trend is that URLs aren't expressive enough for some of these sites.

~~~
prodigal_erik
In that case "web application" is a misnomer. If the current state has no
natural URL, it's not a legitimate part of the World-Wide Web. Instead the
authors are tunneling a proprietary protocol over AJAX to carry opaque content
to a single-purpose GUI app, just like all the terrible client/server apps
from the 90s only slower.

~~~
vannevar
_If the current state has no natural URL, it's not a legitimate part of the
World-Wide Web._

But most of the popular content accessible via the web now fits this
description. Look at Google's own homepage, it's a complex Javascript
application that's completely opaque. @bruceboughton is right, the problem
isn't that people aren't respecting the WWW specification, it's that the
specification is no longer adequate to describe what the Web has become.

~~~
prodigal_erik
Google has competent web developers who practice progressive enhancement.
Their search form and results have stable (even sensible) URLs and are
perfectly usable without trusting their js.

~~~
vannevar
Sure the application has a URL. But once I load it, it constructs what I see
on the page on the fly. I can turn Javascript off and load an alternate page,
but if I leave it on I'm loading an application that's every bit as opaque as
Flash.

------
Isofarro
Ran into another interesting shortcoming of hash-bang URLs last night looking
through my referrer log. Loads of referring URLs of <http://gawker.com/> and
<http://kotaku.com/> to my blogpost. But no mention at all of my blog-post or
a link to it on the homepage.

First I thought they were referrer log spamming, then it dawned on me that
fragment identifiers get stripped out of HTTP referers, so making hash-bangs
useless as a means of joining up distributed conversations on the web.

Somewhere on those two Gawker media sites there's a conversation going on
about the use of hash-bangs. But nobody outside knows about it. It's a big
black hole.

------
Bockit
Can't it work both ways? Serve the #! links and provide canonical content
located at the (almost) same uri sans #!.

If you visit <http://mysicksite.com/article/1> javascript changes all the
links to the #! format. Then when the user clicks the links they enter #!
land.

Now the user copies a link from their address bar and puts it into the wild.
Someone gets that link, <http://mysicksite.com/#!/article/1>, and visits it.
Rewrite with htaccess or whatever method you employ to serve the content at
<http://mysicksite.com/article/1>, using javascript to change all the links to
the #! format.

I posted this in the reddit thread about the Gawker/lifehacker problems
recently, but was too late for anyone to really give me a response. For those
of you that have worked with these kind of systems before, would this solve
the problem the original link was describing?

EDIT: Ahh I think I get the problem now, of course after I post it. Server
doesn't get the data from the uri trailing the #! I think?

~~~
s0urceror
That is, indeed, the crux of the problem. Anything after the hash is client-
only.

------
jvdongen
[EDIT: never mind, missed this response, similar in style but 2h earlier ...
<http://news.ycombinator.com/item?id=2197064>]

May be I'm missing something, but it seems to me that there is a way to have
your cake and eat it too in this case.

Say we have a site with a page /contacts/ which lists various contacts.

On this page there are completely normal links like '/contacts/john/', each
link preceded by/wrapped by an anchor tag - <a href="john"> in this case.

If you visit this site without javascript enabled (e.g. you happen to be a web
crawler), you just follow the links and you get just regular pages as always.

If however you've javascript enabled, onclick events on each url intercept a
click on a link and fetch just the information about the contact you clicked
on (using an alternate url, for example /contacts/john.json), cancels the
default action and (re)renders the page.

Then it does one of two things: \- if pushState is supported it just updates
the url \- if pushState is not supported it adds '#john' to the url

If someone visits '/contacts/#john' with javascript enabled, /contacts/ is
retrieved and then john's data is loaded and displayed.

If someone visits '/contacts/#john' without javascript enabled, he gets the
full contact list, with the focus on the link to john's page, which he can
then click.

By using this scheme: \- search engine and other non-javascript users can
fully use the site and see completely normal urls \- XHR page loads are
supported \- XHR loaded pages don't break the backbutton \- XHR loaded pages
are bookmarkable \- Bookmarks to XHR loaded pages are fully shareable if the
recipient has javascript enabled or pushState is supported, and at least not
totally broken if not.

The only drawback I can see is the 'sharing bookmarks with someone who has no
javascript support' issue - is that a real biggie? In addition of course to
the 'made error in javascript, now all stops working' issue - but that is
something that has not so much do with the #! debate as well as with the 'is
loading primary content via XHR a good idea' debate.

To me it seems that current users of the #! technique have just gone overboard
a bit by relying _only_ on the #! technique instead of combining it in a
progressively enhancing way with regular HTTP requests.

------
aamar
The problem in this situation is that you have a smart technical person
arguing for technical purity, while at the same time (seemingly) ignoring the
mostly non-technical considerations of user-experience and economics.

Yes, the old, conservative model of HTML is very simple, but when people use
AJAX well, the user experience is enormously and materially improved. We're
still early in the development of this medium, and many people will do it
wrong. But even the people who do it right will probably seem inelegant and
kludgey _by the standards of the old model._

And yes, you can get both AJAX and clean URLs via (still poorly-supported)
HTML5 History API and/or other progressive enhancement methods, but these may
require a significant amount of additional effort. Maybe worth it, maybe not.

This topic reminds me of when sound was added to movies. "Tight coupling" and
"hideous kludge" sound a lot like the arguments that were made against that
too. The conventional wisdom was to make your talkie such that the story
worked even without sound; one can still sometimes hear that, but it isn't, I
think, a standard that we associate with the best movies being made today.

------
nostrademons
It's not really that bad. The people using hash-bangs are following a spec
proposed by Google to make AJAX webpages crawlable:

[http://code.google.com/web/ajaxcrawling/docs/specification.h...](http://code.google.com/web/ajaxcrawling/docs/specification.html)

So when you see the lifehacker URL in the article, you know that there's an
equivalent non-AJAX URL available with the same content at:

[http://lifehacker.com/?_escaped_fragment_=5753509/hello-
worl...](http://lifehacker.com/?_escaped_fragment_=5753509/hello-world-this-
is-the-new-lifehacker)

There's no need to execute all the JavaScript that comes back from the server
- if they're following the spec, all you have to do is escape the fragment and
toss it over to a CGI arg.

Another option is progressive enhancement, where you make every link point to
a valid page and then add onclick event handlers that override the click event
to do whatever JavaScript you want it to. I think this is a far superior
option in general, but it has various issues in latency and coding complexity,
so a good portion of web developers didn't do it anyway.

~~~
brown9-2
But as Tim says, the spec proposed by Google is only meant to fix some
problems (can't be searched by search engines) caused by using this URL
scheme. It isn't meant to be a one-guide-fits-all approach making AJAX content
addressable.

In other words the spec treats one of the symptoms, not the original problem.

------
vanessafox
I posted more as a comment on the original story, but I have covered this
issue in depth (from when Google initially proposed it, to when it was
launched) here:

[http://searchengineland.com/google-proposes-to-make-ajax-
cra...](http://searchengineland.com/google-proposes-to-make-ajax-
crawlable-27408)

[http://searchengineland.com/googles-proposal-for-crawling-
aj...](http://searchengineland.com/googles-proposal-for-crawling-ajax-may-be-
live-34411)

[http://searchengineland.com/its-official-googles-proposal-
fo...](http://searchengineland.com/its-official-googles-proposal-for-crawling-
ajax-urls-is-live-37298)

Of course, a better solution is some type of progressive enhancement that
ensures both that search engines can crawl the URLs and anyone using device
without JavaScript support can view all of the content and navigate the site.

------
rushabh
I can't understand how hard would it be for someone writing a crawler to
replace a hashbang (#!) with _escaped_fragment_

For developers of AJAX apps it: 1\. Improves productivity 2\. Improves user
experience 3\. Is more efficient on the server as it prevents a lot of
initializing code.

I think the old school needs to wake-up a bit!

------
siddhant
Facebook had an outage some time back (I think this one -
<http://www.facebook.com/note.php?note_id=431441338919>), and when everything
got back to normal, the hash-banged URLs were gone. Was it related?

~~~
robryan
They are still there in IE, so I guess they have started using push state
where available.

------
alexkearns
Yet another annoying pontificating article about hashbangs. Why can't people
accept that there are more than one way of doing things on the web.

Just because you don't like using hashbangs does not mean no-one else can.

Sure, use of hashbangs might make seo of your site harder. Yes, it might make
it harder for hackers who want to do curls of your site's pages. But maybe
this is not your aim with your site.

Maybe you want to give your users a slicker experience by not loading whole
new pages but instead grabs bits of new content.

The web is a place for experimentation and we as hackers should encourage such
experimentation, rather than condemning it because it does not fit with how we
think things should be done.

~~~
andolanra
A while back, there was this pie-in-the-sky idea which was really interesting
but not too practical, called Semantic Web. It didn't really pan out because
it turns out that annotating your sites with metadata is boring and tedious
and nobody really liked to do it, and anyway, search and Bayesian statistics
simulated the big ideas of Semantic Web well enough for most people.

The ideas behind it still stand, though, in the idea of microformats. These
are just standardized ways of using existing HTML to structure particular
kinds of data, so any program (browser plug-in, web crawler, &c) can scrape
through my data and parse it as metadata, more precisely and with greater
semantic content than raw text search, but without the tedium that comes with
ontologies and RDF.

Now, these ideas are about the _structured exchange_ of information _between
arbitrary nodes on the internet_. If every recipe site used the hRecipe
microformat, for example, I could write a recipe search engine which
automatically parses the given recipes and supply them in various formats
(recipe card, full-page instructions, &c) because I have a recipe schema and
arbitrary recipes I've never seen before on sites my crawler just found
conform to this. I could write a local client that does the same thing, or a
web app which consolidates the recipes from other sites into my own personal
recipe book. It turns the internet into _much_ more of a _net_ , and makes
pulling together this information in new and interesting ways tenable. In its
grandest incarnation, using the whole internet would be like using Wolfram
Alpha.

The #! has precisely the opposite effect. If you offer #! urls and nothing
else, then you are making your site harder to process except by human beings
sitting at full-stack, JS-enabled, HTML5-ready web browsers; you are _actively
hindering_ any other kind of data exchange. Using #!-only is a valid choice,
I'm not saying it's always the wrong one—web apps definitely benefit from #!
much more than they do from awkward backwards compatibility. But using #!
_without graceful degradation_ of your pages turns the internet from
interconnected-realms-of-information to what amounts to a distribution channel
for your webapps. It actively hinders communication between anybody but the
server and the client, and closes off lots of ideas about what the internet
_could be_ , and those ideas are not just "SEO is harder and people can't use
curl anymore."

I don't want to condemn experimentation, either, and I'm as excited as anyone
to see what JS can do when it's really unleashed. But framing this debate as
an argument between crotchety graybeards and The Daring Future Of The Internet
misses a lot of the subtleties involved.

~~~
aamar
Very interesting points, but there are couple of errors which undermine part
of your point: 1. If the application follows the Google proposed-convention or
similar, the crawler doesn't need a full-stack JS implementation; it just
needs to do the (trivial) URL remapping. 2. Nothing in this hash-bang approach
requires a HTML5-ready browser.

~~~
Isofarro
I tried both curl and wget last night (neither of these are HTML5-ready
browsers), and neither of them could get content using the hash-bang URL. They
both came back with an empty page skeleton.

Also, how do you reassemble the hash-bang URL from HTTP Referrer header?

~~~
wahnfrieden
Neither curl nor wget follow the Google convention for handling hashbangs as
suggested by the parent, so I'm not sure what you're getting at with this
reply.

~~~
Isofarro
Hash-bang URLs are not reliable references to content - that's what I am
getting at. Curl and WGet are perhaps the most used non-browser user-agents on
the web. And both of them are unable to retrieve content at a URL specified by
a hash-bang URL.

In this context hash-bang urls are broken.

~~~
aamar
I'm sorry if I implied that curl/wget handle this already. However, they could
handle this with a very small wrapper script, maybe 3 lines of code, or a very
short patch if the convention becomes a standard. That's not nothing, but it's
maybe 7 orders of magnitude lighter than a full JS engine, and it's small
anyway compared to the number of cases that a reasonable crawler needs to
handle.

Also, with that wrapper or patch, curl & wget will still not be remotely HTML5
ready, which I hope demonstrates that HTML5 is not a requirement in any way. A
single HTML5-non-ready browser that can't handle this doesn't mean therefore
that HTML5 is a requirement.

------
garrettgillas
The point of mainstream sites indicating that the page has ajax with the URL
path is to tell search engines. I have a feeling that what the author doesn't
get is that it is very hard for search engines to tell the difference between
ajax pages, static pages, and spammy keyword stuffed pages.

To me, it seems that Google recommends indicating ajax content in the path in
the same way that our government issues concealed weapon permits. Yes it okay
to have concealed content that can loads on the fly as long as you are very
clear of your intentions. Once again this is a usability issue that wouldn't
be an issue if it weren't for spammers.

------
zachbeane
This rant would be more effective and persuasive if also directed at the
Google engineers who made this hashbang style pervasive in Google Groups. I
didn't think it would be possible to get deep links to old articles even
_worse_ than before, but they managed it.

------
il
It's interesting how many upvotes this is getting in a very short time.
However, I don't think the average Twitter user cares about performance and
URL elegance, so I doubt Twitter will change anything.

~~~
jamesjyu
I have seen performance issues and outright broken behavior with Twitter's
hashbang ajax loading scheme. In that respect, regular users will care (they
just won't necessarily know what is causing the issue).

------
zaius
I think people are missing a huge benefit of the hashbang syntax: readable and
copy/paste-able URLs. Without them, it's impossible to have an ajax
application with a decent URL scheme.

~~~
masklinn
You don't need the hashbang for that. You never did. Hashbang only tells
google "munge around this shit to get an actual page".

~~~
zaius
I'm curious what the other solutions are then. The only one I can think of is
the History.pushState, and that's only supported in newer browsers.

Let's say I'm writing a web based word processor, and a user clicks on a
document. I want the URL to be a reference to that specific document. The only
way to change the URL to be specific without requiring a whole-page refresh is
to use the hashbang syntax.

~~~
masklinn
> I'm curious what the other solutions are then.

The hash without the bang. It's only been done for about 10 years. You can put
whatever you want after the hash. It's up to your application to decide the
meaning of it.

For an example, see the OSX SDK documentations:
[http://developer.apple.com/library/mac/#documentation/Cocoa/...](http://developer.apple.com/library/mac/#documentation/Cocoa/Reference/Foundation/Classes/NSString_Class/Reference/NSString.html)

------
jcfrei
Just a thought - but could a lot of people complaining about hashbangs still
be browsing the web with lynx?

------
dtby
Hi, HTML/HTTP are the second worse application delivery platform available.
Try not to be shocked.

Sorry, your other choice was #1.

