
Hash URIs - hardik988
http://www.jenitennison.com/blog/node/154
======
peterwwillis
Does nobody care that this makes the world-wide-web completely pointless from
any perspective that isn't a web browser with JavaScript? I'm going to be
royally pissed if I have to write from scratch an http library in C with
JavaScript support just to make a friggin' scraper script or debug a
website/webserver.

If you want to be annoyingly fancy with the way you deliver content, just do
it to user agents that _support JavaScript_. For any other user agent just
provide the actual content we wanted! To do this, all you have to do is _not
use hash tags_. The URI can remain the same and the application will work just
fine (JS can still check the friggin query uri and do any Ajax trickery it
wants). Sometimes I think webapp designers are just trying to piss hackers
off.

We'll ignore the fact that having to escape the bang is really annoying,
considering the page is useless from the console anyway due to the
aforementioned requirement of JavaScript.

~~~
tptacek
You don't need to embed Javascript to solve this problem, which has confronted
web security products since the invention of "Ajax". Yes, the clientside is
evaluating JS to figure out what to load, but once you know how those loads
work, nothing stops you from simply making the "backend" requests directly.

We can nerd out till the sun goes down on how feasible it is to automatically
figure out what links to load, but we spend the better part of many people's
time every week dealing with this, and it rarely creates a major problem.
Actually scraping tag soup is _by far_ the more annoying issue.

~~~
chc
I'm not sure reading and fully comprehending the flow of the complete
JavaScript source for every site you might wish to scrape is actually easier
than making a WebKit-based scraper that runs load and click handlers.

~~~
mdaniel
Relevant: <http://code.google.com/p/phantomjs/> is a headless browser based on
WebKit

I don't have any hands-on experience with it, but if one were to go down the
path you just described, that project would likely be a great start on that
journey.

------
btipling
At Cloudkick we use the hash bang on our new overview. The hash bang URI
represents the application state or view. It is a UI tool that helps provides
the user with an additional, browser friendly, means to navigate the
application. The information in the hash bang is also private to the account
and irrelevant to the public which can't access the same information. The URI
portion without the hash bang is simply the application's address. The server
doesn't need to know about application's UI, all it does is deliver the
application. Many sites make the server part of the user facing application
with server side templates and scripts but we are moving away from that. The
UI will be in the client and server doesn't need to know about UI state. If
your server is part of the UI then don't use the hash bang convention as the
server can't make sense of it.

~~~
lzm
I don't understand why Cloudkick uses a hashbang instead of just a hash. Is my
overview being indexed by Google?

~~~
btipling
Your overview isn't, but we've built this into our library of tools that we
use for history management, and maybe we will build something like our
provider portal that would be.

------
MatthewPhillips
It's part of 2 general trends that are taking place recently.

1) A move away from server-side programming. Server is now used to serve
static files and json data. Not to dynamically create pages. Application logic
is not done in javascript.

2) The emergence of high-level javascript frameworks that do most of the blunt
work. These frameworks are server-language agnostic and hence do all of the
application scripting in browser. JQuery Mobile and Mobl come to mind here.

------
davnola
Great article. Recommends building on conventional hash-less URIs and using
progressive enhancement e.g onClick handlers to implement hash-bang URIs.

Three particularly interesting ideas:

* sites should support the _escaped_fragment_ query parameter, the result of which should be a redirection to the appropriate hashless URI

* if you’re serving large amounts of document-oriented content through hash-bang URIs, consider swapping things around and having hashless URIs for the content that then transclude in the large headers, footers and side bars that form the static part of your site [can't envisage how this would work]

* "I suspect we might find a third class of service arise: trusted third-party proxies using headless browsers to construct static versions of pages without storing either data or application logic"

~~~
moe
If you're starting on a new app today then I'd suggest to embrace the history-
API instead of entering the world of pain that is hashtags.

<http://caniuse.com/#search=history>

Adoption will sky-rocket when Firefox 4.0 rolls out, as that is the last major
browser to not support it yet.

Oh yes, and then there's IE... This is where my suggestion becomes
opinionated: Screw IE and display a polite message to the remaining IE-
visitors.

~~~
Skalman
Well... I don't see how ignoring IE would change that much for those visitors,
if the pushState is used only as progressive enhancement. I probably wouldn't
even display a message to them.

On the other hand, if dynamic changes are essential, we'll have to support
hashbangs too.

------
diamondsea
I think the whole method that these sites use to deliver content is completely
backwards, at least from the user perspective.

The goal of going to a web site is ultimately to get the content you are
looking for. If you're going to Twitter, it's to read someone's twitter
stream, etc.

The way these hashbang sites function is to display the least-relevant (to the
user) information first, all the chrome and ads of the page. Then, when all
the stuff you don't actually care about is done rendering, only then does it
go and get the content you came to see.

This drives me nuts as a user. It is particularly annoying on slower devices,
such as Edge or 3G smartphones, where you see the lag even more strikingly.

From a user-experience, content-oriented framework, this entire architecture
should be reversed. The URI should point to the content, not the presentation
as it does with Hashbang. The content should be served first, in a readable
format, and then the javascript magic should kick in, download all the chrome
in the background and then update the page once with all the other elements.

If you're going to have your page do a secondary load anyway (as hashbang does
now) you might as well make the content the front-and-center part.

~~~
cabalamat
> _If you're going to Twitter, it's to read someone's twitter stream, etc._

You may be interested in the twitter-like website I'm currently developing. It
currently uses no JS at all (it will do so, for e.g. writing new messages, but
the entire functionality will work without JS).

------
mckoss
I am hearing a lot of vitriol against the use of the hash uri (or hash-bang
uri) pattern. While I agree that it does make a slightly different set of
assumptions about the browser model, I don't think that application developers
are employing it just because it is a " cool new thing".

In many cases, the hash-uri enables a new class of applications and system
architectures that were not previously possible on the web. Naked (hash less)
uri's require that all state transitions round-trip to the server. This isn't
at all desirable if you want to support offline, or high latency (mobile)
clients.

I am in agreement that we want to retain the link structure of the web. But we
also do want to not freeze the application architecture of the web to 1997. I
think this post had some great recommendations about implementing a hash-bang
based site, and still "playing nice" with a diversity of client assumptions.

~~~
wvenable
> I don't think that application developers are employing it just because it
> is a " cool new thing"

Have you seen the Gawker media redesign of lifehacker, Gizmodo, etc? They
appear to be using the hash-bang for all links for no good reason at all. So
yes, some big sites are employing it because it's a cool new thing.

~~~
mckoss
That may be. But it's pretty hard to ascribe _intention_ simply by looking at
their site. While what they are achieving _could_ be implemented w/o the use
of hash-bang URIs, it may be that they have reasons that are not readily
apparent.

Whether you agree with them or not, there was a lot of thought put into the
Gawker redesign:

<http://lifehacker.com/#!5701749>

I do note that they (sometimes) avoid redrawing the right-hand column, when I
switch between Gizmodo articles, since they don't refresh the whole page. Even
though they still have dozens of server round trips for this kind of
transition, they seem to flow in very quickly and the page transitions are
actually quite smooth. This would not have been possible with a standard page
refresh (which would re-anchor the browser view to the top of page, regardless
of the user's current scroll state).

I'm not saying they are taking optimal advantage of the hash-bang pattern, but
it does allow them some user experience optimizations that they could not get
without it.

~~~
wvenable
The links could still be rendered as regular urls and onclick handlers used to
dynamically update the page and set the fragment for bookmarks.

~~~
mckoss
Isn't that what they are doing?

~~~
wvenable
No, all links are hashbanged. Probably simplifies the implementation a bit
because they only need one method of loading pages (Ajax) but I don't think
it's particularly friendly.

~~~
mckoss
I think the two methods result in the exact same thing. Especially since the
crawler won't see any raw HTML pages anyway - all links are hashbanged, and
have to be crawled via the Google parameter rewriting scheme.

~~~
wvenable
Except non-JavaScript (or limited JS) browsers see nothing at all. Graceful
degradation in this situation isn't all that difficult, they just chose not to
do it.

------
svv
_Fragment identifiers are currently the only part of a URI that can be changed
without causing a browser to refresh the page_

This is actually the primary reason for most of the hash and hash-bang uris in
today's javascript-heavy sites. Once all the popular browsers allow changing
URI without reloading the page, the whole issue will become irrelevant.

I am actually a big fan of HTTP and REST (the real one, with HATEOAS), but
browsers have a long history of making it hard to write RESTful web sites.
HTTP methods had long been restricted to GET/POST; code-on-demand support
before XHR had been very rudimentary (javascript was almost useless, and I
don't even want to mention java applets); browser addressing conventions led
to horrible (from the REST point of view) practices like redirect-after-POST
etc.

------
rwmj
Good article. Although instead of writing "A browser that doesn’t support
Javascript ..." (implying some failure or deficiency in the browser) he might
have written "For users who selectively disable Javascript ..."

~~~
snorkel
There's also millions of low-end mobile devices that can reach the web but
don't support javascript.

------
Sukotto
A lot of visually impaired people disable javascript too since it usually
doesn't play nice with the screen reader.

~~~
jonespen
According to this survey (<http://webaim.org/projects/screenreadersurvey3/>),
98.4% had enabled js.

------
jdavid
There are a number of data points these days behind the social wall that will
never have access to a crawler.

------
amitraman1
Keep it simple.

Hopefully they did not use AJAX to serve static content.

