
Browser History (2013) - mattkahl
https://madhatted.com/2013/6/16/you-do-not-understand-browser-history
======
wtbob
I don't get what he doesn't understand: when the user goes back, the browser
is supposed to show the document he saw when he left, as it was when he left
it (so long as that view hasn't expired out of the browser's cache
completely). This is intuitively what a user expects, right?

> Note: if history list mechanisms unnecessarily prevent users from viewing
> stale resources, this will tend to force service authors to avoid using HTTP
> expiration controls and cache controls when they would otherwise like to.
> Service authors may consider it portant that users not be presented with
> error messages or warning messages when they use navigation controls (such
> as BACK) to view previously fetched resources. Even though sometimes such
> resources ought not to cached, or ought to expire quickly, user interface
> considerations may force service authors to resort to other means of
> preventing caching (e.g. “once-only” URLs) in order not to suffer the
> effects of improperly functioning history mechanisms.

I don't get what he fails to understand: if the back function obeyed
expiration headers, then going back could potentially reload the page,
potentially causing a re-GET or even a re-POST.

~~~
Noseshine
As I read and see this, it is not about normal HTML pages. The issue probably
is more important for SPAs.

For example, you post something that increases your total message counter, and
the URL is changed (single page app: the page is not reloaded, only modified
by Javascript), for example showing the new message (think "forum" as an
example). If the user clicks "Back" they may get the page-state with the
previous counter value, confusing the user. Ideally you would want the counter
to be updated.

One can construct similar scenarios for regular web apps/pages where a new
page is loaded, but I think the caching behavior is more out of line for SPAs.
It comes down to the web serving as app platform and not just as "HTML pages
for reading". You don't expect - _in an application_ (compare behavior of a
regular non-web application) - that when you go back the GUI may not reflect
current state, in an app it always does, even if you go back to a previous
form.

I think there will always be misunderstandings because of people who always
think of the original web, a bunch of linked pages, and others who see it as
(also) an application platform. The requirements needed for either often are
fundamentally or sometimes subtly different. Is it ideal that we have a
platform that is supposed to do such extremely different things all in one?
Maybe not, but I think we actually managed surprisingly well to reconcile two
very different concepts and requirements in the "web platform" thus far.

I think it would be valuable for discussion when authors a) state if they are
talking about one or the other kind of web platform use b) consider that the
other one that they are not talking about also exists and has equal value.

~~~
zeveb
I would expect that if one _must_ write a SPA then one will hijack the Back
button, no? So one could implement whatever logic makes sense within the SPA.

But really, one should avoid SPAs like the plague. Write web sites consisting
of web pages. Only if the site or pages need more should one add JavaScript,
and only if it's insufficient to purpose should one write an SPA (as an
example, I have a hard time envisioning Mattermost, IRC or Slack working as
web pages).

~~~
Noseshine

        > one should avoid SPAs like the plague
    

Why? I don't see any basis for such a statement, but I'm more than willing to
learn.

~~~
zeveb
SPAs are basically an exploit, taking advantage of a dynamic document model
and a language intended to layer a little bit of behaviour atop that model,
and using it instead to deliver entire applications. It's a little like
noticing that sed is Turing complete, and thus writing a text editor in sed.

The Web is supposed to be about resources and links between them; that's what
REST is about (in a very real sense, REST is the driving principle behind the
Web). In pretty much every case, those resources can have an attractive,
human-readable HTML representation (but they may of course also have easily-
parsed JSON or high-performance Thrift, flatpack or protobuf representations).
One should be able to use a browser as one's user agent to use a web site
built on REST principles to perform any sequence of operations that web site
makes available.

Now, one might provide quite a lot of JavaScript over top of that REST
interface, in order to provide a more user-friendly experience (one could
imagine HN using JavaScript to allow inline commenting, while still preserving
the ability to POST). In general, one should provide the plain resource-
oriented interface first, and only add JavaScript later. For one thing, this
helps one think clearly about the API; for another, it's a lot easier to take
a clean REST API and use it from JavaScript than it is to take a purpose-built
SPA and try to turn it into a proper REST system.

Now, there do exist some systems which really, _really_ don't make a lot of
sense as primarily REST apps — in my post, I used the example of chat apps.
Certainly, they should _have_ REST APIs, but honestly I can't see most people
wanting to use them that way (although … it may be convenient in order to
avoid distraction). For that sort of app, it's conceivable that one might
properly begin with the SPA. Another example might be the 2048 game, or
similar things. As much as I'd prefer a native gtk+ game, it is true that most
people haven't yet upgraded to Linux; it's also true that as terrible as the
JavaScript privacy & security stories are, they are much better than the
native app privacy & security stories.

All those are good reasons to write an SPA.

But if one is building a brochure site, or a blog, or a magazine, or a trip
planner, or an e-commerce site, or pretty much anything that most of use —
then starting with an SPA is _wrong_. All of those should be modeled as
resources and state first, with UI added later.

An example would be that I discovered it's impossible to checkout using
Target's website without JavaScript; I was just trying to send a gift
certificate to a friend, and now Target isn't seeing that money, due to a poor
design decision. There's absolutely no reason that Target should require me to
enable JavaScript in order to POST them a credit card number and a quantity.

~~~
Noseshine
You are not giving reasons against SPAs which is what you said earlier, you
are giving reasons for using the right tool for the job. Which nobody
disagreed with from the beginning, not me anyway.

~~~
zeveb
> You are not giving reasons against SPAs

I think being inefficient ('writing a text editor in sed') and impeding doing
things the right way ('it's a lot easier to take a clean REST API and use it
from JavaScript than it is to take a purpose-built SPA and try to turn it into
a proper REST system') are reasons not to write SPAs, no?

And the presence of reasons _to_ write proper HTML/HTTP apps implies why one
shouldn't write improper SPAs, no?

------
zhoujianfu
The way I've always wished the back button worked is as though I had just
opened a new tab (instead of clicking a link/submitting a form), and now I've
just closed that tab and once again see the original tab I'd left.

I assume this is actually a big reason people use (so many) tabs... the back
button doesn't work right!

Sigh!

~~~
KJP191
It used to work like that in an old version of Opera back in the day (perhaps
Opera 5.0? I think it was the same version that introduced gestures),but a
bunch of sites saw it as insecure and blocked Opera users from their sites so
they backed down and switched to the standard method.

------
jimjimjim
I generally want a back button that rewinds time (server side be damned).

usually I'll right-mouse-button click + open in new tab on any link instead of
just clicking on it.

Eventually, like garbage collection, I'll stop, go back to the start of the
tabs and close a bunch of them, then go back to what i was looking at.

And if I'm on a site that doesn't allow right clicking on a link? well the
site had better be important otherwise it's gonna get closed.

~~~
daurnimator
Use middle click (or for those without a middle mouse button on a computer
they can't reconfigure, use ctrl/cmd + click)

~~~
Dylan16807
Sometimes middle click gets swallowed by bad code.

But more importantly, you can't use either method if the link is a javascript
function.

------
panic
You can use the pageshow event to detect when the browser navigates back in
history to a page in the back/forward cache:
[https://developer.mozilla.org/en-
US/docs/Web/Events/pageshow](https://developer.mozilla.org/en-
US/docs/Web/Events/pageshow)

More generally, you shouldn't make a "single-page app" unless you really need
to. The web is designed for navigating between multiple documents. Browser
features like history and bookmarks will work better if you stick to the
standard behavior.

------
spotman
> Today, the only option for ensuring an XHR request is made when the user re-
> visits a page via the back button is to (1) add an unload handler then (2)
> use cache busting.

I'm sure there is exceptions, but in general, the last thing I want my browser
to do when i press back, is to start making requests. I expect the requests to
have been already made.

~~~
jacquesc
The result of this thinking is when I press the back button, I have to refresh
the page manually because the data is stale.

My daily use case for this is Github Issues. Simple repro: Click an issue,
make a change, press back.

Now the list view is out of date and doesn't show my changes. So I have to
refresh.

I think a background XHR request is the best approach (until browsers fix this
issue for real). On page load: pull latest from server (if network is down,
then no-op). If no changes, then don't change. If there are changes then
replace inline without a new page reload.

~~~
daurnimator
No!

If I click back I want to see what I saw last time. e.g. I might have seen 3
news articles I was interested in, I click though and read the first one. When
I click back I want to see the exact same list as before, and be able to read
the next article I saw; not _new_ articles that have been written while I was
reading the last one.

You can s/news article/github issue/ and my response is the same.

------
mixonic
Author here, happy this popped up and to see the HN community thinking through
it. A few people have brushed off what I sketched as uninteresting and don't
see any issues. I'll try to explain it another way (with three years
reflection to help).

Single page applications are now quite popular. _Most single page apps use a
different definition of "back" than browsers do_, and there are times when the
two treatments conflict.

Many, or most, use a local in-memory database to keep track of information
without going to the server. They update that in-memory store as you make
changes. For example you see a list of names: Mary, Robert, John. You click
Robert and edit the name to "Rob", the name auto-saves. Then you click "back".

Because single-page apps control "back" when in the SPA, they do what most
developers want. They return to a semantically correct page, showing Mary, Rob
(just edited), John. Tons of apps do this. _This is not what the browser
does_. The browser, if following the "back" specs, would show the out-of-date
names of Mary, Robert, and John.

The theoretical conflict can also become practical. Think through this flow:

* Visit /names

* AJAX for GET /api/names

* See Mary, Robert, John

* Edit Robert's name to Bob, autosave

* AJAX for POST /api/name/4 with the new name Bob

* See Mary, Bob, John

* Click on a link, lets say to Mary's website URL

* Mary's website, new domain, loads.

* ...click back

The SPA loads up, and attempts to GET /api/names. However, the bfcache is at
play since the native "back" behavior is running. So the stale API response,
with the original names Mary, Robert, John is returned. _The list of names on
the screen is DIFFERENT than what the user saw after they edited_.

Additionally most SPA apps presume AJAX calls return accurate data, however
here the names are _not_ the names currently in the database. They are only in
the bfcache. You can imagine, with more complex data, ways this can cause
complex and unforeseen failures.

This is a very poorly understood corner of JavaScript development even today.

[edit]: formatting

~~~
Manishearth
> So the stale API response, with the original names Mary, Robert, John is
> returned.

This seems like a bug -- if you click back, it should take you to the page you
saw before, not an earlier version of it.

AIUI the bfcache doesn't "remember and replay" API requests; it just caches
the entire DOM and JS state.

Do you have something that can demonstrate this behavior?

> Because single-page apps control "back" when in the SPA, they do what most
> developers want. They return to a semantically correct page, showing Mary,
> Rob (just edited), John. Tons of apps do this. This is not what the browser
> does.

This is not necessarily what users want (as evidenced by discussions on this
post). Many people want the old page, especially if there's information there
(form fields, or other state) that might have gotten lost by a misclick. As
someone else noted here, the "back = reload page" behavior can be emulated in
the bfcache world by back+reload, but if you don't have a bfcache you can't
emulate the "don't lose state" behavior that the bfcache gets you.

It seems like a new meaning is being shoehorned into the "back" button, and
then you're complaining it doesn't work.

~~~
mixonic
Sorry :-( You are correct. What I described is not be behavior of the bfcache,
it is the network behavior described under the heading "In practice". And
there is a link there to a server that can help you play with the behavior.

Apologies for using the wrong term and causing confusion.

> This is not necessarily what users want (as evidenced by discussions on this
> post).

HN not being representative of an average user aside, I don't disagree. My
point is that there are two different expectations of what should happen and
they can conflict and cause errors.

> you're complaining it doesn't work

I'm really sad you got that impression. I'm fascinated and think this an
architectural problem of the web. My post is an attempt to describe the issue
and raise awareness.

~~~
Manishearth
>it is the network behavior described under the heading "In practice". And
there is a link there to a server that can help you play with the behavior.

Right, except requests aren't being made there, it's just that the devtools
seem to say that they are. The scenario you gave as an example (with the
/names API call and whatnot) isn't possible AFAICT. Maybe I misread something?

> My point is that there are two different expectations

Yeah, agreed, there are two expectations here.

APIs that let you explicitly invalidate bfcache entries (something on
pushState() maybe?) or detect bfcache loads would be interesting, and would
let SPAs deal with this problem, perhaps.

------
rosstex
On the desktop, I like having the option to see what a page just looked like,
without reloading any data.

If I hit the back button, it's usually because something on the previous page
caught my attention and I want to find it again. If I want to reload a page,
I'll refresh or click on the site logo (in the case of getting to the root of
the site).

~~~
fredleblanc
I hate when I go back and things change. I went back for a reason!

But I'm also of the school that wants View Source to view the source that was
just downloaded, not re-download a new copy of the source.

~~~
cmg
I've never understood why View Source would do anything but show me the source
of the page I'm looking at - not the source of the page as it is "now," which
could be seconds, minutes or hours later. If I'm looking at a page, the
browser has or could have a copy of the original stream sent from the server,
why not just display that?

~~~
userbinator
What browser(s) are you using that have such behaviour? Is this a new
"feature" in the very latest versions? I don't think I've ever seen View
Source make another request on Chrome, Firefox, IE, or Opera.

~~~
cmg
Chrome does.

"Yes, when you "view source", you're really opening a new tab that opens the
page again and displays the source rather than renders the page."

[https://bugs.chromium.org/p/chromium/issues/detail?id=4650#c...](https://bugs.chromium.org/p/chromium/issues/detail?id=4650#c2)

------
Manishearth
Note that the behavior described in the post is per spec -- the cache spec is
a red herring and doesn't apply here. The web specs override the HTTP spec in
various spaces.

The relevant spec is at
[https://html.spec.whatwg.org/multipage/browsers.html#the-
ses...](https://html.spec.whatwg.org/multipage/browsers.html#the-session-
history-of-browsing-contexts)

Note:

> An entry with persisted user state is one that also has user-agent defined
> state. This specification does not specify what kind of state can be stored.

> User agents may discard the Document objects of entries other than the
> current entry that are not referenced from any script, reloading the pages
> afresh when the user or script navigates back to such pages. This
> specification does not specify when user agents should discard Document
> objects and when they should cache them.

and from
[https://html.spec.whatwg.org/multipage/browsers.html#travers...](https://html.spec.whatwg.org/multipage/browsers.html#traverse-
the-history)

> If entry no longer holds a Document object, then navigate the browsing
> context to entry's URL

and

> If entry has a different Document object than the current entry,

> ...

> Make entry's Document object the active document

\------

Browsers try to treat the back button as if the user had never left the page.
So the XHR requests aren't re-made because the page simply isn't reloaded,
it's just made active.

The fact that Chrome says "from cache" might be a bug here, but what the
devtools show isn't visible to JS/etc, so this isn't a compatibility issue.
AFAICT Chrome and Firefox (and presumably Safari) behave the same here, except
from a difference in how the bfcache is invalidated. (chrome seems to
invalidate when the domain changes).

I'm not clear why all of this is a problem though. If the page is reloaded,
it's reloaded. If it's loaded from the bfcache, it's as if it was never
unloaded (almost the same as the user switching a tab and coming back, except
of course JS was suspended). Both behaviors seem ... fine for a webapp?

------
jlarocco
It's hard to take this seriously because the page itself does not work
correctly w.r.t. the back button.

For example, if I click through to the http 1.1 spec in the first paragraph,
then hit "Back" I see the scrollbar shrink as new content is loaded and
uBlock's block count increases as new content loads.

If I scroll to the bottom of the page, click a link, and then click back, I
don't even go back to the same spot - I'm at the end of the article and
content loads below it...

My expectation as a user, regardless of the spec, is that I should see exactly
what I saw when I was just on the page. Render to a bitmap, and when I click
"back" display the bitmap. If going back to the page requires any network
requests then the page is doing it wrong.

The exception that proves the rule would be streaming content.

------
gayprogrammer
This isn't an all-or-nothing issue. The author generalizes that developers and
users today use browsers/html differently than they did before, but what I'm
hearing is actually a new way to use the word/button for "(go) back".

The back button has not changed functionality--it still works as expected on
all non-webapp webpages.

From my perspective, I think the author should be asking for browsers to
implement a new button that follows his desired "load previous URL" behavior.

------
eridius
This article no longer reflects the behavior of Safari at least as of version
10 (the version that ships with macOS Sierra). In my tests, the second version
of the page that attempts to bust the bfcache behaves identically to the first
version, i.e. the bfcache is not in fact busted.

------
rob74
bendmebackmeanywayyouwantme? That rang a bell... but according to
[http://www.azlyrics.com/lyrics/garbage/ithinkimparanoid.html](http://www.azlyrics.com/lyrics/garbage/ithinkimparanoid.html)
it should actually be bendmebreakmeanywayyouneedme ;)

------
ashitlerferad
[2013]

