Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Why does 'View Source' issue a new HTTP request?
173 points by koolba on Dec 19, 2016 | hide | past | web | favorite | 93 comments
I've noticed that both Firefox and Chrome issue a new HTTP request when you view the source for a web page that you've already loaded. It's particularly annoying when the page itself is slow to load or if it won't load at all.

Why is that? Wouldn't they have the existing source for the originally received page cached already? Is it based on Cache-Control headers?

This has been on my mind for a while (usually comes up when looking at what's behind slow web apps) and came up again with Piwik on the front page[1]. Their website was semi-down (HN hug of death) but eventually loaded. I wanted to see what their GA equivalent tracking code looks like but the page failed to load as rather than showing the cached copy, it tried to fetch a new one.

[1]: https://news.ycombinator.com/item?id=13210195




It's a bug: https://bugzilla.mozilla.org/show_bug.cgi?id=307089

Or, it's a memory saving feature. To implement "View source from cache" requires keeping around the raw page HTML, which you might not otherwise need after parsing - except you probably will for all the developer tools to work, so this probably should just be considered bug.


I wouldn't mind having to activate it in the developer tools and disable when I don't need it anymore and can feel the difference. Definitely easier than having to open a separate program like fiddler.


How much memory is saved here? http://www.httparchive.org/interesting.php?a=All&l=Dec%202%2... 52 _kilobytes_ ? Really?


The browser would have to save the source text for EVERY tab it opens, regardless of whether you view source or not. That seems more onerous than making a separate request for the source code every once in a while.


Am I in some parallel universe here ?

We are talking about hundreds of KBs of memory total which is nothing and if it's a worry for mobile devices we have the disk. It's not a feature that demands instant performance.


> It's not a feature that demands instant performance.

Hence it's acceptable to request the document from the server when opening 'View Source'.


No, because when you want to view the source you mean to view the source of this very document, not the source you'd get by asking to reload the page, it is often different.


If you want to see the page as it's currently being rendered, looking in the developer tools seems more relevant anyway.


If they wanted that, they'd have looked at the developer tools to begin with.

Many of you folks are completely missing the point. The world wide web took off in large part because it was incredibly easy to learn HTML, because with every webpage if one wanted to know how it worked one could just look at the source code.

How the page is currently being rendered, what state the DOM might be in... These things do not matter to someone trying to view the source HTML for a page. They're looking to learn about the HTML. They're not goimg to get that by viewing pre-digested DOM information.


Or a website I was trying to fix: the source in view source, and thr stuff on dev tools, never matched each other, even with JS disabled. in the end after days I gave up and the page is in production, crazy bugs and all, because I can't figure why on both chrome and Firefox the webpage end with lots of random "strong" and "span" tags that don't exist anywhere in the original source. The tags aren't even closed properly, some are never closed, some are closed multiple times.


Thats just it...

There are edge cases where a bug is intermittent... and is masked by something on the client side, especially possible with browser plugins. In fact, plugins were the cause of 2 of these for me... Where some issue on page load was causing a bug, but then JS was changing the source away from what cased the bug, but a refresh wasn't guaranteed to have the same information (this was a fast changing log-viewer, for one of them) ...

So you end up not being able to capture the init state of the page... but the bug wouldnt show up without JS enabled because the error is in the JS...

Not common case... but it seems like fetching from the server is MORE work for no reason when the data is already there...


No it's not, and it's particularly inappropriate for any document being viewed as a result of a POST request.


But who does this feature help exactly? A _tiny_ number of users. Chrome is currently using over 3GB on my machine. Anything they can do to trim that without affecting performance is worth it to me.


3GB is how much memory the _rendered/executed_ source code takes. The source code is nothing. Store it LZ4 compressed in memory and it's less than a rounding error.


Yes, I know that, but all of these little inefficiencies add up, and for what? A tiny portion of users who need the developer tools? Not worth it.

This kind of thinking is exactly what leads to our software being as slow or slower than that of two decades ago while running on machines hundreds or thousands of times more powerful.


Yes, but if you have more than enough memory, then that's no problem.

So, the browser should load the HTML page into a part of memory that can be discarded by the OS if the OS needs more memory. Actually I think it is strange that no such memory API exists in Unix.

EDIT: Anyway, storing it in the browser's file cache would also do the trick, I suppose :)


Isn't that one of the issues nowadays, that people have the attitude "we have enough memory" and thus don't tend to care about optimizing for memory usage anymore.

Your suggested solution, about discarding when memory is needed, solves this problem. But it solves a problem that would be created by needlessly storing a lot more data than needed. So it's a solution to a problem that was not a problem to begin with.

But I know, if it's a few kb, it will not make a huge difference.


No this is really not the case of thinking we have enough memory. This is the realization that the static source code of a page is a percentage error compared to the memory overhead of a rendered page. The modern browser are memory hog I would rather they focus on that.

Its also a broken implementation function wise since I have asked for the source of the current page not of a reload.


100kb * 100 tabs is 10MB. I'll take my chances.


So hide it behind a feature flag?


> "we have enough memory"

Your phone is in disagreement.


We're talking about a default feature which is disabled 'for average users'. Average users only have a few tabs open in their browser. Not 100. Average users also don't have 3 GB RAM, nor do they use view source on a mobile phone, and neither do average users actually use the feature. So the default setting makes sense even though a user who is using the feature may end up using several MB due to a reload (not cool on a plan).

If you are saving 57 kB per tab open, that'd be ~5,7 MB with 100 tabs open. But if you have 100 tabs open on a mobile phone (!!), you have a bigger problem, and all those tabs are causing swapping already anyway. In that sense, enabling the feature by default makes sense. And don't forget that some people don't have flat rate internet.


My phone has a quite insane 3 GB of RAM.


My phone also has the insane 3GB of RAM, but last week Chrome killed Spotify and vice versa (music stopped playing) as soon as I switched apps. Sadly 3GB is not enough for Android and todays apps.


The most common $150 phone from last year only has 1GB, half of which is reserved for the OS.


Similar "tab discarding" feature already exists in Chrome, by the way:

https://developers.google.com/web/updates/2015/09/tab-discar...


> So, the browser should load the HTML page into a part of memory that can be discarded by the OS if the OS needs more memory. Actually I think it is strange that no such memory API exists in Unix.

It is more tricky then you would think to determine what "needed" memory means. Does the OS need disk cache? Or the content of memory mapped files?

It's not unlikely that you currently have some process active which has memory mapped a huge file. Does it need the content? Who knows.


You are talking about an OS-provided cache for the browser to store it's own cache in? There's little sense in that.


Moving memory management into the OS often makes sense, because the OS has the big cross-application picture, knows the system-wide memory pressure, and most importantly, already manages the memory of applications, by swapping between RAM and disk.

For the same reason, OS X and Android Linux both have systems for OS managed caches, and AFAIK Firefox already uses these: https://bugzilla.mozilla.org/show_bug.cgi?id=748598

The status on mainline Linux is a bit more nebulous (seems Android's ashmem has been upstreamed, but it's not directly usable on GNU/Linux systems?), and other efforts have stranded: https://lwn.net/Articles/602650/

For some more thoughts about memory management on OS level vs. application level, I can recommend this "random outburst" from the designer of the Varnish HTTP cache: https://www.varnish-cache.org/docs/trunk/phk/notes.html


But the amount of memory required for plain text source is very small compared to the memory allocated to render the page.


What everyone is missing here is that one in ten thousand users needs this, so why optimize for the vast minority?


> requires keeping around the raw page HTML

Isn't it already in the browser's cache in that state anyway?

And if the cache is small, the preservation doesn't even have to be "for all tabs," if the last few pages can be retrieved from the cache nobody would complain that the older ones behave as they behave now -- you typically don't do "I wonder what was the source of the page from yesterday" in this old tab, but even if you do, you wouldn't be surprised that the source of yesterday does have to be requested again. So I imagine the fix as just "if in the cache, get it from there, else request."


> Isn't it already in the browser's cache in that state anyway?

Like disk cache as in Cache-Control? In most cases you wouldn't cache the HTML itself, but in cases where you do then your use case should already work as stated, since for the browser to do otherwise would imply the cache is being intentionally ignored for the view-source request.


> In most cases you wouldn't cache the HTML itself

Why not, at least as long as that's the topmost tab? Wouldn't then the view-source-new-request problem be solved by just using the existing features?


With devtools closed seems not worthwhile then. Vast majority of users aren't using devtools


Why keeping the original source in memory? Why not saving it to disk?


Maybe because it would prematurely age SSDs?


like anybody gives a damn about that. Spotify for sure doesn't.


Why would you need the original HTML for dev tools to work? Maybe there are ones I've never seen but the ones I use are using the DOM rather than the original HTML string.


You don't typically.

One case where you would View Source in addition to using Dev Tools is debugging how the browser is massaging your source HTML into the DOM, for example by inserting missing <tbody> elements. Validating your HTML mostly addresses this (I'm actually not sure about the <tbody> example, I would hope the validator at least issues a warning), but isn't something you necessarily want or can do with, for example, user-uploaded HTML snippets.


True, though I guess using some HTML validation could improve that. Not sure if <tbody> is mandatory but I think I've seen some tool complain about its absence.


I showed a few people who have no idea about programming whatsoever how to hack websites to download .mp4 or .mp3 from source code of the website. How would you show them that using DOM...


How would you show them that using the HTML source?

You can use dev tools to view the network tab to find whatever URL Pandora is using.

You can use dev tools to find a media elements src attribute.

To see the JavaScript sources... You check the dev tools.

If anything, in the days of JavaScript, the HTML source will missing a few things.


If there's an audio tag, that element exists in the DOM and has a reference to the url where you can download it. Is this a trick question?


Yes, my mother doesn't know what DOM is or why there are so many buttons in devtools. ctrl+u -> ctrl+f -> mp3 is much easier to explain and repeat.


Right but then you're talking about viewing the HTML source right? In other words "View Source." I'm asking why dev tools would need access to the original source and not the DOM. It doesn't make sense to save the original source in memory to me once it's been parsed.


ctrl+shift+i, ctrl+f "mp4"


Wow first raised in 2005. Nice!


Couldn't you reconstruct the source from the internal dom representation anyway?


No. The parser corrects and changes the source to make the DOM.

So it is impossible to find errors usually. So things like:

    <p><div> </p></div>
the parser will correct, and therefore the DOM will be correct. But what it does to fix this may break your site. I often found the CSS would be screwed up for many reasons (rules don't match DOM structure -- due to bugs all over our codebase)


Also, changes made by client-side JS.


oh right.


I wish View Source were more flexible. I would like:

* Raw: a view of the actual body as sent in the response. Although I'm not aware of the current situation, at least some browsers used to subtly alter what was sent, even for View Source (possibly related to validity corrections). I want a guarantee that what I'm viewing is what the server sent.

* View source as it is today (with a good common understanding of what that means), but a bit more powerful. Give me a cursor so I can copy from the keyboard, for crying out loud! Maybe even let me edit the source so I can work with static pages more easily.

* Something in-between View Source and DOM inspector. E.g. the original source, guaranteed to be untouched by javascript, but cleaned-up for easier reading (given that the source returned by many websites nowadays is practically unreadable (take a look at this page, for example). Reformatting (where possible), maybe automatic expansion of any base href, consistent ordering of attributes, highlighting of errors, etc.


To prevent a new browser request and to get a formatted version of the raw source in Chrome, you can open up the dev tools, go to sources, find the name of the page, and then click the curly brackets in the bottom left corner of the tools. It's still missing a good cursor implementation, though.


Hit F7 for caret browsing.


Isn't a debugging proxy (like Fiddler) a good choice for the first of those?


Dev tools in most browsers does the first as does Fiddler as mentioned in a sibling comment.


You can always use curl. Chrome allows you to copy paste a full curl command (with all the arguments and cookies) to issue an identical http request.


Identical Http requests don't necessarily return identical data.


Provided the content hasn't changed. Sometimes you have a bug that goes away when you refresh the page.


Once the DOM is ready, the browser has no need to keep the original source in memory or cache. For how many page views does the user want to view source? One in a thousand? You're asking the browser to waste space storing something that it will only rarely be asked to display.


it could be garbage collected when user closes the tab.


They (well, Chrome at least) even do that for a POST, which is highly questionable.


I usually have Fiddler running constantly in the background.

If a page is slow to load and I want to see the HTML, I would find the actual HTTP request in fiddler and grab the HTML that way.

Alternatively, the element inspector doesn't re-issue the request, but it will show the compiled HTML, so after DOM augmentation etc


Is there anything comparable to Fiddler for macOS? I heard you can run it in VM but that's just too much brute force for me.



Fiddler is available for MacOS.


Interesting. I had no idea they had a beta version for macOS. I've gotta try it.


From what I can tell the Telerik version is on mac also


Burpsuite has interception proxy and runs on Java.


Maybe the HTML source is discarded after the browser has built the DOM object tree?


I think it may just be because in both cases it opens in a new tab with the view-source modifier, it treats it like you just opened a copy of the page in a new tab?

I believe if you open the inspector instead it does not issue a new request.


> I think it may just be because in both cases it opens in a new tab with the view-source modifier, it treats it like you just opened a copy of the page in a new tab?

Interesting. I never payed attention to the resource prefix. Are those standardized at all? Both Firefox and Chrome use the same "view-source:" prefix.

> I believe if you open the inspector instead it does not issue a new request.

The inspector shows the current DOM, not the original loaded HTML, which could be different. It's probably "good enough" as in most cases the original HTML would be the simpler one (say a single div for a single page app) and the live DOM would show what look like right now.


It's not the prefix/scheme that's important, it's the fact that it's a new tab.


I seem to recall that it wasn't always the case.


True and I was shocked that the referenced mozilla bug[1] is from 2006. Time really flies because it feels like yesterday I could view source without re-loading it.

[1]: https://bugzilla.mozilla.org/show_bug.cgi?id=307089


Sweet memories:

http://imgur.com/a/p02cQ


Second that. That wasn't the default behaviour at once.


Perhaps pre-DOM, per other comments here?


I once wondered this, but didn't think much of it. Now seeing the question, it seems quite clear. The DOM+CSS+javascript is pretty much like a computer running a (self-modifying) program. If it didn't do a fresh fetch, then what you'd be seeing is the contents of 'memory' after the program has been running some time. This is useful in itself, so we have Inspect. If you want to see the initial state, the program before execution begins, then you want to View its source(s).


I'm not exactly sure why browsers do that, but I suspect that's due to caching.

However, I would suggest that you use "Inspect element" and open up dev tools - this way you will see DOM exactly as it is rendered.


Often the dom has been changed, and you want to know what the server actually sent


You can still do that with the inspect menu; if you look at the network history in Chrome, you can see the headers sent, the headers received and the body sent / received.


> You can still do that with the inspect menu; if you look at the network history in Chrome, you can see the headers sent, the headers received and the body sent / received.

That only works if you have it open before the page loaded. It doesn't save information about network requests that occurred before the inspector was opened.


It would be handy if it did though since I continually forget to open inspector before the request.

I'd love a "I'm a developer, store all the things!" setting


> I suspect that's due to caching.

Shouldn't caching have the opposite effect? i.e. it's treated as a new resource but no new http request is needed because it's in the cache..


In Opera up to 12.x it does not.


I suppose in today's web ecosystem, the initial HTML state of the page is useless in almost all cases after the DOM loads. Rarely will you care about the pre-javascript page, and if you do, you're probably debugging a web site, and the extra load shouldn't matter.

Where it could get interesting though, is if the content changes before you view the source. In that case, you're out of luck I guess.


I wouldn't think view-source is a highly used feature that benefits from optimising much and your scenario sounds rare. If a "view-source:..." URL is entered into the browser directly you would need to write code to grab the page from scratch anyway so doing this always makes the logic simpler.


Yeah, that always annoyed me. So far, i dont see any answer in this page. Really curious, tho.


The html returned from the already performed HTTP action has been consumed, rendering the view in the browser. If you want to view the raw HTML instead, 'View Source' requests another copy of it, but does not render it.


Same with save page, etc.


Bug? Wow




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: