The Marvellous Suspender on Chrome and Auto Tab Discard on Firefox will let you ...

EGreg · on March 21, 2021

Here is a counterpoint, and I encourage anyone here to tell me where I am wrong:

Why have tabs at all? Are you really needing to save the state of the vast majority documents and their JS? The suspender says no.

Why not consider all windows in which you aren’t typing a document to have a very small state, such as scroll position. And even the ones where you ARE typing a document can save the form fields in an encrypted file.

No, what you basically are saving is the already loaded DOM. And what if browsers took a radical approach to it as they are doing to third party cookie and... removed everything except maybe the latest 10 documents.

Yes the latest accessed 10 documents would be actually in buffers. The rest would be UNLOADED and browsers would save the state of their textboxes or scrollig, and restore it once the “same” elements appeared. But mostly they’d enable this new API to save state beforeunload and restore it, and that’s it. It’s not even a new API, you should already be playing nice by using this event and not storing some crazy state. Sure, infinite scrolling thingies would be broken, and the caches of many images would be purged but so what. Users can MANUALLY mark sites where they really NEED the caches to grow so large.

Instead, index the text on ALL sites and give the user a way to search their history of all their titles and bodies of all sites, as easily as they search google.

Every time the user opens a new tab, what they’re really saying is “bookmark this current site”. But why should they even make those decisions to bookmark. You should be storing are their history locally (and making it searcheable and making encypted backups of it across all their browser sessions on all their personal devices).

That’s what the user really WANTS to do. It’s the same idea as “gmail search” had when Google first launched GMail versus ordering all your mail in hierarchical folders. Think about it!

rakoo · on March 21, 2021

I've been dreaming of a system like this myself. Every time a page is loaded it would be written on-disk in a format that the browser can easily re-render, and can be nicely displayed by any third-party app. So you'd essentially have a local save of each and every single page you've ever visited; no more "this worked when I last visited it" because the browser could switch to this backup in case upstream is down. Also when you close a tab or switch to another webpage you don't really "close" the website, you just put it back to storage so it can be loaded at a later time. It seems to me using a browser should be closer to using a text editor, where you have one resource you're interacting with at the moment and others are in the background ready to replace the current one, but in a manner that loading them is a benign operation.

Tabs, as you say, essentially mean "I want to keep this page in an easy-to-reach place". If you look at gmail in comparison, it's exactly the same as keeping emails in the Inbox: they are important (for varying values of important) and there is something to do regarding them, so keep them there. It's not exactly the same as starring messages because starring is opt-in (needs manual action to mark importance) while the Inbox is opt-out (needs manual action to unmark importance). It seems many people have been using tabs this way and close them when work is done so we can transcribe GTD to browsers: when a tab is open it means I need to do something with it, when I'm done close it. Regarding the previous paragraph it shouldn't be harder to load a file from upstream than from a tab.

For this to work there needs to be a very strong search and history side. For me the best representation of browsing is a directed graph: nodes are websites and edges are clicks with a timestamp. There can be multiple edges between the same 2 nodes, if I clicked multiple times. The problem is such a graph is not only hard to represent efficiently, it's even harder to use it to search in history. But the good side is that as long as this data structure is used, you can represent any history (flat, threaded, ...) as you want.

exikyut · on March 21, 2021

(See my sibling comment)

I agree with the comparison of inbox-vs-tabs. The people out there that have 30,000 emails in their inboxes... those are the people that lose tabs because their computer forces them to close them.

Hrm, a directed graph. Interesting! I think that's how vim stores edit history?

exikyut · on March 21, 2021

I won't say you're technically or foundationally wrong, but I will say the view presented seems to be looking at solving the problem from the bottom up ("green-field the current implementation, identify the simplest possible alternative architecture that's just complicated enough to solve the majority of use-cases, and let the long tail fix itself"), instead of looking at things from the top down and making tiny/trivial incremental permutations to the bigger picture. Let me explain this counter-counter-argument.

> Why have tabs at all? Are you really needing to save the state of the vast majority documents and their JS? The suspender says no.

Tumblr, Pinterest and other websites that use infinite scrolling say yes. These sites are large and nontrivial. Tumblr is a major social networking platform. Pinterest is... Uber for Google Image search, or something. Both are, it would seem, not going anywhere. Pinterest's client UI is a labyrinthian mess. Tumblr is more manageable. But both use a rummage-around-in-dev-urandom approach to feed delivery; no page load delivers the same content twice.

I have been bitten by this enough that I actually gave up on the sites a few months ago. Well, the app, actually. I'd load a particular image, get lost in the "related" section (this would be especially problematic for industrial-design collections...), find something related that would catch my eye, my finger would tap the wrong image, I'd go back, and... it's gone. The layout's using a different seed now, the images are 98% the same but the one I was specifically looking for is now no longer in the results. This would happen with alarming regularity - like 80% of the time I'd mis-tap, which would be 60% of the time I was browsing, this would happen.

This is what got me so wound up about Chrome not having tab serialization in the first place (my other comment and the links it points to have some further frustration about tab suspension).

> Why not consider all windows in which you aren’t typing a document to have a very small state, such as scroll position. And even the ones where you ARE typing a document can save the form fields in an encrypted file.

> No, what you basically are saving is the already loaded DOM.

Zooming out somewhat, browsers are not just a DOM, and textboxes and scroll position are not the only bits of state in a page. To be pedantic, there's the DOM, but there's also the CSSOM (the CSS object model) which is built from all the CSS files, JS-injected style tags, and manually-applied JS .style.<blah> manipulations; and over in JS land Service Workers now mean pages are running multiple virtual threads of execution at the same time (not actually sure whether these map to OS threads), and WebAssembly bolts an entire new world onto the end of the JavaScript runtime too.

When I look at the web I don't see a single web browser running some specific set of "pet applications", if I can word it that way; rather, I envisage the terabytes (eep) of JavaScript code keeping the world turning around every day as the focus, and that the web runtime is kind of at the mercy of keeping that eye-wateringly head-spinningly large installed base in the air, while moving the platform forward and making substantive progress noises. This point of view is my only explanation for why things often feel so irritatingly stagnant.

--

> And what if browsers took a radical approach to it as they are doing to third party cookie and... removed everything except maybe the latest 10 documents.

Well, IT support teams around the world would need to add staff to deal with the exponential increase in complaints.

> Yes the latest accessed 10 documents would be actually in buffers. The rest would be UNLOADED and browsers would save the state of their textboxes or scrollig, and restore it once the “same” elements appeared. But mostly they’d enable this new API to save state beforeunload and restore it, and that’s it. It’s not even a new API, you should already be playing nice by using this event and not storing some crazy state. Sure, infinite scrolling thingies would be broken, and the caches of many images would be purged but so what. Users can MANUALLY mark sites where they really NEED the caches to grow so large.

Google tried almost exactly this with tab discarding a few months ago, working in exactly the way you describe.

It blew up the entire world's workflow, and they had to back it out. :(

--

> Instead, index the text on ALL sites and give the user a way to search their history of all their titles and bodies of all sites, as easily as they search google.

[SE YES PLEASE YES PLEASE YES PLEASE YES PLEASE YES PLEA]

I've wanted this FOR SEVERAL UMPTY MILLION YEARS HEY GOOGLE WHY DON'T YOU ACTUALLY USE YOUR 100-EXABYTE OR WHATEVER IT IS BIGTABLE FOR SOMETHING ACTUALLY USEful okay sorry I'll stop with the shouting but serIOUSly this would honestly fix 100% of 100% of my problems (yes, 100% of 100% of my problems) with short term memory loss and trying to remember things online and... [sad violin noises]

In all seriousness, my guess is regulatory restriction. The Wayback Machine is this obscure little dorky project in the corner because it can't be anything else.

1. Malicious user creates Google account

2. User uses newly created account to search for contentious $thing, saves $thing (or maybe it auto-saves), then signs out of account and never uses it again

3. Time passes

4. User re-logs back in and re-views $thing from Google's cache, creating <legal/sociopolitical/military/etc> $problem. Fireworks ensue.

Open challenge: solve the general use case of the external brain (searching the history of pages that I've viewed), _while_ not invoking the above problem.

I don't believe this can be done :'(

--

> Every time the user opens a new tab, what they’re really saying is “bookmark this current site”.

Not quite. Not always.

I can actually say this with some authority: the moment I learned that history is volatile (Chrome caps it at 3 months IIRC) I immediately began using bookmarks as nonvolatile history, "just in case".

About 10 years ago.

I have about ~30,000 bookmarks. They're all in Other bookmarks, because Chrome doesn't offer a tagging system that will sync with Chrome on Android.

I have accessed about 3 of them; the other ~5000 times I needed a bookmark I was unable to brave the Tide Of Bookmarks.

:'(

--

> But why should they even make those decisions to bookmark. You should be storing are their history locally (and making it searcheable and making encypted backups of it across all their browser sessions on all their personal devices).

Yes please. (Imagine that all-caps scroller again)

> That’s what the user really WANTS to do.

Yes it is!

> It’s the same idea as “gmail search” had when Google first launched GMail versus ordering all your mail in hierarchical folders. Think about it!

I don't need to :P

I've been thinking about this for a while now myself. All the existing solutions out there seem to revolve around snapshotting the DOM, or storing the exact requests then replaying them, etc etc. None treat the web as the black-box it is.

My alternative idea would unfortunately require participation at the renderer level but would scale to all current and future apps: save the render display lists instead.

In (recent) devtools, in the 3-dot menu at the top-right, select More tools > Layers. Click into the image you see, then click the Paint profiler link that appears. I argue, save _that_. It's the set of Skia operations that drew the page.

My arguments this is a good idea:

1. It doesn't perfectly save the entire page, but it *does* mathematically-perfectly save the parts that have been rendered, and which you have read. If you want to save the whole thing you'll need another imperfect solution. But if you just want to remember what you have read, this will *always* work, regardless of future development.

2. This scheme works with infinite-scrolling systems, _and_ with annoying websites that arrange bits of text into overflow:scroll divs that don't scroll the entire page, and which completely foil those page-screenshot extensions that scroll the page in chunks. (Sidenote: those extensions are actually the only correct way to snapshot websites currently; if you hit CTRL+SHIFT+P in the devtools and select "full page screenshot", you'll often crash the renderer for the tab if the page's full height is over 10,000 pixels, _especially_ on sites that use synthetic/virtualized DOMs for giant listviews and such.)

Unfortunately, I don't really have the resources to implement this right now, nor (depressingly) sufficient knowledge of C++ either. Sigh.

I would definitely definitely like this to be simpler...

exikyut · on March 21, 2021

Okay, 8000 is admirable and scary. :D

I used TGS a few years ago, back when I was still limping along on a 32-bit machine. Generally either the browser process would hit ~3GB VIRT and very abruptly terminate, or (back when Chrome would lump all of the open tabs owned by an extension into a single renderer) the renderer would simply thrash so much (because suspending the current tab, or switching between suspended tabs, invoked the mostly-swapped-to-disk renderer process) the browser would effectively become unusable, eg, 30 second stalls switching between tabs or 2 minute stalls opening new tabs.

Chrome's built-in tab discarding actually closes the renderer process, which solves all of those design fails, but then there's the browser process to contend with; a few days ago the browser process was basically sitting on about 2.5-3GB RAM. Apparently the data structures associated with remembering/showing a few tabs require a lot of memory...?!

The only annoyance with TGS and tab discarding without proper tab serialization/dehydration is page state is thrown out the window. Scrolled 350 pages deep in a tumblr blog or pinterest feed? Permanently gone on restore. I actually actively avoid sites with infinite scrolling where I can ._.

Of course, the real problem is that browsers don't provide good simple mnemonics for "I want to come back to this later" that effectively translate from "this is open and currently a thing" to something that works for squishy brains and finite hardware. Chrome's new reading list feature (which doesn't quite work in Dev yet, clicking the menu item randomly decided to start SEGV/MAPADDRing the other day, glad I wasn't using it lol) will be interesting to watch, but looks about as potent as the tag-less bookmark system, sadly.

This has actually been a problem for some years... https://news.ycombinator.com/item?id=18325632, https://news.ycombinator.com/item?id=16375865, https://news.ycombinator.com/item?id=13537600 make a few references (^F) to 'The Great Suspender' which I've mostly summarized here. (I accidentally locked `i336_` a few years ago - and I also had the English-language equivalent of "more lines of code = more better" back then too, so yeah, if you do have a look I can recommend ^F.)

I've been making "I need to fix this with an extension" noises to myself for years, but the gigantic annoyance is that, at the end of the day, whatever fun system I come up with on desktop will never seamlessly integrate on mobile because of course I can't run extensions there. What on earth is the point of having an external brain if I can't access it without needing to invoke a 30-step process that I have to fully context switch away from whatever I'm doing to perform?!?