In Netscape days, many people would have to pay by the minute to be connected to the internet. In those days, web pages generally contained far more information than they do now, and were less interactive. So you'd connect, load the content you wanted to see, disconnect, and then just sit there and read it for free, instead of bleeding cash.
I would have said UUCP for news but that's showing my age.
These browsers were born in the era of dialup Internet that had per minute charges and/or long distance charges. At the very least you were tying up your family's phone line.
Basically it's like paying for every minute your cable modem is plugged in.
For the feature itself: Netscape had integration with the modem connectivity for the OS and would initiate a connection when you tried to visit a remote page. Offline mode let you disable automatic dialing of the modem.
> Netscape had integration with the modem connectivity for the OS and would initiate a connection when you tried to visit a remote page.
That's not "integration with modem connectivity", that's just going through the OS's socket API (or userland socket stack, e.g. Trumpet Winsock); where the socket library dials the modem to serve the first bind(2). Sort of like auto-mounting a network share to serve a VFS open(2).
Try it yourself: boot up a Windows 95 OSR2 machine with a (configured) modem, and try e.g. loading your Outlook Express email. The modem will dial. It's a feature of the socket stack.
These socket stacks would also automatically hang up the modem if the stack was idle (= no open sockets) for long enough.
My point was that a quiescent HTML4 browser has no open sockets, whether or not it's intentionally "offline." If you do as you say — load up a bunch of pages, and then sit there reading them — your modem will hang up, whether or not you play with Netscape's toggles.
(On single-tasking OSes like DOS — where a TCP/IP socket stack would be a part of a program, rather than a part of the OS — there was software that would eagerly hang up the modem whenever its internal socket refcount dropped to zero. But this isn't really a useful strategy for a multitasking OS, since a lot of things — e.g. AOL's chatroom software presaging AIM — would love to poll just often enough to cause the line that had just disconnected to reconnect. Since calls were charged per-minute rather than per-second, these reconnects had overhead costs!)
> [Netscape's] offline mode let you disable automatic dialing of the modem.
When you do... what?
When you first open the browser, to avoid loading your home page? (I guess that's sensible, especially if you're using Netscape in its capacity as an email client to read your already-synced email; or using it to author and test HTML; or using it to read local HTML documentation. And yet, not too sensible, since you need to open the browser to get to that toggle... is this a thing you had to think about in advance, like turning off your AC before shutting off your car?)
But I think you're implying that it's for when you try to navigate to a URL in the address bar, or click a link.
In which case, would the page, in fact, be served from the client-side cache, or would you just get nothing? (Was HTTP client-side caching even a thing in the early 90s? Did disks have the room to hold client-side caches? Did web servers by-and-large bother to send HTTP/1.0 Expires and Last-Modified headers? Etc.)
Browser shows “I’m offline. Do you want to proceed?” dialog. If OK, tries to open socket. Upon such request, OS will try to dial up on modem with PPP, which takes ~30s. PPP session disconnects after few minutes. Else abort. If browser is online in the first place, skip that dialog part.
- Was caching a thing? Did disks have room to hold cache?
Oh it was, there were rooms. Alas, 56kbps(7kB/s) ideal means each pages are kilobytes to single megabyte large at most. ADSL connections at ~1Mbps(1/8 MB/s) means a 256kB page loads in 1 second on a fine day. Even a 200MB disk space holds 1024 of such ~200kB pages, and IE back in those days had a hidden attributes but ordinary folder deep down from C: that holds about that large random cache of files that it decided to hold.
Fuck I’m old now
IIRC, some of the browsers, in "offline mode", would actually serve pages from the local cache, not even attempting to fetch them from the remote server. If you attempted to navigate to a page that hadn't been cached, you got an error of some kind.
I was considered mad at the time for upping my cache to a whopping 2Mb by friends, but Netscape's cache was highly configurable.
Things like cache pages, but not images, always cache bookmarked pages, cache iframe pages (which were often navigation), etc. Netscape 4 added CSS to that mix.
The browser cache used to actually be quite dependable as an offline way to view pages but this seems to have fallen out of favour in the mid naughties. I remember how disgusted I was when I realised Safari was no longer me letting see a page unless it could contact the server and download the latest version.
I used to have a caching proxy server that would basically MITM my browsing and be more vigilant than even the cache and it really worked quite well. This was back in the 90s when every bit of your max 54kbs counted, or when you wanted to read something while your Dad or sister wanted to also use the phone.
Anyway, you can no longer take this approach because bad people broke the Internet and now you have to have a great honking opaque TLS layer between you and the caching servers so there's no way for this optimisation to work any more.
Of course it isn't really as important these days because we've got faster connections and interactions with the server are far less transactional and richer. But I still would like to have a way of tracking my own webusage and being able to go back in time without having to actually revisit each and every site.
These days you have to hack the browser because that's where your TLS endpoint emerges. Kaspersky tried this for their HTTP firewall application and there was ructions over that.
I'll defo take a look at this. Sounds just like what I've been looking for.
> Isn't this how the internet was supposed to work in the first place? I remember Netscape navigator having a 'go offline' icon in the corner.
Thinking back actually, if you forget about "the web"/HTTP - then yes actually - this is exactly how usenet worked and now I'm remembering that the "go offline" button used to download all your newsgroups along with your email and stuff so you could look at it all offline :-)
If you want something that's like Usenet these days check out Scuttlebut.
Nothing in the HTTP spec has changed about this AFAIK. The internet still behaves this way.
> Anyway, you can no longer take this approach because bad people broke the Internet and now you have to have a great honking opaque TLS layer between you and the caching servers so there's no way for this optimisation to work any more.
You could certainly still do this, you just need to import the proxy's CA certificate into your browser. It's just not possible to do to others on your network without consent now. (This is a security feature, not a bug.)
I was under the impression that TLS everywhere breaks the web as it was designed, ergo though nothing in the spec has changed - the environment itself has changed.
You could probably do something with isomorphic encryption but nobody seems that bothered. It’s probably of limited value nowadays as we use the web more as an rpc layer.
Setup will vary by proxy and OS but here's one example (which I haven't used but at a glance seems OK):
So if you have a local web server (like on localhost), you'll still be able to access it, but you won't be able to access non-local IPs.
Then a cronjob runs and puts it into a folder to be processed into a database, which generates a static html index and puts it in my Google Drive.
Then it syncs offline on my chromebook. Which means that without internet, I can put my chromebook in tablet mode and do some nice reading. I've been very pleased so far.
I'm uncertain what the best mechanism is, there are so many ways to solve it. From filtering to recrawling for new content to enabling more advanced features, there are so many possibilities.
It's not complete, mostly because the frontend is a mess, but the backend is able to save files, pages and links (https://gitlab.com/thebird/snag). Used Rust backend, Svelte and JS for the extension (of course).
Also, an update on the binaries. I just pushed a new set of binaries, and tested the linux and windows ones worked. Hopefully that should resolve the binary issues people were having, but I'm pretty scared some people will still have issues. If you do, please report them, ideally as an issue.
But can we count on Firefox to stay relevant even if Mozilla fired most of the dev team?
I'm not making any value judgment about the actual tool. It sounds interesting enough. But it should behave better.
That's only for configuration files. The actual data should go in $XDG_DATA_HOME, by default ~/.local/share/22120/. Many languages have a tool (e.g. https://pypi.org/p/appdirs/ ) for finding the appropriate data/config/cache directory for each OS.
The config is actually located in hidden file, here: https://github.com/c9fe/22120/blob/8b6cc758f14d34f564fd3a838...
const pref_file = path.resolve(os.homedir(), '.22120.config.json');
const server_port = process.env.PORT || process.argv || 22120;
I guess i can explain my thinking, but I'm not sure if they will help you understand or like it. Basically I like the name and I did consider changing it a few times but it just stuck for a number of reasons. And I can't remember exactly now but the port number might have come after the name. I think it's a good idea and cool if you have the server running on the same board that the name is but at the same time I know you might have something else running on that port so that's why I made it configurable. I guess she might also want to run more than one copy... But I'm pretty sure that we can't open two chrome windows with different debugging ports using the same user data directory.
Anyway, that might be more info than you want or care for, and it might not help, but hopefully you have a clearer picture at least. I get if you still don't like it, but i just feel sorry for you that you didn't have a good experience of this. I did not design it to annoy you.
I wish this was actually (optional) built-in behavior for browsers when bookmarking pages, or at least when adding to a "read later" list like Pocket/Instapaper etc.
Pocket seems to offer something like this, but only in the premium version, so the "permanent archive" ironically seems to go away when unsubscribing.
As a workaround, what if bookmarking a (public) page could actually ping it to archive.org for archival?
> Both WARC and MHTML require mutilatious modifications of the resources so that the resources can be "forced to fit" the format. At 22120, we believe this is not required
Perhaps the only possible purer format would be packet captures, say of the full HTTPS session, along with the session keymatter and connection metadata to later extract the verbatim HTTP resources. That'd be interesting, but I doubt that's what this "22120 format" (for which I see no documentation links) does.
(I even wrote this before checking out your link:
Have you heard of https://en.wikipedia.org/wiki/Mozilla_Archive_Format from two or three Internets ago ? If so what's your thoughts on it ?)
I am working on a personal project that like this. It is in early stages. I am creating a local search based on my browser history. So it doesnt crawl pages. Also the fetch is out of bounds of the browser so Authed urls are not supported out of the box.
I have a bookmarklet currently that lets met "pin" my page. my pinned pages are my new home page. Its how I keep my tabs closed.
I do not do a full archive level (but i could). Instead you get an offline view that is stripped of most things. example https://raw.githubusercontent.com/sbeckeriv/personal_search/...
Demo of the pin:
a self hosted version is on the roadmap.
Sideshow Ask HN: Didn't Firefox mobile work like this? I could read the reader view items offline...
Anyone knows what's happening with the whole bookmarks/collections situation?
I use firefox nightly on android and the feature disappeared (I think) sometime this year when they re-did the whole ui. I can't find an article or bug ticket explaining the reason why.
iOS seems to optimize for temporary offline scenarios; saved pages do not seem to be backed up or synced to iCloud.
The best bookmarking option for archival seems to be the pinboard.in archive plan.
There’s a nice MacOS application (sorry can’t remember the name right now) which gives you a better interface, but... they’re bookmarks. I would like them to be integrated with rather browser bookmarks. And to be usable when the site is down or I’m offline. And to appear when I search... lots of possibilities there.
The README states:
"It runs connected to a browser, and so is able to access the full-scope of resources (with, currently, the exception of video, audio and websockets, for now)"
I wonder what kind of limitations makes it hard to intercept those resources like the rest of the content.
In theory you could also use yacy... But that is intended as search engine and not as archive.
Edit: while looking into it I found alternatives  and Memex  seems to be interesting.
Edit2: I remember 2 Show HNs. One recorded your entire desktop and made it searchable. Can't remember what that was called, but the AllSeingEye I found 
I currently use Memex, but this is different approach, and I keep looking for a polished experience that can get more mainstream users into archiving/offline browsing.
Looks like I got some research for this week
Also, I don't want to do it if I have to change the way I use the protocol so I would want the same methods to be pretty much available and to work the same way.
Previous Show HN: https://news.ycombinator.com/item?id=15653206
Gonna check it.
Unfortunately only for chrome. I am very much used to having my favourite set of Firefox plugins. Will have to check whether I can replicate that with Chrome.
I had to switch to chrome for extensions. Finding a chrome extension that provides similiar functionality to your firefox ones should be easy.
This seems like something that could be done in a proxy and be browser independent.
Interesting. I find Chrome extensions to be very limited in what they can achieve. Can not do without Tree Style Tab...
What I'd like is to cache the history for each page too (important for news pages).
And, I’ve wanted a _search history first_ plugin for web search to find pages I missed saving, but recall reading.
Since the former takes time and the latter doesn’t exist, I gather I could buy storage and save browsing using this tool.
It would be interesting to see how it works in practice—saving so much data.
For work I’d be interested to know how it works for password protected sites like banking, social media, etc.
If you type: "^ worms" in the searchbar it will search your history for 'worms' and show the results in the dropdown. Typing "* worms" will search your bookmarks instead. The rest of the shortcut symbols are listed on the linked page. Hope that helps!
The amount of disk space it takes up isn’t crazy. It has been very useful for me.
For everyone else, here's some definition of the original terms:
fake victim - when you do something and it goes south for you, but you try to mislay responsibility for your choices and try to blame someone else. So in this context it's to do with the standard disclaimer of liability (you won't hold us accountable, etc).
don't lie - basically people stealing the work and pretending it's their own. Like stealing it and relicensing it, etc. Or pretending they are the rightsholders. Basically the standard language about copyright, and sublicensing but in my own lingo.
It seems like the license went something like this:
MIT > Dual GPL/Custom license > No license > AGPL > No License/Commercial only > Custom license > Different custom license > License with a commercial pricing? > AGPL (3 days ago)
Most of this happening over the last few weeks, and the readme says it's dual licensed.
I am too embarrassed to admit that a disproportionate amount of my time are spent on looking for a sentence or god forbid a tweet I vaguely remember reading last week.
SO yes, consider this a vote for that sexy full text search please.
Main downside is it's not trying to plug into existing bookmarks.
1. Download the release (npm or binary)
2. Start it up.
3. Go to chrome://bookmarks
4. Click on a folder and go, "open all".
5. Once they've loaded, click through each tab opened to make sure it loaded properly.
6. Check that they've been added to the index (go to http://localhost:2212)
7. Repeat 1-6 for all the folders of your bookmarks that you want to save.
8. Repeat 7 periodically.
I feel pretty scared for you that this will be too much work for you to feel good about doing it, but I want to say at least it will save it.
I think the use-case is good. I considered it in the past (automatically caching from bookmarks). I feel really bad for you this lets down your use case.
It's interesting, what do you mean by that?
> Can I use this with a browser that's not Chrome-based?
Note that a (rather similar) thing I've participated in in 2002 was browser-neutral.
but it might not happen. I'm only considering it and I will investigate. I'm sorry for you that you can't use it now.
I ask because the dev tool that our company creates occasionally (okay, very rarely) gets a question about offline mode, and when I prod, it's usually just out of curiosity, not because they actually need it in real life.
Browsers'history sucks. I don't know if this project does this, but I would absolutely love to be able to do SQL queries on my browsing history.
I have 'lost' many websites I remember visiting, but for which I didn't remember anything in the title.
Also, obviously, websites change sometimes, and the web archive might not have cached the website you visited. Although from what I can tell, this project doesn't version websites, it just caches the latest, so you would probably just overwrite the previous version accidentally.
Increasingly ≠ totally.
Even though I'm a developer, pre-pandemic I would have to spend a day or three offline several times a year while working. This would be useful for that.
I know an IT guy who works in mines. He loves anything that works offline.
It's a concern I have every time I find a particularly interesting independently hosted blog post or article.
The Internet Archive goes a long way towards making me worry about this less, though. (Let's just hope they don't go away!)
You can have archives from x day or x week or whatever, organized yourself, versioned with git, as you like, to save web content through version changes, removal or vanishing.
And even more cool: If one could browse one's friends' sites, while everyone was offline (if their privacy / sharing settings allowed), just a local net in maybe a rural village
Edit: roadmap: "Distributed p2p web browser on IPFS" -- is that it? :-)
For future work on this project, consider a search engine built on top of the downloaded files! Also, gzip your JSON.
My older project for anyone interested: https://github.com/CGamesPlay/chronicler
Having the ability to see the last week or so of my browsing history would have come in handy on more than one occasion.
I thought of some evil ones, like I know some people at my company that would love to be able to browse an employees past browsing history at their convenience.
It doesn't really need to be "offline" for that to work, but I can see that playing into calling it a security procedure rather than blatant overstepping.
Unfortunately it didn't work when I just tried installing it now (macOS 10.13.6, node v14.8.0).
MacBook-Pro:Desktop peter$ npx archivist1
npx: installed 79 in 8.282s
Preferences file does not exist. Creating one...
Args usage: <server_port> <save|serve> <chrome_port> <library_path>
Updating base path from undefined to /Users/peter...
Archive directory (/Users/peter/22120-arc/public/library) does not exist, creating...
Cache file does not exist, creating...
Index file does not exist, creating...
Base path updated to: /Users/peter. Saving to preferences...
Running in node...
Attempting to shut running chrome...
There was no running chrome.
Removing 22120's existing temporary browser cache if it exists...
Launching library server...
Library server started.
Waiting 1 second...
(node:33988) UnhandledPromiseRejectionWarning: Error: connect ECONNREFUSED 127.0.0.1:9222
at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1144:16)
(Use `node --trace-warnings ...` to show where the warning was created)
(node:33988) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 1)
(node:33988) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
(node:33988) UnhandledPromiseRejectionWarning: TypeError: Cannot read property 'writeFileSync' of undefined
at ae (/Users/peter/.npm/_npx/33988/lib/node_modules/archivist1/22120.js:321:14209)
at Object.changeMode (/Users/peter/.npm/_npx/33988/lib/node_modules/archivist1/22120.js:321:8088)
at s.handle_request (/Users/peter/.npm/_npx/33988/lib/node_modules/archivist1/22120.js:128:783)
at s (/Users/peter/.npm/_npx/33988/lib/node_modules/archivist1/22120.js:121:879)
at p.dispatch (/Users/peter/.npm/_npx/33988/lib/node_modules/archivist1/22120.js:121:901)
at s.handle_request (/Users/peter/.npm/_npx/33988/lib/node_modules/archivist1/22120.js:128:783)
at Function.v.process_params (/Users/peter/.npm/_npx/33988/lib/node_modules/archivist1/22120.js:114:3436)
at b (/Users/peter/.npm/_npx/33988/lib/node_modules/archivist1/22120.js:114:2476)
(node:33988) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 3)
^CCleanup called on reason: SIGINT
To possible remedy, and diagnose more, try
$ export DEBUG_22120=BLEHMEHEKTAA
$ npx archivist1@latest