
Webrecorder: Make an interactive copy of any web page that you browse - pcr910303
https://webrecorder.io/
======
boredgamer2
Very cool. This seems like such a "Duh. Users want this" feature. I wish it
was integrated in Firefox years ago. I bookmark some sites and then come back
years later, but perhaps they hijacked the URL or they later changed the URL's
parameters.

And then when I return, I get a 404. Instead of bookmarking, I'd love to
"capture" the current info, divs, and graphics.

~~~
MasterYoda
You could do that with the Firefox extension "SinleFile" [1]. It main purpose
is to save a full webpage as one single html file with images etc encoded in
the html file (no more messy html file + folder cluttering), very neat. There
is also a variant that is a html/zip hybrid called SingleFileZ [2] that gets
smaller saved file.

In the settings you could set so that sites you bookmark is saved and also
link the bookmark to the saved file locally if you want to.

[1] [https://addons.mozilla.org/en-US/firefox/addon/single-
file/](https://addons.mozilla.org/en-US/firefox/addon/single-file/) [2]
[https://addons.mozilla.org/en-
US/firefox/addon/singlefilez/](https://addons.mozilla.org/en-
US/firefox/addon/singlefilez/)

~~~
gardaani
That looks great! Firefox should have something like SingleFile built-in.
Instead of just copying features from Chrome, they should lead the innovation
and add useful features like this.

~~~
renewiltord
I used this in like IE5 (I think) with MHTML. Though after looking a little,
looks like SingleFile is more featureful (gets lazy loaded stuff etc)

------
BiteCode_dev
For a KISS version of this, there is the Single File add on:

[https://addons.mozilla.org/fr/firefox/addon/single-
file/](https://addons.mozilla.org/fr/firefox/addon/single-file/)

It will save any page as a standalone HTML file, including inlined external
resources.

~~~
abnry
I love this extension! I use it in place of bookmarks. I then run a cron job
to move .html files from my downloads folder to a bookmarks folder. Then I
generate thumbnails and an html index to easily browse my bookmarks. I feel so
much more relaxed knowing I have can save with one click any good information
I find on the web. Eventually I want to add NLP keyword extraction and
categorization, and an internal search feature.

~~~
heavyset_go
Check out Shiori if you haven't, it does everything you want + archival.

------
Diederich
The problem here is quite real; just because one has access to a remote
resource now doesn't mean that access will remain, either in the short or long
term.

I'm not crazy about using a 3rd party service for it though.

I've half-considered wiring up something with OBS to just record my web
browser all day, but, besides the intense storage, indexing and searching is
more than painful.

~~~
greglindahl
Webrecorder is open source, and you can run your own instance if you like.

~~~
mellosouls
Yes - it's also a digital preservation project from a not-for-profit arts
organisation.

[https://webrecorder.io/_faq](https://webrecorder.io/_faq)

~~~
veenusav
I second

------
dang
Surprisingly little prior discussion, but
[https://news.ycombinator.com/item?id=10838985](https://news.ycombinator.com/item?id=10838985)
was related (2016).

------
bacondude3
Just want to plug HTTrack [1] here as well. Not nearly as slick, but it's
worked extremely well for me when Webrecorder couldn't do the job. Being
usable through the command line also makes it useful for some projects WR
can't do.

[1]: [http://www.httrack.com/](http://www.httrack.com/)

~~~
causality0
I came here to say the same thing. HTTRACK is incredibly handy especially when
it comes to old pages. My personal use case is archiving small sites to make
sure they don't go dark, for example webcomics. Having a single folder with
all the comic image files is great.

------
ernsheong
I am the author of [https://www.pagedash.com](https://www.pagedash.com), which
is a service that simply saves the entire page statically, and also makes an
attempt at a dynamic save (allowing all initial JS to run). No network
interaction is captured. Try it if you want something simpler. We've been
around since 2017. See previous Show HN
[https://news.ycombinator.com/item?id=15653206](https://news.ycombinator.com/item?id=15653206)

------
fareesh
Slightly unrelated - but how would one dump the entire state of JS objects and
the DOM onto the disk and then retrieve it at a later date?

Hypothetical example - I'm playing some HTML5 game and I want to turn off my
pc and return to it exactly as it was.

~~~
busymom0
I am mostly a noob to JS but could this work?

[https://stackoverflow.com/questions/2934787/view-list-of-
all...](https://stackoverflow.com/questions/2934787/view-list-of-all-
javascript-variables-in-google-chrome-console)

------
mdaniel
I can't tell which spawned which, but there's a related discussion in
r/DataHoarder about an extension that does that:
[https://old.reddit.com/r/DataHoarder/comments/ggyzoy/is_ther...](https://old.reddit.com/r/DataHoarder/comments/ggyzoy/is_there_a_firefoxchrome_extension_that/)
in which Webrecorder was mentioned yesterday

------
amelius
> Webrecorder creates an interactive copy of any web page that you browse,
> including content revealed by your interactions such as (...) clicking
> buttons, and so forth.

How can they guarantee this if the code may run on the server?

~~~
netsharc
It's probably a replay of what you did. I'm trying to think how the internals
would be made using Javascript, but probably it's a case of recording changes
to the DOM structure ("After 3 seconds, the password input field had the value
'hunter2'. After 5 seconds, a DIV appears with the the text 'Incorrect
password'", etc).

~~~
amelius
Makes sense, but I wouldn't call it an "interactive copy".

~~~
jrochkind1
I am also just interpreting the copy on the OP, but I don't think it's just a
replay -- I imagine they record all HTTP requests and responses, so can echo
back the recorded response for any given JS query to server, when it's
triggered.

The page linked says:

> Webrecorder takes a new approach to web archiving by capturing ("recording")
> network traffic and processes within the browser while you interact with a
> web page. Unlike conventional crawler-based web archiving methods, this
> allows even intricate websites, such as those with embedded media, complex
> Javascript, user-specific content and interactions, and other dynamic
> elements, to be captured and faithfully restaged.

and:

> Webrecorder creates an interactive copy of any web page that you browse,
> including content revealed by your interactions such as playing video and
> audio, scrolling, clicking buttons, and so forth.

I think they record whatever HTTP transactions you trigger, so they can be
played back when triggered in the replay.

~~~
amelius
This wouldn't work for websites using more advanced techniques where ordering
of requests matters.

~~~
jrochkind1
Right, it would not and presumably does not work universally. Hopefully it
works on a great enough majority to be useful.

~~~
amelius
The problem is that at save-time you want to know if it works.

It would be very annoying if you thought you saved something but it turns out
to be not the case.

------
NelsonMinar
Wow, neat!

Is there a reliable screenshot version of this I can install on my own Linux
system? Some sort of headless browser, I imagine. I realize this interactive
system is much more powerful but it's overkill for an application I have in
mind, a best effort single screenshot is fine. I've seen various attempts at
doing this over the years and none have really worked reliably.

I'd consider paying a service to do this but it seems like self-installable is
better.

~~~
asab
Firefox can screenshot the whole page - it's built in. I believe it can be
done in headless as well.

[https://support.mozilla.org/en-US/kb/firefox-
screenshots](https://support.mozilla.org/en-US/kb/firefox-screenshots)

~~~
Nux
Can it screenshot lazy loading pages?

~~~
williamdclt
I think it can be smart-ish about this (scrolling for you, but maybe I'm
thinking about an extension). I've never found anything that handles
virtualised lists/tables though, like screenshotting an entire Slack thread

------
chrischen
I feel like the “Make website browsable offline” feature of yesteryears have
been neglected. Somehow people assume everyone always has internet
connectivity... but I need to save websites, docs, etc, for long flights or
camping trips without signal.

Safari annoyingly has a reading list feature that claims to cache the web page
but annoyingly 50% of the time doesn’t. As with all apple cloud services there
is just no way to explicitly sync.

~~~
saurik
Since when is the Safari Reading List a cloud service? It might be that I just
have most of iCloud disabled and that makes the system work as expected, but
for me it is definitely 100% local (at which point I would presume failures
are due to the mechanism it using to save things not working on all kinds of
web pages: I honestly only use it for simple document sites, and so wouldn't
know if it fails a lot on rich web app sites).

~~~
saagarjha
I think perhaps they're annoyed that their Reading List isn't syncing between
their devices and Apple hasn't put a button that forces the sync to occur.

~~~
chrischen
Yes, it is advertised as being able to make it available offline. But if you
save it to reading list on iPhone, or your Mac, there's no guarantee it's
available on the other device and often it even stops working on the device
that I added to reading list.

------
A4ET8a8uTh0
This is neat. More and more stuff is hidden behind a login screen. Odd
question. How does internet archive handle those types of pages these days?

edit: types of

------
jcahill
We use webrecorder for some interactive work at my workplace, a web archival
nonprofit.

If you're new to web archival, expect a learning curve.

------
seph-reed
This seems like something that would work really well on the piHole level.

~~~
jcahill
A pi isn't going to cut it here.

~~~
edoceo
Pi could for sure do the capture and save, it can run full or headless, and
all the things. For personal itd be fine, not at scale tho

~~~
dependenttypes
You would have to intercept HTTPS connections and add your own CA to every
computer that uses the pihole which is a pain.

~~~
seph-reed
Hmmm...

[https://security.stackexchange.com/questions/8145/does-
https...](https://security.stackexchange.com/questions/8145/does-https-
prevent-man-in-the-middle-attacks-by-proxy-server)

I need to read more about CAs to figure out why the Pi couldn't fake it.

------
eastendguy
Technically interesting, but why would I use this over one of the many full
page screenshot or "Print to PDF" browser extensions? That is what I use when
I want to archive something.

~~~
heinrichhartman
Which browser and extension are you using?

I tried a few, but could not get good results with any of them. print to pdf,
output always looked terrible. Full page screenshot took ages to scroll
throught the page hijacking my viewport. Those captures should happen in the
background.

~~~
eastendguy
For manual capture I use [https://chrome.google.com/webstore/detail/full-page-
screen-c...](https://chrome.google.com/webstore/detail/full-page-screen-
capture/fdpohaocaechififmbbbbbknoalclacl) and Chrome "Print to PDF". Layout
issues are not important for my use case.

For scheduled captures I automated this workflow with kantu:
[https://chrome.google.com/webstore/detail/uivision-
rpa/gcbal...](https://chrome.google.com/webstore/detail/uivision-
rpa/gcbalfbdmfieckjlnblleoemohcganoc)

> Those captures should happen in the background.

Yes, it would be nice if the Chrome extension api would allow full page screen
captures to happen instanstly. Currently all extensions need to scroll
up/down.

~~~
gildas
> Currently all extensions need to scroll up/down.

SingleFile does not. It can save lazy loaded contents without scrolling.

------
CGamesPlay
Webrecorder is quite good, but the necessity of using a separate program to
record and to replay made this less desirable for me. I ended up making an app
that integrated the two (conceptually, no technical relationship), which is
rather outdated now but available here, should anyone be interested:
[https://github.com/CGamesPlay/chronicler](https://github.com/CGamesPlay/chronicler)

------
mosfets
I have been using this add-on in Chrome and thinks it's nice:
[https://chrome.google.com/webstore/detail/huula-web-
clipper/...](https://chrome.google.com/webstore/detail/huula-web-
clipper/holcfbedaepgjadjangfbmpdjnhkhdoa?hl=en)

------
sudhirkhanger
There's Chrome's history, saved pages in Pocket, Bookmarks, possibly separate
history in my mobile, etc. But the main issue is if that how do I get back the
results when I want them in a seamless manner.

How do I get them when I do a regular search using Google? Retrieval is the
main problem.

------
jl6
I would love just a simple one-click print-current-page-to-PDF button that
managed to capture the whole page, and did something intelligent about
infinite scroll sites.

------
boromi
Can't we do this with OBS? What's the difference. Edit was going to download
then saw it was an electron app.. no thanks.

~~~
xfer
It's Http Archive(HAR) recorder+indexer not a scree capture/mp4 video recorder
program. You can use it with your webbrowser, you don't need an app. You can
self-host the instance as well.

------
elil17
Is there a reason you can't just use archive.org for this?

~~~
CGamesPlay
Well, at least two. For one, it might be desired to have a copy of the data
yourself, rather than relying on some hosted service. Also, webrecorder would
allow you to store authenticated pages.

------
imvetri
Feedback on video: Video looks like machine recorded. Suggestion: During
playback, smoothen the mouse movement. Videos will have smooth design appeal
to sell your product better.

