
Show HN: Mummify – Preserve web content, fight link rot - zek
https://www.mummify.it
======
donpdonp
Its worth mentioning the [http://archiveteam.org/](http://archiveteam.org/)
which does a lot of important web archival work.

They use the Web ARChive format (WARC
[http://fileformats.archiveteam.org/wiki/WARC](http://fileformats.archiveteam.org/wiki/WARC)
) which I hope mummify and other such services will standardize on.

------
nine_k
The approach is nice, the problem being solved is real. Hopefully paying
customers will flock in.

The question: is mummify.it itself going to go under some day?

So I'll wait for an open-source analog of this service, to run on my own tiny
server. (Or carve some time to write it myself, of course.)

~~~
toomuchtodo
There already is an open source analog:

[http://www.archiveteam.org/index.php?title=Wget_with_WARC_ou...](http://www.archiveteam.org/index.php?title=Wget_with_WARC_output)
\+ [http://archive.org/web/web.php](http://archive.org/web/web.php)

~~~
nine_k
Wget is definitely a good start. What's needed is a shiny and usable UI, ways
to catalog / tag / search inside your saved data, and a painless installation
process. This could use come work.

~~~
devcpp
I think Evernote was also intended to solve this. Unfortunately, it is also
proprietary. So, there is definitely demand for this.

------
lesinski
How is Mummify going to get around the copyright implications of ripping off
the now removed piece of content which they don't own?

Also wondering: what happens if the publisher redirects the old URL to a new
place -- maybe an "update"... or maybe a useless hub page?

------
arb99
Weird pricing plan. At even the most expensive plan (of 15$/mo) only 50
'mummies' a month. Seems like an arbitrarily low number

~~~
hk__2
There are free alternatives, like Peeep.us [1] and Archive.is.

[1]: [http://www.peeep.us/](http://www.peeep.us/)

[2]: [http://archive.is/](http://archive.is/)

~~~
alok-g
From Peeep FAQ:

How long will Peeep keep my data?

Virtually forever. Nevertheless, we retain a right to remove content which has
not been accessed for a month.

~~~
RyanMcGreal
Trust us to save your content, except for the stuff that is least likely to be
saved anywhere else.

------
desireco42
I don't get the number of free/paid mummifications? Seems very low, space or
bandwidth are abundant.

I think, and with respect to original developer, this is more a feature then
an app and probably would help if it would be developed further to target more
specific problem/group.

Having said that, I wish you best.

------
pasbesoin
The OP depends upon Javascript for any and all content delivery/rendition.
Given the topic at hand, I find this more than a bit ironic.

I gather from the comments that this is some sort of online storage of a copy.
That may serve some use cases over the short term.

If you really want to avoid loss or "link rot", maintain your own copy on your
own equipment.

I've been around to observe everything from personal interest changes, death,
corporate policy changes, ownership transfers, deliberate manipulation... etc.
-- you get the idea -- effect the ability to pull even what were formerly
considered very stable and long-standing, aka "permanent", resources.

If you want to ensure you have access, save your own copy onto hardware that
you own. End of story.

------
AznHisoka
50 mummies per month for the highest plan is way way way too low. It should
10,000 or even 50,000. Think about it this way: the people who are willing to
pay you for this type of service are probably huge publishers who would use
this service to share their URLs to their followers in Twitter, Facebook, etc.

~~~
zackbloom
Doesn't help them much if a huge publisher is paying them $15 a month. That
type of sale would require a SLA too, they really just need a contact email
address.

~~~
AznHisoka
Oh the price should increase too. Right now OP is targeting a imaginary
customer. Who would pay $15 a month just for 50 mummies?

------
MWil
and if Mummify goes down...

Would it be a better option if the "permanents" were shared across
p2p/bittorrent and every unique item had at least 10 shares distributed across
the globe, maybe a max of 20. When one share host goes down, it just picks up
a replacement.

------
acdha
Just to second what donpdonp said
([https://news.ycombinator.com/item?id=6509604](https://news.ycombinator.com/item?id=6509604)),
I think a service like this needs to offer a standard format WARC
([http://archive-access.sourceforge.net/warc/](http://archive-
access.sourceforge.net/warc/)) download.

The whole point of a service like this is long-term access and that really
requires a data checkout option which can be used with other tools (e.g.
[https://github.com/alard/warc-proxy](https://github.com/alard/warc-proxy)).

------
subpixel
I use Safari web archives for a similar task.

But I wonder...isn't it safe to assume that, eventually, browser rendering
engines* will change to the degree that something I saved 4+ years ago is
essentially unreadable?

And doesn't that same potential problem apply to a hosted service as well?

*I'm using that vague term to describe everything I don't understand about how browsers render pages, markup, and javascript, which is a lot.

~~~
dangero
I agree. I think the safest bet would be to keep a static image as well.
Without a static image would will always question if the rendering engine has
even slightly changed the look.

~~~
hexley
Maybe save as a PDF if it's the content you're most interested in.

~~~
subpixel
I save screenshots for that, but when I save an archive it's usually b/c I'm
interested in referencing non-static elements. Like animation, transition,
responsive behavior, etc.

------
jboynyc
Interesting in light of this discussion:
[https://news.ycombinator.com/item?id=6504331](https://news.ycombinator.com/item?id=6504331)

But to trust that something like this to make a _permanent_ copy of stuff I'm
linking to, I'd need to know a bit more about them. Else this is effectively
like using a link shortener -- a single point of failure.

~~~
zek
Hey creator here, we realize there is a bit of a trust issue, thats why we
have the paid plans. Our costs are low enough that, even if you were our only
paying customer left we would be able to keep the service running just for
you.

~~~
alok-g
What Web services are you using in the back-end (that may potentially shut
down)? I need my data accessible for low number of decades.

------
vijucat
I have been using the Scrapbook add-on (see screenshots and manual here : [1])
in Firefox [2] for many years for this; it saves the web page to your local
hard disk, and there are several types of annotations that you can perform on
the saved page. One trick I use is to first run Readability on the page to get
a clean version, and then save that to Scrapbook. With full-text and comments-
only-search, this add-on, all by itself, kept me with Firefox even during the
dark period when Chrome came in and thrashed Firefox on performance :-)

I used to use diigo.com, which does the job quite well, too, before I
discovered Scrapbook.

[1]
[http://amb.vis.ne.jp/mozilla/scrapbook/](http://amb.vis.ne.jp/mozilla/scrapbook/)
[2] [https://addons.mozilla.org/En-
us/firefox/addon/scrapbook/](https://addons.mozilla.org/En-
us/firefox/addon/scrapbook/)

------
lingben
I use evernote premium ($45/year) which is much cheaper and includes a tonne
of extra features.

Compare this to the pricing plan of mummify at $15/month pricing or $180/year.
for that price I could buy a pair of HD with several terabytes of capacity and
copy/paste the whole webpage, code, files and all using httrack.

------
junto
One point to note is that a DMCA takedown targeted at Mummify.it will remove
the content just as it would from the nytimes.com.

If I manually save that content to disk then any DMCA take down doesn't affect
the content stored on my local hard disk.

~~~
toomuchtodo
If it's something personal, I use [http://archive.is](http://archive.is) with
their bookmarklet in Chrome (free; unlimited archiving). It immediately
renders the page, saves a copy with a unique url, and gives me a .zip link to
download the archive.

If it's something I want to submit to the Internet Archive, I use wget with
WARC extensions
([http://www.archiveteam.org/index.php?title=Wget_with_WARC_ou...](http://www.archiveteam.org/index.php?title=Wget_with_WARC_output)),
submit the archive to the IA and notify them it needs to be merged in, and
keep the .tar.gz archive.

Eventually I'll webapp/one-click the whole thing, with an archive to S3 and/or
Glacier.

Disclaimer: archiveteam participant

~~~
kseistrup
+1

------
gabemart
Most of the time, one will not know which pages one links to will disappear in
the future. This service therefore only seems really useful if you use it with
every link you make. That would make the biggest plan short on "mummies" by a
couple of orders of magnitude, at least.

~~~
contextual
Not every page one visits is worth storing. The whole point is to filter out
(and save) the good from the rest.

An aside: Mummify needs browser plugins for all the major browser to make
saving as seamless as possible.

~~~
gabemart
> Not every page one visits is worth storing. The whole point is to filter out
> (and save) the good from the rest.

If this is a tool to "fight link rot", then we're not talking about every page
one visits, we're talking about every page one links to. Presumably, every
page worth linking to is also worth saving.

------
newrenowhore
Really like this, nice work. Is there a way to automatically mummify every
link on an entire website or directory? Your method/service could be an
interesting, easy to use way to recall a read-only version of a site after it
goes down.

------
d0m
Do you rip video on the page too? If so, I would definitely use this service.
I've got videos with my company brand a bit everywhere and I can't find a good
way to download it or link to it so that it stay there permanently.

------
bjackman
Just a thought: the "Link Rot" link should really be a Mummified link. I know
Wikipedia isn't disappearing any time soon, but it just seemed silly to me not
to "realise" the use case!

Anyway, very sexy site design IMO.

------
ChuckMcM
This is pretty cool, I'd be happy with a service that mummified it to storage
I control, since I've had pages that I've pointed to at archive.org vanish
after the current robots.txt was changed.

------
lopsae
The examples of the page don't work. Would be nice that they have some working
examples right at hand to see how the redirecting or the mirroring work on
each case.

------
SmackCat
Your logo looks like a dog with a boner. Cool concept though.

~~~
eniacpx
Thanks. Fedex and Tom Cruise's center tooth.....

~~~
SmackCat
Ha! I didn't realize the Tom Cruise Tooth one until this:
[http://www.buzzfeed.com/lyapalater/you-may-never-be-able-
to-...](http://www.buzzfeed.com/lyapalater/you-may-never-be-able-to-look-at-
tom-cruise-the-same-way-aga)

------
Nux
Let's centralise the interwebs!

It seems like a bad idea, but they do have a point. Maybe when referencing a
link also make a small note to one of these archive sites?

------
nixpulvis
Maybe also offer a callback with the contents of the cached site, so
developers can handle 404 redirects themselves.

------
Pxtl
Things like this should be required when posting links in Stack Overflow and
the like.

~~~
toomuchtodo
Rant: I absolutely hate when someone posts an answer in Stack Overflow and it
points to their blog, which then stops working at some point. What if
Wikipedia was just a collection of links to blogs or other datasources
(citations aside). Put your damn answers in Stack Overflow people!

------
Noelkd
do you get 10 additional mummies per month? Assuming this site stayed open
could be a great help to sites like stackoverflow.

------
Kiro
What about SEO?

~~~
Axsuul
A 301 redirect should pass link juice fine.

------
contextual
This is an example of how great branding can help explain a product. Love the
name, love the look, the copy is clever... but I don't like the low number of
mummifications you get per plan.

I suggest adding more value or lowering the monthly price.

Overall, very cool.

------
itry
First I had to make an account.

Then it doesnt work. Stuck at "caching page".

I hate you.

~~~
zek
Sorry you are having trouble with it. Sometimes the caches take a bit to
generate. Please feel free to shoot us an email at help@mummify.it if your
problem persists.

~~~
tombrossman
FYI, for those of us with Firefox's NoScript extension your site is completely
broken. Nothing loads, it is just a blank page with a list of scripts you
intend to run.

I wasn't curious enough about your service to whitelist your site, but I will
leave a comment asking that you consider providing alternate content.

It is rare to have a site completely fail to load, even with all scripts
blocked (though I accept I will have reduced functionality).

~~~
theg2
You're running an extension that purposely disables what makes the majority of
the web work. Unless you disable it, you can't really complain about sites not
working correctly for you.

~~~
tombrossman
HTML is what makes the majority of the web work, with additional functionality
(optionally) provided by scripts. When I see a blank page I wouldn't call that
'not working correctly' I would call it 'not working at all'.

Maybe we agree to disagree now, and you go on blindly trusting all websites to
execute random code on your machine. I am perfectly content to have new
websites look a little funny the first time I visit.

~~~
zackbloom
"HTML is what makes the majority of the web work" \- Yes, ten years ago.

The modern internet is javascript. The code is executing in a secure sandbox.
If you can get it to do something random on your machine make sure to let
Google know, they'll send you a pretty big check.

~~~
icebraining
It's not just doing something funny with the machines, it's also doing funny
stuff with other websites. Plenty of websites are still vulnerable to XSS and
CSRF.

