
Archiveis: simple Python wrapper for the archive.is capturing service - ingve
https://github.com/pastpages/archiveis
======
jwilk

      domain = "http://178.62.195.5"
      save_url = urljoin(domain, "/submit/")
    

Huh? What's this IP? Why does it send stuff in cleartext?

~~~
cowholio4
That used to be one of archive.is IPs but not since feb. Definitely worth a
pull request.

[edit] adding screenshot from passive dns lookup
[https://m.imgur.com/a/uW9eL2h](https://m.imgur.com/a/uW9eL2h)

------
jaytaylor
Maybe of interest:

I recently open-sourced an archive.is package and command-line client written
in Go:

[https://jaytaylor.com/archive.is](https://jaytaylor.com/archive.is) (aliased
to
[https://github.com/jaytaylor/archive.is](https://github.com/jaytaylor/archive.is))

    
    
        go get jaytaylor.com/archive.is
    

So far I've been using it and it's worked reasonably well :)

One cool thing- I actually used this python package as a starting place for
how to automate archive.is submissions.

\---

RE: archive.is: The person running archive.is deserves a lot of credit, it is
a remarkable system. It may not be immediately clear how how challenging it is
to capture and bottle up (safely and _reliably_) the contents of arbitrary
URLs until you actually try to make such a thing. Archive.is person has
implemented it at scale, and plans to keep the content available indefinitely
[0], all on their own dime.

Mad props.

[0] [https://archive.is/faq](https://archive.is/faq)

~~~
m-p-3
Yet I still can't shake the idea of a potential hidden agenda behind the
effort (not saying there is, just curious).

Why go through so much trouble and financially support it, while it doesn't
seem to bring anything to the maintainer to financially support it over time?

~~~
stevekemp
For a long time access to archive.is was blocked to the whole of Finland,
which was frustrating.

------
iopuy
I've been a part from the archiving community for a number of years. Is
archive.is now considered the defacto standard for snapshotting sites?

~~~
app4soft
> Is archive.is now considered the defacto standard for snapshotting sites?

No way!

 _web.archive.org_ is defacto standard for snapshotting[0] sites[1]!

[0]
[http://web.archive.org/save/https://news.ycombinator.com/ite...](http://web.archive.org/save/https://news.ycombinator.com/item?id=16952954)

[1]
[http://web.archive.org/web/*/https://news.ycombinator.com/it...](http://web.archive.org/web/*/https://news.ycombinator.com/item?id=16952954)

------
mintplant
I wrote something similar some years back for a bot I used to run. Given an
input URL, it concurrently attempted to create snapshots across as many
archive sites (archive.is, Wayback Machine, etc) as possible, with caching,
retries, smart backoff, and continuous updating of the destination as HTTP
responses arrived. Always meant to release it as a standalone library, but
never got around to doing so. I wonder where the code is now...

~~~
robryk
Gwern has released such a piece of software:
[https://github.com/gwern/archiver-bot](https://github.com/gwern/archiver-bot)

See also [http://www.gwern.net/Archiving-URLs](http://www.gwern.net/Archiving-
URLs) for a description of their usage of it.

~~~
mintplant
Sort of. Looks like it archives to each service serially, doesn't retrieve the
resulting snapshot URL, and doesn't do any caching, retries, etc. If one
service goes down, starts responding slowly, or blocks your requests, it'll
break. This is like the first version of my own implementation; it took a lot
more work from there to make it robust.

------
colejohnson66
Why is everyone using archive.is now? What happened to web.archive.org?

~~~
toomuchtodo
Archive.is is a bit superior in JS processing compared to WayBack, and ignores
robots.txt

~~~
OskarS
Yeah, but archive.org is run by a reputable non-profit with strong governance
and support, which we can be fairly confident will remain operational for the
foreseeable future. Who runs archive.is? How certain are we that they will be
around 1, 5, 10 years into the future? How certain are we that the links will
still work?

This is archiving that we're talking about, after all.

