I built 2 CloudFormation templates to allow you to easily spin up a ton of these things across multiple availability zones. It uses the Amazon Linux AMI that exists in each region instead of the ones listed, and builds up the dependencies and the application on the fly. You can run either of these two templates in any region and it should just work.
4. Browse and select the file you downloaded from above
5. Click Next.
6. Fill in the parameters here ( # of instances, The nick you want to be tracked with at the archive team site, the spot price you are willing to pay, and optionally a keypair if you selected that file).
7. Check the box at the bottom acknowledging that the template will create IAM resources ( used by the host to bootstrap )
8. Click Continue.
9. Tags if you want, or click continue.
10. Review. Click Continue.
11. Close.
This will launch however many instances you told it to, as t1.micro’s, as the spot price you set it to. When you want to stop, you just go and delete the stack in this console and everything should go away.
Running this right now in US-West-2, spread across all 3 AZ's there, about 90 instances total, and cranking through things.
For those having trouble with the EC2 instructions, I thought I'd point out that I think that the Archive Warrior[0] (which is much easier to get up and running on your laptop, etc) running over my tethered cellphone is my most performant client.
Yahoo don't appear to rate limit mobile devices / IP blocks as aggressively as everything else (probably because cellular providers tend to have many customers behind one IP).
When your spot instance gets killed mid-download, does the Warrior system handle that and re-assign it to someone else immediately?
Or was your spot instance assigned some URLs to download with the assumption that your Warrior would be reliable, and now they won't get reassigned until they check them all at the end, which may be too late?
I always hate it, when companies remove user generated content from the internet. Why doesn't Yahoo just send some Dvds with the content to Archive.org?
...then clearing it with legal and getting approval within the org to actually /do/ this. AFAIK, this would be unprecedented, but would probably win Yahoo! a lot of fans when they need it most. I have no real hope for them or anyone in a similar position to provide user data back to community custodians.
Several groups tried to salvage as much of Geocities as possible before Yahoo killed it. They got most of it (about 1 terabyte) and you can fix most geocities.com links by changing them to point at reocities.com instead. The main reason is that it's user data and deleting it is rude. Second is that you can do interesting analysis on a terabyte of user accounts during the boom of the internet. http://contemporary-home-computing.org/1tb/archives/3297 The third is that it's history! NSFW example from geocities http://contemporary-home-computing.org/1tb/archives/2736
When bits are essentially free, there isn't much of a reason not to. Could you imagine how fascinating it would be to be able to dive into the everyday culture of 100, or even 1000 years ago? The anthropological impacts of archiving day to day life is huge. For once, history may be written by facts, rather than the victors.
That may be a bit naïve, but who cares if they aren't hurting anyone.
Download one of these files:
With a keypair ( so you can login to the host) http://files.wordsaboutbytes.com/yahoo-messages-save.cf.txt
Without a keypair ( can’t log in locally, but it will run) http://files.wordsaboutbytes.com/yahoo-messages-save-nokeypa...
Then:
1. Open the console ( https://console.aws.amazon.com )
2. Go to CloudFormation
3. Give your stack a name
4. Browse and select the file you downloaded from above
5. Click Next.
6. Fill in the parameters here ( # of instances, The nick you want to be tracked with at the archive team site, the spot price you are willing to pay, and optionally a keypair if you selected that file).
7. Check the box at the bottom acknowledging that the template will create IAM resources ( used by the host to bootstrap )
8. Click Continue.
9. Tags if you want, or click continue.
10. Review. Click Continue.
11. Close.
This will launch however many instances you told it to, as t1.micro’s, as the spot price you set it to. When you want to stop, you just go and delete the stack in this console and everything should go away.
Running this right now in US-West-2, spread across all 3 AZ's there, about 90 instances total, and cranking through things.