This also goes for Google Drive, Dropbox, and many other websites (if not all)
1 Petabyte (and they have multiple)
S3 - $30,000 a month, $360,000 a year
S3 - reduced redundancy - $24,000 a month, $288,000 a year
S3 - infrequent access - $13,100 a month, $157,000 a year
Glacier - $7340 a month - $88,000 a year
They could also always use tapes, for something as critical as the data that is the blood of your business.
Imagine if facebook lost everyones' contact lists, how bad would that be for their business? Backups are cheap insurance.
Same problems with buying things like antivirus software or even IT management utilities; when they're doing their job, there's no perceivable difference. It's only when shit goes sideways that the value is demonstrated.
Hell you could take this a step further for IT as a whole; if IT is doing their job well, they're invisible. Then they can the entire department, outsource to offsite support, and the business starts hemorrhaging employees and revenue because nobody can get anything done.
Yeah, but what exactly IS the benefit? The business doesn't die if something really bad happens? Is that really important though?
Consider the two alternatives:
1) The business spends $x00k/year on backups. IF something happens, they're saved, and business continues as normal. However, this money comes out of their bottom line, making them less profitable.
2) The business doesn't bother with backups, and has more profit. The management can get bigger bonuses. But IF something bad happens, the company goes under, but then what happens to the managers who made these decisions? They just go on to another job at another company, right?
I'm not sure I see the benefit of backups here.
I mean the way management gets on me when we have outages, you'd think that was a significant priority?
Tumblr is apparently fragile and tech-debt laden on engineering side, stagnant on users, and unprofitable. At a certain point, it's a coherent decision to just say "a few days of downtime would seal our fate, the business can only be saved if everything goes right", and not spend any money on mitigation.
So far as Myspace (or Tumblr apparently) is concerned, it is "somebody else's computer of uncertainty".
1PB is nothing today.
1 petabyte = 125 drives = $17,500 (one-time cost).
It will probably cost more to connect all these drives to some sort of a server. Though 125 is within the realm of what a simple USB should be able to handle (127 devices per controller).
Getting petabytes of storage isn't the problem, transferring the data back and forth is.
Taking a full month to recover a downed social media platform isn't really acceptable, but it's still better than being literally unable to recover it at all. Spending a small fortune to ship hardware to an AWS datacenter and convincing/paying them to load it directly would probably also be worthwhile, when we're talking about simply losing a $500M company. If the claim here about "no backup" is true, it's so profoundly stupid that everything I know about best practices sort of goes out the window. Approaches that any sensible person would consider unacceptably slow and unreliable are still a step up from a completely blank playbook.
(I guess the theory might be that Tumblr is such a trashfire it can't be restored, or would lose so much value in days/weeks of downtime that there's no point in even planning for that. Again, I don't really know how you run cost-benefit analyses when it's not entirely clear the project has benefits.)
That's not even talking about availability, as you are now getting into the realm where it starts to get questionable whether even Amazon has enough backhaul capacity available at those locations so that you can actually max out 50+ 10Gb connections simultaneously.
Remember when Microsoft lost all of the data for their Sidekick users? Basically they were upgrading their SAN and things went badly.
Why is your name green?
(Don't ask for a rigorous definition of "one click away", though.)
but the availability numbers speak for themselves :/
- The mobile and desktop sites are completely separate products with vastly different behavior. Some privacy features (relevant to both) can only be accessed on one, some on the other. Tags are rendered in all-lowercase on mobile, but as written on desktop. Block quotes on desktop render as enlarged-font cursive on mobile, for some awful reason.
- Tumblr support(s/ed) font coloring, with no documentation of that fact. You enable it by using the HTML editor and picking among color tags with Friends-themed names like "Monica Pizazz Orange". Oh, and the preview feature won't honor the tags, but actually posting will.
- NSFW content is flagged even in drafts, but if that content is reviewed and approved, it's automatically posted publicly, not returned to drafts where it started.
- Tumblr's desktop sign up page use(s/d) semi-random images from the site as backgrounds. Yes, they did serve cartoon porn to people trying to make accounts.
- Certain posts were impossible to view. Tumblr accounts can have their own themed pages, or simply be popup sidebars over the main news feed. Tumblr "read more" content hiders took users from the news feed to the poster's account - if that account was in popup format, a readmore opened from the wrong location would simply force a circular redirect.
- All Tumblr links are actually pushed through a site-specific forwarding system to track users. As a result, Twitter and many other sites are inaccessible because they view all link clicks as bot traffic from a "single source".
It looks like I was simply wrong on #2, thank you; I remembered it as something that had been around for ages but was noticed, then publicized. If it was found before a planned announcement, that's different.
#3 was fixed within a few days, but frankly I think "posting people's drafts with no warning" is a "damage done" thing, the same as an email client sending drafts to all listed recipients. There are reasons like the "private post" option that you would draft something and never openly publish it, and even beyond that it's reason to draft anything you might not want to publish as-is offline instead of in the site's draft feature.
#6 is complained about by plenty of other people, and happens to me perhaps 90% of the time. I realize I missed one thing: it's mobile-only. Opening a Twitter link on mobile produces a "you're rate-limited" blocking page which sticks around even if you try again later, but choosing "open in Chrome" to escape the Tumblr app immediately solves the problem. I haven't seen comparable behavior in any other app where I've followed Twitter links. Mobile-specific implies it's not purely the link tracking, granted, but it's very much a real Tumblr-specific issue.
aws s3 rm bucket —-recursive
It won’t let you just go into the console or delete the stack that made it if the bucket isn’t empty.
aws s3 sync --delete ./ s3://your-bucket/
Do you think it would be good to extend said argument to say scp / ftp clients?
From the S3 management console user guide:
> You can delete an empty bucket, and when you're using the AWS Management Console, you can delete a bucket that contains objects. If you delete a bucket that contains objects, all the objects in the bucket are permanently deleted.
On the other side the Yahoo services were so heavily integrated that it was hard to carve out any piece of them, and the few times we tried it was a slow and painful process because Yahoo’s piece was glitchey and unreliable outside of it’s home turf and the Tumblr engineers defensive and argumentative about everything and not willing to help.
Having worked at Yahoo, I understand this stance.
Dell used to offer an online backup service. It wasn't even running on Dell equipment!
Basically they acquired a company that offered the service, and while it would be "nice" if a Dell company ran on Dell gear, a lot of the time it's simply impractical/expensive to overhaul things.
None of the services I've been backing up (Goodreads, Trakt, DeviantArt, Tumblr) are currently covered by Timeliner, but the extra twist of assembling all your data into a single timeline sounds kinda cool, so maybe it's worth contributing a few data sources.
I've had a thought about a monthly "data preservation day" concept, (on the 11th, because: https://xkcd.com/1140/, and yes, I know: https://drhagen.com/blog/the-missing-11th-of-the-month/)
A multiplatform archival tool would be useful.
If one doesn't trust Google/Dropbox et.al. to handle the data, then just use two of them, or use a personal solution and one of them. Just don't use your home rolled backup system only because you don't trust any of the online storage providers.
Of course, but that is not a bad thing per se. Digital data archiving itself is unknown territory. What should happen to the data you put online is uncertain, even in the short term.
The analogy of a "cloud" is revealing. A cloud is fleeting by nature.
Nothing last forever online, no matter what they say.
And that is not mentioning compatibility issues or reading old files made with outdated apps.
Some files that are just above 10 years old can be hard to retrieve today.
Number of my files lost by cloud drive providers: 0
Number of my files lost by me because I'm a dumbass: Tens of thousands, including some of most cherished family photos
Google was preventing the sharing of the files, but that should have been it.
(I'm a googler, but don't have any direct info on this specific incident)
If you want additional peace of mind, you can spend a few pennies and use restic to do regular incremental backups of the archive to Backblaze B2 to add yet another storage location and version index.
If you have/can get command line access to your mailboxes, mairix is pretty good at indexing and searching.
Which is it ?
If you hope it stays forever, you should assume it may vanish in a few seconds.
Preferably coming from Apple as new Time Capsule. But All these vendor want is Cloud Services Revenue.
Interestingly There are a few Android handset maker with Wireless + Backup devices. That backup while recharging your battery.
It's not necessarily sites going out of business/losing data either. On Bandcamp, Melora Creager used to have three songs up called The Willow Tree Tryptych. I bought it but it's no longer available for download in my account, though it's listed there. I'll have to pirate it somewhere since I don't seem to have a download of it. I'll have to figure out that later since Spotify atrophied my pirating knowledge.
Always archive your digital purchases when you can.
 There was some kind of deal with Trusonic/GarageBand.com where artists were able to access their tracks uploaded to mp3.com and transfer them to GarageBand.com for about a year from 2004 to 2005. It is unclear how many people actually did the transfer (I had an mp3.com page for an electronic music project and was never notified about this). GarageBand.com in turn closed down in 2010, offering migration to iLike. iLike was acquired by MySpace and rolled into MySpace Music in 2012.
I wish I'd archived the band bios too, because now they're completely out of context, just some band names and song titles which aren't Googleable in anyway. If any of the bios listed the musicians' names it'd be interesting to see what they're up to now, 20 years later.
(Ouch, that really was 20 years ago.)
Also heavily used drip.fm. When Kickstarter decided to change the service to Patreon-lite I asked about archiving the site because of all of the extra info (forget about the music, I wanted the metadata). They told me they couldn't do it.
I never made any money from mp3.com, but I do have a couple backpacks that they sent me for no real reason--a glorious reminder of the dotcom era.
Though a laudable effort, it seems as much a single point of failure as any other site.
That, and with more diverse stuff in one place, there's also more diverse interest in keeping it going.
so they're not completely centralised but may actually be trying to decentralise things a bit.
There are two mirrors:
Absolutely, otherwise you might as well just be renting.
Also worth noting for this to be an option, you can’t purchase DRMed media. What good is your backup if you can’t decode, authenticate or play your own media?
Though, Waffles had a sizeable fall in activity in the last couple years—notably coinciding with the closure of What. One would think that What-expats would bring in fresh blood.
In 2013, MySpace suddenly purged most of its users’ content, including blogs, custom profiles, videos, and posts. There was no sunset, no death announcement that would allow active users to round up their data. It was an astonishing and quietly reported loss.
I now have an external HDD in a portable fire-safe that I know could go from 100% working to 0% working at any moment. The thing I liked about optical is you could have some hope of recovering most data as the media degraded, and basically all with judicious use of ECC. It's a shame.
Also, next to no one knows what those terms even mean.
They could be a safe choice if you know exactly the type of media you're recording on. I'd just like to caution people against assuming by default that their old CD-Rs will be stable long-term.
Is there a better source for this claim somewhere?
"As a result of a server migration project, any photos, videos, and audio files you uploaded more than three years ago may no longer be available on or from Myspace. We apologize for the inconvenience. If you would like more information, please contact our Data Protection Officer at DPO@myspace.com."
Either way, the message in that banner makes it sound both more severe (includes photos and videos too) and less certain. It's also strange that this is suddenly getting attention today if that banner's been there for so long.
You know I'm starting to think that MySpace isn't well run...
Screenshot of the banner i see for reference: https://imgur.com/GDrYqST
It looks like one of their triages has been to link to YouTube copies of tracks if available.
The artists' pages were completely blank, barring a few pictures and a description extracted via the Wikipedia API. No music available at all. This is was in stark contrast from the original MySpace days when the profile pages would be chock full of streaming songs, tour announcements and interactions with fans.
Alexa Rank in United States 2,079
SimilarWeb Global Rank 5,260
SimilarWeb Country Rank United States 1,644
Total Visits ~ 7.53M
I assume Alexa has moved on from such tracking methods?
"Alexa's traffic estimates are based on data from our global traffic panel, which is a sample of millions of Internet users using one of many different browser extensions. In addition, we gather much of our traffic data from direct sources in the form of sites that have chosen to install the Alexa script on their site and certify their metrics."
"Q: What is the “data panel”?
A: Alexa’s data panel is the sample of global internet traffic that is used to calculate Alexa Ranks and estimate non-Certified metrics. The panel is comprised of millions of internet users using one of over 25,000 different browser extensions."
SimilarWeb works similarly too (I actually like it a bit more):
"We leverage hundreds of sources which we categorize into 4 distinct groups: 1. Global Panel Data from hundreds of millions of desktop/mobile devices 2. Global ISP Data from partners with millions of subscribers 3. Public Data Sources from over a billion sites and app pages every month 4. Direct Measurement Data from hundreds of thousands of sites and apps"
What I found disturbing was the login screen that had more than the usual cookie notice:
> I understand that if I choose to post or share any sensitive data (defined as data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs, trade union membership, genetic data, biometric data, data concerning my health, sex life, sexual orientation, or children’s data) on Myspace that Myspace will process that data in connection with making the Myspace services available to me and expressly consent to such processing.
I can understand them wanting to ban my far-right-wing-neo-capitalist ramblings about Brexit, my racist 'jokes' and my Heaven's Gate religious rants but 'trade union membership'? Is collective bargaining in the workplace forbidden already?
What I found even more disturbing was how I was one or two clicks away from getting a prostitute. I don't ever get that on the normal internet I know.
I clicked a few other profiles, seems there are marketing losers who didn't get the memo and that is about it within the several astronomical unit search radius.
There are also the 'stars' on there and you will see the same faces. Katelyn Ryan is like the new "Tom Anderson" but I have no idea if she is a bot or if she wants to have my babies. There is no way of finding meaningful information out for that 'connection'. So to 'connect' with a zombie on myspace you would have to Google them.
It is a very bizarre website. It is built by zombies for zombies. And to think there was a time when I could find people on my street and say 'hi' to them via myspace, for them to be real and communicative.
Why they don't just pull the plug I do not know. Presumably they prefer paying the bills. It is shocking to find such a soul destroying site when you think how much time and money has gone into it.
I think mySpace is worthy of study. Boring Facebook blue was what people wanted, design and redesign didn't keep them on mySpace. Adverts were another thing, by the time it had to pay the billions the likes of Murdoch had spent on it the adverts had to be laid on way too thick. We get told the failure was in allowing user generated themes but that was truly creative rather than selfie-narcissistic. People were engaged in something, not zoned out scrolling through empty lives.
I didn't get my myspace share button. Which is stupid. how hard can it be for a social media site to have a 'share on service x' button? I only wanted one for retro comedy value - the old Twitter 't' logo instead of the bird, the aol email address etc. - just needed that myspace link to complete the set.