Hacker News new | past | comments | ask | show | jobs | submit login
[flagged] PSA: Internet Archive "glitch" deletes years of user data and accounts (gingerbeardman.com)
110 points by reaperducer 39 days ago | hide | past | favorite | 32 comments



While I absolutely understand the frustration of losing years worth of "preservation-level quality" data, you have to admit that the Internet Archive is a free service we do not deserve. They (the folks behind The Internet Archive) do not owe us anything.


> The Internet Archive serves millions of people each day and is one of the top 300 web sites in the world. A single copy of the Internet Archive library collection occupies 145+ Petabytes of server space (and we store at least 2 copies of everything). We are funded through donations, grants, and by providing web archiving and book digitization services for our partners. As with most libraries we value the privacy of our patrons, so we avoid keeping the IP (Internet Protocol) addresses of our readers and offer our site in https (secure) protocol.

QFT

What an unbelievably pissy and deserving tone from the author. If you want to be an archivist and you've permanently lost data that was stored in a single, external location, hopefully it's a learning experience


According to your quote the data was stored in at least 2 places, but Internet Archive are choosing not to restore it? That's a different type of learning experience.


Random text hunk?

QFT? Quantum Field Theory?

> What an unbelievably pissy and deserving tone from the author.

n.b. they are interlocuting with "Patron Services" :X

Pissy? "My support experience with Internet Archive was frustrating and ultimately futile. They did not adequately address my queries and requests" is a very whitebread version of "pissy"

> If you want to be an archivist and you've permanently lost data that was stored in a single, external location, hopefully it's a learning experience

Yeah, hopefully!


QFT == Quoted for Truth


What were the specif things that were deleted and is there information in there that is damning and so a "glitch" was able to bracket whatever was needing to be scrubbed?

I wonder if in the links/uploads for @gingerbread man may be something censorized.


Many accounts were deleted because of this "glitch", it was nothing to do with the content of the account according to the staff that responded.

I guess somebody made a mistake, yet they are not willing to undo all the damage. And so it goes.


Surely, the loss of some user metadata is a catastrophe on par with the burning of the Library of Alexandria. Perhaps next time, the Internet Archive should consult their vast army of paid engineers and their bottomless coffers to ensure such a calamity never befalls humanity again. After all, what's the point of preserving vast swathes of human knowledge and culture if one can't access their personal bookmarks from 2007?


Give them a break, they do a lot of work, all funded via donations.

If your data is so critical that it cannot ever go down, host it yourself.


That's the point, it's their data too.


Profiles over there have been glitchy for a while now. I gave up on favorites years ago and just bookmarked everything of interest in my browser instead. User metadata is probably a low priority item for their staff. Still one of my top five favorite sites on the web.


Sounds to me that they have some very poorly thought out database schemas with pretty weird foreign key choices... Provenance is not unimportant, but maybe in case of Internet Archive lot less important...

And in the end fixing poorly designed accounts and information linked to them can be hard.


> create new accounts and silently relinking their old uploads only if the new account has same email as the old account

Why can't they just relink all the content and force password resets? Something doesn't add up. Looking at the forums, it seems like someone is doing manual remediation.


Every item is linked to the uploader's email address. If someone changes the email of that account, all of the items disappear. This is because, as far as the system knows, the item's email address is not the same as the new email address.

What I find strange is that the uploader's email address is made public on every item they upload. This is not good, especially if someone uses an alias as their account name but has their full name as their email address. It would be surprisingly easy to scrape every email address from every uploader and cause a moderate leak of information.


Just re the username thing, from what I've seen in the past even changing an active username just once seems to prevent the same user from reverting back to their original username.

So could be their account system is a bit primitive in that regard (given that official replies for this news makes it sound like they're utilizing the existing system for restoring content to new usernames, rather than reverting directly via backups). I don't doubt there will be some official clarification though.


Have they actually lost the data these people contributed? Or just the metadata which links it to their account - such that the data is still accessible if you link to it directly (not via the uploader’s user profile)?

I think the answer is the second not the first, but this blog post isn’t very clear as to what is going on.

Losing archived data - that would be more serious. Losing metadata linking it to the uploader - far from ideal but not as bad.


It was as clear as I could make it with the information available. The answer is somewhere between your two hypotheses.

Lots of data is no longer accessible, and links are broken. Uploads are safe but some other archived data is gone. There's more to an internet archive profile than uploads, so I'd say it's somewhat serious of a situation.

We don't really know the true extent of the problem but we can easily see what's missing (see the lists in the blog posts)


Apparently they're working on restoring stuff?

also:

> This story is still developing, and the press have been notified

jeez calm down with the self-aggrandizing


You get what you pay for.


The Internet Archive is one of the wonders of the digital age. I get a proud humanistic feeling every time I walk by their building on Funston and Clement, a former temple that totally looks the part.

There was a fire there about a decade ago that took out a small addition and a restaurant next door. That addition had a "9/11 in pictures" window display that was interesting and kind of eerie. Luckily they were able to save the main building, or the internet as we know it would be quite a bit worse, and the internet as we knew it might be gone.


This part is a bit odd, from the uhh... "support":

"and, for the last time, the old user name is no longer available"

As we all know, things in databases are immutable and can never be changed.


Geeze, Karen take a breath. That is sad news, and it really sucks to lose data

At the same time it is a shoestring budget nonprofit, doing tons of good for the world, thanklessly.

It is not a right, a government service, or even a service you pay for. You have no right to demand anything, just because they are kindly providing you value.

Cut them some slack, and stop acting like they own you anything. Maybe even gasp thank them for doing so much good, with no support, for everyone in the world, for free


I donate during their annual fundraisers. I don't imagine for a moment that entitles me to anything. But I'd settle for some common courtesy, peace and love from my fellow humans.


Is there an internet archive archive?


I hear these folks keep a backup of everything: https://en.wikipedia.org/wiki/Utah_Data_Center


Yes:

> Do you backup my files?

> Yes. We duplicate/backup all files at various locations


Contrary to popular belief, it’s not turtles all the way down.


Free service loses data and offers poor customer service! News at 11!

I love how the author unironically links to a donation page but doesn’t mention anything about donating to Internet archive.


The internet archive where data is kept safe forever has deleted a great many accounts of an unknown number and refuses to map the favorites of these now non existent accounts to new accounts signing up with different emails. They call this a glitch. The press has been informed.


Feels bad man... Hope this all gets fixed.

In other news, welcome to the datahoarder community.


Do better man with ginger beard.


I don't actually have a ginger beard. It's a long story. I try my best, and hope others do too.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: