Hacker News new | past | comments | ask | show | jobs | submit login

It seems all activity from the past two days has disappeared -- backup storage is something you never regret paying for.

You've probably all seen it by now, but from @HNStatus: [1]

  Server back up and seemingly stable. Now restoring our latest backup to recover from limited filesystem corruption.
[1] https://twitter.com/HNStatus/status/420179162138021888

Yeah, I lost about 200 karma (what was about 15% of my total) in the crash.

Good thing they're just silly internet points :)

I lost 25% of mine! A whole point!

I lost 50%

I hope at some point things reverse, and instead of accruing karma we shed it. When we reach zero--

Thank you for making my paltry 22 lost karma points look so, well, so paltry. ;-)

edit: to clarify "lost" karma

If this were reddit, you'd be getting tipped in Dogecoin as well.

Here, unfortunately, Internet play money is Serious Business.

Just gave you an internet point for having perspective.

And here I was wondering what I was being downvoted for. Turns out: nothing, a post just disappeared :)

Gave you a point so you can feel special again. Here's looking at you, kid.

Indeed, the first time I got something on main page and my karma skyrocketed from 5 to 80 something, I lost all.

Just internet points, fortunately :)

Well you're number 1 on bestcomments [0]. Apparently, this is the most effective way to rack up karma.

[0] - https://news.ycombinator.com/bestcomments

So, you had about 1333 points?

It looks like you've already exceeded that now... Well done! The internet gods must favour you.

I suspect the favour of the Internet Gods would be greater if the user had 1337 points ;)

I gave you back an Internet point.

I lost few comments :(

Now commented back.. Hope the thread owner reads them and replies back.

Here you go, I gave you one back.

Yes I noticed this as well from (the lack of) my own comment activity. I don't comment that often but I had written something yesterday that has disappeared.

On a more general note if anybody has backups and they aren't regularly tested restoring them, then you really don't have backups! As an added bonus, regular restoration tests let you practice for the "real deal" and you know how long the entire process will take.

One of the nicest ways I've experienced of making sure your backups are good is to sync up your development machine with them occasionally. Obviously there are situations like HIPAA where you can't do that, but if you can, do. You'll catch problems with your backups long before you actually need to use them.

This. Sometime last century I had to restore an old codebase from a tape backup. Step one find the correct drive...

We'll never need that old repo again.

IMO data loss is less a symptom of untested backups than it is of developer-managed systems. I wonder if ycombinator has a sysadmin (who isn't also a developer)?

That's a shame, I was really looking forward to the comments for the article below. Unfortunately I had it loaded, but hit Ctrl+R (like I sometimes do) and lost it forever. :/

The google cache got a few comments, but very few.




I missed the original posting of https://news.ycombinator.com/item?id=7015438 and it's right up my alley (now it's tomorrow morning's reading :) )

I've reposted a link to the original here - https://news.ycombinator.com/item?id=7015767

I really don't want credit, I just want the original question and the couple of responses to have the opportunity to see the day of light, despite the HN outage. So please vote up the original post!!!

"backup storage is something you never regret paying for"

Indeed, but the time and CPU it takes to do backups can be a very nasty trade off.

For that matter, as I write this, due to a mistake I made after a 85 minutes power outage yesterday, I'm just now doing my daily incremental backup of my home machines to an LTO-4 tape drive. Keeping that drive fed fast enough to prevent "shoe-shining" took some effort, Bacula spools up to 100G at a time to a partition on a single 15K disk on a separate controller. But if I had a LTO-5 drive, from what I've heard there's no single disk in existence that can keep up with a drive (not counting SSDs, which are a very poor match for this use case).

My feed array to the LTO-5 drive is 4 2TB (Hitachi) in RAID-10. Backup strategies, much like build systems are some constant factor harder than they appear.

I'd like to migrate to ZFS but have yet to. Still just running EXT4.

HN should be on a replicated data store like Riak. Losing a node or two shouldn't take the system down, or should at least run in a degraded state (read only) until hardware is restored.

Were they not using raid or performing multiple database writes? A mechanical hard drive failure is pretty common and can be mitigated fairly easily.

RAID arrays fail all the time; the system has famously been one server, and the only visible recent scaling work has been front end caching.

edit: the code has been public for a long time, and there is not a database to replicate. the site ran as a single server for years, and it is unlikely the front end caching has changed anything about the "database" components.

Since RAID failures actually are somewhat common, they are probably looking at a higher level replicated storage system now, a la DRBD, or some kind of distributed file system, a la Gluster.

Deosn't RAID usually at least give some warning if you watch the syslogs? (Genuine question, I am not a sysadmin, we have linux servers with hetzner on software raid 1 and a couple have had single-disk issues which we spotted straight away in zenoss and had hetzner replace the disk. Am I incorrect in thinking this is normal?)

RAID is a method for surviving hardware failure. If you have a software failure in, say, the VFS layer, RAID will happily accept the order to write garbage all over your inode trees and will carefully store and make sure that all the appropriate disks can return the same garbage every time. And yes, it should warn you when you need to replace a disk which is no longer returning the right garbage.

Similarly, if you rm -rf a vital directory tree, RAID can ensure that it goes away reliably.

yes you're right. so replies will now switch to how they don't stop you from deleting data, because... well, i have no idea why. it seems to just be a law of nature.

DRBD and Gluster are not any more resilient to filesystem corruption than a RAID device is. In this kind of case you hope for either real-time replicated storage on a completely separate physical host or very recent backups.

What are DRBD and Gluster if not real-time replicated storage on completely separate physical hosts?

Filesystem corruption without hardware failure is far rarer in my experience. Have you seen an instance that wasn't a proverbial user error?

You never ran reiserfs I see...

Back in ~2004 I watched IT spend a whole day recovering our 60-person startup's main Linux NFS server, due to a software bug in the storage driver. Had to rebuild the whole system from backups.

Yes, I have in fact, in a DRBD configuration. The bug was esoteric, but it happened and was not the result of user error. DRBD and Gluster both allow faults in the VFS layer to propagate to all replicas.

Gluster should by design I think avoid replicating filesystem metadata corruption (but would replicate internal metadata issues in files on top of the filesystem) but DRBD won't... At high volumes I still regularly break Gluster but it'd probably be OK for lower bandwidth/ops use. Not sure what the HN disk usage pattern is though.

IIRC Glusrerfs was the thing that gave me multiple identically-named files in the same directory. Useless.

Or, I dunnoh... writing to S3? ;-)

Databases? What databases?

HN is persisted to flat files.

I guess he meant having two separate logs. One for production, and secondary with his journal. In this case you could restore from backup the original data, and then replay rest of stuff from the external log. That's the solution I'm using with really important data where I cannot afford any data loss, even if down time is acceptable. On commit, it committed to two separate systems, but the secondary system is only journal which can be replayed.

What's odd is that if you look at your submission history, you can up vote your own submissions.

Maybe that's nothing new, but I just noticed it. Seems like a bug.

> What's odd is that if you look at your submission history, you can up vote your own submissions.

It doesn't seem to do anything though.

Several comment threads that I was following when it went down are gone ("No such item."), although their original links are still valid.

Indeed you are right; the original links still work for me. Though the ones I checked so far look exactly the same as the Google Cache version, so I don't know what happened there.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact