Hacker News new | comments | show | ask | jobs | submit login

Disclosure: I work on Google Cloud.

I’m a little confused by this part:

> While MySQL data backups occur every four hours and are retained for many years, the backups are stored remotely in a public cloud blob storage service. The time required to restore multiple terabytes of backup data caused the process to take hours. A significant portion of the time was consumed transferring the data from the remote backup service. This procedure is tested daily at minimum, so the recovery time frame was well understood, however until this incident we have never needed to fully rebuild an entire cluster from backup and had instead been able to rely on other strategies such as delayed replicas.

At first, I had assumed this was Glacier (“it took a long time to download”). But the daily retrieval testing suggests it’s likely just regular S3. Multiple TB sounds like less than 10.

So the question becomes “Did GitHub have less than 100 Gbps of peering to AWS?”. I hope that’s an action item if restores were meant to be quick (and likely this will be resolved by migrating to Azure, getting lots of connectivity, etc.).




We’ve been burned by long download times not only to transfer from a remote cloud, but related to IOPs requirements when transferring tons of small files. Now in each DC, we keep a day or two of backups on several servers loaded with Enterprise PCIE with 40Gbps NICs to reduce that issue. Curious to know the # of terabytes needed (or if it was more of an IOPs issue)?


It sounds to me like the time involved was rebuilding MULTIPLE replicas.


Maybe I overestimated the “significant” part:

> A significant portion of the time was consumed transferring the data from the remote backup service.

I get the time to rebuild part, but I’m curious about the download part.


This was an interesting point, but prefixing it with a call to authority was both needless and distracting. Why did you include it?


I didn't read the Google Cloud part as an assertion of authority, just as a disclosure - if they're talking about competitors (and especially how the choice of competitor negatively impacted GitHub) I appreciate it.

(disclosure - I work for a competitor, not on cloud stuff)


I also appreciate it. Its very common for owners/employees to criticize/attack competitors online anonymously. While the GP wasn't attacking, its just nice to know he works for a competitor.


That wasn’t my intent, quite the opposite actually. I don’t want it to read as “S3 is awfully slow, that’s what the problem was. If they were using GCS then < awesome outcome >”. Knowing that I work at a competing vendor is useful knowledge in this case.


>I don’t want it to read as “S3 is awfully slow, that’s what the problem was. If they were using GCS then < awesome outcome >”.

You don't mention Google at all outside of the opening statement so who would read it that way?


Because if he doesn't disclose it and we find out, it looks terrible and like he's trying to hide something.


If I know a competitor wrote it, I don't read it as negative. To me, it's a welcome courtesy, but as you say, I couldn't assume it if absent; so I'd probably paint a negative picture of S3 with the same albeit unprefaced sentence.

(Plus, I'm always pleased to see someone not call it a 'disclaimer'!)


Disclosure of Interest should be a common practice across all Industry and web discussions. It provides a lot of additional perspective, most of the places do not adhere to this.

Fortunately HN has kept the culture of disclosure intact.


Google cloud is a significant direct competitor to AWS. Anything criticizing AWS reads that way to some extent.


He mentions Google frequently in his posts for context. It's very useful, and there was nothing disparaging that he said.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: