Seems like they have it mostly resolved, the releationships were restored from a backup 3 hours prior to the incident, and they are working to restore the data for the 313 (sic.) items still affected: https://gitlab.com/gitlab-org/gitlab/-/issues/348547
Sorry - I meant "as cited", i.e. there were exactly 313 items still requiring attention (presumably number of links that were created between the last backup and the incident.) This is a very low number, bordering on something you could fix by hand or at least a small script.
Worst case in 2021 should be a daily full backup + transaction logs. That's enough to get you point in time recovery.
On AWS, EBS snapshots are incremental (I didn't know that until studying for AWS certification), so you could schedule a snapshot every 5 minutes if you wanted.
Looking at the timeline, I'm just gonna repeat something I said in another thread this week: it's easy to underestimate how long it takes to do basically anything with a few hundred GB of important data.
For a full backup of a huge DB a three-hour window is pretty small, but I doubt that is what is happening here.
A far more common setup is regular full backups, often daily though sometimes more/less frequent, with much smaller transaction log backups at high frequency, perhaps every 15 minutes, between. That way you lose at most what has happened since the last log backup, and you can restore to any point in time between the full backup and the last log backup. It takes more effort to restore as you need to first restore the full then restore (replay) the log backups in sequence until the point you care about.
Sometimes there is a third layer between: differential backups. These are usually much smaller than a full backup, while also smaller (and less faf/time to restore) than the log backups for the period they cover, but don't offer point-in-time recovery.
Out of curiosity, how do transaction logs handle things like created_at fields, or randomly generated UUID, which rely on contextual data? Is the server time/rng seed faked for each replayed transaction?
The transaction log is everything that gets done, not how it is done, so it can be replayed reliably. How values that are random, time sensitive, or otherwise arbitrary, are derived during a transaction is not important. What is logged is the fact that the values x/y/z were recorded in row 123,456 which is in page 987,654,321¹ which means that when the log is replayed you end up with exactly the same state the original database was in at the point the log is run to.
[1] in fact it could just be logged at the page level, the granularity of the log structure will vary between systems, if logged at the row level it may be the case that the physical datafile after restore is not exactly the same but the data will still be “random” values & all.
In most cases, transaction logs/write-ahead logs will contain the return value of non-determistic functions like created_at or random uuid, instead of the function call.
I was confused by the title. Apparently epics are a way to organize issues by themes across projects. GitHub doesn't have this concept, in case anyone else was wondering.
To add to the confusion of the agile term, the lay term and the company name, I know epics as some part of the React state-management library, Redux Observable...
Huh. While trying to figure out the reasoning behind the name used, TIL that "epic" has acquired a new meaning of "(computing) In software development, a large or extended user story."
The name is an extension of '[user] story' - in literature, an epic is a collection of related stories (as in, The Epic of Gilgamesh), so they decided to call a collection of User Stories an Epic.
Scroll down to "break issues into actionable tasks", and they have an example that's even labeled as an "Epic". They are definitely headed in the same direction as Gitlab.