> I fell asleep at my desk. In a dream, the ghost of a butterfly from the future whispered to me "the dollar hit 0 bitcoin". I wake up, people around me dabbing the blood from their eyes. Progress has been made but not enough. I crack my whip -- "we aren't getting to zero like this!". The flogging will continue until morale improves. Expect another update after I make my rounds. I'm sorry I've let you down.
Jan 12, 03:11 UTC
But not before archive.org saved it for posterity: https://web.archive.org/web/20180112034351/https://status.kr...
But definitely not a good moment to make a joke.
I guess it is all hands on deck and there isn't a large enough team to do rotations of shifts? Or the issue is very complex and you need all your best guys around to fix it.
As someone who runs a high availability website, I can feel for them, but it is really bad to be offline this long. I would look to make some changes to that team or address an unstaffing issue or something. This should not happen. This should have been preventable/foreseeable and a risk mitigation strategy could have been in place.
Bummer because I’m looking to leave Coinbase. Looks like everyone in this space is nerf-grade.
They don't have a single person who can use Photoshop and put up a proper maintenance page? They didn't already have a failover maintenance page? This shit is Web Hosting 101 stuff from decades ago.
This is new levels of amateur idiocy.
Feel free to explain to me how the current result from http://kraken.com/ is at all even remotely professional.
Kraken was doing $711M in daily volume before the maintenance. 
Only the top 8 exchanges have cracked a billion in daily volume. 
Starting an unannounced maintenance in the middle of the day is crazy, even if it was only for the original 2 hours. Over 24 hrs now.
I don't know if this is due to incompetence or intentional.
If they announced the site was going down for major upgrade the smart thing for users to do would be to transfer their crypto to their offline wallets. Possibly withdraw their fiat also. This is what many do for a major fork of a single coin.
Separately, there is for sure going to be tons of money lost because Kraken supports margin trading and the market took a 15% dive right around this time, I suspect there are at least a few that got a margin call and are screwed. These people would have been able to manage their positions if the service didn't have extended and unannounced downtime.
Going to kraken.com tells you all you need to know about these folks, all bad.
Edit: current status: https://status.kraken.com/incidents/nswthr1lyx72
So basically a software upgrade/bug can now cause a bank run? Interesting times.
That may help stabilizing a bit.
I moved all my coins off of there a few weeks later, because it was becoming impossible to get a buy or sell order in.
In other words, to users of the site, this isn't surprising.
Maybe they got so rich from the high crypto valuations that most of the senior team are on their private islands already?
They need a system fix, I really don't trust them.
Run major updates to production...
Near a weekend. Surely they're not going to complete the go-live later today are they??
I'm assuming of course all of this as been heavily tested on test networks for a couple weeks... and of course this is just some deployment issue... (Sarcasm)
If you have the people available, then weekends are best, because it's usually a lower impact for end users.
It wouldn't surprise me if crypto volumes saw a similar trend... though in reality i suspect people account for a minority of volume when compared to bots.
I guess this was some type of breaking change, like a DB re-structure that makes rolling back problematic? And then somehow this wasn't testing enough?
Massive schema migrations would be my guess. And I'm willing to bet that their Test/QA Environment only has a small set of test data and not a replication of production
1) don't change existing schema (don't alter columns)
2) introduce new schema, side by side (add new columns, all nulls, without defaults)
3) update code to read new schema first, if nothing found, read old schema,
4) create triggers that a write in new or old schema updates the other
5) have a separate process to sequentially migrate data from old schema to new schema
6) once all records are migrated to new schema, start slowly removing code that reads old schema
7) once nothing reads old schema, delete it
Handling schema changes is a tricky process that is spread across multiple different steps, where each step can be rolled back.
Edit. An example that comes to mind that you can deploy a mysql cluster in docker with persistent storage. Playing with fire.
They should have put a Kraken 2.0 trade engine alongside the first one, and moved people gradually there. It doesn't matter how confident they were with the upgrade before it happened, it's crypto, everything is new. A few lines of wrong code and you can lock millions of dollars in multi-sig wallets.
I have most of my funds there because their eur SEPA transfers worked very well. I really hope they can get back in shape after it comes back online.
As you say, why didn't they just revert? Are they not able to? What are the steps in their system upgrade? Are they moving to new hardware? What's their setup like? What software are they running (custom written surely, but what language, which database technologies?)
Incidents like this make me curious, and I would love to read the post mortem on something like this.
Not explaining why you're offline for 24h doesn't help people to trust you
If a table that connects user accounts to kraken owned wallets is corrupted and not recoverable people will be out millions. For some that would be the equivalent of your 401k issuing a post mortem for losing all of your retirement.
If this worst case scenario happened they are likely in severe damage control.
Most likely explanation though is that things are just taking longer than expected to upgrade what is by all measures likely a very technical and convoluted system.
Whoever their CTO is, is clearly the worst kind of incompetent and the team is the most amateur I have ever seen in my 20 years of Internet systems management.