I'm a PagerDuty employee and am the same individual who made this post on the last HN thread:
Unfortunately, there are some facts in Linode's post that are not correct.
>On July 9 a customer notified us of unauthorized access into their Linode account. The customer learned that an intruder had obtained access to their account after receiving an email notification confirming a root password reset for one of their Linodes. Our initial investigation showed the unauthorized login was successful on the first attempt and resembled normal activity.
This is almost correct. Someone got in to our account on the first try. They knew the password and a valid TOTP token. Although, Linode's email isn't what notified is, it was our intrusion detection system.
>On July 12, in anticipation of law enforcement’s involvement, the customer followed up with a preservation request for a Linode corresponding to an IP address believed to be involved in the unauthorized access. We honored the request and asked the customer to provide us with any additional evidence (e.g., log files) that would support the Linode being the source of malicious activity. Neither the customer nor law enforcement followed up and, because we do not examine customer data without probable cause, we did not analyze the preserved image.
This is partially correct. We informed Linode that we saw suspicious activity within their network, and reached out to them to inform them. We provided any and all logs we had. We also informed them that we passed the info on to law enforcement, in case they wanted to proactively preserve the data. The knew we had no further information, and as such didn't ask for anything additional.
>On the same day, the customer reported that the user whose account was accessed had lost a mobile device several weeks earlier containing the 2FA credentials required to access the account, and explained that the owner attempted to remotely wipe the device some time later. In addition, this user employed a weak password. In light of this information, and with no evidence to support that the credentials were obtained from Linode, we did not investigate further.
The story behind the mobile device is totally incorrect. The user did not lose their device, the device had been restored (intentionally wiped) 9 months prior to the compromise. The user got a new device, and never set up MFA on their new phone after wiping the old one. The device was, and still is, in the user's possession. The device has not been powered on in a long while.
The user who was compromised was no longer in possession of their MFA secret. They deleted it, intentionally, with no backups existing.
If anyone here is going to be at Velocity 2016 in Santa Clara, or at Monitorama PDX 2016, I'll be giving talks on how PagerDuty was compromised back in July. This includes full details of how this happened, including the details of the mobile device referenced above. There are some details in my talk that don't line up with the blog post provided by Linode. :)
I can't even begin to imagine how much money your customers could lose if PagerDuty infrastructure went down.
Keep in mind Linode had a pretty good rep at the time. I wouldn't second guess them on the call, as I probably would have made it at the time, too. They didn't dig in or lock themselves in with debt, and bailed when the time was right, too.
Are you able to elaborate on this? I understand you may not want to name specific vendors/products in the name of operational security but it sounds like in this scenario whatever is in place actually did its job.
Most of this is still valid. There may be some differences as we've improved our configuration over time.
We use OSSEC for host-level intrusion detection. This fired off quite a few alerts as the malicious party began to log in as root on the serial console, amongst other things.
We also have supplemented it with other tools, such as an in-house wrapper around nmap, to alert us to hosts that don't match their expected network configuration. So when ports get opened incorrectly, someone is alerted usually within a minute.
How many times has this kind of thing happened to them now?
I have been doing some research in my (limited) spare time to try to find a new provider, but I still have not made the switch from Linode.
We operate all of our datacenters in a multi-master configuration across the WAN, so latency is something we need to be mindful of. We were also in the middle of an emergency situation that required a migration, so we needed a solution that would allow us to evacuate quickly.
In the end we decided that we wanted a provider who supported a VPC-like network configuration, and was roughly within the same latency profile as Linode in respect to US-WEST-1 and US-WEST-2.
If timing hadn't been a concern, we may have chosen differently. We felt we didn't have the luxury of time.
Bah. More virt. Throw down some cash on a cage and fire up AWS Direct Connect, man. It's bliss, you get more choices out here than I did with us-east-1, plus you guys can afford it now.
We plopped in physical gear for a couple parts of our AWS infra a few employers ago and cut our Amazon opex by like, two thirds. Not helpful for your using Azure as a reliability strategy, but worth thinking about for your write-heavy databases, for example.
/edit never mind they are not as per another reply in this thread