Hacker News new | comments | ask | show | jobs | submit login
Identical Droplets in the DigitalOcean: Regenerate your Ubuntu SSH Host Keys now (missingm.co)
285 points by jlund on July 29, 2013 | hide | past | web | favorite | 106 comments

SSH host keys are problematic on cloud servers, not just because of this problem, but also because if the cloud provider does the right thing and generates the SSH host key on the first boot, the key is generated when the system has very little entropy available. The primary sources of entropy on Linux are key/mouse input, disk latency, and network interrupts. There's obviously no keyboard/mouse on a server, and in an SSD environment like DigitalOcean, disk latency is quite uniform and thus useless as a source of entropy.

Linux distros mitigate the cold boot entropy problem by saving some state from the RNG on shutdown (on Debian, it's saved in /var/lib/urandom/random-seed) and using it to seed the RNG on the next boot. On physical servers this obviously isn't available on the first boot, and on cloud servers, the provider often bakes the same random-seed file into all their images, so everyone gets the same seed on first boot (fortunately this doesn't harm security any more than having no random-seed file at all, but it doesn't help either). What cloud providers should really do is generate (from a good source of randomness) a distinct random-seed file for every server that's created, but I haven't seen any providers do this.

Regarding entropy sources for guest virtual machines, the host can expose a random number generator device to the VM. This is an option in KVM: http://libvirt.org/formatdomain.html#elementsRng

Modern Intel x86 processors now have a hardware RNG built in, so even if the host boots without any devices, you have a source of entropy.

(Incidentally, KVM defaults to /dev/random which might create a DoS vulnerability if a guest exhausts entropy.)

Interesting. Do you know if any cloud providers do this? (DigitalOcean uses KVM so it would be a possibility.)

Exposing hardware RNGs to guest VMs would be great; entropy in guests is pretty lousy in general, not just at first boot (though obviously at first boot it's at its very worst).

We've run into the /dev/random problem when doing heavy SSL traffic. We had to pipe in sources of entropy. It was a janky solution, but wasn't as janky as it might sound.

If you need more entropy for SSL traffic than a headless server that sees very little I/O load has available, I've found http://www.vanheusden.com/te/ useful in the past. The ~800 bits/sec is produced in our tests isn't massive, but it was considerably than the machines we used it on could generate from the kernel's usual sources on these machines. It claims to produce valid results in VMs too, and given the way it works it might be perfectly valid to scale is some way simply by running more than one copy.

If you want more than that in a cheap Heath Robinson manner (Americans: think Rube Goldberg if you are unaware of Heath, their work came from very similar inspirations) then many SoC solutions have a built-in RNG of sufficient quality for general cryptographic use and some solutions expose them easily. If you enable the relevant module a Raspberry Pi can provide up to ~550,000 bits/sec when asked to, for instance, which if one-off cost is a factor beats paying a few hundred dollars or more for a USB device providing the same sort of rate.

Another solution that we are happy with at least is http://www.issihosts.com/haveged/

Based on the HAVEGE research http://www.irisa.fr/caps/projects/hipsor/

Are you referring to the hardware RNG in TPMs? I thought they were very low bandwidth in terms of entropy, would that be a problem on first boot?

True, but much less of an issue than just leaving the same version of keys in place. In case you're curious, here's how the cloud-init library generally used for ec2 and ec2 compatible clouds works: https://github.com/number5/cloud-init/blob/master/cloudinit/...

Absolutely - baking the same private key into images is much, much worse (and in my opinion a pretty embarrassing thing for a cloud provider to do by accident).

In my experience, 'haveged' [1] has worked out really well for generating entropy on headless servers. It could be scripted into a provider's base distribution to include the haveged binary and run haveged before generating the SSH host keys.

[1] http://freecode.com/projects/haveged

It seems like booting up would actually be a great time for generating a decent amount of entropy, what with all those devices starting starting up from a relatively unpredictable physical state (temperature etc.), hard drives spinning up, and so forth. I'm kind of surprised that the Linux kernel doesn't exploit that.

> I'm kind of surprised that the Linux kernel doesn't exploit that.

I wouldn't say "surprised", more that I wonder why and if there's maybe a good reason for it--that seems more interesting than the notion of surprise to me, anyway :)

Especially considering pretty much everything I've heard about entropy-generation over the years mentions that getting entropy just after boot is a difficult problem.

I'm assuming there must be some truth to that, so your surprise is my curiosity for the explanation why that probably wouldn't work (yes I'm assuming it won't work, sorry, but I really think otherwise they'd be using it :) ). Does anyone know?

Surprise is a useful emotion for noticing that your model needs updating. There's no need to suppress the emotion - better to notice it. This is a kind of core principle in rationality - sort of the little sister of noticing confusion[1]. I think I'm pretty well trained at noticing surprise and using at as a signal to look for more information, and that was, in fact the purpose of my comment.

I don't post comments to demonstrate my cleverness (I hope). I post to tell about an interesting experience, make a careful argument, answer a question about which I'm knowledgeable, or solicit feedback. An expression of surprise for me is a solicitation for information. I was hoping someone with more domain knowledge would have some insight that I could integrate into my model. I realize that some people do just post comments to show their cleverness, and there's no way you could have known my intent without knowing me, so that feedback is well taken. I'll try to be clearer in the future about my intent.

I absolutely agree that what I said probably wouldn't work - no need to apologize! The prior probability for P(simple solution not used by experts | simple solution obvious to a non-expert) is low.

[1] http://lesswrong.com/lw/if/your_strength_as_a_rationalist/

Okay that makes a lot of sense. The way LessWrong uses the word "surprise" there is a little bit different than how it's used in day-to-day language, where it also carries a (subtle) value judgement, as opposed to being exclusively rational signal that ones model needs updating. Hence my confusion.

now I doubt we'll get an expert with domain knowledge to jump in after five days though :)

Even so, the boot sequence of a VM guest is generally straightforward with very little entropy.

It's slow, and manual, but this gives me one more reason to like prgmr, at least for my little side projects. You generate the keys, and send the public key to Luke. Your only access to the control layer is via public_key auth from that key you generated. Presumably on a laptop, with gobs of entropy available.

as much as I appreciate the plug, that key I ask you for? that's to get you into the admin interface on my side. I don't mess with the ssh keys within your image. (host keys are auto-generated on first boot, just like a real server, which has the 'generating keys in a low-entropy environment' problem described above. The image doesn't come with anything in the authorized_keys file; though actually, I could pretty easily adjust the creation script to put the key you sent me in the authorized_keys file of root... but I don't really know the 'right' way to do that; everyone has preferences, so I leave it up to the user.)

We've actually talked a lot about trying to write a xen driver to share a hardware entropy device, but it hasn't gone anywhere. (I mean, xen has 'vtpm' - a virtualized Trusted Platform Module. And nobody uses that, so why not a vrandom?)

I believe you're confusing SSH host keys with the keys used for user authentication. The private SSH host key has to reside on the server, so it's either being generated on the server or you're sending Luke a private key.

But I agree this sounds like a good way to handle user authentication.

I was interested in providing entropy as a service, but I couldn't validate the proposition with the obvious customer segment (i.e. cloud HVM operators, whether public or private).

Essentially, the trust bar for entropy is set so high that no-one who understood the product enough to want it, was willing to trust it enough to actually use it.

(It's not just for SSH keys; random values are required for e.g. SSL connection setup, too)

Sounds like most cloud-providers could just undercut your business by enabling such solutions locally though. I'm not sure how stable the business would be unless you were going to guarantee more entropy than they could locally generate.

Sure, but they're not actually doing it, are they...

Ah, also: by "HVM operators" I didn't mean AWS, Linode, Digital Ocean etc; I meant the admins of virtual machines running on their platforms, plus anyone running a private cloud or even someone with just a bunch of Xen/KVM boxes in a DC.

This is not the last of the problems we'll have with "the cloud", but I guess it's part of what makes it so exciting. :-)

Many people, especially beginners, make the mistake of leaving the same SSH keys in a certain template or in a snapshot of a virtual machine that they later use as a template.

There are a few files that you really, really need to wipe out from a wannabe image template:

- /etc/ssh/* key* (for reasons explained in the parent article. stupid autoformatting, remove the space after the first asterisk)

- /var/lib/random-seed (the seed used to initialise the random number generator. this is the location on CentOS)

- /etc/udev/rules.d/70-persistent-net.rules (so that the VM's new NIC - with a new MAC - can use the same "eth0" name)

People who want to do this more exhaustively can have a look at libguestfs and it's program virt-sysprep which does all of the above and more!


I must say, I'm impressed with how this was handled both by the original researcher and DigitalOcean.

Except I informed them of this issue in January of this year


Apparently they ignored it :)

That worries me far more than the actual security issue. Security issues happen to everybody, but so long as too many don't occur, it's the response that shapes my ongoing confidence in that company or product.

What's happening on the left side? Is that you or the rep?

I blocked out my name, the rep is the one with a picture.

edit: actually I didn't realize there was a skype window up when I took the screenshot, thanks for warning me...

Ok, that is disturbing...

Honestly, I figured they would realize how important it would be to fix this so I didn't follow up on it once I fixed my own images.

It's a reality of doing tech support. You get a flood of garbage information ("Hi, I can't access your web page, I get a 404 error. My system has 8 GB of RAM and an Intel 4700K and blah blah blah..."), and have to do your best to sort through and solve the user's problem.

Your ticket had two problems described. The tech probably didn't understand the significance of the first problem, and so just discarded the information. Then she answered your second question. When you've got 100 tickets to sort through in your 8 hour day, you simply have to make some compromises on the thoroughness of your response.

To get her attention, it would have been better to explain a little about what the consequences are, and request that she have a developer follow up. Make it clear that it's a major security failure and could lead to compromised VMs.

Then open a second ticket for your other issue.

I completely agree, after realizing I had asked two questions on one ticket I immediately saw I should've done better. However when I received a response that said they were working on it, I understood that to mean it was in the queue to fix.

Except I don't see an email from DO or a notification when I log in to the admin panel. So if I didn't check HN at this exact time and saw this article, I would have no idea.

It's not a huge deal to me, but if Linode did the same thing, you all would be foaming at the mouth. Just thought I would point this out.

Do you have any Ubuntu instances running or saved? If not, then they would have no reason to notify you of the issue.

I have a few Ubuntu instance and didn't receive any notification.

I've had a Ubuntu instance running for three months now; no notification.

running an Ubuntu 12.04.2 LTS (GNU/Linux 3.2.0-23-virtual x86_64) for 2 months, no notice yet for me

Responsible disclosure can benefit us all, unfortunately some vendors -- SaaS, PaaS, physical, or otherwise -- use their legal departments as blunt weapons to needlessly attack well-meaning security researchers.

In case anybody is wondering, I'm referring to Volkswagen.


Personally I'm an advocate of full, anonymous, and public disclosure.

That's not a fair comparison. DigitalOcean just needs to update some stuff to fix this issue. VW would need to recall millions of vehicles (10s billions $). You would do the same thing if you were in their position and had shareholders to worry about.

It's the cost of doing business. They're putting insecure software into hundreds of thousands of cars, and every owner of that car has no control over the software that's running on them.

Imagine if we applied the same logic to phones and other devices -- I wouldn't be surprised if you personally would be offended at the idea that you have little recourse over your phone being hacked remotely and you can't do a damn thing about it because the mobile handset manufacturer locked it down. Thankfully phones are subsidized, ubiquitous, and cheap, so you can take your phone anywhere and get it fixed/replaced.

This is the future folks: locked down devices that you have no control over.

Caveat: we're all plugging our phones into these insecure systems too. Wrap your brain around that for a second to see where I'm going with this.

I would absolutely not do what they have done. But at the same time, I will probably never be the CEO of a large company. I suspect there is at least a weak causal relationship at work here...

No, I certainly would not do the same thing, and to suggest otherwise is an insult. Let's not excuse bad behavior with this misguided idea that we're all equally bad.

So what is VW suppose to do? I'm actually truly curious. This flaw apparently will unlock many expensive cars. These cars' system cannot be replaced as quickly/cheaply as an sshd binary in a linux OS. I'm very curious, what other action could they have taken. They need him to be silent so they can figure out how to fix it before he make it public right? Is letting the public know the detailed exploit more important than the potential problems of the info being public?

VW needs to come up with an immediate workaround that they can publish to owners or allow dealers to quickly hack in, then come up with a permanent fix after that buys them some time.

The immediate workaround may not be possible. In that case, they're just screwed. A company is not entitled to be able to save themselves from the consequences of their past fuckups in all situations. Sometimes, a mistake costs a lot of money or even kills the company. Perhaps this is one.

I find it unlikely that it's impossible to disable the keyless entry system on the cars in question. Surely there is some fuse or wire that can be pulled to shut it off. But it ultimately doesn't matter. Finding a workaround quickly is what they need to do, and if they can't do it, that's not his problem.

Your new solution of "fuck the company" actually harms the enduser even more. Some of them get the updated lock... then the company goes out of business, and now none of them can get official parts for their vehicles. It seems a solution for an ideal world, not the actual world.

You'll note that I proposed other solutions first. To repeat myself: it is highly unlikely that there is not some possible workaround that temporarily disables the system, even if it's something as brute as snipping a wire.

Even in the absolute worst case that the vulnerability is somehow built into the very fabric of the car, you can still secure it by removing all valuables from the interior and then clamping a wheel with a boot. Remove the boot once you figure out a fix. Inconvenient to the owner to be sure, but not impossible to deal with.

Harm to the end user is not my priority. Harm to society is, and it's clear to me that the long-term chilling effects on academic research far outweigh any temporary harm from VM issuing a recall or even going bankrupt.

The alternative is to say that entity B should suffer from a restriction on their free speech simply because entity A, due to their own negligence, finds it excessively costly.

There are so many different ways this could be handled other than "threaten to throw the researcher in jail if he doesn't shut up". But they are all more inconvenient and costly to VW. One can understand why, then, VW would go for the "threaten" option, if we think of VW as a sort of non-moral profit-optimizing organism. But I certainly can't understand why anyone would defend it, let alone say that we would do the same thing.

clamping a wheel with a boot.


Harm to the end user is not my priority. Harm to society is

Wholesale removal of personal transport (even 'while we work something out') is "harm to society".

the long-term chilling effects on academic research

Are you not overstating the significance of a paper? Does this paper hold the solution to free energy? The impending food crisis? Sure, it's not ideal, but let's not blow it out of proportion.

You're continually making very frustrating assumptions about the situation, assumptions that paint your argument in the best possible light, even though they are not IMO reasonable.

I repeat for the third time: it is highly unlikely that there is no temporary workaround. Clip a wire, pop a fuse, remove a module, or whatever, one of these will get the job done for the moment.

Finally, even if these cars must be disabled in the interim, it's hardly "wholesale", since it's just one brand of many. Alternatives exist.

Security research is important. Does this paper hold the solution to free energy? No, but the precedent set will discourage further research in this area, which could result in leaving the power grid vulnerable to black hats.

You say I'm overstating the significance of this paper. I say you're vastly overstating the significance of this paper, in terms of what would happen to VW, to VW owners, and to society in general, if the information got out.

The chilling effect on security research of these kinds of actions is fairly well established. There are real-world examples of security researchers deciding not to work on a particular project because they fear persecution. That's a loss to society.

On the other hand, there are no real-world examples of chaos resulting from disclosures of automobile security vulnerabilities, even though car security is, in general, quite lax.

So kindly please, stop with the hyperbole and hysteria.

I like how you accuse me of hyperbole and hysteria, but at the same time use both the arguments "cars should be immobilised if there's no other way" (ie: should not be used) and "there are no real-world examples of chaos resulting from disclosures of automobile security vulnerabilities". Even theoretically, why immobilise so many cars if chaos won't result? You want to have your cake and eat it, too.

Regarding 'wholesale', in context the term just means 'non-selective' and didn't mean every brand on the road. This being said, the arrogance of "Alternatives exist" has got to be pointed out: what alternatives? If you immobilise all the VWs, how will those commuters now proceed? Rent another car? Buy another car? Some might be able to catch public transport, but hardly all.

This links back to what I said about ideal vs real world - you think that it's tenable to just take one brand of vehicles off the road, which is clearly nonsense. Even if there were no temporary fix, the real-world response to fixing the issue would be to leave the cars available to the owners. The idea that you'd even contemplate booting as a considered option is just farcical.

What you describe as "have your cake and eat it" is just a reasoned discussion. I first start with what I think is the most likely scenario, but then I also examine a potential worst-case scenario and show how it, too, can be dealt with.

You appear to be interested in an adversarial discussion in which you score as many points as possible, rather than a collaboration in which we enjoy ourselves and learn. I'm not interested in that, so I'll leave you to it.

They can open the specifications of the ECUs[1] and get involved with all car dealerships, maintenance franchises, insurance companies and whoever else. The goal being to educate them all on how to maintain, repair, and replace the ECUs that are at the heart of every automobile being deployed.

Leaving the responsibility to fix these security flaws in the hands of the automobile manufacturers is dangerous and foolish, they simply don't have the means nor the interest to fix these security flaws.

This is one of the most obvious cases where FLOSS shines -- everybody and anybody can fix their broken software because they know what's running on their machines and the machines are open and accessible to those that need it most: end-users.

[1] http://en.wikipedia.org/wiki/Electronic_control_unit

I don't necessarily agree with the OP but here's how I see it. Yes, releasing this information would be harmful to both the company and the customers. But the principle is: "there is no safety net". And in the long run it will incentivize building higher quality products.

e.g. When I golf, I refuse to take mulligans because it keeps me in a state of mind of, "this is my only chance". Whenever I break this rule, the rest of the day my golfing is worse.

And here's one of the reasons why you're not the CEO of one of the largest corporations in the world. The biggest CEOs in the world are amoral (not immoral) a lot of the time.

Also, the researcher wanted to disclose this information without VW having fixed it. DigitalOcean got the opportunity to fix their system. Do you know what VW's next step is?

I too am an advocate of full public disclosure but the concerns by Volkswagen are warranted and fixes to the issue can not be deployed as easily as DigitalOcean has done here. I don't agree with the gag order from the UK courts but it isn't totally unjustified.

They should be using cloud-init or virt-sysprep[1] on new instances. In particular, it is vital that you give your new instances a unique random seed (which virt-sysprep can do). Also that you provide the virtio-rng to guests that support it.

[1] http://libguestfs.org/virt-sysprep.1.html

To avoid this kind of security problem, use providers that use official Ubuntu Cloud images only. If Canonical haven't certified the Ubuntu images you're using, then your provider could have done anything to them. You'll need some other way to determine their competence.

Cowboy images like this are exactly the reason trademarks exist. Commercial providers who don't get certification are in fact violating Ubuntu's trademark by telling you that you are getting Ubuntu, when in fact you are getting a modified image which is possibly compromised (such as in this case).

How do I validate that my provider is actually providing an official Ubuntu Cloud image?

Technically? I'm not sure that's possible. They own the hypervisor, so you have to trust them. This is why the Ubuntu trademark is so important.

If I have to trust them because they own the hypervisor, the fact they claim to deliver a "certified" image buys me nothing in terms of security.

That's demonstrably not the case. In this situation, it was only the modified image that accidentally introduced a vulnerability. The official image does not have this vulnerability. Had you been using an Ubuntu certified cloud, you wouldn't have been vulnerable.

I've got no way to check I am using an official image, other than knowing that at some point in the past, my provider bunged Canonical some cash for the "Ubuntu Certified(tm)" sticker. Without knowing a lot more about how certification works, that's not a particularly strong proof.

This is now one of the first things I check when setting up a new VPS or other VM instance, because it's really common.

I think I need a list of things to check. This sounds pretty scriptable though:

* SSH Host Key

* SSH Authorized Keys

* SSH PermitRootLogin

* Disabling password auth in favour of keys

* Security updates from the distro

* SELinux (maybe?)

Anything else?

a) Move SSH off port 22. (Really limited security gain when implementing the rest of the suggestions, but saves spam in your logs.)

b) Software firewall via e.g. IPtables is generally the first thing I turn on after rebooting SSHd with the new settings.

c) (Optional) Consider using an architecture where you have N boxes and SSH only listens on a local interface on N-1 boxes, with the Nth box running nothing but your VPN. (This is also a good architecture choice for admin consoles, folks. www.example.com resolves to a public IP, admin.example.com resolves to a private IP, so even if they're technically speaking on the same box/boxes you won't lose the admin console if someone unwisely uses the same password for a WordPress blog somewhere.)

Consider fail2ban.

It can be configured to automatically add (and later remove) ip addresses to iptables based on login failures (found by running regexes on logfiles).

I've got rules blocking ip addresses with multiple failed ssh or wordpress login attempts for an hour to be very effective. I've still got a bit of "brute force poop" in the logfiles, but much less than before. I've seen a suggestion for adding an additional fail2ban "recursive rule" - dropping any ip address with repeated fail2ban lockouts for much longer times - if you trigger multiple 1 hour lockouts for ssh auth failures, you might get dropped for a week or more. I haven't felt the need to implement that one yet.

My honeypots have been seeing scans on 2022, 2222, 3022, etc. for years now. You should be setting proper ACLs for 22 and not moving to another port.

HostGator uses port 2222 for all their shared hosting. I never understood this, given how easy it would be to modify an attack to use the new port.

ACLs as in allow/deny? What if I'm using a dynamic IP on my client? In that case, should I just be using a "trusted" SSH gateway?

You should be using a VPN into a trusted bounce box.

But then you have the same problem on the gateway.

The name for c), assuming you harden the box a bit is a bastion host


I've never understood the logic behind these software firewalls. If you have one enabled it's inspecting all incoming packets, exposing a large surface of what's often quite complicated C code.

Any service that has bound to a public-facing port on a machine I run has done so because I wanted it to; if I had a firewall I'd need to add an exception for it. Any service that's only meant to be accessed from the same machine has bound on loopback. Services that are accessed from another "internal" machine are properly authenticated, which means the network doesn't have to be trusted; spoofing packets won't help an attacker at all. Frankly if your defences rely on the idea that all packets from ip xxx.xxx.xxx.xxx are "safe" you're going to get burned.

Where's the value in a software firewall supposed to be?

I've used iptables as a lightweight IDS for years without incident, so I've found the benefits to outweigh any known risks.

The most dramatic example I have is that I manage a very heavy database-driven web application from an outside vendor that must be public to the entire world. Even a simple request of the home page results in dozens of queries, and users that are logged in put an additional load on resources. I've addressed performance with other optimizations, but since we have no need to generate additional traffic, I have iptables rules to detect unwanted crawlers or pen-testers and block them (based on probes to unused ports/IP addresses, user-agent strings, etc.). When we originally deployed the application, crashes were routine during peak cycles. Now, the server barely breaks a sweat. The ability to implement simple logic in a robust, time-tested software firewall plays an important role.

Other examples include mitigating SSH brute force attacks, port knocking, forensic logging and honeypots. My honeypots protect me from many undisclosed vulnerabilities, simply because attackers are poking around in places they shouldn't be and are automatically blocked. It's much nicer to review a simple report of blocked attacks than it is to troubleshoot a compromised server.

What if you want to do something between "everyone can access it" and "only local host can access it?". For example, I have to run a recursive DNS resolver on a port other than 53 because my ISP intercepts DNS traffic. I've limited access to only my ISPs IPs at the software firewall, so as not to unwittingly take part in a DNS amplification attack.

Or what if I want to run a mail relay but only allow machines I control to access it. Or block a user who's doing something nefarious?

Sure, most services allow you to control access by ip, but I'd much rather manage it all in one place. And the ports show up as closed if you block at the firewall.

>What if you want to do something between "everyone can access it" and "only local host can access it?". For example, I have to run a recursive DNS resolver on a port other than 53 because my ISP intercepts DNS traffic. I've limited access to only my ISPs IPs at the software firewall, so as not to unwittingly take part in a DNS amplification attack.

Hmm. In that specific instance it makes sense, because spoofing the IP is the whole point of the attack. I'd argue that's basically unique to DNS though, in which case the advantage of managing it in the same place as your other services goes away.

>Or what if I want to run a mail relay but only allow machines I control to access it.

Then you use a real authentication mechanism (i.e. SMTP AUTH). Otherwise it would seem perfectly possible for a spammer to spoof one of your IPs and use your relay.

The difference is that with most of those, you can create a snapshot of an instance and then duplicate the configuration, but Host keys are special, since they need to be re-created for each instance.

In any case, there was discussion on those issues not long ago: https://news.ycombinator.com/item?id=5316093

> SSH PermitRootLogin

I assume you mean to disable it? I see that listed in various places, but I don't understand why it matters… I like to kill all passwords on my VMs (so there's nothing to brute force), uninstall sudo and only use ssh keys to authenticate. I would like to know why this is a bad idea.

This is more of an issue on machines with multiple people who can access them. If you disallow logging in as root and use sudo to escalate privileges when needed then there's an audit trail of who did what (or at least a record of someone starting a shell with root privileges at around the time of something bad happening).

A slight correction in that if you examine the logs on any SSH server connected to the net you'll see an absolute flood of Chinese (and other) compromised windows boxes trying to brute force (sorta) passwords for the root account using ssh.

So if you disable root logins, they have no idea what to use as a username. Oh sure someone who personally knows which machine you maintain might be able to guess jonwood is your username or my username is vlm... but they'll never be able to log in as root and botnets aren't smart enough to try anything else. Aside from root, I wouldn't make your "primary user name" = "hostname" either.

Thinking back at my occasional examination of logs, I don't know if I've ever been scanned by bot nets trying Anything other than root as a username. I'm sure it happens, but I can't afford to spend 5 minutes per syslog line in my life either...

I did see remote bots attempting to guess ssh usernames besides root. For instance bob, john, guest, ...

What do you mean by 'check' in this case? Check the keys against the keys for all other VM instances that you've provisioned from the same VPS provider?

You don't even have to compare the keys. If they exist at all, they are probably copied from the image itself. The first time you start up sshd, it will create new keys. If you are unable to witness this, assume that they are not trustworthy, delete them and restart sshd (or replace them with keys you've generated locally).

The common problem is that sshd was started before creating the final image, so it has keys that are duplicated by provisioning. Always delete the keys before committing the final image, so that sshd will create new keys the first time it runs.

Not correct that they were probably copied from the image itself. See other posts about cloud-init.

Generating fresh keys aside, one thing I do with our AWS setup is whitelist the IPs that can connect to our SSH bastion host. This completely eliminates scripted port scans of the SSH server and makes the auth logs much more manageable.

If our IP address changes (eg. ISP assigns a new one for the cable modem) then we just update the whitelist (and remove the old address). It's very infrequent. I could probably count the number of times I've done it on one hand.

It might not be the most scalable setup but at our small size with everybody working from home it works great.

The only slight hitch is updating it when traveling but even that isn't much of a problem. It takes a minute or two from the AWS console and its good to go.

I recently took a look at digital ocean ($5 servers gives me ideas...) but didn't see a firewall option similar to the security group setup in AWS. If it does exist then I highly recommend it.

This is a very good idea. It could be script-able using the AWS API[1] (though I haven't tried it yet).

[1] http://docs.aws.amazon.com/AWSEC2/latest/APIReference/ApiRef...

[EDIT] The relevant example is 4th in the list.

With Digital Ocean you'd need to install a software firewall on the servers themselves, there is no API-configurable network-level firewall. I used 'ufw' which was quite easy to get started with (on Ubuntu), and replicated my AWS security group config pretty quickly. I added the ufw config to my host setup scripts so it happens automatically.

The problem with the software one is you need a way to modify it when you can't access the instance. With AWS you do it from the admin console or APIs. If it's on the machine itself you'd have to know the IP to open up in advance or have someone at home base do it.

If you're likely to be connecting from different locations then you're probably better off having a VPN in a known location and routing connections to your servers via that VPN, rather than fiddling around with firewall rules every time you're in a new hotel.

That just shifts the problem - the VPN is vulnerable to the original attack.

Digital Ocean (and Linode) provide Web Console support, so if you lock yourself out via i.p. - there is the console as a last resort.

One good thing to note is that any VM image using cloud-init (a package for debian/rhel systems) should automagically generate a new host_key set for any new system image. Basically if you build a system image for EC2 or any system that uses the EC2 data format (like Openstack) for host instantiation, then you should install cloud-init. It would prevent something like this.

Now that it's said, I did notice something strange once.

I had loaded up an Ubuntu Desktop droplet with the purpose of checking something out through the browser on the node.

The startup page was https://www.americanexpress.com/

Since when is that default?

Didn't think much of it at the time, but now... whoa.

I suspect this kind of thing happens with other companies, but can only speculate.

Somewhat related: chicagovps gave me a 'fresh' gentoo vps, and the default provided root password was identical to the original one from several months ago. I assume it is one gentoo image with the same password (for all customers)?

Props to the way you handled this. That's how you do responsible vulnerability disclosures!

Just verified this is also the case with at least some AWS-hosted servers. Coupled with the fact that many people simply ignore the MITM warning that SSH throws, this is scary stuff.

The official ubuntu AMIs generate a new key pair on first boot. The AMIs provided by Amazon do the same.

If you're using AMIs from some other third party you should verify that they do the right thing.

Curious. I just upgraded a small server to a medium, detaching the existing volume and reattaching it to the new instance, and it has given me a new fingerprint.

unfortunately, devops (like me) need to ignore these as part of our work system.

i need to automate deployment / reprovisioning of 30 digital ocean servers. as reprovisionined servers frequently use the same ip address, i always run into this. for me I had to disable the check :(

Part of automating deployment is automating key management. Ignoring host key checks is the worst possible solution.

You're actually better off just baking a shared key into your image. So long as it's a key you generated yourself (not like this Digital Ocean scenario, where the key came from the cloud host), only someone who has already rooted one server can successfully MITM your SSH connections.

Whereas if you ignore host key checks entirely, anyone who gains control of one network hop between you and your servers can own you.

By disabling that check, you are destroying a huge component of the security that SSH provides. Perhaps you could clean up the authorized_keys file as part of your teardown script?

I've ended up in this situation too. We migrate between datacenters with duplicate hosts fairly frequently at my primary place of employment, and best practice is to use the cname of the service to access the server currently acting as the primary for it (e.g. foo.bar.com as opposed to foo.datacenter.bar.com). That leaves me with a lot of foo.bar.com entries to clean out of known_hosts, or a lot of spurious MITM errors.

What I've done is added a whitelist to my .ssh/config to disable the alerts only for those hosts. The foo.datacenter.bar.com address (which I use often enough, usually when migrating it between datacenters) still alerts.

And yes, I know I'm living in the 90s what with my datacenters and whatnot. They're kind of like regions... what? Oh... why, you... You kids, get off my lawn!

Not managing keys is asking for trouble. Consider signing your SSH keys to validate that you've properly pruned outdated host entries.





Spend an afternooon to figure this out.

I ran into this issue as well. As the person below mentioned, make this part of your teardown script. You can use ssh-keygen -R hostname to remove host from ~/.ssh/known_hosts.

Great find. I came from a heavy security background and moved to SV where it seems like security is an after thought. I spent many long days and nights STIGing RHEL boxes so I can appreciate this find. Also thanks for letting me know about Digital Ocean, their VPS looks promising and I think I might start using it.

> After you have run those commands, simply restart the SSH daemon so it starts up with the new keys in place

I believe if your version of OpenSSH is up to date, sshd will read the host key each time a session is opened and does not need to be restarted.

We ran into similar problems on the hosting side; another surprise can be the debian-sys-maint password configure by the Debian mysql-server package.

So you are the reason I started getting these error messages, I noticed the change on June 2, great work.

If you are still reviewing salt, I just wrote a post about salt-cloud and DigitalOcean that you should check out -

Create your own fleet of servers with Digital Ocean and salt-cloud:


Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact