> Based on log data from the CoreOS Linux Update Service roughly 3% of online, auto-upgrading, hosts were affected.
If hosts are configured to autoupdate to a release, you're treading a fine line if that release could have issues like wide-open SSH access. Though I'd hope that no one is hosting sensitive data on an OS in alpha, that compromise may be used as a jumping-off point for a more serious attack.
I guess having a lot of machines using the newest alpha releases is a double edge sword. It is great to have many users that are able to report bugs early, but then again a security issue like this is actually a somewhat serious issue.
I really hope people that use anything of alpha status know what they are in for. Maybe a bug like this every now or then is required to remind people of the inevitable consequences of doing so.
That is the PR that explains the error and fixes it properly. It is not merged yet, current git master has the flawed version IIRC. But releases are seperate branches/tags, where the error was reverted in a more crude way: https://github.com/coreos/coreos-overlay/pull/1964
The actual "fix" was to pull those releases completely from the update server, so everybody effectively downgraded to an older unaffected release.
Reminds me about the 2002 bug in Debian Unstable where something in PAM incorrectly allowed password-less ssh logins for all the system accounts, because "*" in /etc/passwd was parsed as "empty" instead of "no valid password", so you could ssh in as "nobody" and be greeted with a shell :)
We use CoreOS but I'm still not sold on automatic updates. We've seen too many issues running disparate versions of etcd or docker. To their credit - it seems to be getting better but still not prime time.
Plus, we really want to push for an 'immutable' production model which is counter to automatic updates.
I still wonder (and this one of many reason I'm not using CoreOS, because I don't now how to secure it as good as Ubuntu/Debian) how do you install a simple Fail2Ban (ssh) implementation on CoreOS? Are there any practices how to make the server which runs all containers more secure? Or do I simply rely on CoreOS team for all that ssh securing and firewall stuff (it seams it is this way now!) ? I have the feeling that hardening a CoreOS server is not the main priority for CoreOS.
There's no point in using fail2ban if you use public key authentication. It's a kludge which helps against brute force attacks, which are limited to password login.
Since CoreOS enforces public key authentication, and the SSH server is the only external attack surface, they're doing just fine. Security is definitely a top priority for CoreOS, and they're doing a great job at it - image signing, Docker container signing, ASLR binaries, a clean build system (based on Gentoo)...
After rethinking... I also tend to agree. Actually I never use password login, maybe it felt safer without brute-force logging of ssh logins and I also think that a rate limit would not hurt, especially when there is a SSH security issue like here ;)
Answering my own question (it was a longer time ago I used CoreOS) here two interesting links:
Maybe this advices should be integrated into CoreOS official docs? Something like "How to harden CoreOS servers". When I first looked into CoreOS I wanted to use familiar tools like fail2ban or ufw to run there (which is not possible, at least not easy). I'm no iptables expert (who is? I think it's not a trivial to use program) and this lack of admin tools made me think it's not good to secure. It's for sure more complicated than in Ubuntu (from an admin perspective).
I tend to agree. One reason to use something like it, though, is that it clogs up logs. But I'd tend to want to tie it down with iptables and limited IP ranges and/or just move ssh to another port (the latter as a quick and dirty way to avoid the steady annoying log nonsense if you for whatever reason can't limit ip ranges, not because I have any illusions it does much for security)
When filtering your logs, filter for only successful attempts. You really don't care how many people failed to login on a server with only pubkey enabled. It's the ones who succeeded you want to know about.
Exactly, but it's nice to be able to follow your logs without all that clutter. Though with journald I mostly look at logs on a per-unit basis anyway (or via e.g. an ELK setup), so it's a slight nuisance rather than something essential.
As another used pointed out, fail2ban is a generic tool, I don't necessarily agree with the method its implemented though there is a recent module in iptables that should be enough for most services.
Also, about SSH - it is still better to block the bad actors before you process the request. I've seen occurances where the network was saturated bots trying to brute force.
I believe the point is that you have CoreOS hidden behind a firewall you can control, therefore you only have ssh open to sources you permit, you use a bastion host, etc.
So a Ubuntu server in front of CoreOS works... :D ;) I'm thinking of a small cloud setup like on DigitalOcean were you need ssh into the CoreOS server.
There is a hardening guide in the docs [1] and it would be great if you have ideas/contributions on how to make it more complete. Right now it is mostly focused on reducing remote services.
CoreOS is basically a trimmed down, customized Gentoo. It's not hard installing custom stuff on CoreOS, just be aware that CoreOS blows away a lot of the system directories on update, so you don't want to rely on putting things in /bin, /sbin etc.
/etc/ survives, though, so you can easily find examples of systemd service files to apply iptables rules, sysctl hardening etc, and I'm sure you can find fail2ban setups too.. It is also fairly trivial to put together a service file that'll run some arbitrary script you can put wherever you like to apply additional changes on boot.
Systemd will let you apply capability based restrictions etc. to fail2ban too.
CoreOS service files do get replaced, but you can override values in them using the systemd dropin mechanism to e.g. override settings for the ssh server.
Your main limitation is that CoreOS itself is intentionally very sparse. You have two options there: Deploy whatever dependencies you like or package up the code you need in a Docker or Rocket container, and run it with sufficient privileges if you want it to be able to apply changes to the host (this will work for /etc/hosts.deny based restrictions but not for applying changes to iptables, though it'd be easy enough to output changes to a file on the host and use a script on the host to apply those to iptables).
To be clear: CoreOS is not effort free. It makes very explicit value judgements, such as aiming for as much as possible to run in containers, that requires extra effort initially. If you are not willing to take extra effort to keep the "outside" host as clean as possible, CoreOS may not be the right choice for you. Most of the return on that investment first becomes noticeable when you're deploying larger numbers of servers/instances, and want to update them.
With respect to keeping it secure, note that the default behaviour for CoreOS is that all installed servers will check in with the CoreOS update servers regularly, and will apply the updates and will reboot once the updates have been applied.
This may or may not be what you want (it's easy to change - check the docs), but it means that if you're not paying attention, security issues like in this article will get patched for you. Doesn't mean you shouldn't pay attention, but the window for anything bad to happen is smaller.
If you are paying attention you can change the reboot policies so you can control when they happen, but if possible I'd recommend not to, as you can also make the system take a lock in etcd so only a limited number of your machines will reboot at any one point in time (or you can reduce the number of available locks to explicitly prevent reboots at times when it might interfere with something else, or set specific machines to manual updates only).
At least for anything running the alpha I'd very much suggest leaving it set to automatic reboot/updates to ensure stuff gets patched quickly when they find anything serious (not just security stuff).
Since you mentioned DigitalOcean elsewhere, DO's CoreOS images do retain the default of automatic reboots.
I personally don't think iptables or switching ports is a viable solution and always use the following policy:
All the internal services should never be exposed to the internet and only accept connections from signed packets using IPsec or OpenVPN with TLS auth.
Interesting approach. I only think that OpenVPN is not more secure than SSH, somewhere needs to be access point/loophole and SSH is pretty good tested for this. It also does not solve the problem when using a single CoreOS server where you need to run OpenVPN in a container (and I'm also not sure if you can access the host from this container).
You can certainly give the container access to the host a number of ways. For something like OpenVPN --net=host passed to Docker will give OpenVPN full access to the host network.
I agree with you - ssh is fine. If you have multiple CoreOS boxes somewhere without a secure private network, though, OpenVPN, PeerVPN or similar solution works fine.
If you couple it with Flannel set to use host routing, you can give all containers their own IP addresses on non-colliding IP ranges (Flannel takes are of coordinating that via etcd) and Flannel doesn't add (in the host routing variant) extra overhead as it just adds suitable routes on each server.
You can set this up a in few different ways: CoreOS provides Flannel coupled with an "early" Docker daemon (so you'll have two) to run stuff that needs to run before the "real" Docker daemon, such as to set up a VPN etc. You could also use Rocket/ACI containers, or run it outside a container.
Alternatively newer versions of Docker supports network plugins, though I've not yet had time to test this with CoreOS as I already have working VPN setups based on Flannel + early-docker.
OpenSSH in normal configurations will respond to all requests with an open socket regardless of whether that request is signed or not. Worst case scenario like OpenSSH Unprivileged Remote Code Execution, or Heartbleed style attack.
OpenSSH behind VPN technology like IPsec or OpenVPN (with TLS auth) means that only authorised (in possession of valid signing key) clients see the open socket.
OpenSSH does have "SSH certificates", but using VPN technology allows you to secure multiple internal services including those that don't support any encryption natively.
99% of systems patched in under 12 hours. I know CoreOS doesn't have a high number of instances in existence, but that's still an insanely fast turnaround.
In other news, it looks like CoreOS is finally adding PAM support, which means we might be able to use it in an LDAP-enabled environment soon. That's great news!
On all the Linux distributions I've used in the last 15 years, NSS has never been responsible for authentication, though. It's only been responsible for resolution actions like getpw* and getgr* (i.e. uid/gid to name), whereas PAM has been responsible for authentication and auditing. You have to configure both to support LDAP, not just one.
I quit using coreOS a while a go when I realized how unprofessional some of the top devs were. I don't want to name names but one in particular seems to spend all their time writing blog posts bashing any and all competition, no matter how disingenuous they have to be to get their point across.
Most people seem to use coreOS for security. When the devs seem more concerned about slinging shit at Canonical and Red Hat it just makes the CoreOS team seem unprofessional and when it comes to security I demand professionals.
Really, the dev in question has been a vocal critic Canonical and others for a while. It's much of a reflection upon CoreOS beyond their decision to ignore it when hiring the person.
I feel like this being #1 on hacker news is someone's way of trying to say it's damning evidence against CoreOS' security model and why you should steer clear of it.
Of course they found issues in an alpha release. Happens all the time.
Very easy to make such a mistake, and it never made it into a stable release.