Hacker News new | comments | show | ask | jobs | submit login
Zerigo DNS services down for 6+ hours due to massive DDoS (zerigostatus.com)
39 points by _phred on July 23, 2012 | hide | past | web | favorite | 56 comments

I can totally buy DDoS flooding network capacity, but I'm befuddled these days by statements saying the servers are "under load", which typically means "out of CPU". It's kind of hard for me to imagine even an i5 not being able to saturate a gigE line with DNS lookups (yes, it is a lot of packets, but it can be done) unless DNSSec is going on. Even 10gigE, if you can amortize interrupts, seems like it'd not be hard to saturate with today's hardware.

What am I missing here?

There are many types of DDoS. Some max out your CPU, some your network. Given that a DDoS (Distributed Denial of Service) involves potentially thousands of willing or unwilling systems, it's relatively easy to make a server unresponsive.

I have a 100 Mb/s internet connection. Scale that up to 10000, and you have saturated even the fastest of internet connections.

Mitigating a DDoS is not easy. Heck, its damn near impossible, considering the fact that DNS DDoS attacks are done via UDP, which allow you to spoof the source IP address. Even if you do block the IP address of al the attackers, your upstream provider is still impacted by the packets trying to come into your server. Most upstream ISPs will blackhole your server IP to diminish the impact on their network.

I run DNSimple (https://dnsimple.com) and we have a full REST API and support domain registrations, transfers and SSL certificates as well. Plus we have an ALIAS record type that's very useful for pointing your apex to services where they only provide a hostname.

I'll be happy to answer any questions you have regarding our service either here or through our support channels.

I know there's much love for DNSimple, but this is the first time that I can remember when the top comment of an X is down post is a competitor essentially posting an advert with no insight on the OP.

I tried to add some insight on another comment, but it's tough to say anything about DDoS that hasn't already been said. DDoS attacks suck, mitigating them requires a multi-prong approach and proactive monitoring and agressive banning and even then you can still be screwed if your bandwidth is saturated.

I feel for the operational folks at Zerigo - dealing with this type of outage is hard. The best thing they can do at this point is get back on Twitter and talk to their customers - the last post was 4 hours ago - that's a lifetime when your system is critical.

How much capacity does DNSimple have though. It appears as though you are another unicast network. ns1.dnsimple.com is a server at Slicehost / Rackspace ns2.dnsimple.com is a server at Linode ns3.dnsimple.com is a server at prgmr.com / EGIHosting / Hurricane Electric ns4.dnsimple.com is a EC2 instance on Amazon

How much computing power and attack traffic can those really handle?

If you are going to offer a solution to a massive DDoS I would think that you would be careful on when to propose your solution.

Instead of adding another unicast network to the mix, why wouldn't you start using an IP anycast network?

Please explain how much capacity you have.

And DNS is, by definition, critical.

Long time user of DNSimple.

Great product, great support, great team, great price.


Me too, I can only recommend DNSimple (I'm coming from Zerigo - switched when they were acquired by 8x8), their service is awesome!

I signed up for your service earlier this morning because Zerigo were taking their time to solve the issue.

However, what I would like to know is - have you guys implemented any procedures to mitigate any negative effects a DDoS may have on your services? (Assuming your service gets DDoS'd like Zerigo) The last thing I want is more down time and to switch to another provider once again.

The philosophy behind DNSimple is great! A few questions:

* How much DNS traffic can it handle for one customer? Usually DNS-services that charge much more than DNSimple have this capped.

* 99% availability (3.65d/y) sounds a bit too much for a production server DNS SLA, maybe after introducing a secondary DNS a much higher availability could be offered. Any timeline for adding a secondary DNS?

* We don't cap traffic at the moment but we do work with our customers when the traffic is significant, offering suggestions on how to improve DNS caching. If we ever do need to cap we'll make sure to give plenty of notice to any customer that would be affected.

* We don't have a timeline, but we have been working on it. We'd like to roll out support for secondary NOTIFY and AXFR, but perhaps we can find a short term solution without that.

I admit that Zerigo was first to spoil me with a simple interface, but I came to DNSimple from there because of all the extra labor-saving features it has. I figure I get back at least 1-2 weeks of my life every year as a result of being a 100% DNSimple shop.

Another big fan of DNSimple here. Been with them for over a month and the services and support have been great.

Even more impressive is we have directed a couple very non-technical customers to them and they all have been able to get up and running in no time.

I'm not (yet) an actual customer of DNSimple, but I know literally half of the team and I can vet for them, both humanly and technically.

That said, I don't know anything about Zerigo and I have no opinion about them.

Just to second this one: I've used DNSimple for years, and it's an awesome product. I can't imagine doing DNS without it now.

Maybe you should check if your own services are up next time you spam them in a thread where your competitors are down.

Your pricing looks amazing - but it looks like you had an outage this morning too. Can you talk about that?

Yes, we had a short outage this morning. We're still investigating the root cause but the symptoms were essentially a simultaneous slow down across all name servers. It's quite possible this is part of the same DDoS that Zerigo is facing, we're not certain - we've been seeing lots of spikes today from various IP addresses (but we've actually been seeing similar patterns for quite some time, as have other DNS providers).

We'll post a more extensive post-mortem once we have a better understanding of what happened, but our main goal at the moment has been to ensure systems remain stable and responsive.

Two points:

- The downtime was 40 minutes (according to Pingdom), that isn't short for something as critical as DNS.

- It seems off to pimp yourself on a thread about a competitors downtime, especially when you had a significant one at pretty much the same time.

Instead of adding another unicast network to the mix, why wouldn't you start using an IP anycast network? DNSMadeEasy / Route53 / EasyDNS / etc.. It seems crazy to move from one service that purchases 6 name servers to another one that is only on 6 servers.... DNS Made Easy fought an attack that was over 200 Gbps a few months ago. http://www.facebook.com/photo.php?fbid=10150668694804467&...

Can any unicast provider even really get close to fighting an attack like this?

Let's be serious at some point.... 6 servers... the MOST you can push is 6 Gbps. And most likely they are bound to about 400 Mbps of DNS traffic (based on CPU load). Unless you have hundreds of name servers and multiple locations... are you even really competing in uptime anymore?

http://stats.pingdom.com/qqps0x9eb0at/195711 indicates the outage on NS1 was 26 minutes. It's still a long time and we're pulling the trigger on some changes (read: capital investments) to stop it from happening in the future.

Thanks for your quick response. I'll follow up in an email with some additional questions about moving over my domains.

DNSimple is the best, seriously, no really, seriously.

Going on 8 hours of Zerigo's downtime I've had to move all of our Zerigo DNS to DNSMadeEasy. It's a shame, because I really, really like Zerigo, especially their API.

Shit happens, but 99.9% (8 hours a year of downtime) is completely unacceptable for a DNS provider.

Add these to your hosts file to access your account: manage.zerigo.com dns.zerigo.com

Source: https://twitter.com/coldclimate/status/227369346891132928

Thank you so much!

Apparently no ETA for restore as of 2 hours ago: https://twitter.com/zerigo/status/227322909230768128

Seems like if you are serious about mitigating this type of issue (as a consumer), you really should be specifying name servers from different providers. Your primary DNS server can be from dnsimple/zerigo/dnsmadeeasy and your secondary can be route53, or you could run your own.

The only problem seems to be keeping them in sync. Seems like you'd have to poll the primary (using whatever API it exposes) to update the secondary.

Mostly thinking out loud, surely someone more experienced could provide better guidance?

Ideally your primary provider would support AXFR and NOTIFY which are part of the DNS zone transfer protocol. It's something we're working on adding to DNSimple, but we're not quite ready to launch it yet. The primary and secondary providers also both need to report the correct authoritative name server delegation details so the primary needs to ensure that that data is in the zone file.

There is another challenge in that we're pushing the envelope a bit by offering features that rely on more than just a DNS record (for example ALIAS and POOL records). These are useful features for some people, but if you're using these types of features then they won't be portable to secondary providers.

I run a DNS hosting service (SlickDNS, www.slickdns.com) and have seen a spike in signups today as a direct result of the Zerigo DDOS attack.

I can't claim that SlickDNS is invulnerable to DDOS attack, but FWIW it does run tinydns name servers which have good performance and excellent security. So if you're impacted by the Zerigo outage, feel free to check out SlickDNS. There's a 30-day free trial with all plans and record updates are pushed through to all the name servers in under 5 seconds.

As you're probably aware the server that you use has little impact when the DDoS sends enough traffic to actually saturate your allocated bandwidth. Anycast provides a good way to handle DDoS, along with proactive monitoring and defense mechanisms, but at the end of the day DDoS are still extremely difficult to defend from completely. The downside is that Anycast is expensive and thus you need the capital to build it out and run it - which often raises the cost of systems built using it.

Sounds good... but do you have a REST API? That's the primary reason I chose Zerigo in the first place.

It's a FAQ: https://www.slickdns.com/faq/#api ;)

The REST API is in final testing, and should be released later this week. It will ship with libraries for Python, Ruby and PHP.

Best thing Zerigo could do for their customers at this point is export all zone information and email it to them or make available for DL. I have a feeling this is going to be a long outage. In the meanwhile, here is a great list of free DNS providers (dont get caught without a secondary DNS provider): http://www.lowendtalk.com/wiki/free-dns-providers

I've been seeing a lot of reflector attacks in the past couple of weeks, where the attacker sends a relatively small query for a valid domain that will return a large reply. The trick is that they spoof the source IP, so the DNS reply goes to the victim.

I ended up hacking something together to firewall any IPs which sent more than 1000 requests in a short period of time.

We've seen the same kind of attack. We ended up limiting our DNS resolvers only to our own prefixes. It's a simple ACL in bind that allows recursion (domains your DNS server is not authoritative for), only to our subnets.

Do you mind sharing the script / code to accomplish that? (some gist somewhere) I'm seeing a lot of these sort of things on our servers too..

It really is a disgusting hack, and specific to FreeBSD. It does need to be a bit more sophisticated than "block an IP if it floods me" because as it is now someone can simply spoof the IP of an ISP's DNS server and effectively firewall them, blocking their users from being able to resolve the domain names I'm hosting.

I can give you one tip to get you started: if you're running named, you can enable logging of every query, something like (hope this formats ok) :

  logging {
    channel query_logging {
         file "/var/log/named/querylog"
         versions 3 size 100M;
         print-time yes;                 // timestamp log entries

      category queries {

Well, that explains why my wife woke me up complaining about half the internet not working. Our ISP is 3 (drei.at) and she was using their DNS, guess there are issues all over Europe.

What are the main advantages of paying for DNS hosting like Zerigo or SlickDNS instead of using the one provided for free with web host companies (E.g. Linode's DNS Manager)?

FWIW the SlickDNS name servers are hosted by Linode so I'm a fan of their server hosting. For DNS management, the Linode interface is fine if you have a handful of domains with simple configurations, but beyond that it's unwieldy IMHO.

I'd say the main reason to use a DNS hosting service is to consolidate your DNS management for all of your domains regardless of registrars or server hosting providers. E.g., I personally have domains registered with 5 registrars and use two server providers. And because they specialize in DNS, DNS hosting providers should have superior interfaces, APIs and support for DNS hosting compared to generalist hosting providers.

The SlickDNS interface has two features in particular that I haven't seen in any other DNS hosting service: automatic management of "alias domains" and mapping IP addresses to named servers. See https://www.slickdns.com/features/ for details.

A dedicated DNS provider will often focus on the experience around managing DNS, including APIs and advanced features. Additionally some folks don't like putting all of their eggs in one basket and prefer to have their registrar be one company, their DNS be another and their hosts be another.

And this is why I use Route 53, I'm a lot more confident in Amazon's abilities to mitigate DDoS attacks.

Which really sucks, DDoS are really hard to combat and Zerigo are an awesome company.

US-Based customer here. Our DNS just started working again.

Running with DNSMadeEasy, is there a way to integrate it with Route 53 through AXFR to have two providers?

This might help: http://route53d.googlecode.com/svn-history/r2/trunk/README

Looks like they are close to getting NOTIFY and IXFR (incremental AXFR) working. It's an interesting approach none-the-less.

Could I integrate DNSimple with DNSMadeEasy via NOTIFY/IXFR/AXFR?

Not right now, we don't operate as a secondary provider (and I'm not sure we will).

Take a look at DynDNS's secondary service, that might work for you: http://dyn.com/dns/secondary-dns/


Days later and what do we have from them? One solitary email and a few half-assed status page updates.

This took down services like Fogbugz on demand.

We've switched our DNS provider and we're waiting for it to propagate. Check http://fogcreekstatus.typepad.com/ for updates.

Looks like this took Trello down, too...

We've switched our DNS provider and we're waiting for the change to propagate. http://fogcreekstatus.typepad.com/2012/07/index.html has all the details.

Comodo's DNS.com appears to be down too.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact