Hacker News new | past | comments | ask | show | jobs | submit login
DDoS Attack Against Dyn Managed DNS (dynstatus.com)
1563 points by owenwil on Oct 21, 2016 | hide | past | favorite | 674 comments

Out of curiosity, why do caching DNS resolvers, such as the DNS resolver I run on my home network, not provide an option to retain last-known-good resolutions beyond the authority-provided time to live? In such a configuration, after the TTL expiration, the resolver would attempt to refresh from the authority/upstream provider, but if that attempt fails, the response would be a more graceful failure of returning a last-known-good resolution (perhaps with a flag). This behavior would continue until an administrator-specified and potentially quite generous maximum TTL expires, after which nodes would finally see resolution failing outright.

Ideally, then, the local resolvers of the nodes and/or the UIs of applications could detect the last-known-good flag on resolution and present a UI to users ("DNS authority for this domain is unresponsive; you are visiting a last-known-good IP provided by a resolution from 8 hours ago."). But that would be a nicety, and not strictly necessary.

Is there a spectacular downside to doing so? Since the last-known-good resolution would only be used if a TTL-specified refresh failed, I don't see much downside.

OpenDNS does this: https://support.opendns.com/hc/en-us/articles/227987767-Dyna...

It's called SmartCache.

I do this, too.

It's called HOSTS and djb's cdb constant database.

And one does not need to use a recursive cache to get the IP addresses. Fetching them non-recursively and dumping them to a HOSTS and a cdb file can sometimes be faster; I have a script that does that. Fetching them from scans.io can be even faster.

   [ -c null ]||mknod null c 2 2 
   case $# in
    sed '
    ' /etc/hosts \
        while read a b c d;
        echo +${#b},${#a}:$b-\>$a;
   } \
    |exec awk '!($0 in a){a[$0];print}' \
    |exec cdbmake $0.cdb $0.t||exit
   exec cdbdump < $0.cdb
   test ${#0} = 2 ||
   exec cdbget $1 < $0.cdb >null;
   exec cdbget $1 < $0.cdb;


   usage: $0  
   usage: $0 domainname
First usage compiles and dumps database to screen. Second usage checks for presence of domainname and exits 0 if present otherwise exits 100. Third usage is if $0 is only two characters it will check for presence of domainname and if present print the IP and domainname in HOSTS format.


With all due respect to the enormous reliance on it that has built up over the past decades, DNS is not the internet. It is just a service heavily used for things like email and web. This does not mean, in an emergency, email and web cannot work without DNS. They once did and they still can.

The internet runs just fine without DNS. Some software may refuse to honour HOSTS and rely on solely on DNS. But that is a vulnerability of the software, not the internet. (And in such cases, e.g., qmail, I just serve my own zone via tinydns, which again is just a mirror of HOSTS.)

What you are doing and claiming is ridiculous.

For one, how are you doing to deal with stale records?

Awesome! Is this available as software I can install on my network? Sorry, probably a dumb question.

Nope, just point your machine or router's DNS to use opendns resolvers instead of your regular ones: and

Do you have a link on the opendns web site that refers to those specific Ips?

One can go even further and install DNSCrypt:


Any downsides to using this? I'm tempted to start using it, but I'm not really sure if there's any particular thing I should consider first.

Be aware that some things (Netflix, Comcast, Youtube) expect you to use your local DNS server so that they can route you to the nearest media server. Using a central IP Address like what is mentioned here can result in unsatisfactory video streaming....at least that's what I found with our Apple TV.

OpenDNS sends your "EDNS client subnet" to some CDNs including Google, though maybe not Apple.


Yes, but beware, they (at least used to) resolve unknown names to a page filled with ads.

That's good to know - the ads are the reason I reluctantly switched from OpenDNS to google.

(Reluctantly in that Google already has enough of my data, thanks, through gmail, search, maps, docs and other services, not because it doesn't work well.)

Google DNS doesn't store any identifiable/private data, as far as I understand?


Yea, but it's also plaintext. Super easy to tap, if I understand correctly.

Still, I prefer it to isps snooping.

Anyone know if Google Public DNS does?

It doesn't (first result is openDNS, second is google):

    $ dig -tA twitter.com @

    ; <<>> DiG 9.8.3-P1 <<>> -tA twitter.com @
    ;; global options: +cmd
    ;; Got answer:
    ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 63973
    ;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 0

    ;twitter.com.			IN	A

    twitter.com.		0	IN	A
    twitter.com.		0	IN	A
    twitter.com.		0	IN	A
    twitter.com.		0	IN	A

    ;; Query time: 14 msec
    ;; SERVER:
    ;; WHEN: Fri Oct 21 11:53:40 2016
    ;; MSG SIZE  rcvd: 93

    $ dig -tA twitter.com @

    ; <<>> DiG 9.8.3-P1 <<>> -tA twitter.com @
    ;; global options: +cmd
    ;; Got answer:
    ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 47295
    ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

    ;twitter.com.			IN	A

    ;; Query time: 13 msec
    ;; SERVER:
    ;; WHEN: Fri Oct 21 11:53:47 2016
    ;; MSG SIZE  rcvd: 29

A shame OpenDNS used to redirect me to some spam webpage every time I tried to resolve a domain that didn't exist--they earned a spot on my black list forever. :(

It's been years since we did that, and they were not spam pages, and easily able to opt-out.

Well, the fact that people still remember goes to show what a truly terrible idea it really was and that it probably did permanent damage to your brand.

I'm not sure what metric you use to judge it as terrible.

I thought it was great. 10,000 companies pay for my service today. 65 million people use my infrastructure today. Cisco bought the company for more than $650m. It continues to innovate on the decades old DNS in secure and useful ways.

So let me know what part is terrible.

The part where you repeated Verisign's mistake in breaking a fundamental protocol.

NXDOMAIN. Kind of a thing, and important to protocols other than HTTP.

The point is that the company did just fine even having made a mistake. Ignoring that is just being difficult.

No, the point that a company doing just fine is somehow an excuse for its actions is just the reason why we can't have nice things.

"I got mine."

I used OpenDNS for a long time. I eventually switched to Google DNS mostly because its IPs are shorter and easier to remember, and I didn't use any of the power user features for OpenDNS. I remember the page full of ads and to be honest I don't begrudge it. We all expect everything given to us for free these days, and then we don't even want the company to make money showing us an ad on the rare occasion that we mistype a URL. It's hard to get paid these days.

Ironically, those unrealistic expectations are probably a significant factor in the growth of data mining and resell; how else is a free-to-use website that doesn't have any ads (or whose users mostly block ads) going to get paid? You may say "not my problem", but it affects you when you leave the company no option but to resell data on the behaviors they observe from you.

That is not an appropriate tone for someone representing OpenDNS to take.

Why not? It's blunt, but to the point, honest, and passionate. Who cares about tone?

And seems very appropriate for the founder of OpenDNS. Pretty authoritative.

Because it's dismissive.

Everyone has preferences, I guess. I far prefer honest and curt to the kind of anodyne, contentless word-payloads pumped out by so many corporate communications departments.

Say, generating corporate communications seems like a promising direction for neural networks. A Markov chain comes close...

They don't do that any more, for what it's worth. I think for a while that was the only revenue stream for what was otherwise a free service. https://www.opendns.com/no-more-ads/

This attitude only promotes the idea that "well we might as well just continue like this then". If you can never forgive a company for doing wrong when they've corrected themselves years ago and now have a track record of doing nothing else that's irked you then what's the point in them ever bothering to make the change?

If what they do is useful to you but have a feature or bug or something else you don't like then you absolutely should forgive them if they then fix that feature or bug to work in a way you like. They may as well never bother fixing things if they can never be forgiven after repenting their internet sins.

If you've since found something that does do what you want then fair play, fill your boots. Otherwise you're being petty for the sake of being petty.

Historically, doing this has been a source of a truly awe-inspiring amount of pain.

Aw, don't leave us hanging like that. What problems did it cause?

Imagine migrating your website to a new host. A month later, you learn that a major ISP has decided that its customers don't need to know about the move, because they hold on to last-known-good records as they like. So half your traffic and business is gone. Or maybe you can't use anything run on Heroku, because the dynamicism there doesn't play nice with your resolver's policies.

That's the kind of world we used to live in when TTLs were often treated as vague suggestions.

The scenario I was describing was one where a last-known-good resolution would be used if and only if a refresh attempt fails after the authority-provided TTL expires.

I believe the scenario you are describing is a rogue ISP ignoring that authoritative TTL wholesale, caching resolutions according to its own preferences regardless of whether the authority is able to provide a response after the authoritative TTL expires.

The rogue ISPs thought they were helping people by serving stale data. After all, better something past its use-by date than failing, right? A low tolerance for DNS response times, and suddenly large chunks of the internet are failing a lot...

Among other problems, this enables attacks. Leak a route, DDoS a DNS provider, and watch as traffic everywhere goes to an attack server because servers everywhere "protect" people by serving known-stale data rather than failing safe.

Be very, very careful when trying to be "safer". It can unintentionally lead somewhere very different.

> A low tolerance for DNS response times, and suddenly large chunks of the internet are failing a lot...

Hang on a second. I feel that you're piling on other resolver changes in order to make a point. I'm not suggesting that the tolerance for DNS response times be reduced. Nor am I suggesting a scenario where the authority gets one shot after their TTL, after which they're considered dead forever. I would expect my caching DNS resolver to periodically re-attempt to resolve with the authority once we've entered the period after the authority's TTL.

> Leak a route, DDoS a DNS provider, and watch as traffic everywhere goes to an attack server because servers everywhere "protect" people by serving known-stale data rather than failing safe.

I think you're suggesting that someone could commandeer an IP and then prevent the rightful owner to correct their DNS to point to a temporary new IP.

Isn't the real problem in this scenario the ability to commandeer an IP? The malicious actor would also need to be able to provide a valid certificate at the commandeered IP. And at that point, I feel we've got a problem way beyond DNS resolution caching. Besides, if what you have proposed is possible, isn't it also possible against any current domain for the duration of their authoritative TTL? That is, a domain that specifies an 8-hour TTL is vulnerable to exactly this kind of scenario for up to an 8-hour window. Has this IP commandeering and certificate counterfeiting happened before?

> Hang on a second. I feel that you're piling on other resolver changes in order to make a point.

Yes. The point I am making is the additional failure modes that need to be considered and the pain they can cause. Historically have caused.

At no point did I ever think you were suggesting that one failure to respond renders a server dead to your resolver forever. Instead, I expect that your resolver will see a failure to respond from a resolver a high percentage of the time, leading to frequent serving of stale data.

> Isn't the real problem in this scenario the ability to commandeer an IP?

You're absolutely right! The real problem here is the ability to commandeer an IP.

However, that the real problem is in another castle does not excuse technical design decisions that compound the real problem and increase the damage potential.

> Instead, I expect that your resolver will see a failure to respond from a resolver a high percentage of the time, leading to frequent serving of stale data.

If this were true, the current failure mode would have end users receiving NX DOMAIN a "high percentage of the time," which obviously is not happening.

{edit: To be clear, I'm reading the quote as you stating that "failure to resolve" currently happens a high percentage of the time, and therefore this new logic would result in extended TTLs more often than the original post would assume they would happen}

> However, that the real problem is in another castle does not excuse technical design decisions that compound the real problem and increase the damage potential.

It's fair to point out that this change, combined with other known issues could create a "perfect storm," but as was pointed out this exploit is already possible within the current authoritative TTL window. Exploiting the additional caching rules would just be a method of extending that TTL window.

On the other hand, where do you draw the line here? If you had to make sure that no exploits were possible most of the systems that exist today would never have gotten off the ground. It seems a bit like complaining that the locks to the White House can be exploited (picked), while missing the fact that they are only supposed to slow someone down before the "men with guns" can react.

Based on the highly unscientific sample of the set of questions asked by my coworkers in my office today, the failure mode of end users receiving NX DOMAIN has happened much more than on most days.

I don't need to make sure no exploits are possible. However, it at all possible, I'd like to help ensure that things aren't accidentally made more dangerous. It's one thing to consider and make a tradeoff. It's quite another to be ignorant of what the price is.

Well, it obviously happens when the resolver is down, but that's the situation that this logic is being proposed to smooth over. The normal day-to-day does not see a high percentage of resolvers failing to respond, or else people would be getting NX DOMAIN for high profile domains much more often.

I'm just trying to make sure we don't wind up making DNS poisoning nastier in an effort to be more user-friendly.

All the attacks mentioned here seem to be of the following shape:

1. Let's somehow get a record that points at a host controlled by us into many resolvers (by compromising a host or by actually inserting a record).

2. Let's prolong the time this record is visible to many people by denying access to authoritative name servers of a domain.

(1) is unrelated to caching-past-end-of-ttl, so you need to be able to do (1) already. (2) just prolongs the time (1) is effective and required you to be able to deny access to the correct DNS server. Is it really that much easier to deny access to a DNS server than it is to redirect traffic to that DNS server and supply bogus reponses?

DNS cache poisoning is currently a very common sort of attack. The UDP-y nature of DNS makes it very easy. There are typically some severe limitations placed on the effectiveness of this attack by low TTLs. It does not require you to deny access to the authoritative server. This attack is also known as DNS spoofing: https://en.wikipedia.org/wiki/DNS_spoofing

Ignoring TTLs in favor of your own policy means poisoned DNS caches can persist much longer and be much more dangerous.

Right now, to keep a poisoned entry one must keep poisoning the cache.

In that world, one can still do that. One can also poison the entry once and then deny access to the real server. You seem to be arguing that this is easier than continuous poisoning. Do I understand you correctly?

You are correct in your assessment of the current dangers of DNS poisoning.

I am in no way arguing about ease of any given attack over any other. I am arguing that a proposed change results in an increased level of danger from known attacks.

I'm arguing that the proposed change at hand, keeping DNS records past their TTLs, makes DNS poisoning attacks more dangerous because access to origin servers can be denied. Right now TTLs are a real defense against DNS cache poisoning, and the idea at hand removes that in the name of user-friendliness.

The way I read your argument, it relies on denying access to be cheaper or simpler than spoofing (X == spoofing, Y == denying access to authoritative NS):

You are arguing that a kind of attacks is made more dangerous, because in the world with that change an attacker can not only (a) keep performing attack X, but can also (b) perform attack X and then keep performing Y. If Y is in no way simpler for the attacker why would an attacker choose (b)? S/he can get the same result using (a) in that world or in our world.

Am I misreading you or missing some other important property of these two attack variants?

I believe you may have failed to consider the important role played by reliability.

X cannot always be done reliably - it usually relies on timing. Y, as we've seen, can be done with some degree of reliability. Combining them, in the wished-for world, creates a more reliable exploit environment because the spoofed records will not expire. The result is more attacks that persist longer and are more likely to reach their targets.

Such a world is certain to not be better than this one and likely to be worse.

Indeed I didn't consider that. Thanks a lot for being patient and enlightening.


I appreciate the support. But FWIW, I don't think Kalium was trolling. Although he (I assume, but correct me if I am wrong) and I disagree on the risk versus reward of extending the time-to-live of cached resolutions beyond the authoritative TTL, I nevertheless appreciated and enjoyed his feedback.

You assume correctly.

I'm afraid we're simply going to have to agree to disagree on this point. I do not share the opinion that this is a good idea with significant upside and virtually no downside. I also do not agree that none of the issues I have raised apply to the original suggestion - I believe they do apply, which is why I raised them.

WRT the second attack, what they're referring to is actually DNS cache poisoning - inserting a false record into the DNS pointing your name at an attacker-controlled IP address. This is a fairly common attack, but usually has an upper time limit - the TTL (which is often limited by DNS servers).

This proposal would allow an attacker to prolong the effects of cache poisoning by running a simultaneous DDoS against un-poisoned upstream DNS servers.

Not sure whether it could be used in a legitimate attack (probably), but it can definitely lead to confusing behavior in some scenarios. You switch servers, your old IP is handed to some random person, your website temporarily goes down - and now your visitors end up at some random website. Would you want that? Especially if you're a business?

Also, "commandeering" an IP of a small hosting might be easier than you think. It depends entirely on how they recycle addresses.

You seem to be continuing to warn against a proposal that isn't the one that was made. What specifically is dangerous about using cached records only in the case of the upstream servers failing to reply?

It doesn't take much of an imagination to attack this.

The older I get in tech the more I realize we just go in circles re-implementing every bad idea over again for the same exact reasons each "generation". Ah well.

TTL is TTL for a reason. It's simple. The publisher is in control, they set their TTL for 60 seconds so obviously they have robust DNS infrastructure they are confident in. They are also signaling with such low TTLs that they require them technically in order to do things like load balance or HA or need them for a DR plan.

Now I get a timeout. Or a negative response. What is the appropriate thing to do? Serve the last record I had? Are you sure? Maybe by doing so I'm actually redirecting traffic they are trying to drain and have now increased traffic at a specific point that is actually contributing to the problem vs. helping. How many queries do I get to serve out of my "best guess" cache before I ask again? How many minutes? Obviously a busy resolver (millions of qps at many ISPs) can't be checking every request so where do you draw the line?

It's just arrogant I suppose. The publisher of that DNS record could set a 30 day TTL if they wanted to, and completely avoid this. But they didn't, and they usually have a reason for that which should be respected. We have standards for a reason.

Assume we serve the last known record after TTL.

Here's the attack:

- Compromise IP (maybe facebook.com)

- DDoS nameservers

- facebook removes IP from rotation

- Users still connect to bad actor even though TTL expired

"We have standards for a reason" is absolutely correct, and we can't start ignoring the standards because someone can't imagine why we need them _at this moment_

Yes, but there's one piece missing.

> Here's the attack:

> - Compromise IP (maybe facebook.com)

- Attacker generates or acquires counterfeit facebook.com certificate.

> - DDoS nameservers

> - facebook removes IP from rotation

> - Users still connect to bad actor even though TTL expired

I understand what you are saying, but this attack scenario is extraordinarily difficult as a means to attack users who have opted to configure their local DNS resolver to retain a last-known-good IP resolution. It involves commandeering an IP and counterfeiting Facebook's SSL/TLS certificate. As I have said elsewhere in this thread, all sites are currently vulnerable to such an attack today for the duration of their TTL window. So if this is a plausible attack vector, we could plausibly see it used now.

You're right! Completely, absolutely, 100% right. If this was a plausible attack vector, we could see it used now. And you know what? We do!

This is why some people are concerned about technical decisions that make this vector more dangerous. Systems that attack by, say, injecting DNS responses already exist and are deployed in real life. The NSA has one - Quantum. Why make the cache poisoning worse?

Kalium, I really appreciate your responses.

If my adversary can steal an IP from Facebook, create a valid certificate for facebook.com, and provide bogus DNS resolution for facebook.com, I feel it's game over for me. My home network is forfeit to such an adversary.

But I get your point. It's about layering on mitigating factors. The lower the TTL, the lower the exposure. Still, my current calculus is that the risk of being attacked by such an adversary is fairly low (well, I sure hope so), and I would personally like to configure my local caching resolver to hold onto last-known-good resolutions for a while.

All that said, I have to hand it to you and others like you, those whom keep the needle balanced between security and convenience.

Now that I think about it more, it's even worse than that. A bogus non-DNSSEC resolution and a forged cert, both of which are real-life attacks that have actually happened, and you're done for. Compromising an IP isn't really necessary if you're going to hang on to a bad one forever, but it's a nice add-on. It removes the need to take out the DNS provider, but we can clearly see that that is possible.

Keeping the balance between security and convenience is difficult on the best of days. Today is not one of them. :/

If you can forge certs for HTTPS-protected sites, this is not what you would use them for.

It's part of what I would use them for. A big, splashy attack distracts a bunch of people while you MITM something important with a forged cert? Great way to steal a bunch of credentials with something that leaves relatively few traces while the security people are distracted.

> - Attacker generates or acquires counterfeit facebook.com certificate.

So you enabled an attack vector that has to be nullified by a deeper layer of defense? And in some cases possibly impacted by a user having to do the right then when presented with a security warning.

Why would you willingly do that?

Also I do find your assumption of ubiquitous TLS rather alarming - facebook is a poor example here, there are far softer and more valuable targets for such an attack vector to succeed.

Edit: Also to keep my replies down...

> I would personally like to configure my local caching resolver to hold onto last-known-good resolutions for a while.

You can! All these tools are open source, and there are a number of simple stub resolvers that run on linux (I'd imagine OSX as well) which you can configure to ignore TTL. They may not be as configurable as you like, but again they are open source and I'm sure would welcome a pull request :)

The policy that caused so much pain before is to take DNS records, ignore their TTLs, and apply some other arbitrarily selected policy instead. I confess, I don't understand how the proposal at hand is different in ways that prevent the previous pains from recurring.

Maybe you can enlighten me on key differences I've overlooked? How do you define "failing to reply"? Do you ever stop serving records for being stale, or do you store them indefinitely?

So if our resolver was on our resolver was on our laptop and had a nice UI that would work great. Now the question is : why is the resolver not in my laptop?

It can be if you want it to be, but it's probably much less interesting than you think.

You likely underestimate the sheer number of DNS records you look up just by surfing the web, and how useful that information would be to 99.99% of users.

Basically the tools exist for you to do this yourself if you are so inclined, but they may not be that user friendly since they aren't generally useful to most.

Seconded. This is a common idea which occurs to people who haven't dealt with DNS before, and it ends with a much better understanding of how many things https is used for and going back to using openDNS as your resolver.

Proper cache invalidation is one of the 2 hard problems of computer science (the other 2 being naming things and off by 1 errors).

Yes, and there are 10 kinds of people in the world: those who understand binary and those who don't.

It'd be nice to have a "backup TTL" included, to allow sites to specify whether and how long they wanted such caching behavior.

Also, that cache would need to only kick in when the server was unreachable or produced SERVFAIL, not when it returned a negative result. Negative results returned by the authoritative server are correct, and should not result in the recursive resolver returning anything other than a negative result.

> Also, that cache would need to only kick in when the server was unreachable or produced SERVFAIL, not when it returned a negative result. Negative results returned by the authoritative server are correct, and should not result in the recursive resolver returning anything other than a negative result.

Precisely. I am not suggesting any change to how a caching resolver comprehends valid responses from the authoritative server for a domain. For example, if the authoritative server says, "No such domain," then the domain is understood to be gone. At that point, the domain being gone is in fact the last-known-good resolution.

This is why TTL is a variable. If you want to have your records last longer, you can. If you want to later shorten them, you can. People screw DNS up enough already, let's not make it worse by adding layers of TTL.

It might be a stretch to use the information, but the SOA RR does contain an EXPIRE field, defined as "A 32 bit time value that specifies the upper limit on the time interval that can elapse before the zone is no longer authoritative." It's an additional request, but the SOA RR does contain the type of information you are asking for.

I've been thinking of adding this exact feature to my DNS framework that I've been working on (if github was resolving): https://github.com/bluejekyll/trust-dns

If you have any feedback, I'd love to hear it.

Sounds promising. I'll want to take a look when GitHub is resolvable again. :)

To be perfectly honest, a "feature" like this has no business being in a safe and secure DNS server. You should fail-safe, rather than serving stale data of unknown safety.

Serving data you cannot verify is a dangerous failure state.

Perhaps. In this case the web was down. I definitely understand the point, stale data with TTLs which have expired, especially on RRSIG records is dangerous.

But I have to wonder about situations like this where DNS has been taken down, what the better good is. If the records can be proven to have been cached as authentic data at some point within some period of time. In this case hours, is it for the better good that stale authentic records are acceptable to serve back? In this case a stale period of some number of hours would have been good.

I'm not so sure which is better in this case.

Would you believe me if I said that the DNS protocol itself has an answer for this? The answer is in the basic design of what a TTL is. It's preferable to serve nothing than to serve known bad data. Stale data is a form of bad data.

As another user put it, we have these standards for good reason.

> Is there a spectacular downside to doing so? Since the last-known-good resolution would only be used if a TTL-specified refresh failed, I don't see much downside.

Because you would keep old DNS records around forever if a server goes away for good. So you need to have a timeout for that anyways.

Yes, but:

1) Memory and disk are cheap. My caching DNS resolver can handle some stale records.

2) I suggested above that this behavior would continue until an administrator-specified and potentially quite generous maximum TTL expires. That is, I could configure my caching DNS resolver to fully purge expired records after, for example, 2 weeks.

> 1) Memory and disk are cheap. My caching DNS resolver can handle some stale records.

The problem is not that it would require storage but that stale records can be outright wrong. That timeout would require configuration and DNS does not provide that.

So sure, a new timeout could be introduced but that currently does not exist in DNS.

> The problem is not that it would require storage but that stale records can be outright wrong.

Again, the scenario is that the authoritative/upstream resolver cannot be reached in order to refresh after the authority-provided TTL expires. Are you saying that in the case of a service having been intentionally removed from the Internet (the domain is deactivated; the service is simply no more), my caching resolver will continue to resolve the domain for a time? Yes, it would. What's the downside though?

> That timeout would require configuration and DNS does not provide that.

Yes. This would be a configurable option in my caching DNS resolver, in the same vein as specifying the forwarders, roots, and so on. But to be clear, this would not be a change to the DNS protocol, merely a configuration change to control the cache expiration behavior of my resolver. I'm not wanting to sound flippant, I'm not sure I understand the point you're trying to make here.

>The problem is not that it would require storage but that stale records can be outright wrong.

But the tradeoff here is a wrong record vs a complete failure to lookup the record. I would rather have the wrong one.

If a server goes away for good, at some point NS records will stop pointing to it. We could serve stale records as long as all of the stale record's authority chain is either still there or unreachable.

I've had an IP address from a certain cloud provider for a month. Some abandoned domain still has its nameserver and glue records pointing to the IP, and i get DNS queries all the time.

The domain expires in January. I hope it's not set to auto-renew. :-)

Note that this is already happening. The only thing my proposal would change is that it would also affect servers that used to be authoritative for subdomains of such abandoned domains. I would expect there to be very few of them: very few domains have delegations of subdomains to a different DNS server and they are larger and thus less likely to be abandoned.

I think what the poster above you is saying is a feature on some software that isn't in an RFC somewhere.

HTTP has a good solution/proposal for this: the server can include a stale-on-error=someTimeInSeconds header in addition to the TTL and then every cache is allowed to continue serving stale data for the specified time while the origin is unreachable. Probably a good idea to include such a mechanism in DNS, too.


I can guarantee you that popular DNS resolvers (think 500b+ transactions a day) do have this feature!

Don't want to say much more due to it being my job, and I don't want to give away too much.

EDIT: https://www.google.com/patents/US8583801

What is the point of this comment?

FWIW, since the comment was a reply to my message above:

It provided value by answering my question concerning serious downsides to providing optional post-TTL last-known-good caching within a DNS resolver. The answer is implicit in that a major DNS resolver provides exactly this functionality.

Thank you :)

A little more information, considering it is public. (I had to double check if it was)


you mean opendns?

No I mean ISP's that run their own DNS resolvers.

i seem to remember that dns has generally been reliable (until recently, i guess), probably nobody has ever thought that to be necessary.

you could write a cron script that generates a date-stamped hosts file based on a list of your top-used domain names, and simply use that on your machine(s) if your dns ever goes down. that's basically a very simple local dns cache.

if you feel like living dangerously, have it update /etc/hosts directly.

> i seem to remember that dns has generally been reliable (until recently, i guess)

Probably because people used to use long TTLs (1 hour, 4 hours, whatever) and now the default behavior in services like Amazon Route 53 is to use 5 minutes.

Try Akamai managed CDN content. 20 seconds !!

The 20 seconds with Akamai is because of their dynamic end user IP mapping technology, Basically they need to map in near real-time based on characteristics of the end user IP, they can't afford a long TTL

It's not illegal to have TTL that short but it certainly feels like violation of some implicit contract between users and provides. Of course the root cause of this is the horrendous hack of using DNS for CDN routing. It doesn't have to be that way... I wrote a recent article about this very issue here


I haven't heard of packetzoom before, I'll definitely take time over the weekend or next week to dig into your approach.

I wouldn't call DNS based IP mapping "horrendous" simply because it doesn't work as well for mobile,I understand you have your own pitch but lets go easy on the hyperbole :)

The fact is that it is still very effective. The major CDNs are quite aware of the mobile shortcoming of DNS based mapping and I am pretty sure it is something they are working to address.

At the end of the day location is just one component involved in accelerating content, there are plenty of other features that various CDNs use to deliver optimal performance.

Regarding the short TTLs, I get your argument, it is indeed like a user's browser is constantly chasing a moving origin. The alternatively however is a non-optimized web, which would be orders of magnitude worse. Remember the benefits of CDNs doesn't just accrue to end users but also to content providers, most origin servers can't handle even the slightest up tick in traffic.

"I wouldn't call DNS based IP mapping "horrendous" simply because it doesn't work as well for mobile"

OK I'll take back the word "horrendous" but it's a hack alright.

> The fact is that it is still very effective. The major CDNs are quite aware of the mobile shortcoming of DNS based mapping and I am pretty sure it is something they are working to address.

No not really. They're certainly trying to patch DNS to pass through enriched information in DNS requests through recursive calls... but it's such a long shot to work consistently across tens of thousands of networks around the world, and requires coordination from so many different entities, that it's clearly a desperation move more than than a serious effort. Regardless, there's no real solution in sight for the web platform.

For mobile (native) apps though, the right way to discover nearby servers is to directly build in that functionality using mobile specific techniques. There's no reason to keep limiting mobile apps to old, restrictive web technologies considering that apps have taken over as majority of traffic around the world. That's the root idea behind a lot of what we're doing at PacketZoom. Not just in service discovery, but also in more intelligent transport for mobile with built-in knowledge of carriers and network technologies etc, automatic load-balancing/failover of servers and many other things. Here's my older article on the topic


Say I want to implement my own dynamic DNS solution on a VPS somewhere - if I set short TTLs am I causing problems for someone? How short is too short?

This company, for example, has a 99.999% uptime SLA. Thats roughly 5 minutes per year.

I think a problem that you might be overlooking is that DNS lookups aren't just failing, they are also very slow when a DDOS attack is underway on the authority servers. This introduces a latency shock to the system which causes cascading failures.

All will break the moment that one of the websites that you access makes a server-side request to another website ( think about logging-services, server-clusters, database servers, etc - they all either have IPs or most-likely some domains. )

I'm not sure I understand what you're saying.

The scenario is that my local network's caching DNS resolver retains resolutions beyond the authority-provided TTL in the event that a TTL-specified refresh at expiration fails. Therefore, my web browser may—in the very rare situation where this arises—make an HTTP request to an IP address of a server that has been intentionally moved by a service provider (let's assume they did so expecting their authoritative TTL to have expired). Since this scenario only arises because my caching resolver wasn't able to reach the authority, I'm not seeing a downside.

But if I understand your reply correctly, you are saying that the web server I've contacted may, in turn, be using a DNS resolver that is similarly configured to provide last-known-good resolution when its upstream provider/authority cannot provide resolution. This would potentially result in that web server making an HTTPS API request to a wrong IP, again only in the rare case where we have defaulted to a last-known-good resolution. I'm not really seeing the problem here except that the HTTPS request might fail (if the expected service was moved and no longer exists at the last-known-good IP), but how is that worse than the DNS resolution having failed? In both cases, the back-end service request fails.

One possible issue is that IPs are re-used in cloud environments. Potentially, your browser could POST sensitive data to an IP address that now belongs to a totally different company.

Yeah, that is definitely possible.

I mean, hopefully it is over HTTPS so they can't do anything with it... but if it isn't then it can definitely happen. Our servers get random web traffic all of the time.

True. I am not aware of any web services POSTing sensitive data over the public Internet that don't use HTTPS. If your service is sending sensitive data over HTTP without TLS, I feel the problem is bigger than a potential long-lived DNS resolution.

Valid point! One would hope that does not happen.

I mean, hopefully it is over HTTPS so they can't do anything with it...

DV certs only rely on you being able to reply to an HTTP request, so if any CA was using such a caching DNS server, you could probably get a valid cert from them.

HTTPS does not protect you against sending data to a host owned by another company.

Yes it does, the cert presented by api.othercompany.com would not pass validation when you're trying to open a connection to api.intendedcompany.com.

Correct, but they wouldn't be able to decrypt the data.

The data doesn't even get there, the handshake kills the connection before that.

I think you understood me. Maybe I can explain more.

If you have a service `log.io` with it's own DNS servers ( running named or djbdns ). And one day you decide to shut them down and rename the service to `loggy.io`.

What will happen is that any DNS trying to query the `log.io` DNS will reach unreachable server, which will lead to serving the last-known IP from the proposed DNS Cache on your machine.

If you don't use forever-failback-cache after the TTL expired you will just reach unreachable server and return back no IP address.

This is the equivalent of retiring the domain name itself. If you stop renewing it anyone can hijack it and serve whatever they like. Not to forget, they will also get email intended for that domain.

Anyone sane will keep the domain name and ns infrastructure and serve a 301 HTTP redirect.

All anyone is proposing here is to override the TTL to something longer (like 48h) if the nameserver is unreachable.

Of course the perfect solution would be to have the recursive nameserver fetch the correct record from a blockchain.

Thanks for the reply.

> If you have a service `log.io` with it's own DNS servers ( running named or djbdns ). And one day you decide to shut them down and rename the service to `loggy.io`. > What will happen is that any DNS trying to query the `log.io` DNS will reach unreachable server, which will lead to serving the last-known IP from the proposed DNS Cache on your machine.

To reiterate the scenario you've put forth as I understand it: I'm a service operator and I've just renamed my company and procured a new domain. I've retired the old domain and expect to fulfill no more traffic sent to that domain. When a customer of my service attempts to resolve my old domain, their caching DNS resolver may return an IP address even though I have since shut down the authoritative DNS servers for the retired domain. They will make an HTTPS request to my servers (or potentially someone else's, if I also gave up my IP addresses), and fail the request because the certificate will be a mismatch.

The customer's application will see a failed request either by (a) DNS failing to resolve or (b) HTTPS failing to negotiate. Either way, my customer needs to fix their integration to point to my new domain.

To be clear, it is up to my customer to decide whether they want their caching DNS resolver to provide a last-known-good resolution in the event that authoritative servers are unreachable. If they prefer failure type (a) over (b), they would configure their DNS resolver to not provide last-known-good resolutions.

When your customer sees that HTTPS error, they may associate it with your company failing at security.

You can install EdgeDNS locally. It does that, among other things.


This can be pretty bad in a world where AWS ELB IP addresses change regularly.

Why? The OP is only proposing using a cached result when there's no updated record available.

Serving wrong records is usually worse than serving no records.

EDIT: It would be fine as long as your site only served HTTPS content and HSTS was enabled for your domain, preventing any sort of MITM attack.

If your DNS server is offline, is the last record it returned when it was online really the "wrong" one? There'd be no right one in that case.

Exactly. You can't know if it is still valid, so you might send clients to an IP that's now controlled by somebody else. Worst case, they know and set up a phishing site. DNS generally has been reliable enough that the trade-off is not worth it.

> Worst case, they know and set up a phishing site.

They'd need to specifically gain access to the last known good IP address, which might be different depending on which DNS resolver you talk to (geodistribution, when the record was last updated, etc). I wouldn't really consider that a realistic attack vector.

Withing a small hosting provider this might be pretty simple. Attacker might lease a bunch of new servers and get the IP that was recently released. Then they could launch a DDoS to force address resolution in their favor. It's a bit far-fetched, but a lot of very successful attacks seem that way until someone figures out a way to pull them off.

Sure, within a small host or ISP, that may be doable; even then, I'd consider it a stretch. But if we were to limit it to those constraints, when will this ever be exploited except as a PoC? No entity large enough to actually want to run this exploit on has an architecture where this would be feasible (due to things like geodistribution and load balancing), nor would be hosted on a provider so small that their IP pool can be exploited in the manner described. If I were an attacker, I'd focus on something with much bigger RoI.

How about DNS cache poisoning? Serving stale data combined with a DDoS on the root resolver makes DNS cache poisoning more dangerous by prolonging it.

I like this idea. Grab a new elastic IP on AWS. Set up an EC2 listening on 80, maybe some other interesting ports, and see if anything juicy comes in. Or just respond with some canned SPAM or phishing attempt. Repeat.

I guarantee you'll get lots of weird traffic. We do all the time on our ELB-fronted app servers.

Yes. You either positively know the right answer, or you return the fact that you don't know the right (currently valid per the spec) answer. The right answer in the situation you posed is "I don't know".

Relevant (or at least a-propos) post by Bruce Schneier, from a month ago: "Someone Is Learning How to Take Down the Internet"


Edit: And to be clear: I don't mean to imply there's any connection :)

Prediction: A massive, sustained attack will occur on key US Internet infra on election night in an attempt to debase the US election results.

That was exactly my thought. This may be unrelated, or it may be a test run. But a large scale attack on Election Day that crippled communications would stir up unrest for a variety of reasons. Although I think that's highly unlikely to change the outcome, unrest after such a contentious election is not good.

If this is a test run, this is an amazing early warning for Twitter and the like to immediately start working on contingency plans for election day.

What can they do? It's not Twitter themselves being DDOS'd, it's a DNS provider. This propagates up the chain to impact both a Tier 1 network and cloud providers, which hits tons of stuff on top of that.

Have a failover DNS provider. Amazon uses Dyn, but also has UltraDNS as a backup, and it's obviously still up. Twitter vs Amazon:

host -t ns twitter.com: ns3.p34.dynect.net, ns4.p34.dynect.net, ns1.p34.dynect.net, ns2.p34.dynect.net.

host -t ns amazon.com: ns3.p31.dynect.net, ns4.p31.dynect.net, ns2.p31.dynect.net, pdns6.ultradns.co.uk, pdns1.ultradns.net, ns1.p31.dynect.net.

Thank you so much for mentioning this. This was my first thought when I heard about all the major enterprise sites affected by the DDoS:

"How do all these major players have singly-homed DNS"?

if you utilize geoip routing features of one provider, it can be difficult to impossible to then ensure repeatable/deterministic behavior on a second provider.

They could distribute instructions for users to follow, E.g. using some permanent working IPs or alternate DNS servers.

Gives them time to test the countermeasures before 11/8

Assuming you're referring specifically to targeting media companies reporting on the results and not the electric grid like someone else mentioned, wouldn't they have to DDoS Google itself for that to work? I don't really see a DDoS of Google being effective.


This comment says literally nothing.

Probably not a great idea. If the internet went down at my work, none of us would be able to do anything, so we'd probably all head out to the polls just because we have nothing better to do. Unintentionally increased turnout.

This is terrifying. Thankfully I don't think much actual voting infra is network reliant. But it could probably delay the results from being finalized for days, and allow Trump to spew further allegations of rigging.

Though if they targeted electric grid, water, and public transport, starting early in the day and choosing the regions by their populations political leaning, it could easily have an effect on the result itself.

You don't need to target voting infrastructure. You target media infrastructure (DNS, streaming, web media) in order to either reduce or shift voter turnout. A candidate ahead in a battleground state? You stomp on media reporting to ensure their opponent's voters aren't dissuaded from heading to the polls.

Control the message, and through that the actual votes cast.

I dont even think you need to necessarily shift voter turn out. I think you need to sow enough confusion in order to cast the results into doubt.

Yeah it just needs to be "The internet was broken so your votes were lost" and then some made up post-hoc explanations that 90% of people don't understand so they can't dispute

Given various fuckups over the years, media won't call a state until the polls are completely closed. Silencing them doesn't change this strategy.

This is not congruent with their behavior during the primaries.

I'm working at the polls in CA, and can verify this; all critical information is moved by sneakernet with a two-person rule on its handling.

Of course, I have no information on the security model of the pre-election preparations and post-election tabulation, but luckily results for each polling place are also posted for the public to inspect - media outlets and campaigns can verify the tabulation themselves with a slight delay.

Hahahaha. For sure it's not supposed to be network reliant. But from my experience working on critical infra, even things like power grids and rail systems, this is almost never the case.

That is a terrifying thought. Sounds like the plot of a potential Neal Stephenson novel.

I think you're right, not much of the voting infra is network reliant, but the more I think about it the more it seems that the "fear" factor of such outages could influence the election. Or, perhaps a curated working set of information sources, thanks to selective DDoS. Regardless, terrifying to be sure.

Is it really important who wins it there are only two candidates that share common view on many problems? And you don't need Internet to count votes anyway.

It's okay. James Comey, the FBI chief, said the US electoral system is such a mess, it would be too hard for an attacker to hack it or damage its integrity in any way. It's all good.


Of course, he said nothing about internal rigging:


Please keep the unfounded conspiracy theories off Hacker News. Thank you.

You have complex situation and you can't understand it, conspiracy theory offers simple answer.

> The Department of Homeland Security told CNBC that it is "looking into all potential causes" of the attack.


Is this par for course for all large DDOS attacks or did something tip them off?

More like for the first time in a long time, serious negative economic impact is occurring. I sincerely wish this was a wake up call, but it won't be.

From what I know of the situation (don't trust me, I'm not going to offer citations or sources), this attack wasn't particularly large in terms of gigabits/second. It was, however, very large in terms of economic impact.

I would assume that when a large number of big enterprise-y things go down, HSI takes notice. When other providers get attacks that are 20x larger (gbit/sec), but have much less widespread impact and impact on less enterprise-y things, they don't care so much.

>We don't know who is doing this, but it feels like a large nation state. China or Russia would be my first guesses.

Why not the USA?

It doesn't make a whole lot of sense for the USA to take down the internet, as they benefit the most from it. A significant fraction of that economy is based on it, much larger than in the cases of China and Russia. It would be like the owner of a coal mine campaigning for a carbon emissions tax: maybe there's something we don't know, but from the information we have it seems unlikely.

Note that this wouldn't rule out the USA as such. First, it could be a longshot preparedness thing, with no expectation that it would ever be used. Second, they could be red-teaming the thing (looking for weaknesses so that they can arrange for them to be shored up).

In either of these scenarios, it's no less likely that the USA would be doing it than anyone else. If you assume that whoever is doing this is planning to use their knowledge, however, the economic argument makes the USA less likely to be involved.

There are many types of actors even within the USA nation-state / government.

For example, if a particular part of the government got wind of a data dump about to be released by another nation-state or independent actor (for example, a leak of some kind) - I think some parts of the USA government that possesses the ability to do so wouldn't hesitate to take down dns to the entire internet to avoid another similar data leak to the Snowden dump.

Be really wary of attributing intent: you do not know who will benefit the most from taking down certain services. To claim that the US benefits from the internet so much that it wouldn't do certain actions to protect itself from certain types of harm is shortsighted.

Even my example could be really wrong, but the idea is that nobody really can say - "oh the internet is too important to xyz, they'll never do anything!"

> wouldn't hesitate to take down dns to the entire internet to avoid another similar data leak to the Snowden dump.

I don't understand how this would change anything unless you're assuming they would take down the Internet permanently

I'm probably wrong, but this is how I see it (not sure about the OP).

News cycles happen fairly rapidly, so if you could take down a number of sites that might be friendly to the dissemination of potentially damaging information just long enough such that it's forgotten about, or the attack is so large the media talks about the attack instead, then you might be able to successfully avoid widespread public knowledge of such information. Though, this would be best aided with collusion or cooperation (intentional or otherwise) from the media. Toss in a few unrelated services as a bonus for collateral damage, and you might be able to avoid scrutiny or, at the very least, shift the blame to an unrelated state actor. It won't prevent the release of information, but that's not the point--you want to prevent the dissemination and analysis of that information by the public at large.

This is all hypothetical, of course, and not likely to work. It also comes with the associated risk that if you were discovered or implicated, public outrage might be even worse than if you allowed the release of the information you hoped to distract from in the first place! As such, I can't imagine anyone would be stupid enough to try.

I'll take my tinfoil hat off now.

If you take them offline before they've managed to disseminate the info, then it can't be forgotten because nobody knew about it in the first place. Which means when the sites come back on, the info is still newsworthy.

What kind of information would be so sensitive as to risk crashing the economy over, yet so trivial that people would forget about it because they couldn't access Twitter for a day? I get that it's more nuanced than that, but I'm really struggling with this scenario; sensitive information tends to get out if it's important enough, even if you're willing to kill a bunch of people.

It renders the server that's hosting a leak unable to broadcast the leak temporarily, while they arrange more conventional measures to seize it. It's a more rapid response than getting a warrant and a police team on location. The broad nature of the attack also avoids tipping off the server owners.

I still give it less than a 5% probability, though.

> the server that's hosting a leak

Honestly, that's fairly thin. WL uses torrents and other means of disseminating data that don't rely on central control structures. Plus, presumably, WL has the ability to quickly shift data into secure hands who are willing to release it when things quiet down.

So, sure, the USA could go send someone to sieze the hard drives of someone who has confidential information. But, I have to imagine one of the first steps when getting that kind of information is to disseminate it to others (at least some of whom are unknown to the states). If they were hit, these people would very quickly take that as a signal to indiscriminately release all the information.

Or they could take it down for enough time for other parts of the government and/or international diplomatic system to do their work.

Remember, a data leak is not just a technical issue. They can resolve it in any number of ways - get a small team incursion into another state's territory for extraction, etc. All the outage needs to do is to hold open that window for enough time for all the different parts of the entire threat response chain to do each part's job.

A lot of technical people think tech is the end, but no - if you get a small team to go knock on the person's door, and get your internet response team to shut down dns, or to get someone on site at the telco to perform certain actions at the router/switch level, etc all portions working together is a powerful way to resolve or to accomplish certain goals.

Think bigger, especially with state actors - the resources are there, and this line of thought is probably really basic stuff that people came up with in the 1960's or 70's (even when the arpanet was being created, there was probably already a team tasked with taking such actions - it only make sense to have 2 teams working on such goals in tandem - one to create the network, the other to take it down)

For either of your cases, I can't imagine the value of having it last this long, or even be this severe. The impact on the economy and trust in a infrastructure company is too high.

I'd be more willing to put my money on someone attacking an entity downstream who is normally immune to DDOS attacks of this size.

It would be like the owner of a coal mine campaigning for a carbon emissions tax

Not necessarily so strange. See https://en.wikipedia.org/wiki/Bootleggers_and_Baptists for instance.

Total and absolute speculation follows: If the US wanted reliable take-down capability, they might want to test it first, and it would be least provocative if they tested here in the US.

As for the length of the "test," they might want to see how the US would react to such attacks in the future, and shake out anything critical. "Oh, these two agencies can't talk to each other. Good to know."

I hate the way modern times makes me look.

The usual thinking goes something like; well, the US created the internet so why would they want to take it down? Yes, NSA spies and all that, but they need the internet up to do that and also as bad as NSA is, it's nowhere near as bad as China or Russia where they ... (ranges from censorship to eating babies alive)

> The usual thinking goes something like; well, the US created the internet so why would they want to take it down?

To pin it on someone else?

"17 Intelligence agencies told me Russia hacked our DNC thing" (Clinton).

So maybe it is now "Oh look they took down the whole internet as well".

>To pin it on someone else?

That's... kind of conspiratorial thinking? Would you cut off your own hand so you could blame it on someone else?

As someone else pointed out this is a classic false flag operation. Look up Gleiwitz incident and Operation Northwood. It can be very effective. With good opsec and anonymity online it can be even easier.

> kind of conspiratorial thinking?

You mean like lizard aliens infiltrating our planet? -No. But in the realm of "shooting down of passenger and military planes, sinking a U.S. ship in the vicinity of Cuba, burning crops, sinking a boat filled with Cuban refugees, attacks by alleged Cuban infiltrators inside the United States, and harassment of U.S. aircraft and shipping and the destruction of aerial drones by aircraft disguised as Cuban MiGs", yes.

It was mostly a reply to "US would have absolutely no reason for doing this" and the reply is there cold be a plausible reason.

It's called a false flag operation, and does happen.

And the ratio of false flag operations to false accusations of false flag operations is about 1:99. Lots of unlikely things "do happen," but if that's the immediate explanation you reach for you're going to be wrong most of the time.

> Lots of unlikely things "do happen," but if that's the immediate explanation

Where did I say it was my immediate explanation and this is _likely_ what is happening?

You might want to take this up with that "rdtsc" guy who wrote "As someone else pointed out this is a classic false flag operation." He seems to disagree with you on those points.

Oh right, I know him! He is a decent fellar. I think he was saying if US is attacking its own infrastructure, then a false flag operation is a plausible explanation. Talking to him a bit revealed he didn't say this US attacking itself.

Not suggesting this is a false flag operation, but pointing out the concept as some commenters seemed unfamiliar with it.

Of course, I just tried to explain how the thinking behind the sentiment that it obviously isn't the U.S., not saying I agree with it.

Russia retaliating for the US taking out the ESA Mars Drone.

I thought, "first actual space battle! Neat!"

Then I felt horrible.

What's this about? Confusing premise. Got any links?

Not a joke, just conjecture.

Some NPR story about US Cybercommand responding to Russian cyber attacks, 'at place and time of our choosing.'

'Some you might hear about. Some you might not.'

FFWD to a couple days ago, NPR story about a botched European and Russian lander.

Today, US Eastern Seaboard is seeing connectivity disruption due to DDoS attacks.


Unwinding the stack, the latest news is these DDoS attacks are not likely state sponsored.

Russia pulling off a coordinated attack that soon after and in response to my theorized US retaliation seems unlikely.

US attacking a joint partnership between Russia and Europe civilian space program seems unlikely.

I assume it was just a joke...

From the context of the paper, because the USA could just send a three-letter-agency agent of some sort to Dyn, a US-based company, and ask what their infrastructure looks like? (Presuming of course some weird scenario where they weren't already tracking it, which seems unlikely.)

Why would they attack themselves?

Because we are not indiscriminate like that.

Who is "we"? The u.s. government is a conglomerate of interests, organizations, individuals... Many of whom are quite indiscriminate. I'm not at all proposing that the U.S. was involved here. I'm questioning the simple identification of the U.S. government with the word "we", and the corresponding assumption that this institution is integrated in a carefully discriminating way...

Why the USA?

With the elections coming, USA would be on the top of my list.


Because America is an Orwellian hellhole, gawd, haven't you read animal house 84?!

I did, as far as I remember these book are all about USSR ;)

the number of people missing the sarcasm in your post is worrying

Because it's being down voted? I think any down votes are more likely because the comment doesn't add substantively to the conversation and is distracting given other recent threads.

um. if you are basing your logic on orwelianness then the England should be your your top candidate. massive monitoring of the populace, severe restrictions on the ability to carry anything that is vaguely pointy or goes bang, poor freedom of speech rules (relative to the US).... I love England but as far as orwellian societies go, you can do notably worse than the US.

As someone living in England, I can confirm that a) we are an extreme surveillance society, which the general population neither really understands, nor cares about b) the vast majority of us are very grateful we don't have (legalised) guns on the streets, and we suffer from a much lower homicide rate as a result

Schneier: "I have received a gazillion press requests, but I am traveling in Australia and Asia and have had to decline most of them. That's okay, really, because we don't know anything much of anything about the attacks.

If I had to guess, though, I don't think it's China. I think it's more likely related to the DDoS attacks against Brian Krebs than the probing attacks against the Internet infrastructure, despite how prescient that essay seems right now. And, no, I don't think China is going to launch a preemptive attack on the Internet."

[1] https://www.schneier.com/blog/archives/2016/10/ddos_attacks_...

If this is supposed to be "taking down the internet", then I'm not impressed. Using cached DNS still gives access to any service. I'm even typing here on HackerNews.

If this is another practice run, then I'm still not impressed. Taking down one provider is not that hard. Good luck finding the resources to do this DDoS to ALL large DNS providers out there.

Maybe it's not really fair to link to that post every time a DDoS with more than average payload happens. Especially since the post doesn't mention any specifics, because well, "protect my sources". It's like the "buy gold now" guy starring in the 2 AM infomercial predicting an economic recession within the next 5 years, without adding what the exact cause is going to be. He is probably going to be right, but that doesn't make him a visionary.

I work as a freelancer and today I didn't get paid and that's just me. Companies probably lost millions today. By just one DDoS to one DNS provider.

Yeah it's not the whole Internet, but how do you define "taking down the Internet" anyway. Is it every connected computer or just a huge amount of interconnected big websites? Because the latter is happening right now.

Let's try to put this DDoS attack in some context aside from the technical part.

As @scrollaway mentioned, 6 weeks ago, Bruce Schneier posted that several companies told him that they're detecting attempts to probe their networks and find ways to bring it down https://www.schneier.com/blog/archives/2016/09/someone_is_le...

Now let's look at the progress of events:

- Hillary Clinton's personal email server was hacked a while ago.

- A lone hacker published a document obtained by hacking the DNC servers. The document includes opposition research on Donald Trump and how Hillary can attack him in the election.

- Wikileaks published emails obtained by hacking the DNC

- US intelligence agencies confirmed that Russia was behind the DNC hack

- It was reported that the CIA is starting a cyber attack against Russian targets. http://www.nbcnews.com/news/us-news/cia-prepping-possible-cy...

- This is happening while the war in Syria and Iraq is growing. The Russians are there to "fight ISIS" but they have deployed an air defense system even though ISIS doesn't have any air force.

- Russia's only air craft carrier is trespassing through UK waters to get to Syria in a show of force that doesn't really add anything to their military capabilities there.



- Finland (yes, Finland) is increasingly worried about Russia. They violated their air space, and they're questioning Finland's independence. Finland shares a long boarder with Russia.


- US ships were attacked near Yemen after they're bombed some targets the belong to the rebels. https://www.theguardian.com/us-news/2016/oct/13/us-enters-ye...

- US election is in 3 weeks and Donald Trump is openly in love with Putin. Trump questioned the benefit of NATO which is the basis for Europe stability after the 2nd world war.

Say Hello to World War III, everybody!

> US election is in 3 weeks and Donald Trump is openly in love with Putin.

He states that he's never met Putin nor has any holdings in Russia. He has stated that he is open to positive relationships with the Russian government.

> Trump questioned the benefit of NATO which is the basis for Europe stability after the 2nd world war.

I believe he stated that he wants NATO to "pay their fare share" in the costs of maintaining the organization.

I'm not a Trump supporter but we shouldn't believe everything we read.

Trump says he never met Putin, now. In the past, he said he did. I just did a search for "trump met putin" and found a bunch of news sites reporting that in a GOP debate a while ago Trump said

“I got to know him very well because we were both on ‘60 Minutes,’ we were stablemates, and we did very well that night.”

Trump was boasting in that debate about nothing, they were on the same episode of 60 minutes but they were not even on the same continent for that episode.


That's not really the point. The point is Trump is now saying he hasn't, but in the past he said he has. Not only is he contradicting his earlier statement, but it also makes him not trustworthy. And of course, if he was boasting about having supposedly met Putin in the past, that means he thought it was a good thing to boast about, which suggests that he is sympathetic to Putin and to Russian interests.

> - Finland (yes, Finland) is increasingly worried about Russia. They violated their air space, and they're questioning Finland's independence. Finland shares a long boarder with Russia.

The Finns actually have still quite good relationship with Russia (better than other neighbors) and nobody's actually questioning Finland's independence. Baltic countries is a different story.

Source: A Finn here.

It's good to know things are clam, I was just quoting the article. Not sure where they got that from

> Finland is becoming increasingly worried about what it sees as Russian propaganda against it, including Russian questioning about the legality of its 1917 independence

Lithuanian here, can confirm. Quite scared of Russia, for (what I hope are) understandable reasons.

> - Russia's only air craft carrier is trespassing through UK waters to get to Syria in a show of force that doesn't really add anything to their military capabilities there.

If they were really gearing up for war why would they move their only carrier away from the mother land. Your article even says it is more of a "show of force" than start of war.

So how did you jump to WW3?

I'm sure that carrier is being followed by multiple NATO submarines as well.

It seems incredibly unlikely that a global war would start over Syria when we've had 60 years of proxy conflict instead. Russia or NATO have absolutely nothing to gain from an open military conflict.

> - Finland (yes, Finland) is increasingly worried about Russia. They violated their air space, and they're questioning Finland's independence. Finland shares a long boarder with Russia.

Finland isn't worried, they have had stable relations for half a century as both sides agreed to not mess with each other. They have even refused to join NATO because it is actually safer for Finland and vice versa.

Cowboys with missiles stationed on Russia's border making hyperbole statements (like you do) - now that would be a real threat. (the same was also true the other way around with the Soviets stationing missiles in Cuba)

> Say Hello to World War III, everybody!

Is this sabre rattling or the prelude to a global conflict? Surely at worst it will (continue to) be a proxy war between NATO and Russia in Syria and nothing more? What motive is there for Russia or NATO to engage in open warfare? I'm not sure that a slow and prolonged lead up to an open war would even be effective in this situation.

Perhaps it should be "Say hello to Cold War v2.2017"?

Hopefully they stick to semver

Your threat can't be parsed due to depreciation of the old rhetoric API. Please upgrade your rhetoric to 2.3.x.

I like this version number better :)

I just want to sell my software, why does everyone have to fight?!

Thank you for these links. I'm trying not to get wrapped up in conspiracies but am increasingly worried by the mounting conflict. I'd love to hear a calm, reasoned response from someone more knowledgable than me on these topics.

The global powers are negotiating with themselves, and consolidating state power at home. Proxy wars for the former, and fear mongering at home for the latter.

The players are the 0.001% who control these states and the rest (we) are the captive (and propagandized) audience. They are being super kind as to at least make it entertaining for us.

Yeah I've gotten a little wrapped up as well looking into all of this mounting tension between the US and Russia. Protip, stay away from /r/the_donald.

My personal opinion is that it is mostly political and I think (hope) that what is happening in Syria won't escalate to direct conflict between the US and Russia.

I stumbled across this little blog article the other day and it helped relieve some of my anxieties.


I don't actually think this will lead to open conflict. My comment at the end was just saying that this is what a world war would probably look like now, and that this back and forth might continue for a while. I'd like to hear from an expert as well, rather than rely on piecing news items together.

I'm not worried about open conflict, I'm worried that the internet will become a battlefield and our businesses will be caught in the crossfire.

> - Russia's only air craft carrier is trespassing through UK waters to get to Syria in a show of force that doesn't really add anything to their military capabilities there.

I read they were passing in international waters. Is that not the case? It's clearly a show of force, but no need for the hyperbole if it is not true.

Foreign vessels are allowed to transit through another nations waters. This happens regularly and is not in any way noteworthy.

Thanks for the info. What was noteworthy to me is that their aircraft carrier is running on diesel and is clearly something from the 80s.

prediction: in some time they will probe Google. That will be fascinating.

I think it's safe to assume they've been probing Google for 15+ years. One documented example: https://en.wikipedia.org/wiki/Operation_Aurora

Really? It's unclear there's any connection between that vague article and Dyn's problems today.

Let's hope they update that page after today's incident.

Maybe they are also using deep learning.

Maybe some Deep Learning experiment actually created Skynet and it is probing us.

I wanted to provide an update on the PagerDuty service. At this time we have been able to restore the service by migrating to our secondary DNS provider. If you are still experiencing issues reaching any pagerduty.com addresses, please flush your DNS cache. This should restore your access to the service. We are actively monitoring our service and are working to resolve any outstanding issues. We sincerely apologize for the inconvenience and thank our customers for their support and patience. Real-time updates on all incidents can be found on our status page and on Twitter at @pagerdutyops and @pagerduty. In case of outages with our regular communications channels, we will update you via email directly.

In addition you can reach out to our customer support team at support@pagerduty.com or +1 (844) 700-3889.

Tim Armandpour, SVP of Product Development, PagerDuty

I had the privilege of being on-call during this entire fiasco today and I have to say I was really really disappointed. It's surprising how broken your entire service was when DNS went down. I couldn't acknowledge anything, and my secondary on-call was getting paged because it looked like I wasn't trying to respond. I was getting phone calls for alerts that wasn't even showing up on the web client, etc. Overall, it caused chaos and I was really disappointed.

"It's surprising how broken your entire service was when DNS went down." lol

How does the service you're responsible for work when DNS stops functioning?

Hopefully you have a nice SLA with them.

I appreciate the update, but your service has been unavailable for hours already. This is unacceptable for a service whose core value is to ensure that we know about any incidents.

Given that a large swath of SaaS services, infrastructure providers, and major sites across the internet are impacted, this seems harsh. Are you unhappy with PagerDuty's choice of DNS provider, or something else they have control over? I don't think anyone saw this particular problem coming.

A company that bills themselves as a reliable, highly available disaster handling tool ought to know better than to have a single point of failure anywhere in its infrastructure.

Specifically, they shouldn't have all of their DNS hosted with one company. That is a major design flaw for a disaster-handling tool.

I'm not using the service, but I'm curious what an acceptable threshold for this company is. Like, if half the DNS servers are attacked? If hostile actors sever fiber optic lines in the Pacific?

I ask because my secondary question, as a network noob, is was anybody prepared / preparing for a DDOS on a DNS like this? Were people talking about this before? I live in Mountain View so I've been thinking today about the steps I and my company could take in case something horrifying happens - I remember reading on reddit years ago about local internets, wifi nets, etc, and would love to start building some fail safes with this in mind.

Two pronged comment, sorry.

I'm not using the service either, but I noticed this comment [1]. It's not the first time that a DNS server has been DDoS-ed, so it has been discussed before (e.g. [2]). At minimum, I would expect a company that exists for scenarios like this to have more than one DNS server. Staying up when half of existing DNS servers are down is a new problem that no one has faced yet, but this is an old, solved one.

[1] https://news.ycombinator.com/item?id=12759653

[2] https://www.tune.com/blog/importance-dns-redundancy/

Re question #2, Amazon uses UltraDNS as a backup and seemed to be relatively unaffected by today's attack.

Re question #1, check out PagerDuty's reliability page here: https://www.pagerduty.com/features/always-on-reliability/

Namely "Uninterrupted Service at Scale - Our service is distributed across multiple data centers and hosting providers, so that if one goes down, we stay available."

It seems fair to expect them to have a backup dns too, but I am not an expert.

> is was anybody prepared / preparing for a DDOS on a DNS like this


I have, personally, been under attack with as-large or larger than todays attacks at my DNS infrastructure and survived.

This is exactly my point.

From the perspective of my service being down, my customers being pissed, and me not being notified.. yes, maybe PD should be held to a higher standard of uptime. Seems core to their value prop.

> I don't think anyone saw this particular problem coming.

Knocking half of the web off the grid because their DNS provider is under attack? It happened recently to DNSimple.


The irony is that I noticed it when dotnetrocks.com went offline, at that time dotnetrocks was sponsored by dnsimple...

Flush your DNS like the parent said.

Flushing DNS wont do shit

pagerduty.com moved to Route53, but the TTL on NS records can be very long. Flushing (restarting, ...) whatever can cache DNS records in your infra will help to quickly pick up the new nameservers.

Not on your laptop. On your local DNS resolver.


Running a redundant DNS provider is expensive as all hell.

While 'expensive' is a relative term, I disagree that it's cost-prohibitive for most firms, as I looked into this specifically (ironically considered using Dyn as our secondary). The challenge isn't coming up with the funds, it's if you happen to use 'intelligent DNS' features; these are proprietary (by nature) and thus they don't translate 1:1 between providers.

In addition to having to bridge the divide yourself, by analyzing the intelligent DNS features and using the API from each provider to simultaneously push changes to both providers, you have to write and maintain automation/tooling that ensures your records are the same (or as close as possible) between the providers. If you don't do this right, you'll get different / less predictable results between the providers, making troubleshooting something of a headache.

Thus in that case the 'cost' in man effort (and risk, given that APIs change and tooling can go wrong) in addition to the monthly fee.

If all you're doing is simple, standard DNS (no intelligent DNS features), it's not as hard, and it's just another monthly cost. Since you typically get charged by queries/month, if you run a popular service you're probably well able to afford the redundancy of a second provider.

Ah so make everything redundant. Double my costs in man hours and in monetary cost. Brilliant!

Ah so make everything redundant. Double my costs in man hours and in monetary cost. Brilliant!

The sarcasm is curious. It's a business decision. Either your revenue is high enough that the monetary loss from a several-hour intra-day outage is potentially worse than the cost of said redundancy, or you don't care enough to invest in that direction (it's expensive).

Making things redundant is exactly a core piece of what infrastructure engineering is. I guess with the world of VPSes and cloud services, that aspect is being forgotten? And yes, redundancy / uptime costs money!

When your service literally says it exists to help provide uptime, redundancy makes sense.

Your automation should be handling creating/modifying records in both providers. Also, if you're utilizing multiple providers you don't need to pay for 100% of your QPS (or whatever metric is used for billing) on every provider, only 50% for two or 33% for three. You can just pay for overages when you need to send a higher percentage of your traffic to a single provider.

A lot of providers do have 'fixed' portions of costs, so, it won't be quite 1/2 or 1/3rd.

It may, at scale, be like 100% (one provider), 55%+55% (two) and 40%+40%+40% (three). Still eminently affordable.


Route53 on AWS is $0.50/zone and $0.40/million queries. API integration is also very easy.

Using something like Route53 as a backup is significantly cheaper than suffering from the current Dyn outage.

That is not helpful if you want vanity name servers

I assume your clients would prefer working nameservers over vanity ones. Especially if you are in a critical business like PagerDuty.

Latest github NS moved to awsdns

        $ dig -tNS github.com @

        ; <<>> DiG 9.8.3-P1 <<>> -tNS github.com @
        ;; global options: +cmd
        ;; Got answer:
        ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 15616
        ;; flags: qr rd ra; QUERY: 1, ANSWER: 8, AUTHORITY: 0, ADDITIONAL: 0

        ;github.com.                    IN      NS

        ;; ANSWER SECTION:
        github.com.             899     IN      NS      ns-1283.awsdns-32.org.
        github.com.             899     IN      NS      ns-1707.awsdns-21.co.uk.
        github.com.             899     IN      NS      ns-421.awsdns-52.com.
        github.com.             899     IN      NS      ns-520.awsdns-01.net.
        github.com.             899     IN      NS      ns1.p16.dynect.net.
        github.com.             899     IN      NS      ns2.p16.dynect.net.
        github.com.             899     IN      NS      ns3.p16.dynect.net.
        github.com.             899     IN      NS      ns4.p16.dynect.net.

        ;; Query time: 32 msec
        ;; SERVER:
        ;; WHEN: Fri Oct 21 13:01:48 2016
        ;; MSG SIZE  rcvd: 248
But my local copy is still on dynect

        $ dig -tNS twitter.com

        ; <<>> DiG 9.8.3-P1 <<>> -tNS twitter.com
        ;; global options: +cmd
        ;; Got answer:
        ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 62729
        ;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 4

        ;twitter.com.                   IN      NS

        ;; ANSWER SECTION:
        twitter.com.            75575   IN      NS      ns3.p34.dynect.net.
        twitter.com.            75575   IN      NS      ns4.p34.dynect.net.
        twitter.com.            75575   IN      NS      ns1.p34.dynect.net.
        twitter.com.            75575   IN      NS      ns2.p34.dynect.net.

        ns3.p34.dynect.net.     54698   IN      A
        ns4.p34.dynect.net.     81779   IN      A
        ns1.p34.dynect.net.     8544    IN      A
        ns2.p34.dynect.net.     54775   IN      A

        ;; Query time: 0 msec
        ;; SERVER: <local>
        ;; WHEN: Fri Oct 21 13:02:14 2016
        ;; MSG SIZE  rcvd: 179

Your local copy is also twitter, instead of github :)

I believe you don't understand DNS. It's probably the most resilient service (granted it's used correctly). There's nothing inherent in the protocol that would prevent them to use multiple DNS providers.

> Running a redundant DNS provider is expensive as all hell.

What makes you think that?

Sorry if this sounds dickish, but renting 3 servers @ $75 apiece from 3 different dedicated server companies in the USA, putting TinyDNS on them, and using them as backup servers, would have solved your problems hours ago.

Even a single quad-core server with 4GB RAM running TinyDNS could serve 10K queries per second, based on extrapolation and assumed improvements since this 2001 test, which showed nearly 4K/second performance on 700Mhz PIII CPUs: https://lists.isc.org/pipermail/bind-users/2001-June/029457....

EDIT to add: and lengthening TTLs temporarily would mean that those 10K queries would quickly lessen the outage, since each query might last for 12 hours; and large ISPs like Comcast would cache the queries for all their customers, so a single successful query delivered to Comcast would have (some amount) of multiplier effect.

That's not how that sound be done. Just use a mix of two providers. Using your own servers and TinyDNS is silly for million/billion dollar companies.

See MaxCDN for example who uses a mix of dns providers (AWS Route53 and NS1):

    ns-5.awsdns-00.com.   ['']   [TTL=172800] 
    ns-926.awsdns-51.net.   ['']   [TTL=172800] 
    ns-1762.awsdns-28.co.uk.   [''] (NO GLUE)   [TTL=172800] 
    ns-1295.awsdns-33.org.   [''] (NO GLUE)   [TTL=172800] 
    dns1.p03.nsone.net.   ['']   [TTL=172800] 
    dns2.p03.nsone.net.   ['']   [TTL=172800] 
    dns3.p03.nsone.net.   ['']   [TTL=172800] 
    dns4.p03.nsone.net.   ['']   [TTL=172800]
Curious, are you the kind of person that runs their own smtp email server and complains about GitHub pricing being too expensive?

No tool is silly as long as it does the job adequately. Are paperclips silly for a billion-dollar company?

If both Dyn and R53 go down, it's exactly when you want a service like PagerDuty work without a hitch.

You're asserting that your (or their) homegrown DNS service will have better reliability than Dyn and Route53 combined. That assertion gets even worse when it's a backup because people never, ever test backups. And "ready to go" means an extremely low TTL on NS records if you need to change them (which, for a hidden backup, you will), and many resolvers ignore that when it suits them, so have fun getting back to 100% of traffic.

Spoiler: I'd bet my complete net worth against your assertion and give you incredible odds.

Golden rule: Fixing a DNS outage with actions that require DNS propagation = game over. You'd might as well hop in the car and start driving your content to people's homes.

Idea: Chaos Monkey for DNS outages

I don't know how big PagerDuty is; IIRC over 200 employees, so, a decent size.

I was giving a bare-minimum example of how this or (some other backup solution) should have already been setup and ready to be switched over.

DNS is bog-simple to serve and secure (provided you don't try to do the fancier stuff and just serve DNS records): it is basically like serving static HTML in terms of difficulty.

That a company would have a backup of all important sites/IP addresses locally available and ready to deploy on some other service, or even be built by hand via some quickly rented servers, is I think quite a reasonable thing to have. I guess it would also be simple to run on GCE and Azure as well, if you don't like the idea of dedicated servers.

Not neccesarily. Granted this is how I would configure a system (two providers), but it is just as sensical to use one major provider which falls back to company servers in the event of an attack like this. It is all in sysadmin preference, while it is smart to relegate low-level tasks to managed providers it is also smart to have a backup solution that is under full control just in case that control needs to be taken at some point in time.

That would be a quick fix similar to adding another NS provider. Of course if dyn is out completely they might not have their master zone anywhere else. Then it's similar to any service rebuilding without a backup.

+1 for using a mix of two providers. That's what we do at my startup. Never had a problem since (knock on wood).

+1 for TinyDNS.

I just wish it scaled to multiple cores :(




You can't comment like this on Hacker News. Please read the guidelines:


"Challenges" is exactly the sort of Dilbertesque euphemism that you should never say in a situation like this.

Calling it a "challenge" implies that there is some difficult, but possible, action that the customer could take to resolve the issue. Since that is not the case, this means either you don't understand what's going on, or you're subtly mocking your customers inadvertently.

Try less to make things sound nice and MBAish, and try more to just communicate honestly and directly using simple language.

Running multiple DNS providers is not actually that difficult and certainly not cost prohibitive. I am sure after this, we will see lots of companies adding multiple DNS providers and switching to AWS Route53 (which has always been solid for me).

How am i meant to see twitter status updates when twitter is down?

Please check our status page as an alternative method for updates. Unfortunately, it's also been encountering the same issue so we're sending out an email with the latest updates.

I didnae get an email

It's still a work in progress. If you have any immediate issues please contact us at support@pagerduty.com or (844) 700-3889.

PagerDuty outage is the real low point of this whole situation. Email alerts from PagerDuty that should have alerted of the outage in the first place, only got delivered hours later after the whole mess cleared out.

The outage started more than eight hours before you posted this message..

To be fair, Hacker News isn't exactly the first place where companies post status messages. I actually applaud him for posting his message here.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact