Hacker News new | past | comments | ask | show | jobs | submit login

You should be very careful with this. Even a relatively low TTL of 300 (5 minutes) can be too long to wait during a major outage. Make sure changing CNAMEs is a very rare thing, and be prepared to tolerate inconsistency.

It's also worth noting that some widely-used pieces of software cache DNS lookups in-process (MongoDB), so changing a DNS name is no guarantee that all connected processes will automatically fail-over to the new machine unless restarted.

Moral of the story: distributed systems is hard

I have run into this at least twice this week alone with both Nginx and HAProxy pointing at Amazon Elastic Load Balancers. Amazon occasionally rotates out IPs for ELBs, so everything's working fine for weeks and then boom, now we're having a bad day.

As a rule, NEVER EVER use CNAME's within AWS on Route53. You should always use their A Alias function. This provides near instant changes when you need to adjust a record. This only works on aws resources, but it is a great feature they built in.

I totally agree. In these two specific cases, R53 + ELB wasn't going to work for us though, because of reasons. (I don't think I can go into specifics, but they're actual reasons.) Our workaround was reloading the configs (which triggers a DNS cache flush on both Nginx[1] and HAproxy[2]) once every couple of minutes.

I know, I know, I hate it too. We're working on it. Just wanted to share the workaround.

[1] http://wiki.nginx.org/CommandLine#Loading_a_New_Configuratio...

[2] http://www.mgoff.in/2010/04/18/haproxy-reloading-your-config...

The JVM is another one of those things - DNS lookups are, by default, cached for the life of the VM instance.

I remember when Java added this feature. It was because lookups in the JVM were atrociously slow, requiring a native layer to get out of the Java sandbox.

Speculating here, I suspect that an important Sun or IBM customer had an app that did lots of DNS lookups and the performance stunk. So, an engineer did a quick 'fix' to cache DNS lookups. Customer was happy, everyone moved on. Some time later this quick fix got ported into the mainline code base. But, it appears that nobody did a proper analysis of this quick fix, ie. respect TTL on DNS. Maybe supporting TTL wasn't important because this was back in the early days of Java when it was trying to win the desktop war and desktop apps weren't really expected to be long lived processes.

God, how awful. That's the point of TTL!

Beyond that, my system has a perfectly functional resolver (and a daemon to manage name service lookups, using policies that our organization has chosen and assumes are being used).

I understand Java doing 'its own thing', because the goal is to provide consistent behaviour on all platforms, but it shouldn't be stupid behaviour.

Personally, I don't think you should use DNS at all for intra-network service communication. I think the naming scheme described in the article is useful for humans, but for services trying to connect to other services, I would stick with hardcoding or Zookeeper.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact