This is a notice to fellow HN readers.
This morning around 8:00 UTC Nagios alerted me, that the thread count on a number of Apache Tomcat servers I support for a client started to rise dramatically. I've discovered that a library that is part of the application was trying to fetch DTDs like http://java.sun.com/dtd/properties.dtd from java.sun.com. The servers are unreachable so each request thread was taking 30+ seconds to time out. I've worked around the problem by putting an iptables rule for 192.9.162.55 port 80, that would reject the request immediately and informed to client to look for a permanent solution.
In case the library can't be fixed (I don't know who developed it), I'm planning on putting the relevant DTD files on a local HTTP server and redirect all requests for java.sun.com to that virtual host (eg. through relevant entries in the hosts file).
Check your application servers and the software you use if it processes XML documents. One way to check:
$ netstat -ant | grep -E '192.9.162.55:80[[:space:]]+SYN_SENT' | wc -l
184
$
SYN_SENT is because the 192.9.162.55 (java.sun.com) is not responding ATM. If it becomes reachable again, just s/[[:space::]]+SYN_SENT// .
Relevant URLs for more information:
http://www.w3.org/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic/
http://www.oasis-open.org/committees/entity/spec-2001-08-06.html
And a previous discussion here about the same problem:
http://news.ycombinator.com/item?id=3094075
I want to define whitelists for each environment in which my app will run -- development, QA, production, etc. To which hosts may it connect? Where may it access files? What else might I wish to constrain as way of avoiding inadvertent dependencies? Particular queues/topics on messaging buses? Database schemas within a particular server (network restrictions are too coarse for this)? When asking this question, I'm not trying to protect myself from rogue developers with malevolent intentions -- I just want to avoid a scenario like the one described by the OP.
Recently, I started-up the Java app upon which I am currently working and watched its network behavior via Microsoft's Wireshark-esque network monitoring tool. It turns out that EHCache now asks one of Terracotta's servers for the most recent EHCache version number so that it can spit an out-of-date warning in the logs. Benign and useful, but I still had to spend a few minutes in the EHCache source to make sure that, if Terracotta's servers were down, our app would still start-up.
Should one do this at the OS level (jails, perhaps)? I'm not limiting this idea to just Java apps, but I'm really only an expert in the Java space.
I also argue that the whitelist would help codify inter-app dependencies in large IT environments. A few years ago, the large IT shop for which I worked did a disaster recovery drill where they literally deployed 10's of apps in an IBM-provided datacenter as a dry run. One thing they learned was that a particular production app was erroneously configured to log certain audit events to a server in a QA environment (which was not part of the disaster recovery plan for obvious reasons). Whitelists would have prevented this issue.