Hacker News new | comments | ask | show | jobs | submit login
Java.sun.com is down again - breaking bad apps across the land
97 points by zorlem on Apr 29, 2012 | hide | past | web | favorite | 36 comments
This is a notice to fellow HN readers.

This morning around 8:00 UTC Nagios alerted me, that the thread count on a number of Apache Tomcat servers I support for a client started to rise dramatically. I've discovered that a library that is part of the application was trying to fetch DTDs like http://java.sun.com/dtd/properties.dtd from java.sun.com. The servers are unreachable so each request thread was taking 30+ seconds to time out. I've worked around the problem by putting an iptables rule for 192.9.162.55 port 80, that would reject the request immediately and informed to client to look for a permanent solution.

In case the library can't be fixed (I don't know who developed it), I'm planning on putting the relevant DTD files on a local HTTP server and redirect all requests for java.sun.com to that virtual host (eg. through relevant entries in the hosts file).

Check your application servers and the software you use if it processes XML documents. One way to check:

   $ netstat -ant | grep -E '192.9.162.55:80[[:space:]]+SYN_SENT' | wc -l
   184
   $
SYN_SENT is because the 192.9.162.55 (java.sun.com) is not responding ATM. If it becomes reachable again, just s/[[:space::]]+SYN_SENT// .

Relevant URLs for more information:

http://www.w3.org/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic/

http://www.oasis-open.org/committees/entity/spec-2001-08-06.html

And a previous discussion here about the same problem:

http://news.ycombinator.com/item?id=3094075




More generally, it ought to be easier to constrain apps running on the JVM to declared sandboxes. I once looked at the Java security model and found it to be totally inadequate for such purposes as it seemed to have been designed for ensuring that desktops could not be compromised by rogue applets running in browsers. Specifically, I was surprised by the coarse-grainedness of the security settings. Want to limit access to the network by disallowing use of the Socket class? Done. Want to only allow access to a whitelist of hosts? No dice. No filesystem access at all? Easy. Limit the app to only reading and/or writing to certain directories? Not a chance.

I want to define whitelists for each environment in which my app will run -- development, QA, production, etc. To which hosts may it connect? Where may it access files? What else might I wish to constrain as way of avoiding inadvertent dependencies? Particular queues/topics on messaging buses? Database schemas within a particular server (network restrictions are too coarse for this)? When asking this question, I'm not trying to protect myself from rogue developers with malevolent intentions -- I just want to avoid a scenario like the one described by the OP.

Recently, I started-up the Java app upon which I am currently working and watched its network behavior via Microsoft's Wireshark-esque network monitoring tool. It turns out that EHCache now asks one of Terracotta's servers for the most recent EHCache version number so that it can spit an out-of-date warning in the logs. Benign and useful, but I still had to spend a few minutes in the EHCache source to make sure that, if Terracotta's servers were down, our app would still start-up.

Should one do this at the OS level (jails, perhaps)? I'm not limiting this idea to just Java apps, but I'm really only an expert in the Java space.

I also argue that the whitelist would help codify inter-app dependencies in large IT environments. A few years ago, the large IT shop for which I worked did a disaster recovery drill where they literally deployed 10's of apps in an IBM-provided datacenter as a dry run. One thing they learned was that a particular production app was erroneously configured to log certain audit events to a server in a QA environment (which was not part of the disaster recovery plan for obvious reasons). Whitelists would have prevented this issue.


On several occasions I've used the Tomcat Security Manager and it have not given me any problems. I find it fine-grained enough for my purposes, although not as fine-grained as you wish. I've used it to limit JVM's access to a specific set of hosts and TCP ports, restrict (RW and RO) access to files with and without wildcards, restrict access to specific methods, properties, classes and what-not. One can't use it to restrict the access to specific databases, or topics in the message queues as you suggest, but I don't think it's necessary. I think it's out of scope for the VM to restrict access to a specific DB and this should be the responsibility of the specific DBMS. Otherwise I'd imagine the overhead would be quite serious. I haven't heard of any VM manager that would provide such a thorough and deep access control, do you know of any?

If you still need this functionality in a Java security manager I believe you could build it using the existing hooks, they look quite powerful and flexible.

Now, the real pain I've had with the Security Manager in Tomcat 5.5 was writing the rules for a pre-canned application, not written with SM in mind. It was quite a tedious process, but all MAC systems are tedious to set-up initially. That's life.


I hadn't known about Tomcat's security manager stuff. Interesting, and proof that Java's security manager stuff can be extended for practical purposes.

I had thought about using AspectJ to wrap interesting points in various APIs and then do "stuff". The obvious behavior is to restrict usage based on whitelists. However, it might also be interesting to run one's app in an access logging mode, especially when trying to wrap some controls around a previously unrestricted production application.


Limit the app to only reading and/or writing to certain directories? Not a chance.

Create a non-privleged user. Restrict the account r/w to certain directories. Run the app as that user.

Want to only allow access to a whitelist of hosts? No dice.

I have not done this, but I think you can do that with iptables.


From your description, it sounds like you want SELinux.


So, I can understand how the OS could limit access to the network or filesystem, but it can't know that I'm accessing db schema ABC. However, network and filesystem restrictions are probably good enough for most people.


Actually several userspace tools have had SELinux extensions added (with not very much adoption, it has to be said). Here's an article about PostgreSQL + SELinux:

https://lwn.net/Articles/365224/


Coun't you do this by writting your own security manager?


As a best practice applications should reference dtds from local filesystem. Most sane data centers would have outbound (App->Internet) access locked down - only needed hosts/ports are allowed after the application developer specifically requests for it.


Sadly, if you use Python's batteries-included XML tools, this is virtually impossible to do. See http://bugs.python.org/issue2124 for some discussion.


Those tools suck.

lxml is better.


At the least, the program could use a singleton to fetch and cache the DTDs. To just pull it over the internet every time you need it is, ignoring the practical problems, just flat out wasteful.


Does the DTD have the right headers set to allow clients to cache it?


I'm not sure the situation with java.sun.com, but those provided by w3c do have a 90 days expiration (according to one of the links I've posted).

In all cases, since the DTDs are more or less versioned through their filenames, with quite a minimal rate of changes, caching them (even if not outright saving them forever) should be the default action.


Around 2005, I was semi-forced to use Xalan/Xerces (the Apache reference implementation of SAX, DOM, XPath, XSLT, etc.) for a project. These libraries were included in the JDK [edited from orig post]

To make sure that these libraries did not attempt to talk servers outside my company's control, I had to dig through the code and implement "neutered" forms of schema look-up interfaces, etc. I can't recall exact details. The default behavior was promiscuity and presumption, and making sure that these libraries didn't strike-up conversations with random servers was not trivial or terribly well documented. So, I'm not surprised by the current state of affairs.


you can pass -c to grep to get a count, you don't need to pipe it to `wc`


thanks :)

the "| wc -l" was tacked in for the submission :)


virtualbox.org is down as well.


For virtualbox.org it's planned maintenance from April 27th to April 30th. The announcement was on the main page.


Hang on... WTH???

The most popular virtualisation software out there that's full-featured and free to use... is shutting down their website for three days for planned maintenance?

This is something that would have happened in 1993. Maybe. Between this and java.sun.com being offline it's pretty much the biggest red flag to stay away from Oracle as far as possible I could imagine.


And the coursera compilers class is distributing its dev environment as a virtualbox image. I'm lucky I have a copy already.



Worst case scenario, you'd get it from your package manager.

There's likely a few downloads on torrent sites for the download as well.

Since it's FLOSS it should be legal to grab it from torrents anyway.


Could be any number of things, operationally, and could also have a buffer built-in to the maintenance window to avoid unexpected issues.

Take a breath, you have more important things to worry about.


Planned downtime is handled like this? It isn't hard to put up a temporary page. Just taking it down, even with notice is really poor form.


At times like this I think of Lily Tomlin's Ernestine character: "We don't care. We don't have to. We're Oracle!"

(I've been dealing with Oracle for a few years. It started with just database stuff, but they kept buying applications I supported, now they own Solaris ... anyway.)


I got a job where we don't deal with Oracle at all, life is so much better! I'd recommend it to anyone. Eat your veggies, exercise regularly, and work in an Oracle-free workplace .... this is the secret to happiness!


work in an Oracle-free workplace .... this is the secret to happiness!

In Big Enterprise the alternative [1] to Oracle is Microsoft.

You're darned if you do and darned if you don't.

[1] Don't even mention open source. Not going to fly at BE, in my experience.


When they bought Sun, the quality of Sun service dropped to the point where I can't imagine why anyone still buys their kit. It was remarkable.


Sun support was remarkable. About five years ago I had a critical problem ...

There I was in the data center at 3 a.m., trying to figure out why my mirrored drive server wasn't booting on it's surviving disk.

I was groggy as heck, and even basic vi commands required a lot of thought. Actual thinking took more effort. The Support Engineer walked me through even the basic stuff

"Okay, now 'yank-yank put' to copy that line"

And a few minutes later the server booted and all was well.

We're moving as quickly as possible away from Solaris, to Linux. But service quality isn't the driver - it never is.

The problem is cost.


That's what happens when Sun takes over.


especially for 3 days..


They're probably busy switching everything over to an Oracle stack...


I like how Oracle educates developers about the proper handling of DTD's. (They didn't break it by accident for the third time already, right? RIGHT?!)


Of course not. Oracle would never do anything bad to a developer community.

To be honest though, if this have already happened TWICE do people really have any excuse for using a server that goes down a lot for something important?

Why would you NOT just download the stuff on some separate server and at most run some cronjob to keep it up to date?

Or am I just being stupid?


See the link to the w3c - several times they've started delivering 503 HTTP error codes, hoping that the applications would start to break. It didn't have a big effect, either because the application didn't actually use the DTD they've retrieved for anything or because they broke in a non-obvious manner (like with the servers I'm administering). Had the outage been shorter or if I wasn't monitoring the Tomcat JVMs it could've stayed under the radar. That's one of the reasons I've made this submission.




Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: