Hacker News new | past | comments | ask | show | jobs | submit login

Heap CTO here – would love to answer any questions you have.

This was my first exposure to btrace, which a super useful swiss army knife for JVM debugging. That made this a worthwhile adventure for sure.

Do you think Scala already surpassed C++'s ability to obfuscate code, or does it need more improvements to get there?

Great article. I had been hunting down a similar issue in Java 8 with maven dependencies. I was getting the same error, but confirmed that the dependent jars were correctly included on the classpath. I eventually gave up and decided to take a different programming approach that did not include these missing classes, but I think I will revisit it to see if a class loader is getting closed somehow.

Ooh, check it out and let me know what you find! It would make me really happy if this post helped someone debug something when they had previously hit a dead end.


I'm curious how do you guys operate the flink cluster, do you have a single huge shared flink cluster where people can submit any kind of jobs for various applications/streams. Or do you have multiple smaller flink clusters for specific use cases? Are they on Mesos/YARN/k8s, or just plain vms/baremetals?

Which flink version are you guys on? I'm very excited about Flink 1.6, especially related with FLIP-6 [https://cwiki.apache.org/confluence/pages/viewpage.action?pa...]

Just wondering if you guys have any thoughts on that.

Thank you.

We're running a single flink cluster. Engineers can run whatever jobs they need, but we aren't writing new flink jobs that often so the load is pretty predictable. We have a single digit number of jobs at the moment.

We are on 1.3.2 at the moment, running on EC2 vms.

How much time did your team spend on the debugging effort?

Iirc Ivan (post author) spent a few days tracking this down. There were some other debugging dead ends that we omitted in this writeup. One red herring was that the issue appeared to happen during the US morning, so there was some time-of-day component, and we thought it might be a system load issue.

The fix turned out to be fairly involved too – on the order of a week I think.

Ivan works from Bulgaria so sadly he is asleep right now.

Isn't that a bit odd, your stack trace there in the article?

Usually the stack trace when a NoClassDefFoundError is thrown contains a "Cause" clearly showing the name of the class loader that was supposed to know about the class in question, and which then rather obviously failed to load it. If it actually isn't in the real trace, the exception/error logging is probably a bit wonky.

I have seen logging procedures that don't traverse/print the entire traceback chain, but only prints the message and the immediate stack. But this is unfortunately a rather terrible idea. Quite often it will exclude the actual cause from the printed trace while simulataneously retaining the error message, not rarely leading to liberal amounts of confusion. The pattern with exceptions being thrown with a cause is not uncommon in the JDK, so making sure to log causes are important.

In general, although it's probably obvious, I would like to mention that having code loaded by class loaders with different lifetimes interact is rife with "interesting" issues. Wherever possible I would recommend serializing messages over any boundaries where class loaders have different lifetimes, as it both prevents all of the strangest causes of errors, and can also lead to a cleaner design. An exception would be if prohibitively expensive from a performance perspective, of course.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact