Hacker News new | comments | ask | show | jobs | submit login

> somebody should be building a modern HTCondor

A bug-free, cross platform, rock-solid HTCondor would be great. Sadly the existing HTCondor is a buggy turd. I've fixed some stuff in it and a lot of the code is eye-roll worthy. I can practically guarantee you hours of wondering why something that by any reasonable expectation should be working isn't working, and not giving any meaningful error message.

But like you said, a really solid HTCondor rewrite with the Stork stuff baked in would pretty much be all anyone should need, I think. It's not like you'd even really need to dramatically re-architect the thing. The problems with HTCondor really are mostly just code quality and the user experience.

I use HTCondor every day. I find it to be rock solid with top notch documentation. It is cross-platform today as well, so I'm not sure where you're coming from on this.

Interesting -- does HTCondor solve the same problem as Hadoop? I know it is for running batch jobs across a cluster.

How big are the data sets? How does it compare with the MapReduce paradigm?

I see there is this Dagman project which sounds similar to some of the newer Big Data frameworks that use a DAG model. I will make an uneducated guess that it deals with lots of computation on smaller sized data (data that fits on a machine). Maybe you have 1 GB or 10 GB of data that you want to run through many (dozens?) of transformations. So you need many computers to do the computation, but not necessarily to store the data.

I would guess another major difference is that there is no shuffle (distributed sort)? That is really what distinguishes MapReduce from simple parallel batch processing -- i.e. the thing that connects the Map and Reduce steps (and one of the harder parts to engineer).

In MapReduce you often have data much bigger than a single machine, e.g. ~100 TB is common, but you are doing relatively little computation on it. Also MapReduce tolerates many node failures and thus scales well (5K+ machines for a single job), and tries to reign in stragglers to reduce completion time.

I am coming from the MapReduce paradigm... I know people were doing "scientific computing" when I was in college but I never got involved in it :-( I'm curious how the workloads differ.

Do you administer the clusters, or has somebody set everything up for you? Do you use it entirely on linux, or have you made the mistake of trying to get it to work on Windows? Which universe do you use: java, vanilla, or standard? I can imagine a perception that it's rock solid in a particular configuration and usage pattern, especially one where somebody else went through the pain of setting it up.

I admin the cluster, so you're gonna need to try again.


Why are you being so mean? You look like a huge asshole in this thread.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact