Saturday, May 14, 2011

Lightweight Job Queuing Engine and GitHub

I wrote a job queuing daemon. It sits on a head node and any number of compute nodes. Good with EC2. Replacement for condor, OGE (formerly SGE), torque, etc. So far, it's good about crash recovery, very good about latency. Other job queuing systems seem to be sooo slow with latency, you never want to use them. And they all spend so much time worrying about copying i/o files all over the place, when all I ever use is NFS anyway.

Written because those other systems are administratively difficult, and have more features than I need. Some day I may choose to go back to condor, or SGE, but for now, it's nice to have a smaller, faster, easier one.

I'm trying github instead of google code. The developers and gangla got me to try it out. I think I might like it. Not sure why it makes you "add" and "commit" separately.

I based the socket system off of ppcgid, since it worked so well in the past.

