“The workers are on fire again.”—Us, every day, before cmdstalk.

beanstalkd, PHP workers, fires

99designs pushes four million background jobs through beanstalkd each day. beanstalkd is a fantastic job queue which we’ve used for more than five years, via the pheanstalk client which I wrote in 2008.

Each beanstalkd job has a TTR; a timer which counts down during job processing. If TTR seconds elapse before the worker finishes the job, beanstalkd assumes the worker is dead and releases the job. Another one of our worker takes the job, despite the original worker still churning away. Each iteration of this results in greater load and less chance of this or any other job finishing. Eventually all the worker processes are stuck, and everything literally catches fire.

The workers are on fire again.
- Us, every day, before cmdstalk.

That’s what happened when we began pushing ImageMagick and GhostScript jobs to rasterize graphics. Some pathological EPS files took longer than the 600 second TTR, causing worker resource starvation.

Increasing the TTR would mitigate the issue, but these EPS files seem subject to the halting problem. That leaves workers vulnerable to slow job saturation.

Interrupting the image operation when the job hits its TTR would be a better solution. But workers need concurrency to watch the TTR during the job. PHP doesn’t do threads, except via an extension that I’m disinclined to use. Using fork() would introduce IPC / signal handling complexity, and prevent processes sharing the beanstalkd connection. PHP feels like the wrong language to attack the problem.

cmdstalk

code icon set
Design by damuhra

Lachlan and I decided we could kill N birds with a single stone. One: solve the queue fires. Two: move another piece of our production infrastructure to Go. Three: provide a beanstalkd layer which our PHP, Ruby and Go apps could all use.

cmdstalk set out to harness the beanstalkd semantics we like on one end, and talk standard unix processes on the other. This allows us to write workers in any language. Here’s the basic model:

  • Connect to a beanstalkd server, watch one or more tubes.
  • Pipe each job payload to a command specified by cmdstalk --cmd=… argument.
  • If the subprocess exits 0, delete the job; done.
  • If the subprocess exits non-zero, release the job for retry (with backoff).
  • If TTR elapses, kill the subprocess and bury the bad job.

Anything that can read stdin and exit(int) can be a cmdstalk worker — no need for beanstalkd knowledge.

Check out the source on GitHub, the docs at godoc.org, or just the usage summary:

$ cmdstalk -help
</code><code>Usage of cmdstalk:
-address="127.0.0.1:11300": beanstalkd TCP address.
-all=false: Listen to all tubes, instead of -tubes=...
-cmd="": Command to run in worker.
-per-tube=1: Number of workers per tube.
-tubes=[default]: Comma separated list of tubes.

Our app runs cmdstalk under supervisord like this:

cmdstalk -all -per-tube=6 -cmd="/path/to/swiftly/console worker:stdin"

Go

Go has become the go-to language at 99designs for infrastructure components. My only previous Go experience comes from writing go6502, an 8-bit computer emulator. Fascinating, but different to writing concurrent network applications. Despite that, building cmdstalk with Go was a pleasure.

computer technology illustrations
Design by LittleFox

Starting from the cmdstalk entrypoint you’ll see broker and cli packages loaded. cli/options.go demonstrates Go’s flag library for argument parsing. broker_dispatcher.go coordinates broker concurrency across tubes, and broker.go is where the action happens. Broker.Run() is a clear candidate for refactoring, but when workers are burning, software’s better shipped than perfect.

Commit ade6f6b0 introduces a simple -all flag to watch all tubes at start-up. 431ac5fc evolves it to poll for new tubes as they’re created. The latter illustrates how well timers and concurrency come together in Go. Together they show that it’s simple to add functionality that would be complex in other languages.

Tests live alongside the code they’re testing, such as broker_test.go alongside broker.go. They’re regular Go code using if to make assertions, but richer assertion libraries do exist.

Conclusion

cmdstalk applies a unix-process abstraction layer to beanstalkd job processing. Like any abstraction it needs to make itself worthwhile.

If you’re running Rails, you might want to look at Sidekiq or Resque, or maybe even delayed_job. If you’re 100% python, you could wire together some solid libraries for job processing.

But if you need to process background jobs using several languages, some of them poorly suited to long-running daemons and concurrency, cmdstalk may be for you. Give it a try; feedback and pull requests are welcome.