WRK It! My Experiences Load-Testing with an Interesting New Tool
/There are a number of load-testers out there. ApacheBench, aka AB, is probably the best known, though it’s pretty wildly inaccurate and not recommended these days.
I’m going to skim quickly over the tools I didn’t use, then describe some interesting quirks of wrk, good and bad.
Various Other Entrants
There are a lot of load-testing tools and I’ll mention a couple briefly, and why I didn’t choose them.
For background, “ephemeral port exhaustion” is what happens when a load tester keeps opening up new local sockets until all the ephemeral range are gone. It’s bad and it prevents long load tests. That will become relevant in a minute.
ApacheBench, as mentioned above, is all-around bad. Buggy, inexact, hard to use. I wrote a whole blog post about why to skip it, and I’m not the only one to notice. Nope.
Siege isn’t bad… But it automatically reopens sockets and has unexplained comments saying not to use keepalive. So a long and/or concurrent and/or fast load test is going to hit ephemeral port exhaustion very rapidly. Also, siege doesn’t have an easy way to dump higher-resolution request data, just the single throughput rate. Nope.
JMeter has the same problem in its default configuration, though you can ask it not to. But I’m using this from the command line and/or from Ruby. There’s a gem to make this less horrible, but the experience is still quite bad - JMeter’s not particularly command-line friendly. And it’s really not easy to script if you’re not using Java. Next.
Locust is a nice low-overhead testing tool, and it has a fair bit of charm. Unfortunately, it really wants to be driven from a web console, and to run across many nodes and/or processes, and to do a slow speedup on start. For my command-line-driven use case where I want a nice linear number of load-test connections, it just wasn’t the right fit.
This isn’t anything like all the available load-testing tools. But those are the ones I looked into pretty seriously… before I chose wrk instead.
Good and Bad Points of Wrk
Nearly every tool has something good going for it. Every tool has problems. What are wrk’s?
First, the annoying bits:
1) wrk isn’t pre-packaged by nearly anybody - no common Linux or Mac packages, even. So wherever you want to use it, you’ll need to build it. The dependencies are simple, but you have to.
2) like most load-testers, wrk doesn’t make it terribly easy to get the raw data out of it. In wrk’s case, that means writing a lua dumper script that runs in quadratic time. Not the end of the world, but… why do people assume you don’t want raw data from your load test tool? Wrk isn’t alone in this - it’s shockingly difficult to get the same data at full precision out of ApacheBench, for instance.
3) I’m really not sure how to pronounce it. Just as “work?” But how do I make it clear? I sometimes write wg/wrk, which isn’t better.
And now the pluses:
1) low-overhead. Wrk and Locust consistently showed very low overhead when running. In wrk’s case it’s due to its… charmingly quirky concurrency model, which I’ll discuss below. Nonetheless, wrk is both fast and consistent once you have it doing the right thing.
2) reasonably configurable. The lua scripting isn’t my 100% favorite in every way, but it’s a nice solid choice and it works. You can get wrk to do most things you want without too much trouble.
3) simple source code. Okay, I’m an old C guy so maybe I’m biased. But work has short, punchy code that does the simple thing in a mostly obvious way. The two exceptions are two packaged-in dependencies - an http header parser which is fast but verbose, and an event-model library torn out of a Tcl implementation. But if you’re curious how wrk opens a socket, reads data or similar, you can skip the ApacheBench-style reading of a giant library of nonstandard network operations in favor of short, simple and Unixy calls to the normal stuff. As C programs go, wrk is an absolute joy to read.
And Then, the Weird Bits
A load-tester normally has some simple settings. It can let you specify how many requests to run for. Or how many seconds (like wrk does.) Or both, which is nice. It can take a URL, and often options like keepalive (wrk’s keepalive specifically could use some work.)
And, of course, concurrency. ApacheBench’s simple “concurrency” option is just how many connections to use. Another tool might call this “threads” or “workers.”
Wrk, on the other hand has connections and threads and doesn’t really explain what it does with them. After significant inspection of the source, I now know - and I’ll explain it to you.
Remember that event library thing that wrk builds in as a dependency? If you read the code, it’s a little reactor that keeps track of a bunch of connections, including things like timeouts and reconnections.
Each thread you give wrk gets its own reactor. The connections are divided up between them, and if the number of threads doesn’t exactly divide the number of connections (example: 3 threads, 14 connections) then the spare connections are just left unused.
All of those connections can be “in flight” at once - you can potentially have every connection open to your specified URL, even with only a single thread. That’s because a reactor can handle as many connections as it has processor power available, not only one at once.
So wrk’s connections are roughly equivalent to ApacheBench’s concurrency, but its threads are a measure of how many OS threads you want processing the result. For a “normal” evented library, something like Node.js or EventMachine, the answer tends to be “just one, thanks.”
This caused the JRuby team and me (independently) a noticeable bit of headache, so I thought I’d mention it to you.
So, Just Use Wrk?
I lean toward saying “yes.” That’s the recommendation from Phusion, the folks who make Passenger. And I suspect it’s not a coincidence that the JRuby team and I independently chose wrk at the same time - most load testing tools aren’t good, and ephemeral port exhaustion is a frequent problem. Wrk is pretty good, and most just aren’t.
On the other hand, the JRuby team and I also found serious performance problems with Puma and Keepalive as a result of using a tool that barely supports turning it off at all. We also had some significant misunderstandings of what “threads” versus “connections” meant, though you won’t have that problem. And for Rails Ruby Bench I did what most people do and built my own, and it’s basically never given me any trouble.
So instead I’ll say: if you’re going to use an off-the-shelf load tester at all, Wrk is a solid choice, though JMeter and Locust are worth considering if they match your use case. A good off-the-shelf tester can have much lower overhead than a tester you built in Ruby, and be more powerful and flexible than a home-rolled one in C.
But if you just build your own, you’re still in very good company.