"Wait, Why is System Returning the Wrong Answer?" - A Debugging Story, and a Deep Dive into Kernel#system
/I had a fun bug the other day - it involved a merry chase, many fine wrong answers, a disagreement across platforms… And I thought it was a Ruby bug, but it wasn’t. Instead it’s one of those not-a-bugs you just have to keep in mind as you develop.
And since it’s a non-bug that’s hard to find and hard to catch, perhaps you’d like to hear about it?
So… What Happened?
Old-timers may instantly recognize this problem, but I didn’t. This is one of several ways it can manifest.
I had written some benchmarking code on my Mac, I was running it on Linux, and a particular part of it was misbehaving. Specifically, I was using curl to see if the URL was available - if a server was running and accepting connections yet. Curl will return true if the connection succeeds and gets output, and return false if it can’t connect or gets an error. I also wanted to redirect all output, because I didn’t want a console message. Seems easy enough, right? It worked fine on my Mac.
def url_available?
system("curl #{@url} &>/dev/null") # This doesn't work on Linux
end
The “&>/dev/null” part redirects both STDOUT and STDERR to /dev/null so you don’t see it on the console.
If you try it out yourself on a Mac it works pretty well. And if you try it on Linux, you’ll find that whether the URL is available or not it returns true (no error), so it’s completely useless.
However, if you remove the output redirect it works great on both platforms. You just get error output to console if it fails.
Wait, What?
I wondered if I had found an error in system() for awhile. Like, I added a bunch of print statements into the Ruby source to try and figure out what was going on. It doesn’t help that I tried several variations of the code and checked $? to see if the process had returned error and… basically confused myself a fair bit. I was nearly convinced that system() was returning true but $?.success? was returning false, which would have been basically impossible and would have meant a bug in Ruby.
Yeah, I ran down a pretty deep rabbit hole on this one.
In fact, the two commands wind up passing the same command line on Linux and MacOS. And if you run the command it passes in bash, you’ll get the same return value in bash - you can check by printing out $?, a lot like in Ruby.
A Quick Dive into Kernel#System
Let’s talk about what Kernel#system does, so I can explain what I did wrong.
If you include any special characters in your command (like the output redirection), Ruby will run your command in a subshell. In fact, system will do a few different things. In fact, system will do many different things.
If your command is just a string with no special characters, it will run it fairly directly: “ls” will simply run “ls”, and “ls bob” will run “ls” with the single argument “bob”. No great surprise.
If your command does have special characters, though, such as ampersand, dollar sign or greater-than, it assumes you’re doing some kind of shell trickery - it runs "/bin/sh” and passes whatever you gave it as an argument ("/bin/sh” with the arguments “-c” and whatever you gave to Kernel#system.)
You can also pass an array for more control - [“ls”, “bob”], for instance, will do the same thing as passing “ls bob” into Kernel#system, but with perhaps a bit more control - you can make sure it’s not running a subshell and you can automatically quote things without adding a bunch of double-quotes.
# Examples
system("ls") # runs "ls"
system("ls bob") # runs "ls" w/ arg "bob"
system(["ls", "bob"]) # runs "ls" w/ arg "bob"
system("ls bob 2>/dev/null") # runs sh -c "ls bob 2>/dev/null"
No Really, What Went Wrong?
My code up above uses special characters. So it uses /bin/sh. I tried it on the Mac, it worked fine. Here’s the important difference that I missed:
On a Mac, /bin/sh is the same as bash. On Linux it isn’t.
Linux includes a much simpler shell it installs as /bin/sh, without a lot of bash-specific features. One of those bash-specific features is the ampersand-greater-than syntax that I used to redirect stdout and stderr at the same time. There’s a way to do it that’s compatible with both, but that version isn’t. And in this specific case, it always winds up returning true for /bin/sh, even if the command fails.
Oops.
So in some sense, I used a bash-specific command and I should fix that. I’ll show how to fix it that way below.
Or in a different sense, I used a big general-purpose hammer (a shell) for something I could have done simply and specifically in Ruby. I’ll fix it that way too, farther down.
How Should I Fix This?
Here’s a way to fix the shell incompatibility, simply and directly:
def url_available?
system("curl #{@url} 1>/dev/null 2>&1") # This works on Mac and Linux
end
This will redirect stdout to /dev/null, then redirect stderr to stdout. It works fine, and it’s a syntax that’s compatible with both bash and Linux’s default /bin/sh.
This way is fine. It does what you want. It’s enough. Indeed, as I write this it’s the approach I used to fix it in RSB.
There’s also a cleaner way, though it takes slightly more Ruby code. Let’s talk about Kernel#system a bit more and we can see how. It’s a more complex method, but you get more control over what gets called and how.
System’s Gory Glory
In addition to the command argument above, the one that can be an array or a processed string, there are extra “magic” arguments ahead and behind. There’s also another trick in the first argument - Kernel#system is like one of those “concept” furniture videos where everything unfolds into something else.
You saw above that command can be (documented here):
A string with special characters, which will expand into /bin/sh -c “your command”
A string with no special characters, which will directly run the command with no wrapping shell
An array of strings, which will run array[0] as the command and pass the rest as args
An array of strings except array[0] is a two-element array of strings - that will do the same as an array of strings, except the first entry is [ newArgv0Value, commandName ]. If this sounds confusing, you should avoid it.
But you can also pass an optional hash before the command. If you do, that hash will be:
A hash of new environment variable values; normally these will be added to the parent process’s environment to get the new child environment. But see “options” below.
And you can also pass an optional hash after the command. If you do, that hash may have different keys to do different things (documented here), including:
:unsetenv_others - if true, unset every environment variable you didn’t pass into the first optional hash
:close_others - if true, close every file descriptor except stdout, stdin or stderr that isn’t redirected
:chdir - a new current directory to start the process in
:in, :out, :err, strings, integers, Io objects or arrays - redirect file descriptors, according to a complicated scheme
I won’t go through all the options because there are a lot of them, mostly symbols like the first three above.
But that last one looks promising. How would we do the redirect we want to /dev/null to throw away that output?
In this case, we want to redirect stderr and stdout both to /dev/null. Here’s one way to do that:
def url_available?
system(["curl", @url], 1 => [:child, 2], 2 => "/dev/null") # This works too
end
That means to redirect the child’s stdout (file descriptor 1) to its own stderr, and direct its stderr to (the file, which will be opened) /dev/null. Which is exactly what we want to do, but also a slightly awkward syntax for it. However, it guarantees that we won’t run an extra shell, and we won’t have to turn the arguments into a string and re-parse them, and we won’t have to worry about escaping the strings for a shell.
Once more, to see documentation for all the bits and bobs that system (and related calls like Kernel#spawn) can accept, here it is.
Here are more examples of system’s “fold-out” syntax with various pieces added:
# Examples
system({'RAILS\_ENV' => 'profile'}, "rails server") # Set an env var first
system(["rails", "server"], pgroup: true) # Run server in a new process group
system("ls *", 2 => [:child, 1]) # runs sh -c "ls *" with stderr and stdout merged
system("ls *", 2 => :close) # runs sh -c "ls *" with stderr closed
Conclusion
Okay, so what’s the takeaway? Several come to mind:
/bin/sh is different on Mac (where it’s bash) and Linux (where it’s simpler and smaller)
It’s easy to use incompatible shell commands, and hard to test cross-platform
Ruby has a lot of shell-like functionality built into Kernel#system and similar calls - use it
By doing a bit of the shell’s work yourself (command parsing, redirects) you can save confusion and incompatibility
And that’s all I have for today.