Symbol#to_s Returned a Frozen String in Ruby 2.7 previews - and Now It Doesn’t
/How a Broken Interface Getting Fixed Showed Us That It's Broken
One of the things I love about Ruby is the way its language design gets attention from many directions and many points of view. A change in the Ruby language will often come from the JRuby side, such as this one proposed by Charles Nutter. Benoit Daloze (a.k.a. eregon), the now-lead of TruffleRuby is another major commenter. And of course, you’ll see CRuby-side folks including Matz, who is still Ruby’s primary language designer.
That bug has some interesting implications… So let’s talk about them a bit, and how an interface not being perfectly thought out at the beginning often means that fixing it later can have difficulties. I’m not trying to pick on the .to_s method, which is a fairly good interface in most ways. But all of Ruby started small and has had to deal with more and more users as the language matures. Every interface has this problem at some point, as its uses change and its user base grows. This is just one of many, many good examples.
So… What’s This Change, Then?
You likely know that in Ruby, when you call .to_s on an object, it’s supposed to return itself “translated” to a string. For instance if you call it on the number 7 it will return the string “7”. Or if you call it on a symbol like :bob it will return the string “bob”. A string will just return itself directly with no modifications.
There are a whole family of similar “typecast” methods in Ruby like to_a, to_hash, to_f and to_i. Making it more complicated, most types have two typecast operators, not one. For strings that would be to_s and to_str, which for arrays it’s to_a and to_ary. For the full details of these operators, other ways to change types and how they’re all used, I highly recommend Avdi Grimm’s book Confident Ruby, which can be bought, or ‘traded’ for sending him a postcard! In any case, take my word for it that there are a bunch of “type conversion operators,” and to_s is one of them.
In Ruby 2.7-preview2, a random Ruby prerelease, Symbol#to_s started returning a frozen string, which can’t be modified. That breaks a few pieces of code. That’s how I stumbled across the change — I do speed-testing on pretty ancient Ruby code regularly, so there are a lot of little potential problems that I hit.
But Why Is That a Problem?
When would that break something? When somebody calls #to_s and then messes with the result, mostly. Here’s the code that I had trouble with, from an old version of ActiveSupport:
def method_missing(name, *args)
name_string = name.to_s
if name_string.chomp!("=")
self[name_string] = args.first
else
bangs = name_string.chomp!("!")
if bangs
self[name_string].presence || raise(KeyError.new(":# is blank"))
else
self[name_string]
end
end
end
So… Was this a perfectly okay way to do it, broken by a new change? Oooooh… That’s a really good question!
Here are some more good questions that I, at least, didn’t know the answers to offhand:
If a string usually just returns itself, is it okay that modifying the string also modifies the original?
Is it a problem, optimisation-wise, to keep allocating new strings every time? (Schneems had to work around this)
If you freeze the string, which freezes the original, is that okay?
These are hard questions, not least because fixing question #1 in the obvious way probably breaks question #2 and vice-versa. And question #3 is just kind of weird - is it okay to stop this behaviour part way through? Ruby makes it possible, but that’s not what we care about, is it?
I mention this interface, to_s, “not being perfectly thought out” up at the top of this post. And this is what I mean. to_s is a limited interface that does some things really well, but it simply hasn’t been thought through in this context. That’s true of any interface - there will always be new uses, new contexts, new applications where it either hasn’t been thought about or the original design was wrong.
“Wrong?” Isn’t that a strong statement? Not really. Charles Nutter points out that the current design is simply unsafe in the way we’re using it - it doesn’t guarantee what happens if you modify the result, or decide whether it’s legal to do so. And people are, in fact, modifying its result. If they weren’t then we could trivially freeze the result for safety and optimisation reasons and nobody would notice or care (more on that below.)
Also, we’ll know in the future, not just for to_s but for conversion methods in general - it’s not safe to modify their results. I doubt that to_s is the only culprit!
Many Heads and a Practical Answer
In the specific Ruby 2.7 sense, we have an answer. Symbol#to_s returned a frozen string and some code broke. Specifically, the answer to “what broke?” seems to be “about six things, some of them old or obscure.” But this is what trying something out in a preview is for, right? If it turns out that there are problems with it, we’re likely to find them before the final release of 2.7 and we can easily roll this back. Such things have happened before, and will again.
(In fact, it did happen. The release 2.7.0 won’t do this, and they’re rethinking the feature. It may come back, or may change and come back in a different form. The Ruby Core Team really does try to keep backward compatibility where they can.)
In the mean time, if you’re modifying the result of calling to_s, I recommend you stop! Not only might the language break that (or not) later, but you’re already given no guarantees that it will keep working! In general, don’t trust the result of a duplicated object from a conversion method to be modifiable. It might be frozen, or worse it might modify the original object… And yet, it isn’t guaranteed to, or to keep doing it if it already does.
And so the march of progress digs up another problem for us, and we all learn another little bit of interface design together.