At my RubyKaigi talk, I suggested that further information on Ruby performance will be forthcoming here -- and it will.
A gentleman from EngineYard, however, first asked me, "are there any other factors that I wish I had time to cover in my talk and didn't have time?"
OH YES. This is my response to him:
The short answer is "yes, there are a number of factors and I've written blog posts about several of them."
Garbage collection, for instance, is a huge factor. Between Ruby 2.0 and 2.3, the garbage collector changed enormously. And in a high-concurrency, high-memory-usage scenario like mine, it's fair to ask the question, "was the whole difference a matter of garbage collection?" I wrote a blog post about that, doing a fairly quick assessment: "https://appfolio-engineering.squarespace.com/appfolio-engineering/2017/5/12/has-ruby-helped-rails-performance-other-than-garbage-collection"
There's also a lot more to the specifics of how I gathered the data. You can look at the "for pedants only" section at the end of another blog post to see more of the details there: "https://appfolio-engineering.squarespace.com/appfolio-engineering/2017/4/14/comparing-ruby-speed-differences"
As far as Puma and concurrency settings... I tested that fairly extensively and wrote about it: "https://appfolio-engineering.squarespace.com/appfolio-engineering/2017/3/22/rails-benchmarking-puma-and-multiprocess". You won't see a blog post about Puma versus Thin, but it turns out that Puma is *significantly* faster for this benchmark as well. So: there are definitely some interesting things there. I still need to contact Hongli Lai about getting a commercial Passenger license for testing to see how it fares against Puma - there could easily be significant differences there as well.
A few things have changed in my methodology over time, but you can also see how I originally designed the benchmark and why in another blog post, which was critiqued by a number of Ruby performance folks (Nate Berkopec, Charles Nutter and Richard Schneeman, among others.) Here's that post: "https://appfolio-engineering.squarespace.com/appfolio-engineering/2017/1/31/the-benchmark-and-the-rails".
So yeah, there are definitely other factors. I've been working on this a fair bit. And that's ignoring the many and various factors I've explored but I *haven't* found time to blog about. I have a list! For instance: my benchmark allows you to set a random seed. That *should* make essentially no difference in the results if I'm using enough requests. But it's straightforward to actually measure whether it makes a difference, and I haven't yet. I *hope* that won't be worth a blog post, but I haven't actually checked yet...
Also, what if I optimize for latency instead of throughput? Is there a significant difference in request variance between running all requests in a single process versus running in multiple processes (which will be *interesting* to measure for warmup reasons)? I could check startup time with Bootsnap. I could check startup and request time with the enclose.io AoT Ruby compiler.
So yes, there are a number of other interesting factors and things to analyze still, no question. If you watch the AppFolio Engineering blog (linked several times in this message) you'll see these things as they come out. That's where I write up my results!
Thanks for asking! It's wonderful when people are interested in my work :-)