Don’t make your users wait for GC

One of the weaknesses of Ruby is garbage collection. Although some improvements have been made (tunable GC settings, lazy sweep, bitmap marking), it is still a simple stop-the-world GC. This can be problematic especially for large applications.

We have a rather large application running on Passenger 3 using REE 1.8.7 with tuned GC. As the application grew, the GC performance has been degrading until we were faced with the following situation where GC averages 130 ms per request.

It was time to address our obvious GC problem.

We could have continued to tune the GC and optimize memory usage to reduce the impact of GC on response times. But why? Why have the GC impact the response time at all? Instead, we decided to trigger GC after one request finishes but before the next request starts. This is not a new idea. It is actually generically referred to as out of band work (OOBW). It has been implemented by Unicorn and discussed here. However, it is not supported by Passenger. So we decided to add OOBW support to Passenger.

Our patch allows the application to respond with an additional header, namely X-Passenger-OOB-Work. When Passenger sees this header, it will stop sending new requests to that application process and tell that application process to perform the out of band work. The application registers oob_work callback for the work it wants done using Passenger’s event mechanism. When the work is done, the application process resumes handling of normal requests.

All the application code can be packaged in an initializer,

PhusionPassenger.on_event(:oob_work) do
 t0 =
 GC.start "Out-Of-Bound GC finished in #{ - t0} sec"

class RequestOOBWork
 def initialize(app, frequency)
   @app = app
   @frequency = frequency
   @request_count = 0

 def call(env)
   @request_count += 1
   status, headers, body =
   if @request_count % @frequency == 0
     headers['X-Passenger-Request-OOB-Work'] = 'true'
   [status, headers, body]

Rails.application.config.middleware.use RequestOOBWork, 5

After experimentation, we decided to trigger GC after every 5 requests. If you set the parameter too high, then GC will continue to occur in the middle of requests. If you set the parameter too low, then application processes will be busy performing their out-of-band GC causing Passenger to spawn more workers.

With these changes in place, our graphs show 10x improvement in GC average per request,

The patch we implemented and deployed to production is for Passenger 3. It is available here.

Passenger 4 is currently under active development. In fact, beta 1 has already been released. We’ve ported our patch to Passenger 4. It is merged and expected to be included in beta 2.

About these ads

6 thoughts on “Don’t make your users wait for GC

  1. Hi pkmiec, thanks for contributing this patch, it was a pleasure working with you!

    I’m thinking how much cooler it would be if there’s way to do *delayed GC* on top of out-of-band GC. Whenever the Ruby GC detects the need to trigger a garbage collection, it would set a flag. The code can then later on invoke the garbage collector when needed. This requires patching Ruby though.

    Maintaining REE and keeping quality assurance was an extremely resource intensive task, which was one of the reasons why we had to End-Of-Life it. Such a patch should be accepted by upstream for it to become viable. Maybe they’d accept a patch that introduces GC hooks so that new GC functionality can be implemented in extensions. That would make maintenance and development much easier.

    Just thinking out loud.

    • Thank you for all you help and support on this patch.

      Along the lines of the *delayed GC* idea, I initially tried to implement some sort of heuristic on when to respond with the X-Passenger-OOB-Work header, but could not get it to work in a reasonable way. So I just settled on every nth request.

      I feel a lot of effort has gone into the ruby gc, but integrating that effort into the mainstream ruby has been very very slow. You guys made REE incorporating the railsbench gc patch and copy-on-write friendliness. Twitter invested into Kiji which implements a limited version of generation gc. Rubinius, to my understanding, has a much more sophisticated generational gc. Yet, MRI 1.9 only included the tunable GC and lazy sweep. Never mind, it wasn’t really usable until 1.9.3 and we lost tools like memprof. Ruby 2.0 will have bitmap marking which is great but also disappointing it took so long to effectively implement the same thing that REE had.

      I totally agree pluggable GC would be a huge improvement. I recall this having been proposed on the mailing list at some point. I believe one issue is the lack of proper way of dealing with c extensions, which Brian Ford talked about at the last ruby conf:

  2. Pingback: Phusion Passenger 4 Technology Preview: Out-Of-Band Work – Phusion Corporate BlogPhusion Corporate Blog

  3. Pingback: Phusion Passenger now supports the new Ruby 2.1 Out-Of-Band GC – Phusion Corporate Blog

  4. Hello there! I could have ssworn I’ve besen tto this site before but after reading throujgh some
    of the post I realizd it’s new to me. Anyways, I’m defnitely
    glad I found it aand I’ll be bookmarkiing annd checking back often!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s