25 March 2018

Ruby Benchmarking

Dan Mayer
Dan Mayer @danmayer

Ruby Benchmarking

If you work on a large app, publish, gems, or are interested in the Ruby community at some point you will likely want to dig into performance and benchmarking. Often what you think will help performance doesn’t have as much of an impact as you would guess. Which is why being able to set up quick experiments to verify the impact of code changes on performance and memory can be important. Benchmarking is great as it can help prove, your change had the impact you want. Ruby performance and the measurement of it is getting more discussion at the moment because Ruby 3X3 set out a goal to make Ruby 3 three times faster than Ruby 2. Learn more in this talk, Ruby3x3: How are we going to measure 3x?

This post will talk about measuring the performance of Ruby and benchmarking your own code and applications. In a future post, I will dig more into good practices around benchmarking Ruby gems.

image from pixabay

Ruby Benchmarking Projects

A number of projects have been proposed and have support from the community for benchmarking Ruby… These project all will help the community improve the performance of Ruby, and can be used to help you measure the impact or Ruby or Gem changes you would like to try. If you are interested in contributing to Ruby or helping the community with improved performance or better benchmarking I encourage you to check out these projects. A great way to learn more about benchmarking is to dig into projects that are already out available and looking for more contributors.

Ruby Benchmarking Basics

When I need to quickly benchmark something, I look no further than Ruby’s simple Benchmark Module. I don’t have all the usage memorized off the top of my head, but have ended up at this great blog post, timing Ruby code is easy with Benchmark, a number of times over the years. It is a quick and simple post that will help get basics up and running in no time at all. For some additional Ruby benchmarking support also see the resources below.

Learning by Benchmarking A Proposed Ruby Change

I previously covered how to build Ruby from scratch on OS X. I posted that as I was working on a feature, I was hoping to get into Ruby. I proposed a change to Ruby’s Coverage that I hoped would lead to large performance improvements adding Coverage Pause & Resume support. A little discussion with @tenderlove, lead me to try to prove the value of the feature in terms of performance. In the end, I actually proved my feature idea was unnecessary and that Ruby’s Coverage was significantly faster than Ruby’s Tracepoint. Below, I will walk through some of the steps and code, I used while driving towards better benchmarks and understanding of what was really impacting performance while trying to collect the line of code runtime.

First Attempt: A Micro Benchmark

For some specific changes a micro benchmark, that focuses in very narrowly on the specific changes under test can be all that is needed and make for faster and easier iteration while testing changes. Often these are the easiest to setup and be more easily repeatable with various versions and settings to compare a number of changes.

My first attempt was to use a small repository, that I had used to show an issue with Ruby’s coverage, see my Benchmarking Coverage Example. The code was extended it so it could be run in 3 modes:

  • Ruby without Coverage loaded: ruby example.rb
  • Ruby with Coverage as it exists in the current release: COVERAGE=true ruby example.rb
  • Ruby with my suggested Coverage feature put to use: ENHANCED_COVERAGE=true ruby example.rb
require 'benchmark'
require 'coverage'

WITH_COVERAGE = !!ENV['COVERAGE']
WITH_ENHANCED_COVERAGE = !!ENV['ENHANCED_COVERAGE']

Coverage.start if WITH_COVERAGE || WITH_ENHANCED_COVERAGE
require 'bigdecimal/math'
require './app'
require './app_proxy'
Coverage.pause if WITH_ENHANCED_COVERAGE

ITERATIONS = 2_000
UPTO = 1_000
coverage_data = nil

# warm up
AppProxy.process(App, {iterations: 1, up_to: UPTO})

Benchmark.bm do |bm|
  if WITH_ENHANCED_COVERAGE
    bm.report { coverage_data = AppProxy.process(App, {iterations: ITERATIONS, up_to: UPTO, coverage: false, enhanced_coverage: false}) }
  end
  bm.report { coverage_data = AppProxy.process(App, {iterations: ITERATIONS, up_to: UPTO, coverage: WITH_COVERAGE, enhanced_coverage: WITH_ENHANCED_COVERAGE}) }
end

puts "coverage"
puts coverage_data
puts "done"

The initial results, even with a high number of iterations basically couldn’t show a real difference. In the case, I was trying to show, since it wasn’t a dramatic improvement and wasn’t obviously measurable like the number of objects allocated to memory, proved to not be a good case for a microbenchmark.

Second Attempt: Build On Rails Ruby Bench

As I abandoned the idea of a microbenchmark being able to show the differences of the performance impact when using Ruby’s code coverage. I sought a more realistic example of how Ruby code is frequently used in production. I found @codefolio’s project, rails_ruby_bench, which I mentioned previously intends to help measure performance impacts for Ruby 3X3. Unfortunately, I ran into some issues getting this benchmark to run locally and I would need to do a good deal of work to embed the code I wanted under test into this project and Discourse which the project uses as the Rails app under performance testing.

While I didn’t end up pursuing this route, it did help push me towards a good direction to more realistically measure the performance impacts I wanted to see. By setting up a sample Rails application and testing full request cycles, I would have a much more realistic measure of my changes. I plan to follow up on this project more in the future and think it is an ideal way to test many Ruby or Gem changes that would have performance or memory impacts on standard Rails applications.

Third Attempt: Sample Rails App

Combining the learnings from the microbenchmark and the ideas in Rails Ruby Bench, I wanted to be able to quickly test a number of different scenarios by just setting a few environment variables, and then pull meaningful results on exercising the full Rails app with the changes. To measure the impact on the full Rails stack I turned to Apache’s AB, a simple HTTP benchmarking tool I have used many times over the years.

For each test, the Rails application would be run in a different mode, then benchmarked via AB. I created a new repository coverage_rails_benchmark. The project README covers all the steps for anyone to run their own set of benchmarks and records all the results of my performance tests.

The benchmarking script itself is extremely simple, the script below will make 2000 requests making 5 concurrent requests at a time to the endpoint listed. The code can be found in bin/benchmark.rb

puts `ab -n 2000 -c 5 "http://127.0.0.1:3000/posts"`

For example to get the most basic results of Ruby without Coverage loaded, one would follow the two steps below in two different terminals.

  • start the Rails server in basic mode: IGNORED_COVERAGE=true RAILS_ENV=production bin/rails server
  • execute the benchmark: ruby ./bin/benchmark.rb

This would output a bunch of data about the benchmark results.

Benchmarking 127.0.0.1 (be patient)

Server Software:
Server Hostname:        127.0.0.1
Server Port:            3000

Document Path:          /posts
Document Length:        3631 bytes

Concurrency Level:      5
Time taken for tests:   8.391 seconds
Complete requests:      2000
Failed requests:        0
Total transferred:      8572000 bytes
HTML transferred:       7262000 bytes
Requests per second:    238.34 [#/sec] (mean)
Time per request:       20.978 [ms] (mean)
Time per request:       4.196 [ms] (mean, across all concurrent requests)
Transfer rate:          997.58 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.3      0      11
Processing:     6   21  10.8     19     192
Waiting:        6   20  10.7     19     190
Total:          6   21  10.8     19     192

Percentage of the requests served within a certain time (ms)
  50%     19
  66%     23
  75%     25
  80%     26
  90%     30
  95%     33
  98%     38
  99%     42
 100%    192 (longest request)

For my various comparisons, I ended up caring the most for the mean time per request in the above example 20.978 [ms] (mean) was the starting point baseline I comparing with all other results. My final sample benchmark support 7 distinct modes that could be run in two different distinct settings. Unlike with my microbenchmark performance differences were extremely obvious with a full spread of fasted benchmarking mode running at 17.735 [ms] (mean) vs the slowest mode taking 85.069 [ms] (mean). Showing the worst mode was nearly 5X slower in a simple benchmark.

Benchmarking Conclusions

The various modes and options made it clear that I had initially been optimizing for the wrong thing, and that while there was significant performance improvements that could be made, it wouldn’t be related to my suggested Coverage feature proposal to support pause and resume. Instead, it showed that collecting coverage data was always very fast, but pulling and processing that data could be very slow. Opposed to trying to sample data collection the goal should be to reduce as much as possible the frequency one processes the results and to filter it down to the smallest set of results needed. Let’s take a look at a high-level summary below, or see the full benchmark details.

  • No Coverage Support: 17.735
  • Coverage Running (Ignore Coverage): 18.131
  • Coverage Stopped (Ignore Coverage): 18.268
  • Coverage Paused (Ignore Coverage): 18.717
  • Coverband Coverage (Ignore Coverage): 18.759
  • New Pilot Version Coveraband Coverage (Collect Coverage): 19.227
  • Coverage Running (Collect Coverage, but only into memory): 21.141
  • Coverage Resume (Ignore Coverage): 23.930
  • Coverage Resume (Collect Coverage, but only into memory): 26.720
  • Coverband Coverage (Collect Coverage): 39.421
  • Coverband Tracepoint (Collect Coverage): 46.979
  • Coverband Tracepoint (Ignore Coverage): 47.500
  • Coverage (Collect Coverage, send to Rails.logger): 85.069

In the end, my goal of changing Ruby’s Coverage was to be able to significantly reduce the performance overhead of my gem Coverband’s ability to collect runtime data. The results from the benchmark gave me all the data I needed to see that a different approach, where I dropped TracePoint in favor of Coverage but worked to reduce the frequency of checking the results would lead to far better performance improvements than my initially suggested feature. In the end, I pursued that approach, and was able to reduce the overhead with a sample rate of 100% from 2.5X slower to only being 1.08X slower! The details on that, I will cover in another post with some specifics on how to build performance benchmark testing into a Gem.

Ruby Benchmarking Learnings

I came away with a deeper understanding of Ruby performance benchmarking, and a much faster Gem that I will be able to release shortly. Beyond that some additional thoughts on Ruby benchmarking,

  • For some changes microbenchmarks aren’t helpful
  • There are some existing great projects to help folks benchmark changes to the Ruby language
  • Ruby’s Coverage is significantly Faster than collecting line usage via Ruby’s TracePoint functionality
  • Specific to Ruby’s Coverage library
    • The number of files instrumented with coverage has a big impact on performance, making micro benchmarks useless for that library
    • Calling coverage.peek_results to access the data is the biggest cost, much larger than collecting it.
    • Simply logging, processing, or trying to do anything with data is often slower than collecting it
    • Performance wins will come in reducing processing the data vs collecting it
  • Writing an maintaining good benchmarks can be challenging, I didn’t touch on the issues of running benchmarks with “background noise” and running multiple times to tease that out
  • Without measuring performance, guessing what will have a large impact is often wrong

Let’s Go Faster

via GIPHY

In a follow-up post, I will dig into the details of benchmarking a Gem over time, and the specific changes that helped to reduce the performance overhead of Coverband making it an order of magnitude faster. While there are code changes and using different features of Ruby, large performance wins also can come from a fundamentally different approach to solving the problem, which is the case with the changes needed to improve Coverband.

Categories

Ruby