Exploring the Home/Away Bias for Called Strikes

Introduction

We are all familiar with the home field advantage in baseball. Teams tend to play batter at home, players tend to hit better in their home ballpark, and so on. But the rationale behind the home field advantage is less clear. Here’s an interesting article that presents the “facts and the fiction” about home field advantage. This author claims, following research from the Scorecasting book by Moskowitz and Wertheim, that the major reason for home team advantage is the umpires officiating that gives an advantage to the home team. I generally agree with Moskowitz and Wertheim. In baseball, one of the major ways for umpires to show favoritism is in their calling of balls and strikes.

Retrosheet recently released their play-by-play data for the recent 2022 baseball season. Given this new data, I thought it would be a good exercise to write a Shiny app that would allow the user to explore the home-team bias in calling balls and strikes. One chooses a rectangle about the zone the the app will give the proportion of called strikes for the home team, for the away team and compute what I call the home bias. This app is easy to create and might be a good way for the interested reader to learn the interface for building Shiny apps.

Downloading the Retrosheet Files

For those new readers, I should briefly review the process of downloading the play-by-play files from the Retrosheet site. On this page, I describe the process of downloading the individual files for a particular season and extracting all of the variables using the Chadwick software. We’ve written a special R function for computing the runs values for all plays. These can be used for computing the runs expectancy matrix for the 2022 season.

The Shiny App

When you launch this app, you see a scatterplot of the locations of 5,000 called pitches where a called strike is colored red and a called ball is colored tan. Since the location of called strikes may vary by the pitching arm and the batter side, you can select the side of the pitching arm and the batter side on the left. In this example we are considering all called strikes thrown by a right-arm pitcher against a right-handed batter.

You select a region about the zone by brushing over the graph. Here I am selecting a region at the bottom of the zone — since it straddles the bottom of the zone, I would guess the percentage of called strikes to be about 50%. The bottom table gives the number of called balls and called strikes for the home team (bottom of the inning) and the visiting team (top of the inning). Here we see that the called strike rate is 61.179 for the home team and 62.000 for the visiting team. Since the called strike rate is lower for the home team, that gives them an advantage. We measure this bias by computing

Called Strike Rate (Visiting Team) MINUS Called Strike Rate (Home Team)

We call this the Home Bias — looking at the bottom left of the app, we see the bias is 0.821 percentage points for this selected rectangle.

Using the App

Using this app, I explored the size of the Home Bias for regions about the zone. I decided on using regions with a width of 0.2 feet and selected regions centered about the four sides of the zone where one would expect the called strike rate to be about 50%. I considered the four matchups — R_R is a right-handed pitcher against a right-handed batter, R_L is a right-handed pitcher against a left-handed batter, L_R is a southpaw against a right-handed batter and L_L is a leftie against a leftie. For each case (selected rectangle and pitcher and batter sides), the app gave me the Home bias.

This table gives the results (each number reported is the Home bias reported as a percentage).

  Location  R_R  R_L  L_R  L_L
1   Bottom 1.21 1.23 1.30 0.84
2      Top 0.47 0.52 0.73 0.72
3   Inside 0.31 1.29 0.55 1.33
4  Outside 1.53 0.36 1.33 0.40

Some takeaways:

  • For all cases, the home biases were positive, ranging between 0.40 and 1.53 percent.
  • The home biases at the bottom of the zone were larger than the biases at the top of the zone.
  • Interestly, the biases at the right of the zone (from the catcher’s viewpoint) were higher than the biases at the left of the zone. This would be outside pitches for right-handed batters and inside pitches for left-handed batters.

Some Comments

  • Umpires have a big influence on the outcomes of games by their calling of balls and strikes. This Shiny app documents the umpire bias in favoring the home team — the called strike percentage is always larger for visiting batters than for home batters.
  • The size of the bias depends on the location of the pitch. The bias exceeds one percent for pitches at the bottom and left sides of the zone.
  • My summaries of the bias were based on particular rectangles that I selected — the interested reader can choose other rectangles to learn more about the locations where the bias is large or small.
  • I have not explored all of the potential inputs that may affect the size of the home bias. The sides of the pitcher and batter seemed like obvious things to check. I would think that the count may also impact the called strike probability and the size of the bias.
  • Should MLB start using robots to call balls and strikes? We certainly have the technology, other sports like tennis use the technology to replace umpires, and I don’t think robots would have the home team biases that we see here.

R Notes

This Shiny app is the function BrushingCalledPitches() in my ShinyBaseball package. Currently the app is live — you can play with the app at the location

https://bayesball.shinyapps.io/BrushingCalledPitches/

Due to space limitations, I was only able to put about 50% of the 2022 called pitches (150,000 of them) on my Github site, so this on-line app doesn’t use all the data. But I think the pattern of home bias values should be consistent with the results presented here that use all the 2022 called pitches.

If you inspect the R code for my app here, you will see that the app isn’t that complicated. I have several functions construct_plot(), calculate_rate(), that do all of the work and the user interface part of the app is short.

Leave a comment