GOLF

Golf Betting and Daily Fantasy: Implementing Expected Strokes Gained to Identify Putting Regression

Putting is a highly volatile aspect of betting on building rosters for daily fantasy golf. Can we leverage data to account for this?

Any data-driven golf fan, bettor, or daily fantasy sport player is likely aware of the term "putting regression."

It refers to the idea that putts can fall some days, and they can lip out some other days.

Those strokes can be all the difference between winning and losing a tournament, finishing in the top 10, or even making the cut.

Putting is extremely important. It just is.

You'll be hard-pressed to find many PGA Tour winners who actively lost strokes putting during their win. By my data (232 ShotLink events since 2015), less than 5.0% of winners lost strokes putting (the actual number is just 3.4%).

But the idea of putting regression doesn't imply that putting is entirely random or that hot putting will never carry over from event to event.

What are we to do?

A deeper look at how strokes gained: putting accumulate might help us avoid the inflated hot streaks and buy low on "cold" or underperforming putters.

If you're not interested in the process, foregrounded dead-ends, or mathematical results, you can skip ahead to the proposed application or the conclusion itself.

The Process

Determining the proper sample size for golf is not exactly easy. But I wanted to start somewhere with a decent number of golfers and rounds accrued.

So, using FantasyNational, I pulled current (as of March 18, 2022) PGA Tour players with a 100-round sample of ShotLink data and broke out their putting stats based on distance splits of five-foot increments (0 to 5, 5 to 10, etc.).

I'll be citing r-squared values, which -- for us -- shows the predictiveness of these stats to overall strokes gained: putting.

Though I started by seeing how predictive these splits were, the whole thing got a little nebulous because of the number of dead ends I uncovered.

Here are some things I tested but found fruitless.

Foregrounding Some Non-Option Applications

I want to rule out a few ideas I came up with, tested, and found problematic.

Round-to-Round Regression
I pulled a random sample of 20 golfers from our qualified golfer dataset to see if strokes gained: putting and strokes gained: putting from various distance splits were sticky from one round to the next.

They aren't.

While we see some stickiness from inside 10 feet, we're still looking at small samples within a given round. The 2021 PGA Tour average putts per round was 29.01.

Let's just assume the Tour-average frequency (laid out in the next section) over 29 putts in a round.

We'd see a golfer average:
- 14.2 putts from inside 5 feet
- 4.2 putts from 5 to 10 feet
- 2.7 putts from 10 to 15 feet
- 1.9 putts from 15 to 20 feet
- 1.5 putts from 20 to 25 feet
- 4.5 putts from outside 25 feet

It's no wonder these stats aren't stabilizing from a round-to-round basis.

Month-by-Month (and Half-Year) Regression
I also looked at month-to-month putting regression, which also wasn't sticky.

Samples are small still here, too, if we're careful not to put stock into golfers with minimal ShotLink rounds in a month.

I then broke things out into half-year samples, and even that wasn't exactly predictive from an overall strokes gained: putting standpoint.

Basically, we can't really say that a golfer is money from within 10 feet, so he'll always be money from within 10 feet.

We also can't say that a golfer who hits bombs or lags it better than most will repeat that over the next round, month, or half of a year.

Now what?

The Important Data

Here are a few putting stats.

This includes the 2021 PGA Tour make average from each five-foot range, the frequency of putts from that range (i.e. how many putts come from that particular range), the r-squared between strokes gained: putting from that range with a golfer's overall strokes gained: putting from our 100-round sample, and the differential between those two numbers.

Range PGA
Make%
PGA
Frequency%
R^2
With
Total SG:P
Differential
Inside 5 Feet 96.8% 49.2% 46.3% -2.9%
5 to 10 Feet 56.8% 14.6% 60.2% 45.7%
10 to 15 Feet 30.7% 9.3% 28.4% 19.1%
15 to 20 Feet 18.6% 6.5% 22.0% 15.4%
20 to 25 Feet 12.6% 5.0% 11.5% 6.5%
Outside 25 Feet 5.6% 15.4% 13.1% -2.3%


Putts from inside five feet, understandably, are made at nearly a 97.0% clip, and they made up nearly half (49.2%) of all putts from qualified players on the 2021 PGA Tour.

As evidenced by the r-squared value, a golfer's strokes gained: putting from within five feet explains 46.3% of his total strokes gained: putting.

That r-squared value jumps to 60.2% on putts from 5 to 10 feet despite the fact that only 14.6% of putts are from that range. If you want to know who's a good putter, strokes gained data says it's those who convert from 5 to 10 feet.

If we combine these two ranges to look at all putting from within 10 feet, we account for around 63.8% of putts attempted, and the efficiency from there has an r-squared of 77.1%.

What this means: if we had zero idea of how well a golfer was putting from outside 10 feet, we'd still have a pretty good idea of how well he was putting overall.

The farther from the hole (after 10 feet), the less predictive strokes gained: putting is until we get to the true lag ranges of 25-plus feet. Even then, though, the predictive power with overall strokes gained: putting is minimal (13.1%).

Basically, it stands to reason that golfers who are gaining strokes putting from 10 to 25 feet over a short span might be great putters.

But they also might be overperforming from a generally unreliable, unstable range.

How can we apply this information?

A Proposed Application

The idea of throwing out putts for being too long and thinning out an already small sample of data is a little worrisome, especially when the goal is to identify small-sample outliers who are bound to regress.

However, we can consider cutting out the least predictive putts (i.e. longer putts) while paring down the sample and still keeping it quite predictive.

We already laid out that 63.8% of putts are from within 10 feet and that strokes gained: putting from that distance explains 73.3% of strokes gained: putting.

But let's not forget that we're then considering roughly 18.5 putts per round (63.8% times the Tour average of 29 putts per round).

We can add putts from 10 to 15 feet to this number and thus look at 73.1% of putts (assuming Tour-average splits here) and explain 80.5% of strokes gained: putting.

That's a lot of the putts (more than 21 of 29) and a lot of the predictive power.

I considered adding in lag putting (25-plus feet) because we see more than 15.0% of putts come from that range.

Maybe like a dunks-or-threes stat like we see in basketball. No dice.

Testing a within-15-feet-plus-lag-putting split made the r-squared less predictive (77.8%). It's still good, but it's not worth skewing the data with a golfer who buried some bombs and thus looks better than he should.

The same goes for applying disproportionate weight to the splits from each five-foot range (80.2%). Just focusing on strokes gained from within 15 feet works well while also stopping before we just account for nearly every putt that a golfer attempts.

Therefore, I posit that we can reduce a lot of putting noise by looking at putting efficiency from within 15 feet to predict expected strokes gained: putting with the following formula:

xStrokes Gained: Putting = (1.718) + (1.204 * [SG:P From 0'-15'])

Based on the testing we've already done, we know that this will explain more than 80% of a golfer's strokes gained: putting and account for around 75% of a golfer's putts.

Does it pass the eye test, though?

I'd like to think so, yes.

Here is how this formula would rank the top 20 putters from our sample. (I know I'll lose some evergreen-ability with these tables, but it's important.)

Golfer
(Past 100 Rounds
Since March 18, 2022)
SG:P
Rank
xSG:P
Rank
Rank
Differential
Bryson DeChambeau 5 1 -4
Xander Schauffele 15 2 -13
Chesson Hadley 17 3 -14
Cameron Smith 1 4 3
Louis Oosthuizen 8 5 -3
Adam Schenk 51 6 -45
Mackenzie Hughes 9 7 -2
Denny McCarthy 7 8 1
Jason Kokrak 3 9 6
Ian Poulter 4 10 6
Brian Harman 46 11 -35
Patrick Reed 6 12 6
Alexander Noren 10 13 3
Adam Scott 11 14 3
Billy Horschel 24 15 -9
Brendon Todd 1 16 15
Adam Hadwin 18 17 -1
Beau Hossler 21 18 -3
Lanto Griffin 29 19 -10
Sam Burns 12 20 8


I don't think anyone could really hate on this list as far as great putters go.

After all, it's not that different than the actual strokes gained: putting rankings (again, that's expected).

Notably, some outliers on here are Adam Schenk (6th in putting from inside 15 feet but 115th from outside 15 feet) and Brian Harman (11th and 126th, respectively).

That tracks. They aren't getting lucky on deep putts but have been good from a predictive range.

That all applies to Bryson DeChambeau (1st and 85th), Xander Schauffele (2nd and 92nd), and Chesson Hadley (3rd and 74th), too.

Poster boy Cameron Smith's clutch putting was a huge reason why he won THE PLAYERS in 2022 (and was at least partly why I wanted to revisit this study when I did).

Smith ranks 4th from within 15 feet and 36th outside 15 feet. Despite being great from deep, he's still rewarded by this method because he's good where it matters/where it's predictive. He's being rewarded for being great on 75% of his putts (i.e. the rough amount from within 15 feet).

Moving on.

We should expect to see the inverse with the weakest expected putters.

Again, we should want to see the expected ranks mostly match the actual ranks (that's the whole goal here -- to predict actual strokes gained while removing some variance), and when we don't, we should anticipate that these golfers are outliers from 15-plus feet.

Golfer
(Past 100 Rounds
Since March 18, 2022)
SG:P
Rank
xSG:P
Rank
Rank
Differential
Luke List 126 127 1
Kyle Stanley 127 126 -1
Keegan Bradley 125 125 0
Chez Reavie 124 124 0
Scott Piercy 115 123 8
Lucas Glover 111 122 11
Hideki Matsuyama 120 121 1
Matthew NeSmith 122 120 -2
Joel Dahmen 106 119 13
Andrew Landry 106 118 12
Dylan Frittelli 117 117 0
Tyler Duncan 121 116 -5
Henrik Norlander 119 115 -4
Cameron Champ 123 114 -9
Tony Finau 93 113 20
Branden Grace 96 112 16
Russell Knox 116 111 -5
Bubba Watson 112 110 -2
Harold Varner III 66 109 43
Tom Hoge 108 108 0


Anyone who bets or plays daily fantasy golf should know these putters are virtually all massive red flags in the putting department, so I think we're good.

And, notably, there aren't that many true outliers in terms of ranking differential.

After all, if you're losing strokes consistently from within 15 feet, you are losing strokes consistently on nearly three-fourths of your putts (around 73.1%).

We shouldn't want those golfers to appear as good putters just because they've fared well on longer putts. We should actually be staying away from the ones overperforming on some lucky putts while remaining bad on short- and mid-range putts.

Anecdotally, the largest differential here is for Harold Varner, who ranks 109th in putting from within 15 feet but 22nd on longer putts.

Tony Finau (sorry, Tony) is 113th in strokes gained: putting from within 15 feet but 19th in strokes gained: putting from beyond 15 feet in this sample.

Conclusion

Initially, I felt confident in the data that simply showed how predictive putting from 5 to 10 feet was with overall strokes gained: putting, but upon realizing just how small a sample that actually would be (again, roughly 4 to 5 putts per round on average), I wanted to press on.

The reality is that round-to-round, event-to-event, and even month-to-month putting is inherently going to be random to a degree. There's a reason I have tried, failed, and canned this study a dozen times before.

But I feel good with this attempt.

And while, yes, we're going to lose some appeal of long-range bombers by looking only at putting from within 15 feet, this proposed method does a good job of accounting for the majority of putts (again, roughly 75% of putts will be from within this range) and explains even a greater deal (80.5%) of a player's strokes gained: putting.

I'll definitely be applying this to all of my weekly research, but just like with course history, current form, and even recent tee-to-green stats, there is no one solution for analyzing pro golf, and it's always just a piece to the larger puzzle.