Predicting Success: FIP and XFIP

FIP, xFIP, and a Potential Fix to their Predictive Power

By Austin Brubaker

Baseball, and the methods used to evaluate players, are consistently evolving. Teams pursue even the smallest of details to gain a broader insight into predicting future success. With this comes the rise in analytics in baseball.

Most fans recognize common statistics to analyze the effectiveness of a pitcher. Stats like ERA, WHIP, Walks, and Strikeouts have been used for decades. Many fans will even recognize newer ERA estimators, like FIP, xFIP, and SIERA, which have gained more and more credibility.

While these statistics are fun to see, it’s necessary to understand what they actually are and their purpose. Only then will we get insight from these statistics and see their potential flaws.

Fielding Independent Pitching, more commonly known as FIP, has become the most common alternative to ERA. The theory behind this statistic comes from a very simple premise; pitchers do not control the outcome of balls put in play.

Perhaps a weak grounder is hit in the perfect spot where a fielder is not. Or, reversely, a line drive hit 110+ MPH could be hit directly at a fielder. Either way, a pitcher has very little control on the fielders, and thus doesn’t have complete control over their ERA. From this, the statistic known as FIP was created.

FIP directly combats the flaw in ERA by creating an ERA-like estimate (to the same scale as ERA) with only the statistics a pitcher can control: Strikeouts, Walks, Home Runs, and Hit by Pitches. The formula used to calculate FIP is fairly simple and can be found by a quick Google search.1 From these four basic countable statistics, a pitcher can be evaluated exclusively on outcomes they can control.

FIP also has another important purpose in predicting future success.

ERA is great at evaluating the performance of a pitcher during a current season; it is not, however, the best at predicting how well a pitcher will do in the future. FIP, because factors outside the control of a pitcher are removed, is a better predictive statistic of future success than ERA. This has been studied through many articles online, which look at the relationship between ERA, FIP, and the following season’s ERA.2 3

Expected Fielding Independent Pitching, or xFIP, takes the assumptions of FIP one step further. In FIP, a pitcher’s home run rate in a given season is used. xFIP believes this assumption of a consistent home run rate is unrealistic.

There are many examples of a baseball just barely scraping over the wall, or pitchers who get plenty of fly balls in pitching-friendly ballparks. These results vary year to year, so it is unrealistic to assume, when predicting future success, a pitcher is going to have the same luck with flyballs. Thus, an alternative was needed.

xFIP substitutes the home run rate of a given season with that season’s league average fly ball-to-home run rate. That is the only difference between FIP and xFIP. Walks, Strikeouts, and Hit by Pitches are utilized, but the assumption is the pitcher should have a league-average fly ball-to-home run rate.4 Additionally, xFIP is utilized as a method to predict future success, as is FIP, and is seen as a more predictive statistic to ERA.2 3

Since FIP and xFIP are so similar to each other and share the same purposes, one may question the statistic that best predicts future success. That is, in the offseason, what should each team currently be looking at as they attempt to add talent for the upcoming season. The question is not what a pitcher did in 2023, but rather what they will do in 2024. Thus, it is important understand which statistic (FIP or xFIP) is better in its predictive power, as the values for FIP and xFIP could vary drastically.

The research on this is not conclusive.

In several studies posted online, different conclusions have been drawn. Some articles claim FIP is better at predicting future ERA2, while others have xFIP as a better estimator.3 While it is important to highlight and potentially critique each article in their analysis, I believe there could be a better method that addresses the flaws of each. This method could potentially have more predictive power than FIP or xFIP alone.

To give some light into my theory, it’s relevant to highlight my background.

During grad school, I was a teaching assistant of an introductory statistics course. One of the topics highlighted was called the “regression effect.” The simple premise behind it can be best seen in an example. Let’s say, for a class, you have all their midterm scores, and you want to use those scores to predict a student’s final exam score. Choosing one student at random, let’s say this individual was in the 90th percentile in midterm scores (scored better than 90% of the class). Based on the regression effect, we would predict this individual to score somewhere between the 51st and 89th percentile on the final exam, meaning we would predict this individual would score above the median, but somewhere closer to it.

Conversely, if a student scored in the 10th percentile in midterm scores (scored better than 10% of the class), we would predict this individual would score somewhere between the 11th and 49th percentile on the final exam, meaning we would predict this individual would score below the median, but closer to it. This concept, taught in an introductory statistics course, could potentially be applied to the world of baseball analytics.

To see how, recall the key difference between FIP and xFIP being the home run rate used in its calculation. Going back to the midterm and final exam example, FIP represents the percentile on the midterm exam, while xFIP represents the median. It is my theory that neither of these values gives the best predictive value for a player, and the better estimate could be found somewhere in the middle. Let’s look at some anecdotal evidence to highlight some of these concerns.

Sonny Gray had a career and MLB season low 0.39 HR/9 among qualified pitchers in 2023.5 Thus, his FIP was far lower than his xFIP. I don’t think it is fair to assume this MLB leading (and career low) rate is sustainable at this extreme, but it is reasonable to predict he will continue to have a below average home run rate. Somewhere between FIP and xFIP.

Conversely, as Dodgers fans experienced, Lance Lynn was very home-run prone in 2023, giving up a MLB leading 2.16 HR/9.5 Thus, his xFIP was much lower than his FIP. While it is likely some of those numbers would stabilize in 2024 after career highs in home run rate, it is fair to predict Lance Lynn would still have an above-league-average home run rate, maybe not to the extreme of 2023. Somewhere between FIP and xFIP.

These two examples, highlighting the extremes in home run rate, illustrate a potential flaw with the predictive power of both FIP and xFIP individually predicting ERA. While these numbers are not the sole calculus in deciphering which pitchers to target, perhaps there is a better way to utilize these statistics in predictions. Perhaps, somewhere between FIP and xFIP.

To see if this theory holds any merit, I decided to do some fairly simple data analysis based on recent baseball data. A sample of the 83 MLB pitchers who threw at least 250 combined innings between 2021 and 2022, as well as the 74 MLB pitchers who threw 250 combined innings between 2022 and 2023, were utilized as the test dataset. The following data was collected: ERA from season 1, FIP from season 1, xFIP from season 1, and ERA from season 2. In addition, a new FIP value (let’s call it newFIP) was created by averaging the values of FIP and xFIP.

From there, simple linear regression was run on ERA from season 1, FIP, xFIP, and newFIP, with ERA from season 2 being the dependent (or outcome) variable. This is the same analysis run by other articles online examining the predictive power of FIP and xFIP, among other statistics.2 3 To see which model is “the best,” the adjusted r-squared (R^2) value and the residual mean square error (RMSE) will be examined, with better models having higher R^2 values and lower RMSE. Below is a table of the output:

Predictor	R^2	RMSE
ERA	0.1085	0.9297
FIP	0.1171	0.8217
xFIP	0.1199	0.8191
newFIP	0.1286	0.8109

Using this limited dataset containing almost exclusively starting pitchers, the combination between FIP and xFIP created the best model at predicting the following season’s ERA. This does not mean my theory is correct in many or all circumstances; in fact, there are many limitations in this approach, such as not using relief pitchers or pitchers with fewer innings pitched, only looking at recent years, not looking at alternative models, etc. It is also true that the values provided are very small, and the differences in predictive power aren’t huge. The model created with newFIP only has 12.86% of the variance explained by newFIP, which is not very high. What this does illustrate, however, is a possible small advantage in creating models and analyzing data somewhere between FIP and xFIP has.

This theory could take very simple or complex forms. For example, the easiest solution would be to split the difference between FIP and xFIP, as was utilized above. Or, perhaps, years of experience or park factors should be taken into account when calculating a new predictive formula using FIP and xFIP. Or, there could be an even better value to use between FIP and xFIP. Regardless, analysts (both working for organizations and fans of the game) should always look for innovative ways to analyze players and predict future success. Only then can you truly attain even the smallest advantage that could make the biggest difference.

Slowinski, P. FIP. Fangraphs. https://library.fangraphs.com/pitching/fip/