Analysis: MLB Power Rankings, A New Statistical Stratification of Analyzing and Projecting Team Performance


I am of the mindset that a weekly power ranking fails to accurately reflect the MLB’s rankings. It’s useful to demonstrate weekly trends but fails to effectively stratify a team against its peers in any meaningful way.  Teams get hot, or play poor competition and go on streaks similar to what the Rays did earlier in the year when they began closing ground on the division-leading Red Sox. Week-to-week teams can rise or fall significantly, and while it makes an interesting read, the weekly rankings don’t mean much. A periodic ranking over quarters of the season paints a more complete picture of what’s happening, and with this in mind, I set out to stratify the league through the first ~60 games.  

I am stratifying teams based off of a new system that I’ve created that focuses on statistical analysis rather than simply a team’s win-loss record. This counters the popular argument that ‘you are your record,’ but this is a Power Rankings, not a simple breakdown of standings and a team’s performance/trend over their last 10 games. The purpose of this approach to stratification is to determine how talented a team truly is. Some teams may be playing well, but due to strength of schedule, or catching a streaking team, they may be underperforming their actual ability. Similarly, a bad team may be overperforming. To say that a team’s record has a spurious relationship with their talent is foolish, but I don’t think it tells the entire story. I predict that the weighted variables that I will be presenting in these rankings will correlate strongly with a team’s record. 

I am heavily stealing the propositions made by Eric Walker and Bill James, the fathers of sabermetrics to contribute to my narrative and give context to my weighted categories. I choose five variables to assist in creating my power rankings. Surprisingly, or not, runs scored, runs against, and wins are not included in the calculation. Instead, I chose variables that when combined tell a story about the team. Earned Run Average (ERA), Batting Average (BA), Fielding Percentage (FP), On-base Plus Slugging (OPS), and Walks and Hits Per Inning Pitched (WHIP). When combined, these statistics offer a holistic measurement of team performance.


Now, anyone that is familiar with statistics understands right away that some of my independent variables influence others, which may be the first flaw in my argument. But, each category plays its own role in the narrative, and thus must be included because they still measure different events in the game.  Without running a regression analysis, it’s impossible to determine which of these variables has the most impact (coefficient) on our dependent variable (which is performance – wins/losses) and which are actually statistically significant. Fortunately, others that have come before me have done most of the legwork. Enter Eric Walker, Bill James, and others (a good article by Adam Houser argues similar points and can be found here:

OPS may be the most important offensive stat in modern baseball. As Walker and James put it (and House agrees), the best way to generate offense is through the ability of a hitter to not get out. Getting on base is the BEST way to do this. The image of Billy Beane and Sandy Alderson in Oakland yelling at old scouts and telling their minor league managers to ensure their players are drawing walks is visceral to this point. The heralded stat of batting average is important because getting a hit leads to having a runner on base, but it only contributes so much as to add extra bases to a players ability to already get on base (measured in the OPS/OBP) for the purposes of this article. OPS contributes another thing if you’re getting on base. Walker also stated that the next most important statistic is slugging percentage, which is also measured by OPS. To advance and drive in runners, putting the ball in play is absolutely necessary.  The batter doesn’t need to be safe at first to generate runs or advance runners into scoring position.  These are not novel concepts to anyone who has read his works, or Moneyball, but they are important to the new formula, and as such OPS becomes the most important measurable offensive statistic for the purposes of this new rankings metric. It will thus be weighted more heavily than the other offensive statistic (BA), which is self-explanatory.

Fielding percentage is often overlooked. Defensive runs saved has become a popular metric to measure individual players, particularly when valuing fourth outfielders, but it is not significant enough to warrant a lion’s share of this metric. Most teams deviate by less than .01%, and while ensuring that teams are converting the putouts that they should be making, it is weighted less than other categories even though it is the lone metric that measures a team’s fielding efficiency.

WHIP was found by Houser to be the most effective measure of pitching/defense. I similarly find it the most important metric to measure a pitcher, but ERA offers a more holistic team rating. You might be asking, why have WHIP if you have ERA? Both measure the talent of a pitcher against a batter, but an ERA also measures the team defense and coaching strategies, whereas WHIP primarily measures the battle between a pitcher and the batter. This may be up for debate and could be argued as my second flaw in the measure, but I’ll try to explain my reasoning. An earned run could be scored by a team forfeiting a run with a runner on second and third with one out, and taking the out at first on a ground ball as opposed to challenging the run at home. This is a defensive strategy decision, and not necessarily a product of the pitcher vs. batter. Similarly, a manager’s decision to bring in a certain pitcher to face a batter (ie lefty specialist, deciding to pull a pitcher prematurely, or leaving one in for too long).  The resulting hits/walks/runs would then be on the manager, a metric which we, unfortunately, do not measure in a coherent data set. In the absence of this, we lump defensive decisions, strategy, and managerial choices together with the pitcher vs. batter duel as part of the narrative for ERA, whereas fielding percentage measures the team’s ability to not commit errors that lead to unearned runs.

WHIP, our last metric, previously discussed as the most important defensive statistic by Houser,  measures the average number of Walks and Hits Per Inning Pitched that a team’s pitchers give up. This is critically important because it is essentially the defensive counter metric to OPS. It is the ability of a defense and pitcher to prevent runners from getting on base.  As such, it will be weighted heavily in this equation. Again, having both ERA and WHIP may be counter-intuitive, but teams are being stratified against one another equally, but teams are being stratified equally and the two metrics as we have discussed measure different events. So it shouldn’t prevent the end result from being accurate. 

Weighted Percents:

Unfortunately, without a regression offering the statistical significance and coefficient, I need to use my narrative to determine the weights. After a few conversations with people telling them my idea (some of which liked the idea as a tool for projecting a teams success, others subscribe to the win-loss being the end all be all) I think I’ve come to a balance. I wanted to have offensive and defensive metrics weighted equally for the sake of the overall rankings, so below you will find my weighted breakdown of each category.

  • ERA: 20%
  • WHIP: 20%
  • FP: 10%
  • BA: 15%
  • OPS: 35%

Data Set: Accurate as of 6/1/2018

Google Docs view for those that cannot see the data set:


Results – Power Rankings:

Using the weighted averages and the data set above, I created a composite score that averages the rankings of each team. The team with the lowest score rounded to the nearest hundredth will be ranked highest. Example: Houston’s ranked a 4.3 based on my metric, and as such are ranked first. To the right of their ranking, I put their current record as a way to determine how well the metric is correlating with their record, and whether or not teams are overperforming or underperforming (based on the metric). Below are the results.

  1. Houston – 4.3 (36-22)
  2. Boston –  4.35 (39-18)
  3. Chicago Cubs – 6.8 (30-23)
  4. New York Yankees – 8.15 (36-17)
  5. Washington – 8.2 (32-24)
  6. Cleveland – 8.25 (30-25)
  7. Tampa Bay – 8.5 (28-27)
  8. Atlanta – 8.95 (34-23)
  9. Seattle – 9.15 (34-22)
  10. LA Angels – 10.45  (30-27)
  11. Pittsburgh – 10.65 (29-27)
  12. Milwaukee –  11.45 (36-21)
  13. Detroit – 13.9 (27-30)
  14. Philadelphia – 15.35 (31-23)
  15. Arizona – 15.55 (28-27)
  16. St. Louis – 15.9 (30-24)
  17. LA Dodgers – 15.95 (26-30)
  18. Colorado – 16.05 (30-26)
  19. Oakland – 16.4 (29-28)
  20. San Francisco – 16.95 (26-30)
  21. NY Mets – 18.15 (27-27)
  22. Minnesota – 20.5 (22-30)
  23. Kansas City – 20.5 (20-36)
  24. San Diego – 22 (25-33)
  25. Chicago Sox – 22.1 (16-37)
  26. Cincinnati – 22.55 (20-37)
  27. Toronto – 22.6 (25-32)
  28. Miami – 24.65 (20-36)
  29. Texas Rangers – 24.85 (24-35)
  30. Baltimore – 26.05 (17-40)


  • The Rankings System seems to correlate well with team performance.
  • Including both ERA and WHIP was justified based on variations seen in Cleveland, Colorado, Tampa Bay, Atlanta, NY Yankees, St. Louis. These illustrate the value of having both metrics, but as expected, most WHIP statistics strongly correlate with ERA.
  • Tampa Bay Rays are underperforming according to the Rankings System.
  • LA Dodgers are underperforming according to the Rankings System.

The Way Forward:

  • The next step for this system is to run regressions to more accurately weigh each category and determine if any other statistics should replace or be included in this stratification methodology. 
  • Take this stratification methodology and build on it to project teams likely overall records and success in the future. 

Shoutout to the Mercer University Weighted Average Grade Calculator.

One thought on “Analysis: MLB Power Rankings, A New Statistical Stratification of Analyzing and Projecting Team Performance

  1. Pingback: Analysis: Assessing the Effectiveness of the CGMs MLB Statistical Stratification (Power Rankings) – Couch GMs

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s