cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
The Discovery Summit 2025 Call for Content is open! Submit an abstract today to present at our premier analytics conference.
Choose Language Hide Translation Bar
View Original Published Thread

A new perspective on inequality analysis: Understanding bias in MLB home runs using the Gini coefficient

In this blog, we explain the Lorenz curve and the Gini coefficient, which are used to analyze income and savings disparities in the economic and social fields. We also introduce an example of using JMP to visualize the Lorenz curve and quantify it with the Gini coefficient to analyze the bias in the number of home runs played by each team in Major League Baseball (MLB).

Indicators of inequality: Gini coefficient and Lorenz curve

The Gini coefficient is a well-known indicator for quantifying data bias and inequality, and the Lorenz curve is used to visually represent that inequality.

The Gini coefficient is also known as an " inequality index " and is used to indicate the degree of inequality in various areas, such as income inequality and educational inequality.

The figure below shows the Lorenz curve (blue) that visualizes income inequality in households. The horizontal axis shows the cumulative household share, and the vertical axis shows the cumulative income share.In the case of equality, with no income disparity, the Lorenz curve coincides with the dotted 45-degree line (called the " perfect equality line "). On the other hand, in the case of large income disparities, that is, inequality, the Lorenz curve moves away from the 45-degree line.

undefined

The relationship between the Lorenz curve and the Gini coefficient

The Gini coefficient is calculated based on the area enclosed between the Lorenz curve and the line of perfect equality. It is defined as the area enclosed when the area of ​​the right-angled triangle in the lower right corner is set to 1, and is calculated using the following formula:

Gini coefficient = 1 - 2*(area under the Lorentz surface)

Since in the case of perfect equality the Lorenz curve is equal to the line of perfect equality, and in the case of perfect inequality the Lorenz curve forms a right-angled triangle, the Gini coefficient can be interpreted as follows:

  • Gini coefficient 0 : Complete equality (everyone has the same income and resources)
  • Gini coefficient of 1 : Complete inequality (one person has everything)

By the way, the Gini coefficient for income in Japan has been increasing year by year, from 0.433 in 1990, to 0.526 in 2003, and 0.570 in 2021.* These figures indicate that income inequality is widening in Japan.

*The figure presented here is the "initial income Gini coefficient," which is different from the "redistribution Gini coefficient," which is based on income after taxes and social insurance premiums have been paid.

As mentioned at the beginning, the Lorenz curve and Gini coefficient are often used in the fields of economics and society, but they can also be used as indicators to measure inequality and bias in other fields.

So, this time we will introduce an example of calculating the Lorenz curve and Gini coefficient using MLB data in JMP.

Example of the use of the Gini coefficient: bias in the number of home runs in MLB

We analyzed the bias in the number of home runs hit by batters from the four teams that participated in the World Series and League Championship Series during the 2024 MLB regular season.

The histogram below shows the distribution of home runs hit by batters on each team.

undefined

If you know anything about baseball, you can tell who the Dodgers outliers (50-55 homers) and Yankees outliers (55-60 homers) are, right?

Using the Dodgers example, here are the steps to calculate the Gini coefficient and create a Lorenz curve in JMP:

1. Prepare the data and create the formula

Below is the data that the histogram above is based on, which contains the columns "Homeruns" (number of home runs) and "N" (number of players). From this, we will use a formula to create the columns shown in yellow.

undefined

The following formulas are applied to each of the yellow columns*:

  • Column " Cum_N ": Shows the cumulative percentage of people. Used as the horizontal axis of the Lorenz curve.

undefined

  • Column " Cum_Homeruns ": Shows the cumulative percentage of home runs. Used as the vertical axis of the Lorenz curve.

undefined

  • Column " trapezoid ": The area under the Lorentz surface is calculated using the trapezoid rule.

undefined

  • Column " GiniCoef ": The Gini coefficient is calculated from the area of ​​the trapezoid.

undefined

The calculated Gini coefficient is 0.584.

*These formulas can be combined into a single formula, but for the sake of clarity, we have chosen to use multiple columns for the calculation.

2. Creating the Lorenz Curve

Create a line graph using the newly created columns "Cum_N" and "Cum_Homeruns" in "Graph Builder."

undefined

Here, the 45-degree line (black dotted line) can be displayed by right-clicking on the graph, selecting [Customize] from the menu that appears, and writing the following script.

undefined

In addition to the Dodgers, we performed similar calculations for three other teams, the Yankees, Mets, and Guardians, and the results with the Lorenz curves overlaid are shown below.

undefined

The more downward the Lorenz curve, the greater the bias. We can see that the Guardians (red line) have a smaller bias than the other three teams. Looking at the Gini coefficient, the Dodgers, Yankees, Mets, and Guardians have the largest bias in order.

The Dodgers and Yankees, who advanced to the World Series, had a few players who hit an overwhelming number of home runs, which is a major factor in the bias.

In this blog, we have shown an example of using the Gini coefficient and the Lorenz curve to visualize and quantify the bias in the number of home runs in MLB. This method can be applied to data analysis in not only economics and society, but also sports and other fields.

The JMP data table that calculates the Gini coefficients for the four teams is attached to this blog for your reference (MLB2024_4teams.jmp).

by Naohiro Masukawa (JMP Japan)

Naohiro Masukawa - JMP User Community

This post originally written in Japanese and has been translated for your convenience. When you reply, it will also be translated back to Japanese.