Outlier and Clustering Tools: Uncovering Important Weather Factors in the 2021 Texas Power Crisis (2021-US-30MP-819)
Mason Chen, Stanford OHS
The 2021 Texas power crisis was the costliest disaster in the state’s history. This presentation implements JMP outlier and cluster tools to uncover which weather indicators had the greatest effect. Air and dew point temperature were identified as the most crucial factors in the power outages as they directly affected the bursting of pipes. To compare the weather patterns across previous years, cluster sampling was used to collect 12 different variables for each day of February in Houston from 2012 to 2021. Although the quantile range outliers detected that air temperature was the main outlier in 2021, the analysis assumed all weather variables were independent (but humidity depended on both air and dew temperature). The principal component statistical process control chart similarly showed inconclusive results, as the eigenvalues considered the situations spanning all 10 years instead of revealing the main causes specific to this year. Cluster variables helped identify the most representative variables, and dew point (which could not be seen in the outlier analysis) was now observed in the SPC analysis, which was aligned to the scientific research. The heat map and the score plot were further used to assess the difference between the 2021 climate and the past decade.
Speaker | Transcript |
Mason Chen | Hi, everyone. Thanks for tuning in. I'm mason and today I'll be presenting a study on important whether indicators, such as humidity, |
temperature and dewpoint levels, in the 2021 Texas power outage in order to help prevent similar disasters from happening in the future. | |
So the 2021 outage crisis was the costliest disaster in Texas history for not only local residents, but also other places in the United States that depended on the natural gas and oil from Texas, | |
which has the largest processing capacity of natural gas, and all the states of the nation were impacted by the crisis. | |
So we wanted to see if we could help prevent future similar disasters. Although there are many political and financial factors that led to the crisis, | |
we also wanted to consider the scientific and environmental factors at play in order to evaluate and approach this event more holistically. | |
To do this, we decided to study Houston's historic weather pattern every year from 2012 to 2021 in February, because Houston was one of the hardest hit cities. | |
We'll be using outlier methods and clustering tools in order to understand how rare the weather situation was during the 2021 season, | |
as compared to past years, and also to help identify the most important weather markers for future potential crises through our understanding of both the scientific and environmental factors involved. | |
So before we jump into our overall framework of the presentation, we wanted to give a bit of a background of the event and the motivation behind this project. | |
So the Texas power outage crisis occurred in February 2021, and many sources attribute the event to the state's failure to prepare their electrical systems beforehand. | |
On this year actually wasn't the first time that Texas experienced major power outages. Ten years ago in 2011, also during February, the Groundhog Day blizzards saw many cities' natural gas power plants completely freeze up. | |
Texas issued ruling blackouts for about three quarters of the state. | |
The North America...North American Electricity Reliability Corporation attempted to negotiate plans with the Electric Reliability Council of Texas (ERCOT for short) | |
to upgrade the infrastructure of the state's power plants, which was mostly ignored due to cost considerations. Many oil producers accepted the risk of their power plants freezing | |
up, usually two to three days every year, but the risks and frequencies have become greater due to climate change, which makes weather more unpredictable. | |
When a series of winter storms swept across Texas in February this year, the power plants were not able to withstand extreme freezing conditions because they were not prepared in advance. | |
And many sources of energy stopped working, including renewable energy, like wind power, and nonrenewable ones, like coal and natural gas. | |
In response ERCOT could not deal with equipment problems so they instituted rolling blackouts across the state. | |
In extreme weather, many crops were also destroyed leading to food shortages, and a lot of citizens were left freezing in their homes and suffering from hypothermia with the continued power outages. | |
So here's a very brief timeline of what happened over a span of about two weeks. So there were quite a few storms that hit Texas in early February, one of them winter storm Uri, which plunged many places in Texas into an even more dire situation. Shortly afterwards, | |
demand for electricity hit 69,000 megawatts, a state record, beating the previous record of 65,000 megawatts. | |
And the day after, ERCOT began instituting power outages as result both from the damage from vulnerable power equipment, as well as increased power demand. The rolling blackouts finally ended on February 19. | |
Here you see the images of the winter storm from satellite images. The one on the left is taken right before the first wave of blizzards hit Texas. | |
The second taking during the pinnacle of the situation. So I particularly showed Houston because it has the greatest contrast of all the cities, as it was hardest hit, | |
which is also why will be focusing on Houston. You can really see the loss of electricity in mid February as compared to the end of February. | |
So after understanding the far-reaching effects of the situation and possible causes of the outage, we wanted to find out if there were certain weather indicators that would direct us to a more | |
specific environmental causes of the outage and could be used in the future as a reference to prepare for crises earlier by paying attention to the important weather indicators early on in the winter season. | |
Like I said, we'll be using Houston's weather information for the project because it was one of the hardest hit cities. | |
First, we will try to understand the science behind these indicators, their differences and what these indicators mean. | |
And then we'll use a variety of outlier clustering tools to study which of these weather markers were the most different as compared to the past 10 years during February in Houston and also compare how unique the February 2021 climate was. | |
And throughout the presentation we'll be trying to connect statistical analyses and the results to environmental science to help us determine which weather factors contributed the most to the 2021 Texas power outage. | |
One of the most immediate questions that we had was what specific environmental factor contributing to the crisis. So we know that the storms and lack of winterization were general factors, | |
but not all storms lead to freeze ups of vulnerable power equipment. So we wanted to know what specific weather factors lead to the breakdown to the power system, so you can know what to prepare for the future. | |
We decided to look at this through Houston and compare Houtson's weather factors to other cities in Texas that we're not hit as hard. | |
We picked Dallas because it was very close in proximity and both use the same power system, so it also serves as a good basis for comparison. | |
On the right is the weather statistics for the month of February in both Houston and Dallas. Interestingly, if you look at any marker of temperature, whether that be maximum average or minimum temperature, | |
Dallas always has the lower temperature. | |
So we expect lower temperatures to increase the risk of equipment freezing up, but that doesn't seem to be the case here. So instead the dew point temperature seems | |
to be the main contributor, so if we have higher dewpoint temperature in Houston than in Texas, and so that makes us wonder if dewpoint temperature might be the most important indicator of the risk of freezing. | |
So what is dewpoint temperature and why is important and how is it different to our regular measure of temperature? | |
The temperature specifically...air temperature is the measure of the speed of air molecules, so the faster the molecules move, | |
the greater the kinetic energy and the higher the temperature. Condensation depends on the temperature of the air. So when there's lower air temperature, water molecules move more slowly so that attractive forces of the water molecules cause them to stick to each other. | |
Now relative humidity, or just humidity, is a measure of how much energy available for evaporation has been used to free those water molecules from each other to resist condensation. | |
So if we had a relative humidity of 25%, that means 25% of the energy available has been used to evaporate water from bodies of water, such as lakes, and 75% of the energy is available to do more evaporation. | |
Now relative humidity is a bit misleading because it's relative to air temperature and dewpoint temperature. It's not itself an independent measure of moisture. | |
So the dewpoint is the temperature at which condensation first begins. So when the air temperature drops below the dewpoint temperature, that's when condensation starts. | |
And unlike humidity, when dewpoint temperature increases, it is only because the amount of moisture in the air increases. So, in other words, dewpoint temperature is an independent measure of how much moisture is in the air. | |
But on the other hand, humidity depends on the difference between air temperature and dewpoint temperature. | |
So when dewpoint temperature is equal to the air temperature, the relative humidity is 100%. | |
The greater the difference between the deewpoint temperature and air temperature, the lower the relative humidity. So if relative humidity changes, it can be because of temperature changes or moisture changes or both. | |
Now, how do air temperature, relative humidity and dewpoint temperature relate to atmospheric pressure? The dewpoint is independent of atmospheric pressure. | |
??? says that pressure varies directly with temperature. So if you have a gas and the molecules will move faster and climb more frequently within the walls of the container, increasing the pressure. | |
Humidity is also directly related to pressure, because if you have a fixed amount of vapor in the air and you condense the volume of air, the pressure will increase and overall percentage of moisture in the air, which is relative humidity, will also increase. | |
So based on this information, you would think dewpoint temperature and air temperature are the two most important variables in determining freeze up risks because | |
both the amount of motion air and coolness of the weather are both important. Now let's see if that's actually reflected in the weather statistics. | |
Okay, so now we're equipped with more knowledge about weather...about whether science and what those indicators mean, | |
we can dive into statistical analysis and examine whether or not an increase in moisture levels or dewpoint | |
is largely responsible for the 2021 weather conditions. So to collect our data, we use clustering sampling, which is a type of sampling method in which we divide the population into clusters and then select four clusters ???. So in our case, we chose the month of February | |
from the past 10 years, because outage occurred in February this year, and you didn't want to consider the month-to-month and season-to-season variations in our sample. | |
And we want to collect data from all past 10 years, and not just the previous year, because the weather may be getting worse and worse due to climate change. So this year has the worst season, so far, and the contrast will be more obvious if we compare it to more years. | |
We collected the daily weather statistics in February 2012 to 2021, which means a total sample size of 283 days, because there's a minimum of 28 days in February and an additional three days for the leap years. | |
And for each day we collected the maximum average and minimum temperature, dewpoint, humidity, wind speed and pressure, and and we also collected precipitation. | |
We collected the maximum average and minimum because one might be a more important accurate measure than the other, which we also don't know yet. | |
A sample size of 283 days also passed the power test, power being the probability of correctly rejecting the null hypothesis. So we don't really need to collect even more data. | |
So we wanted to see which of the weather indicators | |
have outliers. Just if a parameter has many outliers, the more likely that indicator is more easily swayed by extreme weather conditions, which would make it a more important reference for future situation. | |
So here we ran the quantile range outliers for all 10 years. The outlier criteria is determined based on interquartile range, | |
IQR, which is the difference between the Quantile 3 with the 75th percentile of all values for that variable and Quantile 1 to 25th percentile. So the lower outlier threshold was determined as Quantile 1 minus 1.5 times IQR, while the upper outlier threshold was Quantile 3 plus 1.5 times the IQR. | |
We have quite a few outliers for maximum... | |
maximum humidity percentage and precipitation. | |
But these outliers are from all 10 years and we're mostly interested in the days when the Texas power outage ??? | |
so only a few weeks in February 2021. | |
So now we will instead focus on the period from February 2 to February 25, 2021, which is roughly the time period of outage crisis. | |
So, once again, this only considers the data from Houston. So from just these three weeks, we can see that the temperature has the most outliers of all the weather parameters | |
and humidity has one outlier. Nevertheless, the main limitation of the quantile range outlier tool is that it fails to consider multicolinearity. | |
And from our scientific research know the humidity depends on both dewpoint and temperature. Outlier analysis may not be the most accurate tool and dewpoint may still be one of the more important factors. | |
For an alternative method, we tried using cluster variables, which groups the different weather parameters into different clusters. | |
So we wanted to see if the cluster variables tool could help us identify a single cluster that consisted of the most important weather indicators for the 2021 power outage. | |
So the color map on the top left, | |
it | |
graphs each of the weather parameters on the y axis and then each of the parameters again on x asis and plots their correlation. | |
So the bluer the cell, the more strongly negatively correlated the two parameters are with each other, and the redder the cell, the more positively correlated the parameters are with each other. | |
On the color map, we can visualize the four different clusters, which we can also identify in the cluster members table. | |
Cluster 1 consists of all of the temperature and dewpoint parameters, suggesting that this cluster may consist of the most important weather factors of the 2021 power crisis. | |
Cluster 2 consists of wind speed and precipitation, Cluster 3 all humidity and Cluster 4 all pressure. So the last few clusters | |
don't really give us much insight as we're not surprised that all the humidity, pressure measurements belong in their own cluster and wind speed and precipitation correlated. So the key takeaway from this analysis is that dewpoint and temperature may be the most correlated. | |
So far, we know that dewpoint and temperature might be the most important weather indicators, but how does the 2021 weather pattern compare with the previous climates more generally? And can our current weather parameters detect ??? weather in Houston | |
??? major outlier? | |
So the top chart is the cluster summary from our previous cluster reports analysis and the bottom two charts are the multivariate statistical process control charts. | |
So we first ran a principal component analysis to reduce our dimensions from the 16 variables. They include all our weather parameters in our principal component analysis. We need five principal components to account for more than 80% of the total variation. But we can't really distinguish | |
the 2021 situation from past weather situations if we use five principal components. You can see that the red dots, which are when outage crisis occurred, are barely above the red line, which is T square of our upper control limit and not all not at all ...??? | |
So this graph shows that we are unable to detect the true pattern, instead rely too much on eigenvalues in explaining the weather situation. So we then tried removing the second ??? cluster, | |
because out of all the clusters, it explains the least variation. | |
And we redid the principal component analysis. So this time, only four principal components are needed to account for at least 80% of the total variation, but we still cannot clearly distinguish | |
the weather situation during the 2021 power outage crisis from the other time frames. So how else can we try to increase the signal and reduce this whole noise to better identify the 2021 weather situation from the others? | |
Well let's go back to our cluster variables result and see if we can screen for the most representative variables. So let's try | |
choosing the most representative variables of each cluster based on the cluster summary, | |
which is determined from the R square with one...with own cluster and one minus R squared ratio. | |
R squared with own cluster represents how similar each of the variables are with the center of the cluster. | |
So this value cannot tell you the direction of the variable as compared to center, but when the distance is close enough | |
between the variable in the center, the direction isn't as important. Interestingly, the most representative variables are the averages, not the minimum parameters as including them, which is be kind of redundant. | |
After screening based on the cluster variables to just average temperature, average humidity, average dewpoint temperature, and average pressure, we only need two principal components to account for 80% of the variation as seen in the T squared chart on the right. | |
And here, you can see just one major spike, | |
which consists of the days during the outage crisis. So by using both the clustering variable and prinicipal components, we can successfully distinguish the weather situation. | |
during the 2021 power outage. So can the revised T squared chart tell us anything about which weather parameters are the most important indicators of the weather situation of the 2021 outage | |
crisis? Do the results match those from the cluster variable analysis in the prior scientific research? | |
We wanted to study the contribution proportion of each of these four variables, but to do this, we can only pick just one data point. | |
So we chose sample number 270, which has the worst weather situation in 2021, because we care most about which parameters are the most important in the worst case scenario. | |
You can see from the contribution proportional plot on the right, the average temperature in dewpoints are the two largest contributors and there's little contribution from pressure and no contribution from humidity. | |
So this result may indicate that temperature and dewpoint are the main contributors to the weather situation, | |
which not only backs up our cluster variable analysis, but also can be explained by scientific research. So air temperature still provides the greatest contribution, but dewpoint, which measures moisture is also a huge factor in the freeze up. | |
Previously we needed four clusters to account for 90% of the variation in principal component analysis, but we now need just two after choosing the most representative variable from each cluster, for a total of four parameters, | |
which confirms that screening is beneficial to enhancing the signal to noise ratio. | |
To choose which ones are enough, based on the Pareto principle, which states that the vital few 20% of the causes account for 80% of the results from a given situation. | |
If you examine that eigenvectors more closely, you can see that the first eigenvector consists of all of the four parameters quite evenly, as their magnitudes are all around 0.5 over the second principal component mainly consists of temperature and humidity, | |
not dewpoint. This might indicate that dewpoint has not been an important factor in all the past 10 years, so the logical next step is to consider the principal component analysis on just 2021. | |
However, the sample size would then be too small, as we only have 20 data points in February 2021. | |
Although we don't have a large enough sample size to run principal component analysis on just 2021, let's see if we can visualize any difference between the most important parameters for the last decade, | |
as compared to just one. So on the left side, we have the heat map for the contribution proportions across last decade, and the one on the right is just this year. | |
Darker fields indicate a greater deviation and in more significant contrast from the normal situation. So if you look at the graph for February 2012 to 2021 we see that temperature and humidity percentage are the most important factors, based on the darker cells. | |
Humidity may be important because of temperature as humidity depends on both temperature and dewpoint. So this again has the multicolinearity issues, so if we look ???. | |
On the other hand, if we look at just the weeks around the power outage crisis, which is shaded green | |
on the right hand side, you can see that temperature and dewpoint contain the darkest cells. So once again this shows that dewpoint may be more...a more important indicator in the 2021 outage crisis but may not have been historically for the past weatehr situations. | |
So you have this initial process control chart, we can tell that the weather situation for the power outage crisis is an outlier, but how rare and unique was it? So the square on the bottom of this graph | |
graphs the two principal components. It includes different confidence ellipses. To the point of the top left is February 15, 2021, | |
when one of the most disastrous storms hit Texas. You can see that the point lies outside of the 99.5% confidence ellipse, which is the orange line. | |
But inside of the 99% confidence ellipse, which is the blue line. So that means the weather situation on the 15th occurs once every 200 days, and since there are 28 days in February, the probability of it occuring based on past data is equal to 28 days divided by 200 or | |
one in seven years. | |
??? the probibility of future occurrences of a similar situation...weather situation may be more probable and may be even worse with global warming. | |
Also the four points lie in the top left quadrant, which is opposite to the dewpoint and temperature vectors, showing once again that dewpoint and temperature are main contributors to the extreme weather conditions of the 2021 power outage crisis. | |
Another way we can compare the 2021 | |
situation with the normal weather situation across the past 10 years is to compare the February 15 contribution plot to the center point of the score plot, | |
which kind of serves as a median for...of the 10 years. So the top right graph is sample number 270, which is the day with the worst weather situation during the power outage crisis. | |
And the bottom graph is sample number 131, which is the center point of the score plot. So interestingly, sample number 270 has main contributions from temperature and dewpoint, while sample number 131 just has pressure. | |
So this graph shows us that it's not just dewpoint that is becoming a more important factor in unstable weather conditions but temperature has also become more important, possibly due to global warming. | |
The temperature and dewpoint are now more important indicators of the weather situation than pressure and humidity. | |
So what is the Texas government doing to prevent future crisis of power outages? | |
A few days after the outage ended, Governor Abbott addressed the public, calling for winterization of power equipment. | |
And the House proceeded with some new legislation for reforming emergency procedures. However many believe that the new legislation is not adequate to prevent future crises as the temperature | |
threshold for extreme weather conditions is actually lower than that of the freezing conditions during the 2021 crisis. As of March 2021, the State House has not passed any laws | |
that require power plants to be winterized. One former strategic advisor for ERCOT believes that key to preventing outages is to plan for the future and really consider climate change as a huge factor in ??? | |
preventative | |
measures. | |
So in conclusion, we studied the science behind dewpoint temperature, air temperature, and humidity, and pressure and how they were related and connected to each other. | |
The quantile outliers tool fails to consider multiple issues and we know that humidity has on dewpoint temperature and | |
air temperature. So we used cluster variables to see if we could identify the most important weather factors, based on the clusters. | |
We optimized the statistical process control chart by using the principal component analysis. We chose the four most representative variables, which was humidity, dewpoint, temperature and pressure. | |
We also studied the contributions of these variables, more generally using the heat map and, more specifically using the contribution proportion plot for sample number 270, which had the worst weather situation, and comparing these results with center point on the score plot. | |
We found that | |
temperature and dewpoint were the important weather markers for future instances of similar disasters, especially considering the increasing unpredictability of the weather that accompanies climate change and global warming. That's all we have for today. Thanks for listening. |