Hello , my name is Don Lifke , and I'm with Sandia National Labs , I'll be presenting on Reliability Platform in JMP . The example I'll be using has to do with jumping into retirement . Sit back and enjoy . And let's see where this leads us .
A little bit about Sandia. We are , federally funded research government owned, contractor operated facility . I 've been at Sandia since about 2005 , roughly. I'll tell you a little bit about what we do at Sandia. We work primarily on nuclear deterrence . Six major programs that we're working on right now, I won't go into the details of these , but all of these programs that you see on my screen are programs that I worked on and have actually applied JMP to all of these as well . The reason I'm presenting data on retirement is because a lot of the stuff that I do is just too sensitive and I can't present it in this particular environment . So, we're going to use some fun data that everybody can relate to . Naturally , this is a recording , so you hold your questions to the end .
But this is the same presentation I'll be giving live at the conference . L et's talk about retirement . Why is it something that is so certain but has so much uncertainty in it that makes it a little bit hard to plan ? I'm not sure I would want to be without that uncertainty . I prefer a little uncertainty in that . It does make planning for retirement a little bit difficult . But I guess on the bright side , if I were a cat , it would be even more complicated.
By the way , that the GATO here is a repairable system , and that's a whole new topic in general . We're just going to be doing systems that are not repairable , like cats . What we can do is we can use the Reliability and Survival tool in JMP. Some of the screenshots that you see maybe are from an older version of JMP. This was created pre-COVID . Now that we are on newer versions of JMP there might be some slight differences in what you see and what I've read from the screenshots , but should be fairly similar anyway .
But we're going to be using the Reliability and Survival and the Life Distribution and Fit Life by X in JMP to do some analysis of some data. I'm not sure about where data is coming from . At Sandia , we have a Lab News that comes out, every couple of weeks and in that Lab News they like to post retiree deaths of the people that we worked with for a long time, just to let us know that they've moved on and gone to a big R&D facility in the sky .
So, I thought I could use that data to help me plan for my retirement . I grabbed some of that data and I took data from four different periods , of Lab News archives . I pulled some data from 2001 , 2007 , 2013 and 2018 . The number of data points that I grabbed from each of those is here . We'll look at some of these data and see how things are changing through time as well . But anyway , the number of data points is pretty significant .
This little clip , this little picture on the right is a little trivia there . If you look real closely at the violinist on the left , some of you might recognize him or let's leave that up to you to figure out . Maybe I'll sing it . It's actually my partner , Claire . She's a pharmacist . But also a beautiful opera singer .
T he nice thing about using the retiree death data for me is that that population is a better representation of my lifestyle . They tend to take less risks , lead little more conservative lifestyle and a similar income and education level , of course , where the best and the brightest are left at the probably at the bottom of that list , but at least I'm in that group . It's an honor to be working with the folks at Sandia . Another nice thing about using the retiree death data is, it only includes those who actually made it to retirement . I don't really care about the data for not making it to retirement . And I apologize to my kids . I'm in Albuquerque , New Mexico right now .
For those of you who love Breaking Bad and know that Breaking Bad was primarily filmed here in Albuquerque , and so I'm a big fan of the show , so I've got to put a little bit of fun stuff in the presentation . That's why I threw a little bit of Albuquerque reference in there for those of you who are Breaking Bad fans as well . And some Better Call Saul stuff to .
Let's go right into JMP and start doing some of the analysis . I'm going to bailout of that and I'm going to open up my data file . Let me see some screens around here .
It'll take a second to load on your screen . You should be seeing my data file . What you see are the columns of age in the Lab News . That's the newspaper that I got the data from. The bi- weekly date and I broke that down to year . And these other columns I'll talk about in a little bit . Right now , let's just look at the age and the year . Let's just look at the distribution of the data and see what it says .
If we analyze the distribution of age , I'm just going to analyze distribution of age in general . If we look at that , this is what the distribution looks like . You can see a sort of a skewed left distribution and we can really see a little more detail on this if we go into some of the display options in JMP. This data actually best fits Weibull distribution , a two parameter Weibull distribution , which I'll show on some of the future screens . But what we want to do is take this data and use the Life Distribution to look at it .
I'm going to drag over my PowerPoint here and show you that I set these distributions in the interest of time rather than just actually doing it . L ooking at the distributions of the four years , the four categories of years , which are , 2001 , 2007 and 2013 , 2018 , all of these fit a Weibull distribution fairly well . I wanted to check my assumption when I start analyzing the data in Life Distribution .
Then I fit them to a Weibull distribution . Let me open up ... Let me give you a little bit of background on the Weibull distribution , that will help you understand some of the data that I'm going to show . This is actually a Weibull distribution... Weibull distributions generated using JMP's formulas , actually . I've taken distributions with different health plans and different pay DOEs just to show you what the alpha betas do with the Weibull distribution . Let me run my script on this and show you what these look like .
Basically, I'm just doing a Fit Life by X in JMP , and I save the script to the data table . This is primarily just to show you what happens with Weibull distributions when you turn on a local data filter here and show you what the parameters do with different alphas and different betas . If you have Weibull distributions where... I'm going to choose the three that have a beta of one. The beta Weibull distribution, essentially determines this is the spread of the distribution you think of in terms of standard deviation .
When you plot the data on a Weibull curve , you will see that they all are parallel lines and they're basically just scooting across the X axis here . The beta is the slope of this line , which is the spread of the data . If I look at three curves that have the same alpha but different betas , you'll notice that they're all centered at about the same point . What the beta is doing is changing the spread . Alpha is changing the location of this.
They all across this point , which is actually at 0.632 . I'll talk about that here in a little bit . But though the characteristic lifetime is the alpha , and that's the point where your line crosses 63.2% , which comes out to be one minus one over 80 if you want to get into the proof of it . The beta is the spread of the data, the alpha is the location of the data. Scale and location, it's sometimes what they call it . Just a little background on that . I'll swing this PowerPoint back over here and show you and summarize slide what I mean .
These plots have the same alpha . They're all centered at the same place . The three that have the smaller rectangle around them are the same beta . So, they have the same spread , but they're located differently because they have different alphas . Just a little brief tutorial on what the beta distribution parameters do to the curve .
Let's just look at all the data on a close up. Example , probable curve data , and let's just look at the age parameters . If we analyze the Reliability , Survival and just look at the Life distribution of age ... What we see is linear scales on the X and Y , but we can actually determine which distribution system fits here . And what we find , of course , is that the Weibull fits best . This is what the Weibull distribution looks like, on all of that data crammed together . Using the distribution profiler , I can actually manually scoot this over and say, I want to be 90% sure I don't run out of money .
I better plan on a retirement age of roughly 92 or so . About 92 to be 90% sure I don't run out of money every time . That was really the focus of this day in trying to help me plan my retirement age . You know , they always tell you how long you want to plan your retirement for . Of course , we don't know . Usually we just take a wild guess , but this is a little bit of data driven decision making going on here .
What if we look at these through the four different years , we can use the same platform , the Reliability Survival or Fit Life by X. We can look at the age versus the actual year here . Let's see how the years look differently . Let me turn on the density curves here so you can see where those fall . I want this to be a Weibull distribution . We can also turn on quantile lines to see how things change through time . This is the 10, 15 , 98% quantile line .
It looks like maybe maybe from 2001 to 2007 things have change a lot , but about 2013 , maybe the people are living a little bit longer based on the 2013 and 2018 data . We go down here and look at the details of this Fit Life by X . You can see the plots all separately here . These little profilers are fun to mess with here . You can see the age , the probability of failure . Looks like I've got the long run and the probability of failure is... I'll just put a 90 here .
But we can see that it looks like we are getting a little bit healthier because this Probability is going down through time , at least for 90 years old . There's a lot of stuff in here that you can tinker around with . I don't have the time to show you all of that , but what I want to get down to here is ... This location scale . This little test here is telling me , are my locations different , assuming we don't have the same data , the same failure mechanism , which in theory we do , right ?
We all have the same failure , but the physics of the human like failures should be constant . And so our data shouldn't change much . But assuming we have the saving betas and the alphas changing , in other words , is the location of these changing ? And the data says , yeah , that we're rejecting the null hypothesis that the data are not scooting across this X axis through the years . There's a change . Now looking at the location and scale , it looks like that's marginal .
I'm going to talk more , when we get down to the Weibull here, the actual Weibull data . The Weibull looking at is there a difference in the beta , in the slopes or in the spread of the data year to year. It's right on the edge of being rejecting the null hypothesis that they're located , that they're actually the same on the slopes . If you want to look at different distributions through the years , you can actually do a statistical test to see if things are changing through time , the distributions of your reliable data .
Right . So, what am I forgetting ? Let me slide back over here to the PowerPoint . What I'm not doing , I'm not considering censored data . I don't have the data for everybody who's still alive . The retirees are still alive , but I just don't have access to it . So, I thought , well , what's that going to do to my analysis ? Well , I can play around with the data I have and go back in time . For example , I can go back to 2007 and I can treat the 2013 and 2018 data as censored data because I know that those retirees are still alive . Just to see how it affects my analysis , the fact that I'm not including censored data . I did do that . I went back and messed around with that . I'll show you a little bit of it , but I didn't really find out much . As our great friend George Box that models around the practical questions , how wrong do they have to be to not be useful ?
And I really didn't find anything useful in doing that . But I will show you at least what I did . To convince myself that not having that the data for the retiree still alive, is not a problem . If we look at the ... Let me tabulate my data real quick . If I look at the age , the year of the data versus whether or not I suspended the data , what I did is I took the 2018 data and suspended it and worked my way back to 2013 .
We did an analysis on 2013 using the three years of data and then using the 2018 data as suspended data . Of course , I had to take the data from 2018 and I had to subtract five from it because that would be the age in 2013 . I n order to include that suspended data , I had to adjust their age accordingly as well . But I also did that for 2007 . Let me show you in tabular format , I created a column called Suspend 2007 where I took the data from 2018 and 2013 and also suspended it .
I took out my data of known deaths and half my data as suspended data , people who are still alive . You can see that ... Let me show you what that did to the data . For the age on 2013 data still alive , you can see that it's basically the 2018 data minus the five years . If I look at the actual age here . If we look at the mean age , basically this is the mean age for the 2018 data . By suspending , I basically just subtracted off the five years here . And treated out the suspended data .
And the same goes for the 2007 data . T here instead , I believe , I simply just subtracted off nine years, and five years for the 18 to 2013 data . When I wrote the analysis of those , I'm just going to show you on the PowerPoint slide what it looks like because it works out convenient for me to jump back and forth between these two to show you the difference . What I found was that this was my original data and then this is the data treating 2018 data suspended and doing the analysis of 2013 data .
You can see that the area of interest where I'm really concerned about is where I'm crossing that maybe 90% probability . It didn't change much at all . And then this is throwing the 2018 and 2013 data assets during the analysis based on 2007 data and the 2001 data. You can see that it really doesn't change the curve much in the area that I care about . That gave me the comfort that I really am not missing much by not censoring out the data .
I mean by not having this censored data . How did I calculate my 90% confidence intervals ? I'm going to just show you in PowerPoint how I did this just in the interest of time . I can use the quantile profiler and look at the 5% 95% probability and calculate my 90% confidence intervals . Typically , what you'll see for data and in this case , it's 66 years old, 96 years old . But really what I care about is the upper limit . I only care about my 90% probability . And in this case it was age 94 .
Based on this analysis , I now know that I want to be 90% sure I don't run out of money . I should plan to live to be 94 . This is based on my historical data . Quantifying that uncertainty , this is basically ... A little bit humour tone in .
The next phase in this was to try and figure out ... I'm going to throw in a little bit of bonus material here beyond the Life Fit situation . Here we have this tool called the Pension Tool. We could actually put in the year we're going to retire, and we could also assume a salary increase and a non base , which is, essentially, what you would call a bonus in private industry .
We could put those into this tool and it would spit out our pension, our estimated monthly pension . I thought , well , that doesn't help me much. What I want to do is, I want to reverse- engineer that tool so that I have the profiler to tinker with . I went into the JMP and I created a Response Surface model , and I basically input three different ages at retirement , 62 , 65 and 68 .
I did three different salary increases and I tried three different based awards as well . Let me pull data over here . This is what the data looked like when I ran the experiment . I'm just going to show you the the screenshots from JMP rather than do an actual JMP analysis because I'm like 20 minutes already and I want to make sure I have enough time to cover all of this .
When I created this experiment , you'll notice that the runs are sorted . That's okay because I'm doing a computer simulation so, I don't have to worry about lurking variables like the temperature in the room , the humidity and the operators . I'm going to get the same answer regardless of the order that I ran this . So, it's easier for me to do the tool to run them in order . But when you do that design of experiments , you want to randomize your run . Do not run them in order out of convenience . If you do, you feel, okay, your set of strains will tell you that you can use the how to set easy, how to changes with the change, feature in JMP to make sure you do that properly by using some blocking .
My inputs were all orthogonal . I covered all combinations of the... My age at retirement , the three different ages and the three different month based awards and the three different salary increase percentages . This is the look at the analysis you can do when you're setting up your experiment in JMP. You can make sure that your main effects are only correlated to themselves . And that there's minimal correlation to other variables , in two factor interaction is also a squared term since I did a Response Surface model.
But the net result was a nice prediction profile and I'm going to go ahead and just run this and show you . Prediction profiler . When I set up the experiment , I tell JMP. When you set up an experiment in JMP , it's nice because it gives you these scripts already in your data table . I'm just running the model data table . And JMP says , Oh , I know what you want to model , because I was here when you set up the experiment. And it gives me my Y and it gives me my Xs .
It runs my a nalysis for me . I can now use the profiler JMP, rather than having to put in an age, salary increase, and a non base award into that tool and getting one number out . Now I have the profiler I reverse engineered their tool essentially. I can see what my monthly benefit is going to be versus a salary increase , non base, where you see that non base award doesn't really have much effect. Salary increase , not much effect . And of course , age as we would expect, have the most effect .
What do I do with this information ? Well , what I really care about is my lifetime benefit . And I'm also I'm concerned about inflation . So, I added some formulas to this . I added a lifetime benefit , which is just my monthly benefit times how long I'm going to live . If you want to look at the formula , you can see it's basically my monthly benefit times how long I'm going to live . And that's my lifetime benefit .
That's what I really care about , is how much money am I going to get in my lifetime . I know if I retire earlier , I'm going to get less money . That's a no brainer . But I want to know , is there a point of diminishing returns. W hen I looked at the age at retirement , my lifetime benefit for 80 , 84 and 98 created three separate columns for those data will be the column. All these data wil l be provided to you as well , if you want to tinker with it .
But what you can see is a lifetime benefit for aging . It is actually a point of diminishing returns. Where I might as well just retire at 65 because I'm going to get less money , but I'm going to get it for a longer period of time . Now , as that starts to increase, if my lifetime at age 84 , turns out my lifetime benefit would have been a little bit better hanging on there . And as I get older , of course , I'm better off waiting as long as I can to retire .
But I really don't know . I don't know what that number is . My best guess is 84 . That was the fiftieth percentile on my fit . If I want to be really conservative , I'm looking at 90 . But I also wanted to look at how inflation matters . I looked at the present value, I put a formula in JMP . There's actually a formula for present value . Let me explain [inaudible 00:26:05] I'm going to show it to you real quick .
In JMP you can actually use a formula similar to what Excel has and calculate this present value. And that adjusts for inflation . This is what your money's worth now, relative to what you're predicting , the future inflation rate is going to be annual inflation rate . At 0% this present value is just your lifetime . The number of payments you're going to have through your lifetime times the payment amount essentially , but you get penalized as inflation goes up , it becomes less and less . If I look just at the inflation present value data ... If I just model those , what I noticed was ... I'll make these little bigger so you can see them .
I noticed that when my salary increase is smaller, lower ... This is a two factor interaction . You can see the slopes changing . That was some small salary increases , which are probably not unrealistic in the coming future . There is a point of diminishing return when we penalize for inflation . At lower inflation , there is not that big of a benefit to wait to retire . Once I hit about 65 or so , it's starting to flatten out . Really the difference isn't that big .
If we look at the predicted value , these are not my actual numbers . These were based on censoring the data using a pitch factor of $4,000 a month , which is the typical retiree income . I noticed that as I weigh , it flattens out so it doesn't really pay to wait to retire . I'm using this information to help me decide when I really want to retire . Right now , I'll be 62 in May , so I'm starting to approach the ability to do this .
Probably going to wait till 64 , 65- ish to where the difference in my present value is not that big . That's a little quick bonus on how I use the designer experiments feature in JMP to reverse- engineer this . In summary , I looked at the retiree death data and I set the Life Distribution and Fit Life by X, I looked at four different time periods and noticed maybe we did get a little bit healthier and healthier in 2013 , but by 2018 that have flattened out . I do have some 2021 data that I don't have in this presentation, but next year at the conference in Spain in March , that it's actually pretty flat from 2013 to 20 18 to 2021 . We're really not getting healthier , living longer . That's going to help me with my decision . I took a custom designed experiment as bonus material here , and I reverse- engineered this web based applet that we have and used the profiler to replace that one data point at a time . A really cool thing , you can do in JMP . I would take questions here if were alive .
And with that , I want to say one last thing . I really want to dedicate this presentation to my brother , who I lost a couple of years ago to brain cancer , who was only 15 months younger than me . He was a fellow Sandi and also a fellow JMP user . Some of you may have met him at the JMP conference and so I'm dedicating this to him . Thank you very much .