But Now What Do I Do? Using a Reliability Model to Make Event Predictions in JMP (2021-US-30MP-823)

Level: Intermediate

Bradford Foulkes, Director of Engineering, Optimal Analytics

After spending weeks or months pulling together data and building reliability models, often the feeling is "Now what?" Or maybe the question is, “How many will fail?” In JMP Pro, there is a platform that can answer these questions, with some tweaks. The repairable systems simulation (RSS) platform allows you to enter a reliability model, or a system of models, to see how frequently the event will occur and what the impact could be. In this presentation, I explain how to go from reliability model to event prediction, via a method to automate the generation of the RSS platform. Once you have the output, I show how to build reports to answer the question around event prediction, annual downtime values, and individual models in a system. This presentation covers hands-on examples, as well as how to use JSL to make the platform and report generation easier.

Auto-generated transcript...

Speaker	Transcript
Brad Foulkes	Thank you for attending my talk on But What Do I Do Now? Using a Reliability Model to Make Event Predictions in JMP.
	Very frequently I'm asked...I'm asked once I build a reliability model, what do I actually do with it? And oftentimes people forget that you can use a reliability model to make predictions about when events will actually occur.
	And not just the first event even, maybe the second, third or fourth event that that product might see over time.
	So a little bit about myself. My name is Brad Foulkes. I'm the director of engineering at Optimal Analytics, where we work with businesses to try to help them understand their data, identify roadblocks, and get them moving more efficiently.
	Prior to that role, I worked in reliability engineering for about nine years, working on how can we understand part reliability and figure out when parts will actually break over time.
	I've been using JMP for about eight years, I think around JMP 10. I'm not sure, it was a while ago, so I forget what version we were on...I was on then.
	In my favorite JMP tools, I like to use the life distribution a lot because that's where the reliability modeling sits. Fit model, I use very frequently and JSL.
	Scripting can open up a whole sort of new tools and new toys for you to play with. So why do I want to present this today?
	Because, more often than not, this question comes up at the end...at the end of building a model and not really thought about up until that point.
	You might finish a model and have a whole bunch of probabilities and then not really know what to do next. So here's kind of one option that you can do as you have built your model and can continue forward.
	So what we're talking about today around event prediction, when you build your model, you're used to frequently seeing this kind of a plot here. You'll have your individual times,
	and maybe these are hardware or product life cycles. Let's say it's a blender, you know, or any kind of kitchen appliance or small kitchen appliance that maybe doesn't last
	a terribly long amount of time. So some of these are going to fail very early; some will fail very late.
	But at the end of the day, you end up with a probability of failure, when these things are going to...are going to occur. So here you might come over here and say, okay, well about 90% of the failures are going to occur by four years.
	Well, that's great. That helps you to kind of understand a first failure and maybe put a probability of parts failing in the field, but for a person who actually has to buy these things or use them, this doesn't help them a whole lot.
	For a customer that's actually using these, they don't care, you know, how long it's going to last. They might not want to know, how many will I need to buy over a period of time,
	maybe in an overall lifecycle. They only want to look at a 10 years, how many of these do I need to buy in 10 years for for it to survive.
	So that's where event prediction comes in. And what event prediction really is, is taking your distribution and kind of flipping it around, so still having your, you know, your distribution of time.
	What are the the time to events that you have here, and then the probability of an event. And so now
	you can look at how frequently some of these events occur. So maybe the first event occurs here around eight and a half years, the second event occurs at nine and a half,
	the third event...but this is all one customer. And so this is all one customer, one use case, and so, for them, you know, their meantime to failure
	looks at all of these parts here, looks at all of these intervals and these replacements. So this here,
	is 20 plus years of time that a customer is actually looking at this. They may...so they they're going to want to know...I need for or to survive that 20 years, I'm going to need at least three parts.
	And so, with using some of this random selection and simulation, you can put together an understanding of how frequently these parts need to be replaced and how frequently a customer might need to purchase your product.
	So, so how is this practical? Alright, so it really comes down to replacements.
	And replacements being, just how frequently are you are you putting a new part, or as good as new part, in there? So
	a repairable system and a non repairable system. And a repairable system is one where you can get things just up to the
	the point that it can run again, and maybe it's not as good as new. So let's say you've got a hole in your tire. You plug the leak, and you keep going.
	That's repaired. But what if you could repair the tire to the point that it was as good as new and it's never going to fail again?
	Now with the tire, maybe that's not all that practical, but with other things, you might be replacing a part, or maybe you need to cut off a portion of a part and replace just that one or that part that failed.
	Getting a part to as good as new can make this a non repairable system. So even though you performed a repair, to make it as good as new,
	it changes things a little bit. Now that's a giant caveat, you know.You do have to have a lot of considerations as to what as good as new means, but very frequently, I think part...or people deal with parts that
	they've repaired and they repair as good as new and they think, well, this is a repairable system, but for the reliability world, it's not. Alright, so I'm off that soapbox for a bit.
	Back to replacements. When we're looking at kind of a single system,
	you might be looking at something here and say, okay, you know,
	for one single system,
	how frequently are parts failing? And so if we're looking at just one customer, how how frequently are they buying them? And so the first part lasted four years here.
	Then they had to buy a second one. That one only lasted 1.3 years, then they had to buy a third one, and this one went gangbusters and lasted eight years.
	So, at the end of the day, they've had to buy three parts in 12 years, but there's a wide variation. So you see this with with all sorts of products, with some cell phones, with
	other home products. Some things will fail early; some things will fail late.
	You might just be talking with your friends and say, hey, I've gotten...I have this horrible cell phone. I can't believe it failed this early. And they'll be talking to you about the the same cell phone that they've had for 10 years. Granted probably not 10 years but
	it could be a while. So at the end of the day, for this one system, this one customer, they've had to replace a part three times.
	Now, if we look at multiple systems, we can build an idea of over time of how frequently do these things occur. So
	one customer might have that first system, then the second customer, maybe they had five replacements or six replacements over that...over roughly that same period of time.
	So if we were to look at the first system, we might say, okay, we wouldn't expect any failures until at least year four here.
	But if we look at the second system, they had a failure in year one. So if we're trying to figure out the average failures or the the average number of events that we might expect in year one or year two,
	that first system doesn't tell us much. So this is where, if you don't have the data and...but you have your distribution in your model where the beta and eta for your Weibull or something like that,
	you can use simulation to try to lay out how frequently some customers expect events in year 1, 2, 3 and and going out much further than the data that you actually have.
	By doing that, by looking at those multiple systems and looking at that simulation you can kind of understand and start to see a trend of how frequently
	these parts are going to be replaced year over year. So while this data here may have only been built off of event times that were seven years long,
	for for calendar time or for a customer time, there's going to be kind of a spike and and then it will level out to a kind of a flat level of replacements. Now
	with a simulation, you're going to see some bouncing, and that's what you see here. If we have a closed form solution to this, we would have a nice straight line, but
	with the simulation, we do get some variation. That's what you see here. So you can draw, kind of, a line and see you know, on average,
	a person might expect to replace, you know, one-third of these a year. So after three years, you could expect to have a replacement, once it once it gets going. So a few in the beginning and then it kind of levels out.
	So let's look at an example of what I mean by this.
	We have a table here of data and here, I've got a model. And so this model is just a random Weibull. And so if you're not familiar with the different functions that you can use in JMP,
	you can use a random Weibull, a Weibull quantile, Weibull distribution, Weibull density. And these are all functions that you can apply to different distributions.
	In this case, we're choosing a random set of points off of a particular distribution. My distribution has a beta of 1.2 and an eta of 3.
	So, by choosing these random...random events here,
	these are all individual events, and this is what would end up going into your model. What would end up if you were to build a model on part failures, these are the dates that usually end up going in there. What gets lost, though, is how this actually affects the customer. So over time
	the customer maybe wants to know how how many of these I might replace in 20, 30, 40 years.
	And that's where you need to look at the total system event tims. And this is just the cumulative sum of individual times, so 8.3 is the first, 1.2 is the second and so on.
	And so, when you go to the second system, we've got 1.8, 3.9, 5.9, so it's just a cumulative count of the individual times.
	The thing to keep in mind here is that when you're trying to kind of figure out an average of all this,
	you would be looking at the system number, but then at the year that this actually occurred, so 1.8 years occurred in the second year of operation.
	And that's that's a an important distinction to keep...to bear in mind here, because you want to make sure that you're identifying the failures in the appropriate year.
	So your 0.6, while it... if that was the first one, it would be in year one, it's really only incrementing the year...the system time...the system years by one here.
	So, having this you might think, okay, well, now that we've we've got all this we can count up our number number of events and look at this system, the year and and perform a nice little analysis. So JMP actually has a nice tool that will do all of that for you.
	So
	in JMP there is a tool called the repairable systems simulation.
	And this tool,
	I know it...I believe it came up in like JMP 13, maybe JMP 14. It's been around for a few years.
	I don't know that it gets nearly as much publicity as it should, because it is an incredibly powerful tool that I think a lot of folks don't don't appreciate here.
	So, so what it is, is you're looking at a system here. And so that maybe, or right now, I've got two parts in here, but maybe my system is actually
	seven or eight parts, or it's an entire gearbox, or a pump, or something like that. And you have all these parts that work together, that if any one of them fails,
	you know, the entire system fails, and so you want to understand what is the event, what are the event times there. So in each one of these, you'd lay out what your Weibulll is or what your model is, and here I've got a beta 1.2 and an alpha of 15.
	It's a time unit of years, I can certainly choose different distributions if I...if I would like.
	And then I can say what happens if an event occurs. So if a a block...so here I've got a block failure, what happens if a failure actually occurs?
	And the outcome is to replace with new, so I just want to replace the part. Now there are many different options here, maybe I could do a minimal repair, but instead I've said, I want to replace this with a brand new part, so this is going to be a like new...like new model.
	And then you can include an amount of downtime. So how long to repair this? What was the mean time to repair? It can be a constant value
	or it can be...you can say immediately, don't even care about the amount of time it takes or you can give it a list to choose from. You can you give it a couple distributions here as well.
	So so with these tools...and now I've built a very simple system here. I've got one part with my known distribution or I've got two parts with known distributions.
	When either one of these fail, I have a replacement time, and so the second one has the choices option.
	And you can see, you you put it in here. Now the the choices need to be in the same time unit as your simulation.
	So I've chosen years to be my simulation. This is in years, and so the time to prepare for something might only be an hour, two hours, maybe it'll be a day. When you put that in times of years, the numbers get really small though, so so that's what you see here.
	I'm not going to go into all the options here. As you can see, there are many other things that you can do with a standby, a K out of N, a lot of those are for for more complex analyses.
	Alright, so I run my analysis and I'm going to run this for 20 years, which is listed here, with the number of simulations as 100. You will often want more than that, but for demonstration purposes, we're going to set it at 100 and have a seed of 1234.
	So then, then I get this nice big big output here, and the initial output is always called number, which is very helpful.
	But you might look at this and go okay well, what do I do with any of this stuff? And you can come over here and you can launch this analysis and
	maybe you want to look at the total downtime by component. So this says, you know, how...or what is the distribution of downtimes that a a component might see?
	But it never actually answers the question of over 20 years, how many times am I going to need to replace this part?
	So I wrote a script and so when you...when you download this journal, it will have all these scripts that are in there.
	I set it as a button, but the actual script is all right underneath here for for your reference, if you would like to use it.
	So I'm just going to click on the button and a whole bunch of things are going to happen. So I've now gone in and calculated how frequently do I expect these events to to occur, year over year. So from the system perspective,
	I've subsetted my failure events and set my year value. So so you might recall that the year value rounds up, so 3.8 years is 4.
	Now this looks at calendar time, this looks at the the system time that we might be using. It doesn't look at the individual time, so the time between each one of these, you know, for for row one to row two is 1.8 years or so.
	But because we're looking at the system, we're actually counting the 5.69.
	Now I want to turn that in...I want to turn all these values into something useful.
	The important thing to remember, though, is that you need to...when you're looking for an average number of event every year, you need to look at the years that didn't have events occur. So in this first simulation, the first event occurs in year four,
	but I need zeros for one, two and three, and that's what my script does. My script kind of goes in sets a
	pattern of what it is that I want to look at. So I want to look at part one, I want to look at the year value and the simulation ID, and then look at when the events actually occurred. So here you can see the first part one event occurs in year ten.
	And so when that gets added here, that's what gets updated to the analysis. If I scroll down to part two...
	when you scroll down to part two for for simulation one,
	there...there are no events.
	The first event for simulation one isn't until...
	until year four. So if I scroll all the way down to year four
	for part two, it will show that I have one event in simulation one, alright. So it's just really kind of lining up these events for every possible combination of years that this could occur.
	Well, having that level of information, now that I have that across all of the...all of the simulations, you saw I had 4,000 records, now that that's just all the possible permutations of year, simulation, and part.
	But really I want to average those down. So I average it down, and I want to look at the number of simulations, which were the number of rows, the year value, and then the part. I get down to 40 rows, far more manageable and far easier to understand.
	What you see here for part two is we have that little spike up in the beginning, and then it kind of levels off. And
	it bounces around because it's the simulation and we only ran 100 events, but...or 100 simulations, but we have kind of a steady state of of replacements that we might see.
	If we were to try to look at this from a cumulative perspective, though, this is what can tell a customer how many they might expect to replace in 10 years. So for part two,
	let's say the customer wants to know when will I need to replace, you know, the first one? What's the what's the very first one? The very first one
	is going to happen between years four and five, like just after year four. And then the second one happens just...or at about year seven.
	The third one happens at about year 10. So you can see, you kind of have this this bit of a climb where not a lot of failures up front, but then there's this steady state and every three years or so you end up needing to to replace the parts.
	This can be very helpful for for customer, for planning for your business, or even the supply chain efforts, if you wanted to try to do something like this there.
	So using this event prediction from the repairable systems simulation is a handy and quick way to to give that idea or give that understanding of how many parts, not just the probability of failure and when that might occur. So
	one thing...one other thing that I want to talk about today is a way to make the repairable systems simulation a little easier.
	And so it may be daunting. It's a new tool, you may not know everything that you're you're working with, but
	if you...if you start with the template, start with the table, building your repairable systems simulation can be a lot easier.
	So in the end, to build your model, you need you need roughly 13 pieces of data. Most of this goes on in the background that you wouldn't even think of, but if you lay these out into a table and a template
	with these various columns here... And I say 13, you actually need less. I've I've made some modifications here to handle some neat little things that you could do here.
	But really you have your distribution, your model type, your parameters, what time unit. These are all things that are either defaulted in JMP that you can change or
	that that are things that you would add in. So each one of these has a, you know, if the block fails, what's your replace with new? And these you can adjust, you can connect these event names and these processes differently.
	In in my world, a lot of times we'll see a change of behavior. And maybe we want to understand that the first few years, this is the replacement rate, and then the second few years, here is the replacement rate.
	Or you might have competing failure modes that you want to take advantage of. So what I'm going to show you here can handle a lot of that.
	Excuse me, this is by no means perfec. This
	should definitely be adapted for your own...your own use case, but, at the end of the day, starting with a template is an incredible way to to speed up your development and
	to allow you to be able to make adjustments...make small adjustments on the fly. So if I've got this, I've got a nice little script here, where, if I click on the script, it produces my repairable systems simulation for me.
	And if I look over here, I've got my first...my first part with a beta 1.86 or 826, an alpha of 153.
	And it's really just taking this table and turning it into the repairable systems simulation. You can do
	something even more complex. So let me close that one down.
	And so here I've got 14 parts. I could change any of these distributions if I want,
	but I've got other types of events that I want want to look at here. I want to split my distribution where
	it changes or changes shape after a certain number of time. Maybe I'm trying to model a product introduction, you know, product upgrade that is going across the fleet. Or maybe I've got competing failure mode in here and I want to take that into account.
	So if I will, you know...using this template in the script, it allows me to go and handle all of these in a very simple fashion. Now you saw how quick that was. What is that, a second or so to
	to generate this model? Now if I want to create another one of these, maybe, maybe I want this to be a
	log normal.
	I can do that right here, and then
	come over to my script
	and
	and just run that again. And I get another model where (let's see, what was it? Part five.)
	part five is now a log normal. It's really easy to try different iterations, try different models, and it generates a new one every single time,
	labeling the diagram as your your table here. So it's a handy way to perform an analysis and understand your event predictions using this template, using the script that's an embedded in here to perform the analysis, and understand your event predictions over time.