Analytically Speaking: Q&A with reliability expert Bill Meeker
Nov 18, 2014 11:29 AM
Earlier this month, we had the pleasure of hosting Bill Meeker, Distinguished Professor of Liberal Arts and Sciences at Iowa State University, on our Analytically Speaking webcast series. Bill is a noted author and expert in the areas of reliability data analysis, reliability test planning, accelerated testing, nondestructive evaluation and statistical computing. Our viewers had so many good questions for Bill during the live webcast, we didn’t have time to include all of them. For those that were in the queue, Bill has kindly provided answers.
Question: Is there a link or relationship between cohort analysis and survival analysis? Can they be used together? And if so, how would they complement each other?
Answer: Yes, cohort analysis and survival analysis methods can be used together to get more insight into population behavior. In cohort analysis, we stratify our population into different groups of units that share a similar characteristic or characteristics. For example, we might stratify a population of potential customers on the basis of geographic location and/or past experience with the potential customers. Then we could, for example, do separate analyses of time to respond to an offer for each subgroup. An appropriate model to fit would be the “defective subpopulation model” (or DS model) in JMP, in the Life Distribution platform. Some proportion of the population will never respond. This model is also known as the “limited failure population model” and the “cure model,” allowing estimation of the proportion of the population that will respond and the distribution of time to respond for each subgroup. This model is described in some detail in Chapter 11 of Meeker and Escobar (1998). In the Life Distribution and many other platforms, there is an analyze “By” option that will do separate analyses for each cohort. (In JMP 12, there will be a “Compare Groups” option in the Life Distribution platform to make such comparisons even easier to perform.)
Question: You mentioned recidivism as an application of reliability. What are some other areas of application of reliability analysis that you didn’t get to mention during the webcast?
Answer: Yes, Anne mentioned recidivism, and that certainly has been an application of life data analysis (a.k.a. survival analysis) methods. Indeed, it was one of the early applications of the “cure model” mentioned above. There was interest in how long a person would stay out of jail after release. But, of course, some individuals are “cured” and will never return. There are innumerable applications of these methods, which generically might be called “time to event” applications. In engineering reliability, we are often concerned with time to failure or time to return (of a product for warranty repair). In food science and in the development of many other products, one is interested in the shelf life. In the banking industry, there would be interest in “time to payment” for a defaulted loan. In medical applications, there is interest in time to recovery after a treatment. In sociology, there might be interest in time to divorce after marriage. Again, the “cure” model might be appropriate here because a sizable proportion of couples will never have a divorce. In many applications, we are not just interested in the first event, but in the recurrence of certain events over time. Examples include the recurrence of a disease over time (e.g., common colds), repairs of a machine, customers returning for more purchases, etc. Special models and methods are available for such data. Again, I recommend Wayne Nelson’s 2003 book on the subject as a good place to start learning about the analysis of recurrence data. JMP also has powerful analysis methods for recurrence data.
Question: Do you often need to convince end users or customers that 25 or 30 trials are necessary, when these trials are expensive and therefore resisted? If so, what approach would you use?
Answer: Yes, the most common question asked of any statistical consultant is “how many units do I need to test?” And in reliability applications we hear the related question “How long do I need to test?” JMP has extensive tools for planning experiments of different kinds, including reliability experiments such as demonstration tests and accelerated life tests. The theory behind the methods is impeccable. When the software says you need 30 units to achieve the desired precision, it is correct. But that might not help to convince end users with limited resources. I have found it useful to supplement the “black box” answers with repeated simulation of the proposed experiment. I typically run through the analysis five or six complete simulated data sets and then graphically summarize the results of 50 such simulated experiments. The graphically presented simulation-summary results allow visualization of the variability that one could expect to see in repeated experiments and how far away from the truth any given result might be. Such simulations can be used to compare different candidate test plans (e.g., different sample sizes). You do not need to know any theory of experimental design to appreciate the implications coming from the visualization of the simulation results. Such simulations could be programmed in the JMP Scripting Language, JSL.
Question: Can you talk about reliability studies as one – or the main – consideration in a designed experiment, particularly with regard to the approach taken with Taguchi methods?
Answer: Taguchi methods (a.k.a. robust design methods) provide a collection of design-of-experiment tools that can be used to make products and processes more robust to externals noises, such as variability in raw materials or variability in the manner in which a product is used. The use of these methods has been shown to have high potential for improving the quality of a product or the output of a process. Because quality is a prerequisite for high reliability (recall that reliability is “quality over time”), the skillful use of robust design methods will also improve reliability. In some applications, robust design methods can be used to focus directly on reliability. Two books that I highly recommend for this area are:
Question: When you exclude a failure mode, does it treat those as censored data?
Answer: In the multiple failure mode analysis, the first step is to estimate the “marginal” distributions for each failure mode. Under an assumption of independence of the different failure modes, this is done, literally, by making separate data sets for each of the different failure modes and then doing separate analyses for each. In the construction of these data sets, with focus on one failure mode, failure for all other failure modes are treated as right censored (because all we know is that the failure mode getting focus has not occurred yet). This is done for each failure mode. Then the so-called “series system model” can be used to combine the estimates of the marginal distributions to obtain an estimate of the failure time distribution with all of the failure modes active. A simple extension of this approach is to provide an estimate of the failure time distribution with just some of the failure modes active (so you can see the effect of eliminating one or more of the other failure modes). Modern software with capabilities for the analysis of reliability data, like JMP, will do all of this automatically. Technical details for this topic can be found in Chapter 15 of Meeker and Escobar (1998).
If you’d like to see the archived webcast, you can learn more from Bill’s extensive expertise on statistics, reliability and more.