Discussions

AbbieGraham333 · Jun 23, 2025 09:41 AM

Hi,

I have a set of data that has recorded the frequency of spillages from lorry loads. I have recorded for the period of two years: load, date, time, number of bottles on the load, driver, weather, dusk/dawn etc. I want to identify risk factors for when spillages are most likely to occur. However, these spillages happen so infrequently that most of my data is 0. For example, I have 29,000 rows of data, and approx. 28,000 of these loads have no spillages associated. I have calculated Mean Spillages per load, and mean spillages per million bottles. However, it is hard to know whether to do an ANOVA when I have so many zero values. Please could someone advise on what is the best type of modelling/test to use? I just want to determine whether any specific risk factor is associated with the number of spillages if and when they occur.

Thanks,

Abbie

statman · Jun 23, 2025 8:19 AM

First welcome to the community. The question regarding how to develop causal relationships with rare events is very challenging. The response variable of frequency of spills is not a very efficient response variable. It also is not very discriminant in terms of understanding causation. It is an aggregate of many possible failure modes/mechanisms.

These are challenging to investigate with experimental design (while you can likely make the lorry spill, that may not be why it is currently spilling). My bias would be to use directed sampling (component of variation and stability studies to study the process as is). There is the question, are these actually special cause events as defined by Deming? If so, his advice is to react specifically and locally to the events, rather than spend time and effort to predict the events (common cause action). Are they actually higher order effects (e.g., >4th order interaction effects)? Where it is a combination of factors that combine to create the event.

How confident are you in the existing data set? Are there spillages that are unrecorded (perhaps they were small or corrected)? I don't know what the load is, but the clue from your description is some sort of bottle. If the bottles fall out and do not break, is it a spill? Does size of spill matter?

Here are things you can do:

1. First start with hypotheses as to what may cause these events and why. For example: Spills occur because the lorry becomes unstable due to uneven loading. You may be able to do some scatter plots/correlation off the existing data set, but that would just be to stimulate your mind to develop hypotheses that would need to be investigated with future data.

2. Develop an exhaustive list of factors. Process mapping and FMEA often helps to do this. Make sure you actually watch the process of loading, moving and unloading the lorry. It is likely your current data set does not contain information about all of those x's.

3. Are there other response variables that could be measured that might correlate with the phenomena that would provide better insight to failures(e.g., number of situations that might increase the chance of a spill, location of spill, direction of spill from the lorry)? Or, for example, perhaps you hypothesize about the effect of weight balance of the load in a lorry. Perhaps measure the weight distribution within and between lorry loads. Or you suspect it is the road conditions, perhaps you add accelerometers to the lorry. Or speed, speedometers...etc.

4. Make the lorry robust to conditions you hypothesize increase the likelihood of spillage (e.g., suspension that absorbs changing road conditions). This can be done with experimentation as long as the noise in the process is included in the study. Again, you must weight the resources necessary and determine if indeed common cause action is cost effective. Do you need to be robust to a rare event?

"All models are wrong, some are useful" G.E.P. Box

P_Bartell · Jun 23, 2025 04:18 PM

Just to add a bit of a different perspective to what @statman recommends...which I also endorse...you might want to try zero inflated Poisson regression as a fitting personality in the JMP Fit Model platform.

Discussions

What type of analysis for an infrequently occurring event

Re: What type of analysis for an infrequently occurring event

Re: What type of analysis for an infrequently occurring event

Recommended Articles