Discussions

JMewborn · Jun 11, 2023 4:26 AM

Is there a maximum number of causes, or data points a pareto plot can run with? I have a pareto plot that has a little over 5000 causes. When I try to combine the last ~5000 causes, the plot is not interactive anymore. I know it will probably lag a good amount, but I am not getting any interaction.

Josh

SDF1 · Jul 21, 2022 11:27 AM

Hi @JMewborn ,

Are all 5000+ factors really that important to include in a Pareto plot? Usually you'd do the top 80%+ or so. It seems to me that you could use some of the other platforms within JMP (do you have JMP Pro?) to assess variable importance and whether or not they are truly contributing factors. Also, have you compared your factors with a Null Factor to determine if they contribute greater or less than a truly random number? A Null Factor has some more properties than just a random number, it is truly orthogonal to your data set, meaning that it has zero correlation with your data. Hence, if you have a variable that shows up lower on the list as the Null Factor as contributing to the response, then there is a high likely hood that factor doesn't contribute in an important way to the response, but rather is a covariate with the noise in the response and shouldn't be included as a contributing factor.

I'm sure that the lack of interactivity is due to the many k's of factors you're including and that if you could narrow it down to just the most important, then you'd get the interactivity back.

Hope this helps.

Good luck!,

DS

statman · Jul 21, 2022 9:41 AM

I don't know the answer to your question, but I agree with SDF you have too many causes (potential). My guess is you are mixing hierarchy of the causal structure. My advice is to stay within one hierarchy of cause effect continuum and don't mix hierarchy in one Pareto. This is nothing wrong with multiple Pareto charts each for a different hierarchy. I will try to illustrate with the following continuum:

If you are looking for causes of the loss of the battle, horse, nail and shoe would not be on that Pareto. Rider and others at that hierarchy would be on that Pareto.

Also remember, you are looking for big jumps in the Pareto plot to be able to claim assignability to the cause (vs. simply random variation)

"All models are wrong, some are useful" G.E.P. Box

P_Bartell · Jul 21, 2022 12:49 PM

In addition to what @SDF1 and @statman have suggested here's another wild thought for you...are you familiar with Latent Class Analysis? Maybe there is a not so visible multivariate style correlation among your 5000+ 'causes' that you could summarize and sort of collapse the dimensionality of the pool of causes by using Latent Class Analysis. Just a thought and maybe worth a look? Although I'm not sure of response speed in LCA with that many 'causes'...might be lengthy...but at least worth a try? Here's a link to the JMP documentation on LCA: Latent Class Analysis in JMP

Discussions

Pareto Plot Functionality

Re: Pareto Plot Functionality

Re: Pareto Plot Functionality

Re: Pareto Plot Functionality

Recommended Articles