cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
petertrickery
Level II

How to figure out the most significant factor in a formula calculation?

Hey everyone,

 

For example, I have a series of calculated number F which was determined by 4 variable (A, B.C,D)  via a formula F=A/B*C/D. Is there any helpful toolkit through JMP that I can figure out which variable among A, B, C, D make the most significant impact of final result F? 

 

1 ACCEPTED SOLUTION

Accepted Solutions

Re: How to figure out the most significant factor in a formula calculation?

Statistics are about understanding true relationships from data that has variability. In this case, your functional form is known, so you are really asking a mathematical question that will depend greatly on the ranges of your factors. 

 

Before I give you some approaches to understanding the "important" factors, I will say that I have some difficulty in understanding your formula. You changed it in the last post, but more importantly, none of the results in the F column seem to match the formula. So I am not really sure what your formula is. But going with the original post of A/B*C/D (which with parentheses is really this:  ((A/B)*C)/D).

 

Regardless of the formula, once you have it entered into a JMP table, then you can use Graph > Profiler. Put the formula column in the Prediction Formula spot.  Here is an example based on the original equation.

Dan_Obermiller_0-1733880385983.png

This shows that A and B are the most important. But what is the actual ordering? We can get that by using a simulation. Click on the Prediciton Profiler red triangle and choose Assess Variable Importance > Independent Uniform Inputs (or one of the choices that best fits your situation -- read the JMP manual to read descriptions of all of them).

Dan_Obermiller_1-1733880582182.png

This report shows the rank ordering of the factors. I think this may be a good way for you to assess the factor importance. However, please keep in mind that there are limitations to this approach and several assumptions that are being made. Be sure to read all of the information in the JMP manuals to understand the approach.

Dan Obermiller

View solution in original post

9 REPLIES 9
statman
Super User

Re: How to figure out the most significant factor in a formula calculation?

First welcome to the community.  To be honest, I'm not really sure what you are asking.  Where did you get the formula?  Is it theoretical or empirically derived?  If you want to understand the contributions of each factor and perhaps interactions between them, you should design an experiment.

"All models are wrong, some are useful" G.E.P. Box
petertrickery
Level II

Re: How to figure out the most significant factor in a formula calculation?

Hey Statman,

 

Thanks for your reply. Let me specify it a little bit. The Factors A-D represents 4 categories of data sets we generated in lab, and then we applied a theoretical formula A/B*C/D to generate our target value F.  

dlehman1
Level V

Re: How to figure out the most significant factor in a formula calculation?

This seems backwards to me.  If A-D is data generated in a lab, then you can fit a variety of models designed to predict F.  Whether or not the come close to the A/B*C/D formula remains to be seen.  Most of those predictive models will also provide information about the relative importance of those 4 factors.  If you insist on that particular function, then I'm not sure what the meaning of relative importance is - the function depends on all 4 factors in exactly the way the function is specified.  I think the relative importance is an empirical matter, so it should be derived from a preset formula, instead derived from an analysis of the data.

statman
Super User

Re: How to figure out the most significant factor in a formula calculation?

Sorry, I'm a little slow.  What do you mean by "categories of data"?  

 

IMHO, theoretical models typically are built from multiple iterations of the inductive/deductive cycle (aka, scientific method).  The models are, of course, built from the inference space they were "introduced to" over the course of those iterations.  As new techniques, materials, energy sources, etc. evolve, those theoretical models (you can tell they are theoretical as they include no error term or said another way they can't account for variability not encountered when they were developed) may not be as useful, but are useful for hypotheses generation.  My philosophy is to never stop iterating, constantly expanding the consistency and predictability of the model over extremely large inference space (what I call robust design).

"All models are wrong, some are useful" G.E.P. Box
petertrickery
Level II

Re: How to figure out the most significant factor in a formula calculation?

 Thanks for your reply! I attached a set of data examples below, and hope it helps illustrate my question more clearly.

 ABCDF
10.240.12225.79112.6640.014377
20.2370.11725.92312.7640.013653
30.1310.16325.01212.150.010373
40.1420.17725.15712.0330.012022

There are two similar treatments, each consisting of two replicates:

  • Treatment I: Replicates 1 and 2
  • Treatment II: Replicates 3 and 4

For each replicate, there are four individual results corresponding to different categories (A, B, C, and D). The index F is calculated using a specific formula I previously mentioned (F=A*C*D/B). My question is: Is there a statistical analysis that can help identify which category (A, B, C or D), when changed, has the most significant impact on the final index F?

 

I truly appreciate your time and effort. As a beginner in statistics, I find myself with many questions and uncertainties while processing my data. Your guidance means a lot to me!

Re: How to figure out the most significant factor in a formula calculation?

Statistics are about understanding true relationships from data that has variability. In this case, your functional form is known, so you are really asking a mathematical question that will depend greatly on the ranges of your factors. 

 

Before I give you some approaches to understanding the "important" factors, I will say that I have some difficulty in understanding your formula. You changed it in the last post, but more importantly, none of the results in the F column seem to match the formula. So I am not really sure what your formula is. But going with the original post of A/B*C/D (which with parentheses is really this:  ((A/B)*C)/D).

 

Regardless of the formula, once you have it entered into a JMP table, then you can use Graph > Profiler. Put the formula column in the Prediction Formula spot.  Here is an example based on the original equation.

Dan_Obermiller_0-1733880385983.png

This shows that A and B are the most important. But what is the actual ordering? We can get that by using a simulation. Click on the Prediciton Profiler red triangle and choose Assess Variable Importance > Independent Uniform Inputs (or one of the choices that best fits your situation -- read the JMP manual to read descriptions of all of them).

Dan_Obermiller_1-1733880582182.png

This report shows the rank ordering of the factors. I think this may be a good way for you to assess the factor importance. However, please keep in mind that there are limitations to this approach and several assumptions that are being made. Be sure to read all of the information in the JMP manuals to understand the approach.

Dan Obermiller
dlehman1
Level V

Re: How to figure out the most significant factor in a formula calculation?

I remain confused.  From the small example you posted, I was able to figure out that the actual formula for F is (A*B*D)/C which is different than both of your statements.  Given that this formula exactly matches the data, there is no variability so there is no need to model anything.  Dan_Obermiller has provided a method to use in this case - which I don't really understand.  I'm not sure what it means to assess the importance of the variables in that formula.  It is true that you can look at how the results of the formula vary over the range of values you provide for the 4 variables and I guess you can call that assessing their importance.  That appears to be what Dan's method will tell you.  But it still leaves me a bit confused about the purpose of this exercise.  I suppose that if your experiments embody the realistic ranges for the 4 variables, then knowing how each affects the computation of F, then it sounds like what you are asking.  But I think you need to be sure that the range of values in the experiments covers the relevant range for your purposes.

dlehman1
Level V

Re: How to figure out the most significant factor in a formula calculation?

I almost think it might be better to approach this analytically.  If you take the derivative of F with respect to your 4 factors, you get (BD)/C, (AD)/C, (ABD)/C*C, and (AB)/C, respectively.  The relative importance of the factors then depends on the ranges of values for the 4 factors.  If you want to visualize these, you can easily create a data table with whatever ranges for those factors you want and graph how F varies over these ranges.  I don't see anything statistical about this exercise, since it is just examining the properties of the formula for F. 

petertrickery
Level II

Re: How to figure out the most significant factor in a formula calculation?

Thank you for the detailed explanation! As Dan mentioned, my question leans more towards a mathematical approach, focusing on the properties of the formula rather than a statistical analysis. My primary goal is to evaluate the individual contribution of each variable in the formula and understand how changes in each one influence the volatility of Index F.

I believe Dan's suggestion regarding effectiveness assessment will be an excellent tool for addressing this objective, and I look forward to using it in my work.