cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
bns
bns
Level II

New to Stats: What tests and how to in JMP?

Hi JMP Community,

 

I am relatively new to statistics and trying to analyze a dataset on JMP. Had a few questions.

The data: 

  • One measured patient value
  • ~6 equations aimed at estimating the measured patient value
  • Demographics of the patients

The aim:

  • Determine correlations/performance between estimations and measured
  • Determine how much variation exists from one estimation to the next
  • Determine which estimating equation performs best for certain patient groups/demographics (i.e. older patients vs younger, different comorbidities, different BMI)

My questions: 

  • What tests would you recommend for each of these aims?
    • I was thinking of just doing y by x for determining correlations. Would I just do y by x and then by demographics to determine performance depending on certain characteristics?
  • What is best way to determine how much variation exists from one estimation to the next for each person?

Thanks, and let me know if you have any other ideas for this and/or questions.

 

Ben

1 ACCEPTED SOLUTION

Accepted Solutions
peng_liu
Staff

Re: New to Stats: What tests and how to in JMP?

There are quite some statistical concepts/techniques to grasp for this problem, from what I can tell. But also, there are things not clear to me. Here are some thoughts.

  1. I recognize that you have a prediction problem in hand, i.e. you want to predict the measured patient value (Y).
  2. You mentioned 6 equations. You did not mention "predictors". "Predictor" is a jargon for X, used for predicting Y. You may have more than one X. So I guess each equation in your case is a formula of one or more X's.
  3. The type of your Y directs you to one direction, instead of another. If your Y is continuous, e.g. measures like weight, look for regression tools like least squares. If your Y is binary, e.g. good/bad, look for tools like logistic regression. In JMP, some places can automatically tell which direction to go, e.g. "Fit Y by X", or "Generalized Regression", so you don't have to worry too much about Y's type. In other places, you need to figure out what platforms support what type of Y.
  4. Your equations also direct you to one direction, instead of another. It depends on whether your equation is linear in parameters or not. Then your path is forked into either linear models, or nonlinear models.
  5. After you have determined the type of model which you can apply to your data, and fit the model, you need to assess whether model is adequate enough, are there any violations of assumptions. Look for topics related to model diagnostics.
  6. Assuming the model behaves well, and you decide to compare different models (equations), there is something called "model selection criterion", using which you can assess the performance among models. There are quite some different criteria: R-square for regression, information criterion for many other situations. Or as you mentioned, use correlation between the measured and the estimated to determine. I believe that is a very close proxy to the formal methodologies that I mentioned.
  7. I don't understand your second question. In particular, what is your definition of "variation from one estimation to the next"?
  8. For your last question, I guess that you need to fit model by groups. Then compare their performance group by group. But in case your 6 equations have demographics as predictors, then you probably have a variable selection problem, or significance testing problem. I cannot tell for sure without knowing more about your equations and data.

View solution in original post

1 REPLY 1
peng_liu
Staff

Re: New to Stats: What tests and how to in JMP?

There are quite some statistical concepts/techniques to grasp for this problem, from what I can tell. But also, there are things not clear to me. Here are some thoughts.

  1. I recognize that you have a prediction problem in hand, i.e. you want to predict the measured patient value (Y).
  2. You mentioned 6 equations. You did not mention "predictors". "Predictor" is a jargon for X, used for predicting Y. You may have more than one X. So I guess each equation in your case is a formula of one or more X's.
  3. The type of your Y directs you to one direction, instead of another. If your Y is continuous, e.g. measures like weight, look for regression tools like least squares. If your Y is binary, e.g. good/bad, look for tools like logistic regression. In JMP, some places can automatically tell which direction to go, e.g. "Fit Y by X", or "Generalized Regression", so you don't have to worry too much about Y's type. In other places, you need to figure out what platforms support what type of Y.
  4. Your equations also direct you to one direction, instead of another. It depends on whether your equation is linear in parameters or not. Then your path is forked into either linear models, or nonlinear models.
  5. After you have determined the type of model which you can apply to your data, and fit the model, you need to assess whether model is adequate enough, are there any violations of assumptions. Look for topics related to model diagnostics.
  6. Assuming the model behaves well, and you decide to compare different models (equations), there is something called "model selection criterion", using which you can assess the performance among models. There are quite some different criteria: R-square for regression, information criterion for many other situations. Or as you mentioned, use correlation between the measured and the estimated to determine. I believe that is a very close proxy to the formal methodologies that I mentioned.
  7. I don't understand your second question. In particular, what is your definition of "variation from one estimation to the next"?
  8. For your last question, I guess that you need to fit model by groups. Then compare their performance group by group. But in case your 6 equations have demographics as predictors, then you probably have a variable selection problem, or significance testing problem. I cannot tell for sure without knowing more about your equations and data.