Discussions

bns · Oct 7, 2022 12:38 PM

Hi JMP Community,

I am relatively new to statistics and trying to analyze a dataset on JMP. Had a few questions.

The data:

One measured patient value
~6 equations aimed at estimating the measured patient value
Demographics of the patients

The aim:

Determine correlations/performance between estimations and measured
Determine how much variation exists from one estimation to the next
Determine which estimating equation performs best for certain patient groups/demographics (i.e. older patients vs younger, different comorbidities, different BMI)

My questions:

What tests would you recommend for each of these aims?
- I was thinking of just doing y by x for determining correlations. Would I just do y by x and then by demographics to determine performance depending on certain characteristics?
What is best way to determine how much variation exists from one estimation to the next for each person?

Thanks, and let me know if you have any other ideas for this and/or questions.

Ben

peng_liu · Oct 7, 2022 04:35 PM

There are quite some statistical concepts/techniques to grasp for this problem, from what I can tell. But also, there are things not clear to me. Here are some thoughts.

I recognize that you have a prediction problem in hand, i.e. you want to predict the measured patient value (Y).
You mentioned 6 equations. You did not mention "predictors". "Predictor" is a jargon for X, used for predicting Y. You may have more than one X. So I guess each equation in your case is a formula of one or more X's.
The type of your Y directs you to one direction, instead of another. If your Y is continuous, e.g. measures like weight, look for regression tools like least squares. If your Y is binary, e.g. good/bad, look for tools like logistic regression. In JMP, some places can automatically tell which direction to go, e.g. "Fit Y by X", or "Generalized Regression", so you don't have to worry too much about Y's type. In other places, you need to figure out what platforms support what type of Y.
Your equations also direct you to one direction, instead of another. It depends on whether your equation is linear in parameters or not. Then your path is forked into either linear models, or nonlinear models.
After you have determined the type of model which you can apply to your data, and fit the model, you need to assess whether model is adequate enough, are there any violations of assumptions. Look for topics related to model diagnostics.
Assuming the model behaves well, and you decide to compare different models (equations), there is something called "model selection criterion", using which you can assess the performance among models. There are quite some different criteria: R-square for regression, information criterion for many other situations. Or as you mentioned, use correlation between the measured and the estimated to determine. I believe that is a very close proxy to the formal methodologies that I mentioned.
I don't understand your second question. In particular, what is your definition of "variation from one estimation to the next"?
For your last question, I guess that you need to fit model by groups. Then compare their performance group by group. But in case your 6 equations have demographics as predictors, then you probably have a variable selection problem, or significance testing problem. I cannot tell for sure without knowing more about your equations and data.

View solution in original post

peng_liu · Oct 7, 2022 04:35 PM

There are quite some statistical concepts/techniques to grasp for this problem, from what I can tell. But also, there are things not clear to me. Here are some thoughts.

I recognize that you have a prediction problem in hand, i.e. you want to predict the measured patient value (Y).
You mentioned 6 equations. You did not mention "predictors". "Predictor" is a jargon for X, used for predicting Y. You may have more than one X. So I guess each equation in your case is a formula of one or more X's.
The type of your Y directs you to one direction, instead of another. If your Y is continuous, e.g. measures like weight, look for regression tools like least squares. If your Y is binary, e.g. good/bad, look for tools like logistic regression. In JMP, some places can automatically tell which direction to go, e.g. "Fit Y by X", or "Generalized Regression", so you don't have to worry too much about Y's type. In other places, you need to figure out what platforms support what type of Y.
Your equations also direct you to one direction, instead of another. It depends on whether your equation is linear in parameters or not. Then your path is forked into either linear models, or nonlinear models.
After you have determined the type of model which you can apply to your data, and fit the model, you need to assess whether model is adequate enough, are there any violations of assumptions. Look for topics related to model diagnostics.
Assuming the model behaves well, and you decide to compare different models (equations), there is something called "model selection criterion", using which you can assess the performance among models. There are quite some different criteria: R-square for regression, information criterion for many other situations. Or as you mentioned, use correlation between the measured and the estimated to determine. I believe that is a very close proxy to the formal methodologies that I mentioned.
I don't understand your second question. In particular, what is your definition of "variation from one estimation to the next"?
For your last question, I guess that you need to fit model by groups. Then compare their performance group by group. But in case your 6 equations have demographics as predictors, then you probably have a variable selection problem, or significance testing problem. I cannot tell for sure without knowing more about your equations and data.

Discussions

New to Stats: What tests and how to in JMP?

Re: New to Stats: What tests and how to in JMP?

Re: New to Stats: What tests and how to in JMP?

Recommended Articles