Subscribe Bookmark RSS Feed

Missing data w/regression testing & calculating significance

Thanks for reading-

I am looking to run multiple regression on my data and to calculate significance of predictability; I have my data imported and I've figured out how to run multiple regression, but my questions are:

* How do I set it to handle missing data and 'dont' know' responses?
The data is 5 point numeric rating scale for x and y.

* How do I calculate whether a variable is a statistically signifcant predictor? (I am a stats rookie).

Thank you in advance for any help.


Community Trekker


Jun 23, 2011

You can leave the space in the data set blank or:

1. Use the grand average of the existing data
2. Use your predicted value for the missing data (this of course assumes you have predicted the results prior to capturing the actual data)
3. Do a least squares regression and calculate the missing data point (run the analysis and save the prediction formula). Note: This method can give misleading results; especially with small data sets.
4. Rerun the missing treatment combination (DOE). Or re-collect the missing sampling points. Be aware of potential blocking effects and changes in Unit Structure.
5. Rerun the entire sampling plan.

What is the response variable? Sounds like a categorical/Likert scale type response?

Use fit model and run standard least squares. Enter your model (X's and Y's). Evaluate the output (ANOVA).

Remember the prediction formula is only as good as the knowledge of the context of population from which the sample was obtained, regardless of the statistical significance.