Re: dimensionality reduction question

kjwx109 · Mar 21, 2024 08:40 AM

A colleague is wrestling with the following problem and I thought I would share it here. He has some multivariate data with multiple factors and multiple measured responses for each experimental scenario. The responses differ in their significance to the project- some are more important than others. He would like a way of visualising the data using a smaller number of responses, whilst taking into account some sort of weighting, preferably adjustable, that he would assign to the parent responses. What sort of approach is required here, and can JMP help?

dlehman1 · Mar 21, 2024 09:14 AM

I find this an interesting question. The usual interest in dimensionality reduction is in the predictors - and PCA, predictor screening , or clustering (all easily done in JMP) can be used there. But in those cases, the response variable is either given or irrelevant. You seem to be asking about reducing the number of potential response variables. I don't think this can be answered directly from the data - if some response variables are more important than others, that implies there is some overarching response variable (perhaps unmeasured) for which the multiple response variables you have are related to. I would think you need to specify how these variables contribute to the overall response.

As an example, I have done a number of analyses of the College Scorecard data. Among the many potential response variables are: earnings X years after graduation, total debt upon graduation, total repayment X years after graduation, total defaults X years after graduation, etc (all can be broken down by various graduate demographics). The most common item of interest is how college education affects lifetime earnings or financial well-being. That ultimate response variable is not easily measured, while any of the above response variables are measured. What is needed is something to link the measured response variables to the ultimate unmeasured thing of interest. A way to proceed might be to build a theory about how earnings and debt interact over time to produce financial well-being. That theory would provide weightings of the response variables and possibly the form of a model to predict financial well-being.

I don't think this is a task that can be answered from the data without a theory to provide structure to the multiple response variables, unlike the task of choosing/weighting predictor variables (which can be "answered" by analyzing the data). If this is a physical process (where the response variables are various measurements of strength, resilience, reliability, etc.), specifying the theory linking the response variables should be an easier task.