@FR60 wrote:
Ciao Dan
thank you very much for your msg.
I have some comment to do....
You could just use stepwise regression (a classic).
Can it manipulate more than 1K predictors?
Yes. If you have the memory on your machine to handle large problems. I just ran a simple example with 10,000 observations and 2,000 predictors. Stepwise worked fine.
You could use variable clustering.
Can you give me more details on this tecnique on how to choose important predictors through clustering?
This would be an approach that is similar to principal components analysis, but instead of you looking at loading plots to see similar variables, JMP will cluster them for you automatically. You can then choose the variable that is most representative of the cluster or even create the "typical" variable for the cluster. This will help you avoid the "redundant information" you often see with many variables.
You could use Partition.
Tipically in our Fab we use it after removing not important predictors (let's consider that generally we have more than thousands predictors and a lot of noisy in our data ....)
There are many ways to use Partition, but think of that very first split. The approach needs to determine which split contains the most information. That is a variable selection. You could also use a trick that Dick DeVeaux calls "shaking the tree". Split many, many, times then look at the column contributions of the variables to identify the most important ones. So many ways to use this flexible platform!
You could use PCR (principal component regression).
We know this but we loose information on predictor meaning and then we prefer don't use it.
Understood. I am not a big fan of PCR for this reason. However, by looking at the loadings of the variables for only the significant principal components, you could possibly identify the original variables that are important. Use those important original variables to start building your model. There is nothing that says you must stick with the principal components.
There are graph-based methods: normal plots, pareto plot, bayes plot.
I don't know how to do feature selection with graph. Sorry.
Nothing to be sorry about. That is why we are creating the class. People are often unaware of these tools.
There is a new course being created right now called "JMP Software: Finding Important Predictors" that addresses this issue and covers these techniques, using both JMP and JMP Pro. This two day class will likely be available for delivery at customer locations starting in April.
This is a great news. I hope that will be available for Italy too. If yes for sure I will follow them.
All of the classes that we create at SAS in the U.S. are available to the international SAS offices, too. If they do not have an instructor that knows the topic, they can request one from another region that does have the skill set. Just ask your local SAS office for the training!
Rgds. Felice