I've evaluated the performances of several classifications models (e.g., BayesNet, Random Forrest), on several datatsets, by measuring the F1 score (F1 score - Wikipedia, the free encyclopedia) achieved by a ten fold cross validation.My data, ha hence the following columns: classification model, dataset, F1 score.
Now I want to test if:
1)There is a statistical difference among predictors.
2)There is a statistical difference between the best predictor and all the others.
My approach would be to do:
-Fit y by x, with y=F1 and x=classification model
-Non parametric Wilcoxon test: this will answer point 1.
-Non parametric multiple comparison: Wilcoxon test: this will answer point 2.
However, I see other tests than Wilcoxon and I wonder if what I am doing is correct.
Another method you might look at would be Oneway > Compare Means > With Best, Hsu MCB. This is a multiple comparison procedure that tests if each level of the X variable is significantly different than the "best" level. The output will show p-values for comparing all levels with the max as well as with the min.
You can find some information on the option in the Fit Y by X platform in JMP here: Compare Means
Michael Crotty Sr Statistical Writer JMP Development