Subscribe Bookmark RSS Feed

How to test differences in F1 scores?

dfalessi_calpol

Community Trekker

Joined:

Mar 15, 2016

Hi all,

  I've evaluated the performances of several classifications models (e.g., BayesNet, Random Forrest), on several datatsets, by measuring the F1 score (F1 score - Wikipedia, the free encyclopedia) achieved by a ten fold cross validation.My data, ha hence the following columns: classification model, dataset, F1 score.

Now I want to test if:

1)There is a statistical difference among predictors.

2)There is a statistical difference between the best predictor and all the others.

My approach would be to do:

-Fit y by x, with y=F1 and x=classification model

-Non  parametric Wilcoxon test: this will answer point 1.

-Non parametric multiple comparison: Wilcoxon test: this will answer point 2.

However, I see other tests than Wilcoxon and I wonder if what I am doing is correct.

Thanks for your help,

Davide

1 REPLY
michael_jmp

Staff

Joined:

Jun 23, 2011

Hello Davide,

I think your approach is reasonable.

Another method you might look at would be Oneway > Compare Means > With Best, Hsu MCB. This is a multiple comparison procedure that tests if each level of the X variable is significantly different than the "best" level. The output will show p-values for comparing all levels with the max as well as with the min.

You can find some information on the option in the Fit Y by X platform in JMP here: Compare Means

Best,
Michael

Michael Crotty
Sr Statistical Writer
JMP Development