I have a distribution of an output that has both positive and negative values. If data value is negative, it forms one group and if positive forms another group. How one finds the confidence interval for selecting the group correctly. For instance, the negative group min is -0.05, should I consider that as negative group? How do you find the confidence interval so that you set a criteria for slecting each group correctly? Thanks.
I am confused by your question.
I am not sure if you are dealing with measurement errors, outliers, or truly a mix of two distributions. For multivariate data, cluster distance, Mahalanobis distances, etc. are used. For a single variable, I'd look at mixtures.
If you are looking at the negative values as outliers you can compute a tolerance interval, 95% Confidence of 95% coverage or something to that effect. Or
For a single variable that is a mix of 2 distributions and if the mix of data is somewhat smooth, you might try fitting a Normal Mixture of 2.
Below is an example of using one variable Petal length from the Iris sample data table. I chose to fit a Normal Mixture with 3 groups since I know there are 3 species. I chose this beacuse it displays the centers (means), confidence intervals for the mean, dispersion and what portion of the data is estimated to belong to each group. Having the mean and dispersion you could compute a zscore/probability of belonging to each group.
Names Default to Here(1); dt = Open("$Sample_data/Iris.jmp"); dist = dt << Distribution( Continuous Distribution( Column( :Petal length ), Horizontal Layout( 1 ), Vertical( 0 ), Fit Distribution( Normal Mixtures( Diagnostic Plot( Median Reference Line( 0 ) ), Clusters( 3 ) ) ), Customize Summary Statistics( Robust Mean( 1 ), Robust Std Dev( 1 ), Set Alpha Level( 0.05 ) ) ) );
Otherwise, you need to clarify your request.
Thank you so much for the help and recommendations. I used two normal fixture fit and have attached the result. I like to know what is the confidence interval for exactly value of zero (where green and read color are seperated in diagnostic plot). Thanks again.
Looking at your data, you have some extremes that are affecting the fit and the dispersion estimates. Where do the red an green come from? Do you know there are two mixtures in your data? And if yes, are you looking at missclassification at the cutpoint?
From experience, it is dangerous to interpret data without knowing the data context and what is the real question.
In other words, I will advise due to lack of information. Sorry.
Thanks agin for your help. I have attached the table for the Delta SNR and its category. The Delta SNR was calculated based on some design of experiment and it was the criteria that if SNR is positive, we categorized as "Keep" and if negative categorize as "Remove". The SNR = 0 is the decision boundar and my original questions was to see if there are any margin of error that can be obtained. For instance if SNR =-0.005, should I classify it as "Remove" or not. I hope I am clear now. Thanks
There are no labels assigned to this post.