cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
JMP is taking Discovery online, April 16 and 18. Register today and join us for interactive sessions featuring popular presentation topics, networking, and discussions with the experts.
Choose Language Hide Translation Bar
AT
AT
Level V

Confidence interval for a data value

Hi,

I have a distribution of an output that has both positive and negative values. If data value is negative, it forms one group and if positive forms another group. How one finds the confidence interval for selecting the group correctly. For instance, the negative group min is -0.05, should I consider that as negative group? How do you find the confidence interval so that you set a criteria for slecting each group correctly? Thanks.

4 REPLIES 4
gzmorgan0
Super User (Alumni)

Re: Confidence interval for a data value

I am confused by your question. 

 

I am not sure if you are dealing with measurement errors, outliers, or truly a mix of two distributions.  For multivariate data, cluster distance, Mahalanobis distances, etc. are used.  For a single variable, I'd look at mixtures.

 

If you are looking at the negative values as outliers you can compute a tolerance interval, 95% Confidence of 95% coverage or something to that effect. Or 

 

For a single variable that is a mix of 2 distributions and if the mix of data is somewhat smooth, you might try fitting a Normal Mixture of 2. 

 

Below is an example of  using one variable Petal length from the Iris sample data table. I chose to fit a Normal Mixture with 3 groups since I know  there are 3 species. I chose this beacuse it displays the centers (means), confidence intervals for the mean, dispersion and what portion of the data is estimated to belong to each group. Having  the mean and dispersion you could compute a zscore/probability of belonging to each group. 

 

 image.png

Names Default to Here(1);

dt = Open("$Sample_data/Iris.jmp");

dist = dt << Distribution(
	Continuous Distribution(
		Column( :Petal length ),
		Horizontal Layout( 1 ),
		Vertical( 0 ),
		Fit Distribution(
			Normal Mixtures(
				Diagnostic Plot( Median Reference Line( 0 ) ),
				Clusters( 3 )
			)
		),
		Customize Summary Statistics(
			Robust Mean( 1 ),
			Robust Std Dev( 1 ),
			Set Alpha Level( 0.05 )
		)
	)
);

Otherwise, you need to clarify your request.

 

 

AT
AT
Level V

Re: Confidence interval for a data value

Thank you so much for the help and recommendations. I used two normal fixture fit and have attached the result. I like to know what is the confidence interval for exactly value of zero (where green and read color are seperated in diagnostic plot). Thanks again.2mixture.png

gzmorgan0
Super User (Alumni)

Re: Confidence interval for a data value

Looking at your data, you have some extremes that are affecting the fit and the dispersion estimates. Where do the red an green come from? Do you know there are two mixtures in your data? And if yes, are you looking at missclassification at the cutpoint?

 

From experience, it is dangerous to interpret data without knowing the data context and what is the real question.

  • I still do not understand your request "confidence interval for excatly value of zero." 
  • The fitted normal does not fit. If you have a column representing Red and Green, say it is called GRP. Use  Fit Y by X for Y (:Delta SNR) and X(:GRP) and create a normal quantile plot.
  • Delta SNR suggests a difference. If there two variables used in the computation of ths Delta, multivariate methods like k cluster methods. 

In other words, I will advise due to lack of information. Sorry.  

AT
AT
Level V

Re: Confidence interval for a data value

Thanks agin for your help. I have attached the table for the Delta SNR and its category. The Delta SNR was calculated based on some design of experiment and it was the criteria that if SNR is positive, we categorized as "Keep" and if negative categorize as "Remove". The SNR = 0 is the decision boundar and my original questions was to see if there are any margin of error that can be obtained. For instance if SNR =-0.005, should I classify it as "Remove" or not. I hope I am clear now. Thanks