Confidence interval for a data value

Aug 21, 2018 1:57 PM
(2024 views)

Hi,

I have a distribution of an output that has both positive and negative values. If data value is negative, it forms one group and if positive forms another group. How one finds the confidence interval for selecting the group correctly. For instance, the negative group min is -0.05, should I consider that as negative group? How do you find the confidence interval so that you set a criteria for slecting each group correctly? Thanks.

4 REPLIES 4

Re: Confidence interval for a data value

I am confused by your question.

I am not sure if you are dealing with measurement errors, outliers, or truly a mix of two distributions. For multivariate data, cluster distance, Mahalanobis distances, etc. are used. For a single variable, I'd look at mixtures.

If you are looking at the negative values as outliers you can compute a tolerance interval, 95% Confidence of 95% coverage or something to that effect. Or

For a single variable that is a mix of 2 distributions and if the mix of data is somewhat smooth, you might try fitting a Normal Mixture of 2.

Below is an example of using one variable Petal length from the Iris sample data table. I chose to fit a Normal Mixture with 3 groups since I know there are 3 species. I chose this beacuse it displays the centers (means), confidence intervals for the mean, dispersion and what portion of the data is estimated to belong to each group. Having the mean and dispersion you could compute a zscore/probability of belonging to each group.

```
Names Default to Here(1);
dt = Open("$Sample_data/Iris.jmp");
dist = dt << Distribution(
Continuous Distribution(
Column( :Petal length ),
Horizontal Layout( 1 ),
Vertical( 0 ),
Fit Distribution(
Normal Mixtures(
Diagnostic Plot( Median Reference Line( 0 ) ),
Clusters( 3 )
)
),
Customize Summary Statistics(
Robust Mean( 1 ),
Robust Std Dev( 1 ),
Set Alpha Level( 0.05 )
)
)
);
```

Otherwise, you need to clarify your request.

Re: Confidence interval for a data value

Re: Confidence interval for a data value

Looking at your data, you have some extremes that are affecting the fit and the dispersion estimates. Where do the red an green come from? Do you know there are two mixtures in your data? And if yes, are you looking at missclassification at the cutpoint?

From experience, it is dangerous to interpret data without knowing the data context and what is the real question.

- I still do not understand your request "confidence interval for excatly value of zero."
- The fitted normal does not fit. If you have a column representing Red and Green, say it is called GRP. Use Fit Y by X for Y (:Delta SNR) and X(:GRP) and create a normal quantile plot.
- Delta SNR suggests a difference. If there two variables used in the computation of ths Delta, multivariate methods like k cluster methods.

In other words, I will advise due to lack of information. Sorry.

Re: Confidence interval for a data value

