cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Browse apps to extend the software in the new JMP Marketplace
Choose Language Hide Translation Bar
Statexplorer
Level III

Outliers Elimination

Hi Jmp team,

 

I have a production data set I have to analyse 5 columns which contains almost 5000 rows.

Almost 80% of them were outliers, and they were inside the specification limit.

 

Now I wanted to make a box plot comparison between these 5 columns to see a trend over the days.

 

In this case do I need to eliminate all the outliers whatever is there or should I keep and them for comparison?

4 REPLIES 4
jthi
Super User

Re: Outliers Elimination

Why not do both to try test out your options? You can create new column in which you mark the rows as outlier and use that in the visualization. In some cases you might want to stack your data, so you can just exclude outliers from single measurement instead of all of them (wide vs tall data).

-Jarmo
MRB3855
Super User

Re: Outliers Elimination

Hi @Statexplorer :Gotta admit, my first thought when reading your question was "How can 80% of your data be outliers?"

Can you shed some light on this? If you are using some sort of "outlier" test based on normality, and 80% are in the tails, then your data aren't normally distributed and your outlier test is not appropriate.  

 

Sorry, I just can't get my head around the concept of "80% of my data are outliers". i.e., outliers are "unusual" observations; how can 80% of them be "unusual"? 

Statexplorer
Level III

Re: Outliers Elimination

Thanks for suggestions I used IQR method for outlier test and to say 80 % of is data outlier when I saw the box most of the points were outliers
MRB3855
Super User

Re: Outliers Elimination

Hi @Statexplorer : The IQR method is based on normality of the data.  If 80% of the data are at the extremes, then the data aren't normally distributed and those 80% are not "outliers". Conceptually, the majority of your data cannot be "unusual" (by the definition of unusual).

 

How then do you identify outliers from non-normal/heavy-tailed/skewed distributions? This is very complex question with no easy answer. 

 

If you share some details about your data etc. perhaps a fit-for-purpose solution can be found.