cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Submit your abstract to the call for content for Discovery Summit Americas by April 23. Selected abstracts will be presented at Discovery Summit, Oct. 21- 24.
Discovery is online this week, April 16 and 18. Join us for these exciting interactive sessions.
Choose Language Hide Translation Bar
frankderuyck
Level VI

Detect outliers in a dataset with categorical variables

I have a large data set with categorical survey data from > 800 participants. There are 10 categorical variables with 4 nominal levels each. Is there a jmp method for assigning/detecting outlier participants? 

3 REPLIES 3

Re: Detect outliers in a dataset with categorical variables

I am not aware of any such procedures. I will leave it to other users with more experience and knowledge to identify any.

 

Ignorance won't stop me from brainstorming some approaches. An outlier is an unusual or unexpected result. (Note that it is not necessarily wrong or a contaminated result.) So you could use a variety un-supervised and supervised learning methods that might isolate such cases.

 

  • Multiple correspondence analysis is designed to handle many categorical variables with many levels. The optimum scaling is based on the Chi square distance from the centroid. Outliers would perhaps separate in the plot.
  • Residual analysis with a multi-nominal generalized linear model might identify outliers.
  • Recursive partitioning will isolate such cases to a single node. You might have to adjust the minimum node size.

 

What kind of analysis or modeling were you planning? The method might include the identification of outliers.

frankderuyck
Level VI

Re: Detect outliers in a dataset with categorical variables

On the dataset I will perform MCA. I noticed that a 2-dimension MCA plot is quite sensitive to removing/including paticipants to the data set so outlier detection/removal should occur carefully. Afer that with K-means clusters will be identified; K-means also is quite sensitive to outliers. So before starting the analysis outliers must be correctly detected and removed using the right statistical method & jmp tool. 

frankderuyck
Level VI

Re: Detect outliers in a dataset with categorical variables

I assume that there is no specific jmp method for categorical outlier detection?