Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

- JMP User Community
- :
- Discussions
- :
- Detect outliers in a dataset with categorical variables

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Detect outliers in a dataset with categorical variables

Sep 1, 2020 4:51 AM
(159 views)

I have a large data set with categorical survey data from > 800 participants. There are 10 categorical variables with 4 nominal levels each. Is there a jmp method for assigning/detecting outlier participants?

3 REPLIES 3

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Detect outliers in a dataset with categorical variables

I am not aware of any such procedures. I will leave it to other users with more experience and knowledge to identify any.

Ignorance won't stop me from brainstorming some approaches. An outlier is an unusual or unexpected result. (Note that it is not necessarily wrong or a contaminated result.) So you could use a variety un-supervised and supervised learning methods that might isolate such cases.

- Multiple correspondence analysis is designed to handle many categorical variables with many levels. The optimum scaling is based on the Chi square distance from the centroid. Outliers would perhaps separate in the plot.
- Residual analysis with a multi-nominal generalized linear model might identify outliers.
- Recursive partitioning will isolate such cases to a single node. You might have to adjust the minimum node size.

What kind of analysis or modeling were you planning? The method might include the identification of outliers.

Learn it once, use it forever!

Highlighted
##

On the dataset I will perform MCA. I noticed that a 2-dimension MCA plot is quite sensitive to removing/including paticipants to the data set so outlier detection/removal should occur carefully. Afer that with K-means clusters will be identified; K-means also is quite sensitive to outliers. So before starting the analysis outliers must be correctly detected and removed using the right statistical method & jmp tool.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Detect outliers in a dataset with categorical variables

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Detect outliers in a dataset with categorical variables

I assume that there is no specific jmp method for categorical outlier detection?