Discussions

frankderuyck · Sep 1, 2020 07:51 AM

I have a large data set with categorical survey data from > 800 participants. There are 10 categorical variables with 4 nominal levels each. Is there a jmp method for assigning/detecting outlier participants?

Mark_Bailey · Sep 1, 2020 08:09 AM

I am not aware of any such procedures. I will leave it to other users with more experience and knowledge to identify any.

Ignorance won't stop me from brainstorming some approaches. An outlier is an unusual or unexpected result. (Note that it is not necessarily wrong or a contaminated result.) So you could use a variety un-supervised and supervised learning methods that might isolate such cases.

Multiple correspondence analysis is designed to handle many categorical variables with many levels. The optimum scaling is based on the Chi square distance from the centroid. Outliers would perhaps separate in the plot.
Residual analysis with a multi-nominal generalized linear model might identify outliers.
Recursive partitioning will isolate such cases to a single node. You might have to adjust the minimum node size.

What kind of analysis or modeling were you planning? The method might include the identification of outliers.

frankderuyck · Sep 1, 2020 08:29 AM

On the dataset I will perform MCA. I noticed that a 2-dimension MCA plot is quite sensitive to removing/including paticipants to the data set so outlier detection/removal should occur carefully. Afer that with K-means clusters will be identified; K-means also is quite sensitive to outliers. So before starting the analysis outliers must be correctly detected and removed using the right statistical method & jmp tool.

frankderuyck · Sep 7, 2020 03:20 AM

I assume that there is no specific jmp method for categorical outlier detection?

Discussions

Detect outliers in a dataset with categorical variables

Re: Detect outliers in a dataset with categorical variables

Re: Detect outliers in a dataset with categorical variables

Re: Detect outliers in a dataset with categorical variables

Recommended Articles