Subscribe Bookmark



Jun 23, 2011

What’s new in JMP training? Categorical data analysis!

The JMP training curriculum is diverse and complex. We offer a range of training for beginners and for advanced users. In addition to our ongoing effort to keep all of the training up to date with new editions, we also create new courses as our user needs evolve. I want to tell you about one of them, a brand new course about categorical data analysis. We hope that this new course will be as helpful and as popular as our course about ANOVA and regression.

Why is categorical data analysis so important?

It is important because categorical data is abundant across many fields of work. To be clear, categorical data may be numeric or character, and represents a finite set of discrete levels. Examples include gender, political party affiliation, religious affiliation, mortality and morbidity outcomes, product testing outcomes such as pass, fail, or retest, survey responses, market segments, academic test grades, and so on. Whenever you record qualitative attributes, observe membership in a group, or note assignment to a particular category, then you have categorical data.

Why is categorical data analysis so different?

There are many methods for continuous or measurement data, such as descriptive statistics (e.g., mean, standard deviation), analysis of variance (ANOVA), and linear regression, that are familiar to most analysts. Analysts sometimes mistakenly use numeric codes for the categorical levels in order to use these methods that are intended for continuous responses. That is a shame because there are always analogous methods for categorical data. Instead of the descriptive statistics for continuous data, we are interested in proportions or probabilities for each categorical level. Instead of correlation between two continuous variables, we want to know about associations between the categorical response and other variables. Logistic regression relates predictor variables to a categorical response in a natural way that respects the discreteness of the levels. Importantly, JMP unifies our tool set so that our treatment of categorical data is very similar to that of continuous data, only adapted to the unique character of categorical data. That is, we can expect to have state of the art methods of estimation, statistics for inference and for model selection, and the same model exploitation and visualization for categorical responses as we enjoy for continuous responses. We emphasize these similarities in the training.

What kinds of categorical data analysis are covered in this course?

The course starts with a quick survey of categorical data and reviews its characteristics that distinguish it from continuous data. Descriptive statistics using the Distribution platform and associations using the Contingency platform begin the lessons about analysis. Stratified associations and exact tests of association extend this analysis. Modeling and model selection are introduced using the Partition platform complete the first chapter. The second chapter is about logistic regression. It begins with an introduction to likelihood methods for fitting model parameters, inference, and model selection. The rest of the second chapter covers the application of binary, nominal, and ordinal logistic regression models. The third chapter introduces generalized linear models (GLM), which include categorical responses. The binary logistic regression case is reviewed and solved using a GLM instead. The last example shows a common categorical response, counts, that is handled using Poisson log-linear regression, another GLM.

Who is this course made for?

Any JMP analyst who encounters categorical data is a good candidate for this training. This course is advanced and builds upon the lessons of prerequisite courses. A student in this course should be comfortable using JMP (e.g., have taken the Data Exploration course) and statistical methods of inference and modeling with continuous responses (as covered in the ANOVA and Regression course).

When can I take the course?

We are offering a spring public class on April 30 through May 3 and a fall public class on November 13-16. Both of these classes will be conducted over the Internet (Live Web format).

Community Member

Dick Schaertl wrote:

This sounds like an interesting class. Will it delve into applying polynomial contrasts to Least Squares and GLM factor analyses to test model fit? Beforer reading this web page, I emailed Laura Archer, JMP Tech Support, with a similar request. The email subject is pasted here:

RE: [SAS 7610749983] GLM orthogonal polynomial contrasts

Mark Bailey wrote:

Thank you for reading my post and for these great questions. I will answer both of them in turn.

We do not cover contrasts in the new course about categorical responses. We used to cover contrasts in one of our courses about continuous responses, . JMP makes it easy to test one or more contrasts with the Fit Least Squares platform (Analyze > Fit Model). This important topic was eliminated, though, in a recent edition of that course simply due to limited time and lack of interest. Our courses are typically two days in length, so we must 'draw a line' in terms of the scope of topics and the depth of coverage in every course. Our customer feedback told us that the users who attend our training classes would rather spend the limited time on other topics. Of course, in a particular class, if students want to know about a topic like contrasts, which is not included, we would address it through answers to questions.

Contrasts are also available in JMP with the Generalized Linear Models platform, accessed through Analyze > Fit Model. So, while we do not cover contrasts in the new course, we would explain them and show how to get them in JMP, if students asked about them.