Subscribe Bookmark RSS Feed

Best approach to fit large datasets

danly

Community Member

Joined:

Sep 1, 2014

Hi,

In many cases, I'm trying to fit datasets where the data is segmented in different groups. In the most simple case, I just want a linear fit on a group of a few dozen datapoints, but where the dataset includes a large number of such groups.

I have used the bivariate platform in the past, doing Fit Y by X, and using the "Group by..." option to separate the fits for each group. However, for large datasets, the fitting is very slow, and seems like the wrong approach. Another option is to use the Fit Model platform, and using the "By" option to separate by groups, but the fitting seems also fairly slow.

What is the best strategy to fit large datasets? What are the best methods for fitting large datasets efficiently, especially when the raw data plots for each fit is not necessary?

1 REPLY
alexw

Community Trekker

Joined:

Apr 25, 2014

Hi danly,

Try the Fit Model approach again, but with your grouping variable as a factor in the model rather than as a 'By' variable. Also include a term in the model for an interaction between the grouping variable and your 'X' variable, i.e. something like 'X * Group'. So your model factors would be 'X', 'Group', and 'X * Group'.

When you use the 'By' option, JMP will effectively fit a separate model for each group, which will be slow. If you set the group as a factor in the model, JMP will fit one model for the whole dataset, and the output will include a table of the relevant linear fit coefficients by group. You'll also be able to easily compare whether the fit for one group is significantly different to the fit for another group, via the Estimates -> Multiple Comparisons platform.