cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar

Developer Tutorial: Selecting the Appropriate JMP Pro Generalized Regression Distribution for Your Response

Published on ‎11-07-2024 03:30 PM by Staff | Updated on ‎11-07-2024 05:40 PM

Background:

  • Simple Linear Regression assume errors (and response) are normal, but when normality isn't the case, predictions may fall outside of meaningful range (maybe not a big deal) and inference is not reliable (probably a bigger deal)
  • Generalized Linear Models (GLM) assume some other distribution than normality, for example for:
    • Count data (e.g., number of defects on a product)

    • Skewed data (e.g., salaries)

    • Proportions

    • Labels (e.g., good/neutral/bad or yellow/blue/green)

  • Generalized Linear Models (GLM) have three ingredients
    • A distribution for the response given the predictors (the random piece)
    • A linear predictor (the systematic piece)
    • A link function (the piece that random and systematic pieces)
  • Use the R-square to compare models within a distribution. 

  • Use AICc and BIC (information criteria.) to compare between distributions

    • AICc and BIC estimate the Kullback-Leibler divergence, which is the distance from the fitted model to the truth

    • Use them to compare models within the same distribution and across different distributions

    • Rule of thumb: AIC tends to overfit and BIC tends to underfit

  • General guidelines for choosing Distributions for Continuous Response
    • Do we have negative values? Use normal

    • Is it bound to (0,1)?  Use beta

    • Does variance increase with the mean?  Use gamma, Weibull, lognormal

    • Is it time-to-event/censored? Probably use Weibull or lognormal

    • A pretty good catch-all? Use normal

    • Do we suspect that we have outliers? Use Cauchy or t(5)

  • Choosing Distribution when response isn’t numeric
    • Is it two-level? Use the binomia; (e.g., Yes/No or A/B)
    • Is it 3+ levels and order matters? Use Ordinal logistic. (eg., Low/Medium/High or Small/Medium/Large
    • Is it 3+ levels and order doesn’t matter?  Use the Multinomial (e.g.,Pizza/Hamburger/Burrito or Red/Blue/Green/Orange)

See how to choose, specify and build, compare, and evaluate models using Generalized Regression in JMP Pro. Q&A and is included throughout the presentation.

 

 

Resources:

 

 



Start:
Fri, Nov 19, 2021 02:00 PM EST
End:
Fri, Nov 19, 2021 03:00 PM EST
Labels (1)
Attachments
0 Kudos
0 Comments