Discussions

xmq

I fitted:

Y ~ A(Sum) + B(Sum) + C(Sum) + D(Sum) + Subject(&Random)

in both JMP SLS (REML) and Python statsmodels MixedLM (REML) on the same data (~14k obs, ~200 levels of A, sparse A×C cross-tabulation, ~120 random subjects).

What I observe:

All ~200 LS Means for factor A are shifted by a constant ~0.22 (JMP higher)
Relative effects and rankings are identical (diff std < 0.001)
LS Means for different factors in JMP have different grand means (A → 78.5, B/D → 73.0), which shouldn't happen with Sum coding
A different Y on the same X gives essentially no offset (0.006)

What I need help with:

Is there a known reason why JMP's SLS personality (REML) would produce a different intercept than standard GLS β̂ = (X'V⁻¹X)⁻¹X'V⁻¹y? Is SLS using a different algorithm than the Mixed Model personality for computing fixed effects or LS Means?

JMP Pro 17, Windows.

MRB3855 · | Posted in reply to message from xmq 05-06-2026

Hi @xmq , and welcome! Can you show the pertinent output from each please? It will help folks here diagnose what might be going on; I have a few thoughts but I don't want us to get ahead of ourselves.

xmq · | Posted in reply to message from MRB3855 05-06-2026

Hi @MRB3855 , thanks for your reply!

Here shares my dataset and JMP/Python scripts and outputs for checking,

For JMP

Fit Model(
	Y( :Y1 ),
	Effects( :A, :B, :C, :D ),
	Random Effects( :Subject ),
	NoBounds( 1 ),
	Personality( "Standard Least Squares" ),
	Method( "REML" ),
	Emphasis( "Effect Leverage" ),
	Run(
		:Y1 << {Summary of Fit( 1 ), Analysis of Variance( 0 ),
		Parameter Estimates( 1 ), Scaled Estimates( 0 ),
		Plot Actual by Predicted( 0 ), Plot Regression( 0 ),
		Plot Residual by Predicted( 0 ), Plot Studentized Residuals( 0 ),
		Plot Effect Leverage( 0 ), Plot Residual by Normal Quantiles( 0 ),
		{:A << {LSMeans Student's t( 0.05 )}}}
	)
);

For Python

def fit_model(df, y_col):
    """Fit Y ~ A(Sum) + B(Sum) + C(Sum) + D(Sum) + (1|Subject) via REML."""
    # Note: column "C" conflicts with patsy's C() function,
    # so we temporarily rename it for formula construction.
    df_tmp = df.rename(columns={'C': '_C_'})
    formula = '1 + C(A, Sum) + C(B, Sum) + C(_C_, Sum) + C(D, Sum)'
    X_mat = dmatrix(formula, df_tmp)
    design_info = X_mat.design_info
    X = pd.DataFrame(np.asarray(X_mat), columns=design_info.column_names)
    y = df[y_col].values
    groups = df['Subject']

    model = MixedLM(y, X, groups=groups)
    result = model.fit(reml=True, method='powell')
    return result, X, design_info, df_tmp


def compute_lsmeans_A(result, X, df_tmp, design_info):
    """Compute LS Means for factor A (Sum coding → intercept + effect)."""
    fe = result.fe_params.values
    n_fe = len(fe)
    col_names = list(X.columns)

    target_prefix = 'C(A, Sum)'
    target_col_idx = [i for i, c in enumerate(col_names) if target_prefix in c]

    base_contrast = np.zeros(n_fe)
    base_contrast[0] = 1.0  # intercept

    levels = sorted(df_tmp['A'].unique())
    lsmeans = {}

    for level in levels:
        sr = df_tmp.iloc[0:1].copy()
        sr['A'] = level
        X_single = np.asarray(build_design_matrices([design_info], sr)[0])
        c = base_contrast.copy()
        for idx in target_col_idx:
            c[idx] = X_single[0, idx]
        lsmeans[level] = float(c @ fe)

    return lsmeans

I'd appreciate any comments/suggestions you might have.

Discussions

Why does SLS (REML) give a different intercept than Python statsmodels for the same mixed model?

Re: Why does SLS (REML) give a different intercept than Python statsmodels for the same mixed model?

Re: Why does SLS (REML) give a different intercept than Python statsmodels for the same mixed model?

Recommended Articles