Discussions

ConfidenceOwl94 · Mar 26, 2026 10:13 PM

I am using below script to get robust Mean and Robust standard deviation. it works fine till 1M rows, but on a table with ~5M rows it took 45 sec. is there any other efficient way to get this ?

Names Default To Here( 1 );

// 0. Guard: ensure at least one data table is open
If( N Table() == 0,
    Throw( "No data table is open. Please open a data table first." )
);

dt = Current Data Table();

// 1. Ask user to select a numeric column
dlg = Column Dialog(
    yCol = Col List( "Select a Numeric Column", Min Col( 1 ), Max Col( 1 ), Data Type( "Numeric" ) )
);
If( dlg["Button"] == -1, Throw( "User cancelled" ) );
selectedCol = dlg["yCol"][1];
colName = selectedCol << Get Name;

t1 = Tick Seconds();


// 2. Run Distribution with Robust Mean and Robust Std Dev enabled
dist = dt << Distribution(
    Continuous Distribution(
        Column( selectedCol ),
        Outlier Box Plot Row Cutoff( 100000000 ),
        Customize Summary Statistics(
            Robust Mean( 1 ),
            Robust Std Dev( 1 )
        )
    ),
    Invisible
);

Wait(0);

// 3. Extract Robust Mean and Robust Std Dev from the Summary Statistics table
distRep   = dist << Report;
robustMean  = .;
robustSigma = .;

Try(
    // Navigate to the Summary Statistics outline box for the selected column
    summBox = distRep[Outline Box( colName )][Outline Box( "Summary Statistics" )];

    // Get the Name column (col 1) and Value column (col 2) from the table
    nameCol  = summBox[String Col Box( 1 )] << Get;
    valueCol = summBox[Number Col Box( 1 )] << Get;

    // Search for Robust Mean and Robust Std Dev rows by name
    For( i = 1, i <= N Items( nameCol ), i++,
        If( Contains( nameCol[i], "Robust Mean" ),
            robustMean = valueCol[i]
        );
        If( Contains( nameCol[i], "Robust Standard Deviation" ),
            robustSigma = valueCol[i]
        );
    );
    ,
    Print( "Warning: Could not navigate report structure. Check outline box names." );
);





// 4. Display results
New Window( "Robust Statistics Results",
    V List Box(
        Text Box( "Column:           " || colName ),
        Text Box( "Robust Mean:      " || (If( Is Missing( robustMean ),
            "Not available", Char( robustMean, 10, 4 ) )) ),
        Text Box( "Robust Std Dev:   " || (If( Is Missing( robustSigma ),
            "Not available", Char( robustSigma, 10, 4 ) )) )
    )
);



// ------------------------

t2 = Tick Seconds();
Print( "Total Ticks: " || Char( Round( t2 - t1, 3 ) ) || " seconds" );

txnelson · Mar 27, 2026 09:39 AM

I don't see any other method within just jmp of calculating the 2 robust statistics you need, however, you might want to consider using the Python interface to see if it would be faster on large tables. Here is an example of this:

Names Default To Here(1);

// --- PARAMETERS ---
dt = Open("$SAMPLE_DATA/Semiconductor Capability.jmp" );
colName = "npn1";  // <-- change to your column name

col = Column(dt, colName) << Get Values;
// Send column data to Python

Python Send( col );

c = 1.345;               // Huber tuning constant
tol = 1e-6;
maxIter = 50;

// Run Python code
Python Submit(
"
import numpy as np

x = np.array(col, dtype=float)
x = x[~np.isnan(x)]  # remove missing values

# --- Initialization ---
mu = np.median(x)
mad = np.median(np.abs(x - mu))
sigma = 1.4826 * mad if mad > 0 else np.std(x)

c = " || Char(c) || "
tol = " || Char(tol) || "
maxIter = " || Char(maxIter) || "

for _ in range(maxIter):
    u = (x - mu) / sigma
    
    # Huber weights
    w = np.ones_like(u)
    mask = np.abs(u) > c
    w[mask] = c / np.abs(u[mask])
    
    # Update mean
    mu_new = np.sum(w * x) / np.sum(w)
    
    # Update sigma
    u = (x - mu_new) / sigma
    sigma_new = sigma * np.sqrt(np.mean(np.minimum(u**2, c**2)))
    
    # Check convergence
    if abs(mu_new - mu) < tol and abs(sigma_new - sigma) < tol:
        break
    
    mu, sigma = mu_new, sigma_new

huber_mean = mu
huber_std = sigma
print( huber_mean, huber_std )
"
);

// Retrieve results from Python
huber_mean = Python Get(huber_mean);
huber_std = Python Get( huber_std);
show( huber_mean, huber_std);

Jim

jthi · Mar 27, 2026 10:35 AM

What is the purpose of calculating these statistics? Is the method JMP uses with Distribution platform (M-estimator, calculation heavy) the one you want to use or would some other method be fine which would be faster (median and IQR for example)? Or could you possibly sample your data first?

-Jarmo

Discussions

how to get robust Mean and Robust standard deviation faster?

Re: how to get robust Mean and Robust standard deviation faster?

Re: how to get robust Mean and Robust standard deviation faster?

Recommended Articles