Hi - I am trying to fit a regression equation using a random 80% of data and, using the resulting modeled equation, run the remaining 20% of data through the model and save the observed versus modeled results of those 20%. That is, the 20% of rows are used for validation.
My dataset has 227 rows. I want to train (i.e. develop a regression equation) using 80% of those after which that regression equation would be run on the remaining 20% of rows to compute observed vs modeled for those 20% of rows.
My script is not working properly - it excludes 20% of rows correctly but then runs all 100% of rows and saves those. I show the code below but have removed the portion that computes various metrics about observed vs modeled results.
Any help would be most appreciated! (I'm using jmp pro 17.0)
dt = Open("C:\Users\trcampbell\Desktop\MASTER ECOLI\2025\2024.xlsx");
Random_rows = dt << Select Randomly(0.2) << Hide and Exclude();
New Column("Excluded", Numeric, ordinal);
:Excluded << Set Formula(If(Excluded(), 1, 0));
dt << Sort(By(:Excluded), Replace Table, Order(Descending), Copy formula(0));
/////////////......LOG ECOLI VS LOG TURB....LINEAR FIT....../////////////////////
obj = Fit Model(
Y(:LOGECOLI),
Effects(:LOGTURB),
Personality("Standard Least Squares"),
Emphasis("Effect Leverage"),
Run(:LOGECOLI)
);
obj << Prediction Formula;
ref = obj <<
dt << save(
"C:\Users\trcampbell\Desktop\MASTER ECOLI\2025\validation results\test xxxxx1 with 126 and 886.xls"
);