cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar
TDF
TDF
Level II

How can I automate and summarize many repeat validations into one output table?

I am running discriminant analyses and using a stratified validation column to check the quality of my model.

Given the nature of the work (classifying origins of unknown samples) I want to make sure the model is robust.  Ideally I want to create 100 validation columns, run the analysis once on each, then create a single summary classification table - I don't want to manually open and copy 100 tables.

Here I'm looking at Classification Counts, but the solution could be for combining any group of identical output tables.

I'm running Pro v16 (I can update to v17 if it will make the task easier)

Many thanks in advance, Trevor

Discriminant(
	X( :Class for LDA ),
	Validation( :Validation 0·7 0·2 0·1 ),
	Y(
		:"7.2"n, :"3.14"n, :"1.92"n, :"1.9"n, :"1.7"n, :"1.72"n, :"1.88"n, :"1.74"n,
		:"1.76"n, :"1.78"n, :"0.98"n, :"0.96"n, :"1.82"n, :"2.4"n, :"2.12"n,
		:"1.68"n, :"1.84"n, :"7.92"n, :"2.08"n, :"1.8"n, :"1.86"n, :"1.04"n,
		:"3.12"n, :"1.06"n, :"0.94"n, :"2.42"n, :"8.4"n, :"2.46"n, :"7.24"n,
		:"2.36"n, :"1.08"n, :"2.38"n, :"2.34"n, :"8.46"n, :"2.44"n, :"8.44"n, :"1"n,
		:"2.48"n, :"1.48"n, :"1.66"n, :"1.94"n, :"2.14"n, :"8.42"n, :"3.16"n,
		:"8.52"n, :"7.46"n, :"2.54"n, :"3.02"n, :"8.38"n, :"3.18"n
	),
	Discriminant Method( "Wide Linear" ),
	Show Biplot Rays( 0 ),
	Canonical 3D Plot( 1, Frame3D( Set Rotation( -54, 0, 38 ) ) ),
	Show Classification Counts( 1 ),
	Use Matrix Columns( 1 ),
	Cross Validate by Excluded Rows( 1 )
)
1 ACCEPTED SOLUTION

Accepted Solutions
Victor_G
Super User

Re: How can I automate and summarize many repeat validations into one output table?

Hi @TDF,

 

Perhaps an easier process to help you :

1. Create a validation column formula (here on the Big Class dataset):

Victor_G_0-1684236282750.png

2. Use this formula in the model :

Victor_G_1-1684236450519.png

3. Depending on where/what you would like to have repetitions of the model fitting, you can right click on the metrics/estimates you want to repeat and click on "Simulate" (in my example I'm doing it on the "Parameter Estimates" panel values) : 

Victor_G_2-1684236571966.png

You can specify the number of samples you want and fix a random seed (if you want reproducible results) :

Victor_G_3-1684236637896.png

4. After clicking on "OK", the model is fitted 100 times by varying the training and validation sets (according to the stratified sampling formula), and you directly have a summary table (see example attached) with the 100 results of the models, where you can then display the results graphically with the Distributions (script automatically generated in this table) or Graph Builder platforms (example here with the Distributions platform, on the parameter estimates for intercept and weight) : 

Victor_G_4-1684236876989.png

 

You can create these steps very easily, and then use the log to copy the scripts generated in order to automatize the steps/analysis.

Hope this answer will help you,

Victor GUILLER
Scientific Expertise Engineer
L'Oréal - Data & Analytics

View solution in original post

5 REPLIES 5
Byron_JMP
Staff

Re: How can I automate and summarize many repeat validations into one output table?

I suppose you could add 100 validation columns by iterating this:

 

Names Default To Here( 1 );
dt = Open( "$SAMPLE_DATA/Big Class.jmp" );
dt << New Column( "Validation",
	numeric,
	"Nominal",
	Formula( Make Validation Formula( [.6, .2, .2] ) ),
	Set Property( "Value Labels", {0 = "Training", 1 = "Validation", 2 = "Test"} )
);

Then fit your model, iterating through each validation column, and saving the summary table for each one, saving them all to a folder. 

 

Then open and concatenate the files using multiple file open.

 

...or maybe...

open your table, add the validation column, run the model, save the summary, close the table with no-save, and iterate through that 100 times.  

Then use multiple file open and concat the tables to get your overall summary.

JMP Systems Engineer, Health and Life Sciences (Pharma)
TDF
TDF
Level II

Re: How can I automate and summarize many repeat validations into one output table?

Thanks for the reply,.  As I hadn't heard anything I have been working on my own solution that I just completed last Friday.

I start as you suggest by creating 100 validation columns.

I step through the analysis for each one extracting the table I require from each report as a data table..

I join each table into my masterfile, then close the report and data table, so although I generate 200 files I only have 3 open at once.  I can use the Table stack feature to extract the data I need from all the required new columns by searching on a keyword in the generated column and analyze the resulting data.  The only part I haven't managed (which would save the stacking is to export just the column I need from the report table) I've tried many times but failed.

Report( discriminant[1] )[Outline Box("Probabilities to Each Group")] [Table Box (1)] << Make Data Table( "discrim_table " || char(i) );

 I identified the table from the report is called "Probabilities to Each Group".  These are 4 columns in this table (for my data) and I'm interested in extracting the third column called "AAAAAA" for blinded purposes as the label.  The properties are clear this is the Data Table Col Box( :AAAAA) and this col box is nested under the "probabilities to Each Group".  I know I could script to delete these extra columns from the masterfile, but this seems inefficient.  I've tried NumberColBox("AAAAA") etc., but no joy, does anyone have any ideas? 

Victor_G
Super User

Re: How can I automate and summarize many repeat validations into one output table?

Hi @TDF,

 

Perhaps an easier process to help you :

1. Create a validation column formula (here on the Big Class dataset):

Victor_G_0-1684236282750.png

2. Use this formula in the model :

Victor_G_1-1684236450519.png

3. Depending on where/what you would like to have repetitions of the model fitting, you can right click on the metrics/estimates you want to repeat and click on "Simulate" (in my example I'm doing it on the "Parameter Estimates" panel values) : 

Victor_G_2-1684236571966.png

You can specify the number of samples you want and fix a random seed (if you want reproducible results) :

Victor_G_3-1684236637896.png

4. After clicking on "OK", the model is fitted 100 times by varying the training and validation sets (according to the stratified sampling formula), and you directly have a summary table (see example attached) with the 100 results of the models, where you can then display the results graphically with the Distributions (script automatically generated in this table) or Graph Builder platforms (example here with the Distributions platform, on the parameter estimates for intercept and weight) : 

Victor_G_4-1684236876989.png

 

You can create these steps very easily, and then use the log to copy the scripts generated in order to automatize the steps/analysis.

Hope this answer will help you,

Victor GUILLER
Scientific Expertise Engineer
L'Oréal - Data & Analytics
TDF
TDF
Level II

Re: How can I automate and summarize many repeat validations into one output table?

Thanks Victor.

I think I should have been clearer in my reply.  I have solved my issue and the coding is complete.

It would be better to only add the 100 columns (extracted from the reports) that I need, rather than extracting full tables (400 columns) and having to select every fourth column starting from the third.

For me its now intellectual interest, could I extract specific columns from a table within a report?  It might not be possible or it might be useful to other community members if it is.

Thanks to both Byron and yourself for taking the time to post helpful options.

Victor_G
Super User

Re: How can I automate and summarize many repeat validations into one output table?

Hi @TDF,

 

Sorry for the misunderstanding, I thought the case was still open.

 

As the discussion was centered around the creation of 100 individual validation columns, I thought that an easier (and probably more elegant) solution might be to use the "Simulate" feature available in any platform with the validation column formula, in order to automatically repeat the model fitting with various training and validation sets, get access to a summary table displaying the results of these repetitions, without expanding the datatable with a lot of validation columns and having to concatenate each individual results manually. 

Glad you have found a workaround with scripting.

 

Concerning your question, it's possible to extract parts of the report as a datatable : right-click in the panel/section you are interested in and then select "Make Into Data Table" or "Make Combined Data Table" (if you have several responses for example). The script looks like this on Big Class Dataset : 

// Make TableBox into a Data Table

Local( {obj},
	obj = Data Table( "Big Class" ) << Fit Model(
		Y( :height ),
		Effects( :age, :sex, :weight ),
		Personality( "Standard Least Squares" ),
		Emphasis( "Effect Leverage" ),
		Run(
			:height << {Summary of Fit( 1 ), Analysis of Variance( 1 ),
			Parameter Estimates( 1 ), Scaled Estimates( 0 ),
			Plot Actual by Predicted( 1 ), Plot Regression( 0 ),
			Plot Residual by Predicted( 1 ), Plot Studentized Residuals( 0 ),
			Plot Effect Leverage( 1 ), Plot Residual by Normal Quantiles( 0 ),
			Box Cox Y Transformation( 0 )}
		),
		SendToReport(
			Dispatch( {"Response height"}, "age", OutlineBox, {Close( 1 )} ),
			Dispatch( {"Response height"}, "sex", OutlineBox, {Close( 1 )} ),
			Dispatch( {"Response height"}, "weight", OutlineBox, {Close( 1 )} )
		)
	);
	Report( obj )["Response height", "Whole Model", "Parameter Estimates",
	Table Box( 1 )] << Make Into Data Table;
	obj << Close Window;
)

But I don't see an option to specifically select some columns from this panel report. Looking at the properties, I would have to specifically ask to not display and export in the new datatable a certain NumberColBox, for example :

 

Victor_G_0-1684307772451.png

Maybe a more advanced Scripter user may help for this question (If I understood it well). Or it may be more simple in the script to directly add some code to remove non-useful column based on a specific property (name, index, ...). There are also interesting answers on this topic here : Solved: Script to export AIC into new data table - JMP User Community

 

I hope this will help you,  

All the best,

Victor GUILLER
Scientific Expertise Engineer
L'Oréal - Data & Analytics