Summary Statistics of all numeric columns by grouping variable

vikramavtar · Oct 22, 2024 04:01 PM

I borrowed a section of another JSL that someone wrote a while back, but unfortunately has left the company now.

Original JSL creates box plots along with a table with summary statistics of all numeric columns by grouping variable ("split").

My goal is to get only the summary table with Summary Statistics (Mean, Std-dev) of all numeric columns by grouping variable.

I am using JMP16.

Sample input data table has 200 numeric columns, thousands of rows with 5 unique value of grouping character variable ("split").

output summary table should have 200 rows with 5 columns for Mean & 5 columns for std-dev, where each of the new columns (10) is Mean/Std-dev of each numeric column grouped by each unique value of grouping character variable ("split").

Thank you

vikramavtar · Oct 22, 2024 04:06 PM

Forgot to post the error I get with the JSL

Thanks

txnelson · Oct 22, 2024 1:50 PM

The JSL for a JMP Platform that generates an Analysis of Variance is missing from the code. The way the variable "vv" is being used in your supplied JSL seems to be pointing to the Output Display from a Oneway analysis or a Fit Model. I find no such reference in the code you supplied.

I have made a guess and added in a Oneway to the code. It generates output, but you need to verify that it is what you want.

Input = Dialog( "Please Set Pvalue Limit", Pvalue_Threshold = EditNumber( 0.15 ), Button( "OK" ), Button( "Cancel" ) );

Pvaluelimit = Input[1];

vl = V List Box( Text Box( "Vmap Comparison" ) );
vl1 = V List Box();
vl2 = V List Box();
counter = 0;

i = 0;
k = 0;
j = 0;
w = 0;

WorkTable= current data table();
GetColName = WorkTable << get column names( Numeric );

Show( GetColName );
SumDT = New Table( "SummaryData" );
SumDT << New Column( "MDUT_Data", Character, Nominal, width( 30 ), values( GetColName ) );
SumDT << New Column( "P-Value", Numeric, Continuous );
nc1 = 1;
Temp = Data Table( WorkTable ) << Summary( Group( SPLIT ) );
nsplits = N Row( Temp );
SplitDef = Column( 1 ) << Get As Matrix;
Show( SplitDef, nsplits );
Close( Temp, no save );

For( i = 1, i <= nsplits, i++,
    MeanCol = "Mean of " || Splitdef[i];
    SumDT << New Column( MeanCol );
);

For( i = 1, i <= nsplits, i++,
    SigmaCol = "Sigma of " || Splitdef[i];
    SumDT << New Column( SigmaCol );
);

Current Data Table( SumDT );
For Each Row(

    CurrentMDUT = Column( SumDT, "MDUT_Data" )[];
    Current Data Table( WorkTable );
    MeanCal = Summarize(
        ExpCol = by( SPLIT ),
        c = count,
        meanD = Mean( Eval( CurrentMDUT ) ),
        Sig = Std Dev( Eval( CurrentMDUT ) )
    );

    Current Data Table( SumDT );
    For( w = nc1 + 2, w <= nsplits + nc1 + 1, w++,
        BB = meanD[w - nc1 - 1];
        Column( w )[] = BB;
        CC = Sig[w - nc1 - 1];
        Column( w + nsplits )[] = CC;
    );

    Current Data Table( WorkTable );
    vx=worktable << Oneway( Y( column(SumDT:MDUT_Data[row()]) ), X( :Split ), Means( 1 ), Mean Diamonds( 1 ) );
    vv=report(vx);
    Pvalue = vv["Analysis of Variance"][Number Col Box( 5 )] << get( 1 );
    Current Data Table( SumDT );
    Column( nc1 + 1 )[] = Pvalue;
    Current Data Table( WorkTable );
    vv << close window;

    If( Pvalue < Pvaluelimit,
        counter = counter + 1;
        layoutflag = Mod( counter, 2 );
        If( layoutflag > 0,
            vl1 << append( vv ),
            vl2 << append( vv )
        );
    );
);

Jim

vikramavtar · Oct 22, 2024 11:11 PM

Thank you for quick response.

New JSL works and provides the desired results, but I think it is slow.
I am looking to use this script for a very large dataset.

Are there any options to improve performance ? Thank you.

txnelson · Oct 23, 2024 01:03 AM

The script would run much faster by using the Response Screening platform to get the p-Values. It would also be more efficient to use the Tabulate platform to generate the means and standard deviations

Here is an example of how to create the Summary table using the above suggestions. I have excluded the PValue limiting, since it did nothing in the original script.

Names Default To Here( 1 );
WorkTable = Current Data Table();
GetColName = WorkTable << get column names( Numeric );

// Generate the mean/std table
tab = WorkTable << (Tabulate(
	Add Table(
		Column Table( Grouping Columns( :split ), Statistics( Mean ) ),
		Column Table( Grouping Columns( :split ), Statistics( Std Dev ) ),
		Row Table( Analysis Columns( Eval( GetColName ) ) )
	)
));
tab << Make Into Data Table;
tab << close window;
SumDT = Current Data Table();
SumDT << set name( "SummaryData" );

// Generate the p-values
rs = WorkTable << Response Screening(
	Y( Eval( GetColName ) ),
	X( :split ),
	PValues Table on Launch( 1 )
);

ResponseDT = rs << get PValues;
Report( rs ) << close window;

ResponseDT << select columns( {"Y", "PValue"} );
ResponseDT << Invert Column Selection;
ResponseDT << delete columns;
ResponseDT:Y << set name( "Analysis Columns" );

SumDT << Update( With( ResponseDT ), Match Columns( :Analysis Columns = :Analysis Columns ) );

SumDT << Select columns( :PValue );
SumDT << Move Selected Columns( After( :Analysis Columns ) );
Close( ResponseDT, nosave );

It should give you a leg up on completing the script to meet your exact specifications.

I suggest you take the time to read through the scripting guide, found in the JMP Documentation Library under the Help Pull Down Menu.

Jim

hogi · Oct 23, 2024 7:11 AM

As an alternative suggestion, you can generate a temporary stacked table - and use Tables/Summary in combination with the options Group (original column) and Subgroup (split):

dt = Open( "$SAMPLE_DATA/Semiconductor Capability.jmp" );

tmp = dt <<
Stack(
	columns( Column Group( "Processes" ) )
);

tmp << Summary(
	Group( :Label ),
	Mean( :Processes ),
	Std Dev( :Processes ),
	Subgroup( :SITE ),
	statistics column name format( "stat" )
);

Summary Statistics of all numeric columns by grouping variable

Re: Summary Statistics of all numeric columns by grouping variable

Re: Summary Statistics of all numeric columns by grouping variable

Re: Summary Statistics of all numeric columns by grouping variable

Re: Summary Statistics of all numeric columns by grouping variable

Re: Summary Statistics of all numeric columns by grouping variable