cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
The Discovery Summit 2025 Call for Content is open! Submit an abstract today to present at our premier analytics conference.
See how to use JMP Live to centralize and share reports within groups. Webinar with Q&A April 4, 2pm ET.
Choose Language Hide Translation Bar
View Original Published Thread

Quick normality test on many parameters

SophieCuvillier
Level IV

Hello,

 

I have lots of parameters (100 ~ 2000) each with 200 to 2500 values in the sample. I want to do for each one a statistical test to check if the distribution is normal or not.

 

My problem is that it takes me a long time (about 6 minutes): I have to run the Distribution > Fit Normal > Goodness of fit platform for each parameter and retrieve the p-value from the Anderson test, for example, to make my conclusion.

 

Isn't there a faster way? I'd like to avoid using Python as much as possible, because we've got JMP17 and we're afraid it'll cost a lot to migrate the code when we switch to MP19 in 6 months' time, because it changes a lot.

 

Best regards,

 

2 ACCEPTED SOLUTIONS

Accepted Solutions


Re: Quick normality test on many parameters

Hi @SophieCuvillier ,

 

Got you, I've had a try at putting something together for you that will run the make combined data table in a script - this was a fun challenge to get the platform to run all the columns into one window instead of individual windows per column.

 

Let me know how that works for you.

 

//Random Column Generator v1 - Inspired by https://www.linkedin.com/feed/update/urn:li:activity:7241062805292929025/ 

//Get current data table, clear previous selections
dt = CurrentDataTable();
Names Default to Here(1);
dt << Clear Column Selection();

//Launch dialog for columns - adapted from https://community.jmp.com/t5/JMPer-Cable/Prompting-for-columns-totally-modally/ba-p/189647
nw = New Window( "Launch Dialog",
    <<Modal,
    V List Box( Align( "right" ),
        H List Box(
            Panel Box( "Select Columns",
            //Filter for continuous data only
                clb = Filter Col Selector( dt, All,<<continuous(1), <<ordinal(0), <<nominal(0))
            ),
            Panel Box( "Cast Selected Columns for Distribution",
                Lineup Box( N Col( 2 ), Spacing( 5 ),
                    Button Box( "Columns",
                        clbY << Append( clb << Get Selected )
                    ),
                    clbY = Col List Box(
                        "Numeric",
                        MinItems( 1 ),
                        MaxItems( 20 ),
                        nlines( 5 )
                    ),
                    Button Box( "Remove",
                        clbY << Remove Selected
                    )
                )
            )
        ),
        H List Box(
            Button Box( "OK",
                cols = clby << Get Items( "Column Reference" );
            ),
            Button Box( "Cancel" )
        )
    )
);

//Test for cols names
Print(cols);


N = N Items(cols); //count the number of cols

distList = (""); //create a storage location to compile the script commands for each distribution


for(i = 1, i <= N, i++,
    Insert Into(distList,
            (
                "Continuous Distribution(column(" || char(cols[i]) || "), Fit Normal(Goodness of Fit(1))),"
            )
        )
    );

finalScript = "Distribution(" || distList || ");"; //combine the distlist with distribution cols with the distribution command
distribrep=eval(parse(finalscript));

//Target the first Anderson wilkes table
Firsttable=Report(distribrep)[char(cols[1]),"Fitted Normal Distribution","Goodness-of-Fit Test",TableBox(2)]
//Make a combined data table with the results
firsttable<<make combined data table;

“All models are wrong, but some are useful”

View solution in original post

matth1
Level IV


Re: Quick normality test on many parameters


@matth1 wrote:

Unfortunately it's no help while you're using JMP17, but I posted a JSL script which uses python to do what you're looking for (I think). Hopefully it might be helpful when you move to JMP 18/19.


@SophieCuvillier actually I forgot I had created a non-python version of this script - I only changed because the python version was much faster for large data volumes.

Please see attached, I hope it helps. Note that everything below line 107 is just for creating the results display window - if all you want is the table of comparative p-values, feel free to delete that code.

I only have JMP 18 to try it with, but it should be OK with v17.

View solution in original post

9 REPLIES 9
jthi
Super User


Re: Quick normality test on many parameters

Have you used JSL to do this? Can you provide example of the script you have used? There can potentially be some optimizations to make it faster which might already be enough.

-Jarmo


Re: Quick normality test on many parameters

Hello,

 

In the end, I used the script provided by @matth1  and @Ben_BarrIngh  (Thank you very much !!). To improve speed, we thought of randomly taking a sub-sample for each parameter instead of taking the total sample.


Re: Quick normality test on many parameters

Hi @SophieCuvillier,

 

Have you tried holding right click and then using the red triangle to broadcast the commands across all of the individual distributions? You could then use this to right click the anderson darling table and 'Make Combined Data Table'.

 

Thanks,

Ben

“All models are wrong, but some are useful”


Re: Quick normality test on many parameters

Hello Ben

 

Thank you for your answer. Well, I would like to do that in scripting as this is an analysis we want to incorporate in an addin ...

 

Best regards


Re: Quick normality test on many parameters

Hi @SophieCuvillier ,

 

Got you, I've had a try at putting something together for you that will run the make combined data table in a script - this was a fun challenge to get the platform to run all the columns into one window instead of individual windows per column.

 

Let me know how that works for you.

 

//Random Column Generator v1 - Inspired by https://www.linkedin.com/feed/update/urn:li:activity:7241062805292929025/ 

//Get current data table, clear previous selections
dt = CurrentDataTable();
Names Default to Here(1);
dt << Clear Column Selection();

//Launch dialog for columns - adapted from https://community.jmp.com/t5/JMPer-Cable/Prompting-for-columns-totally-modally/ba-p/189647
nw = New Window( "Launch Dialog",
    <<Modal,
    V List Box( Align( "right" ),
        H List Box(
            Panel Box( "Select Columns",
            //Filter for continuous data only
                clb = Filter Col Selector( dt, All,<<continuous(1), <<ordinal(0), <<nominal(0))
            ),
            Panel Box( "Cast Selected Columns for Distribution",
                Lineup Box( N Col( 2 ), Spacing( 5 ),
                    Button Box( "Columns",
                        clbY << Append( clb << Get Selected )
                    ),
                    clbY = Col List Box(
                        "Numeric",
                        MinItems( 1 ),
                        MaxItems( 20 ),
                        nlines( 5 )
                    ),
                    Button Box( "Remove",
                        clbY << Remove Selected
                    )
                )
            )
        ),
        H List Box(
            Button Box( "OK",
                cols = clby << Get Items( "Column Reference" );
            ),
            Button Box( "Cancel" )
        )
    )
);

//Test for cols names
Print(cols);


N = N Items(cols); //count the number of cols

distList = (""); //create a storage location to compile the script commands for each distribution


for(i = 1, i <= N, i++,
    Insert Into(distList,
            (
                "Continuous Distribution(column(" || char(cols[i]) || "), Fit Normal(Goodness of Fit(1))),"
            )
        )
    );

finalScript = "Distribution(" || distList || ");"; //combine the distlist with distribution cols with the distribution command
distribrep=eval(parse(finalscript));

//Target the first Anderson wilkes table
Firsttable=Report(distribrep)[char(cols[1]),"Fitted Normal Distribution","Goodness-of-Fit Test",TableBox(2)]
//Make a combined data table with the results
firsttable<<make combined data table;

“All models are wrong, but some are useful”
matth1
Level IV


Re: Quick normality test on many parameters

Unfortunately it's no help while you're using JMP17, but I posted a JSL script which uses python to do what you're looking for (I think). Hopefully it might be helpful when you move to JMP 18/19.

 

https://community.jmp.com/t5/JMP-Scripts/Comparative-normality-testing-on-many-parameters-or-groups-...

 

matth1
Level IV


Re: Quick normality test on many parameters


@matth1 wrote:

Unfortunately it's no help while you're using JMP17, but I posted a JSL script which uses python to do what you're looking for (I think). Hopefully it might be helpful when you move to JMP 18/19.


@SophieCuvillier actually I forgot I had created a non-python version of this script - I only changed because the python version was much faster for large data volumes.

Please see attached, I hope it helps. Note that everything below line 107 is just for creating the results display window - if all you want is the table of comparative p-values, feel free to delete that code.

I only have JMP 18 to try it with, but it should be OK with v17.

Xinghua
Level III


Re: Quick normality test on many parameters

I don't know why, but unfortunately I don't see the results of the normality test, no output, maybe my JMP version is too low (V14.3). I will try to use a higher version.

01.jpg02.jpg
Can you make it into a jmpaddin file, so that it can be used better.


I hope to only see the results of the normality test, without the need for capability analysis.
This picture is the result of MINITAB. Can you make it look like this?

03.jpg

matth1
Level IV


Re: Quick normality test on many parameters

You can create the normal quantile plot comparing multiple parameters by stacking the data and using the Fit X by Y platform. For example, see the simple code below:

View more...
dt = open( "$SAMPLE_DATA/Semiconductor Capability.jmp" );
dt_stack = dt << Stack(
	columns( :PNP1, :PNP2, :NPN2, :PNP3 ),
	"Non-stacked columns"n( Keep( :lot_id, :wafer, :Wafer ID in lot ID, :SITE ) ),
	Output Table( "Stack" )
);
dt_stack << Color or Mark by Column(
	:Label,
	Color Theme( "JMP Default" ),
	Marker( 0 )
);
ow = dt_stack << Oneway(
	Y( :Data ),
	X( :Label ),
	All Graphs( 0 ),
	Line of Fit( 1 ),
	Plot Quantile by Actual( 1 )
);

You could then build a custom window and include the A-D results table from the previous script as a report next to the normal quantile plot. If you're happy with scripting then this might help:

https://www.jmp.com/support/help/en/18.1/#page/jmp/construct-display-boxes-for-new-windows.shtml