cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
New to using JMP? Hit the ground running with the Early User Edition of Discovery Summit. Register now, free of charge.
Register for our Discovery Summit 2024 conference, Oct. 21-24, where you’ll learn, connect, and be inspired.
Choose Language Hide Translation Bar
shampton82
Level VII

This might be a big ask, but can someone help with a script to try and select a normal distribution when it is pretty close to the best fitted distribution?

So here's what I'm hoping for:

When you click fit all in Distribution platform you get a lot of fits

shampton82_0-1724034774388.png

However, if the AICc of the best fit is within 5 of the normal distribution you might as well use the normal fit.  Sooooooo, if there a way to script something that would go through a bunch of columns that have already had the best fit ran and then adjust the selected fit (assuming it is non-normal) to Normal if it is within an AICc of 5 to the normal distribution?  Bonus points would be for being able to have an input box to enter the delta of the AICc you are willing to live with.  Double bonus would be to remove Students t, Cauchy, and ExGaussian from the selection options as you can't calculate process capabilities on these distributions (and that will be the next step to run after this clean up script is ran).

 

I've tried and can't get it to work, any help would be greatly appreciated!!

 

Steve

10 REPLIES 10
shampton82
Level VII

Re: This might be a big ask, but can someone help with a script to try and select a normal distribution when it is pretty close to the best fitted distribution?

Okay I got there!  I'm sure it's not very eloquent but I wanted to throw it out there in case it would help anyone else.

The first step is selecting the columns on your data table that you want to fit distributions for.

 

//rev 8-24-24

names default to here(1);
dt=current data table();

colnames=dt<<Get selected Columns(continuous, "string");

nw=new window("What AICc is comparable?", Show Menu( 0 ), Show Toolbars( 0 ),<<modal,<<size(300,100),<<return results,
			vlistbox(	
				hlistbox(
					Text Box("Put in what difference between Normal and the best fit you consider the same"),
					neb1=Number Edit Box(5);
					
				),
				 Button Box( "OK",var1 = neb1 << Get;)
			)	
				
	
);




rpt=new window("test",<<WindowView( "Invisible" ),
		obj=dt<<distribution(column(eval(colnames)),Fit All
			
		);
	
	
	
);

Wait( 0 );
dt1=rpt["Distributions", "Compare Distributions", Table Box( 1 )] <<
Make Combined Data Table(invisible);
rpt << Close Window;

//get rid of distribution types that can't have a Process Capability Analysis
// Delete selected rows
dt1 << Select Where(
	:Distribution == "Cauchy" | :Distribution == "ExGaussian" | :Distribution ==
	"Student's t"
) << Delete Rows;


//creat a column that will identify the order of the fitted distributions
// New column: Column 10
dt1 << New Column( "Column 10",
	Numeric,
	"Continuous",
	Format( "Best", 12 )
);

// Change column formula: Column 10
dt1:Column 10 << Set Formula( Col Cumulative Sum( 1, :Y ) );

//Select the best fit as well as the normal fits for all Y's then deleted all other rows
// Delete selected rows
dt1 << Select where(
	:Distribution == "Normal" | :Column 10 == 1
) << Invert Row Selection << Delete Rows;

//Create columns to determien and select the normal fit (if it is the best fir or withing our delta criteria we input at the start)
dt1 << New Column( "Column 11",
		Numeric,
		"Continuous",
		Format( "Best", 12 ),
		Formula( If( :Column 10 == 1, Empty(), :AICc - Lag( :AICc, -1 ) ) )
	);
dt1 << New Column( "Column 12",
		Numeric,
		"Continuous",
		Format( "Best", 12 ),
		Formula( If( :Distribution == "Normal" & :Column 10 == 1, 1 ) )
	);
eval(eval expr(dt1 << New Column( "Column 13",
		Numeric,
		"Continuous",
		Format( "Best", 12 ),
		Formula(
			If( :Column 10 == 1 & :Distribution != "Normal",
				If( Abs( Lag( :Column 11, -1 ) ) > expr(var1),
					1
				)
			)
		)
	)));
dt1<<	New Column( "Column 14 2",
		Numeric,
		"Continuous",
		Format( "Best", 12 ),
		Formula(
			If( Is Missing( Col Maximum( :Column 13, :Y ) ) & :Column 10 == 2,
				1
			)
		),
		Set Selected
	);
dt1 << New Column( "Column 14",
		Numeric,
		"Continuous",
		Format( "Best", 12 ),
		Formula( Sum( :Column 12, :Column 13,:Column 14 2 ) ),
		Set Selected
	);


wait(0);

// Delete column formula: Column 10
dt1:Column 10 << Delete Formula;


// Delete column formula: Column 11
dt1:Column 11 << Delete Formula;

// Delete column formula: Column 11
dt1:Column 12 << Delete Formula;

// Delete column formula: Column 11
dt1:Column 13 << Delete Formula;

// Delete column formula: Column 11
dt1:Column 14 << Delete Formula;

//Delete non-Normal fits that are wihtin our criteria
// Delete selected rows
dt1 << Select Where( :Column 14 == 1 ) <<
Invert Row Selection << Delete Rows;

//Puts all the Y's and distributiosn into lists
col={};
dist={};
for each row(dt1,
		insertinto(col,:Y);
		insertinto(dist,:Distribution);
);

close(dt1, nosave);


//bring back up the distribution platform but only with the Y's that we could fit a distribution to
rpt=new window("Best Distribution",
		obj=dt<<distribution(column(eval(col)),Process Capability( 0 ),
			
		);
	
	
	
);

//Apply the distributions

for(i=1, i<=n items(col), i++,

	//whatbox = column(colnames[i])<<get name;
	//test=(Report(obj) << XPath( "//OutlineBox[text() = '"||col[i]||"']"))<< get title();
	//if(eval(test[1])==eval(col[i]),
		if(dist[i]=="Normal",obj[i]<< Fit Normal);
		if(dist[i]=="Exponential",obj[i]<< Fit Exponential);
		if(dist[i]=="Gamma",obj[i]<< Fit Gamma);
		if(dist[i]=="Johnson Su",obj[i]<< Fit Johnson);
		if(dist[i]=="Lognormal",obj[i]<< Fit Lognormal);
		if(dist[i]=="Normal 2 Mixture",obj[i]<<Fit Normal 2 Mixture);
		if(dist[i]=="Normal 3 Mixture",obj[i]<<Fit Normal 3 Mixture);
		if(dist[i]=="SHASH",obj[i]<< Fit Shash);
		if(dist[i]=="Weibull",obj[i]<< Fit Weibull);
		if(dist[i]=="ZI SHASH",obj[i]<< Fit ZI SHASH);
		if(dist[i]=="Beta",obj[i]<< Fit Beta;);
		//);
	);	

Thanks for the inspiration @jthi and @txnelson (I used a bunch of your other posts to help me get here)