cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
  • JMP 19 is here! See the new features at jmp.com/new.
  • Register to attend Discovery Summit 2025 Online: Early Users Edition, Sept. 24-25.
Choose Language Hide Translation Bar
hogi
Level XII

Fast Way to calculate "nested" Col ... aggregations - like MAD

In 2017 @Kevin_Anderson submitted a wish to add a functionality to calculate Robust Means and Standard Deviation functions in JSL and Formula Editor . 

 

In 2022 an extended version of the wish was submitted by @jthi :
Add normalization and robust statistical functions (and matrix functions) 
with the comment:


@jthi wrote:

Most of these could already be implemented fairly easily by using existing statistical functions, but I would much rather have them as normal functionality in JMP (native implementation could also be faster?). In my opinion, these are very useful and powerful functions (like are all other Statistical Col functions).

 

 

Another function to determine a robust estimator for the variation of values:
MAD: Median Absolute Deviation.
https://en.wikipedia.org/wiki/Median_absolute_deviation 

hogi_0-1751225253705.png

which can be determined via (*):
- calculate the Median() of the values

- calculate the difference between the values and the median

- remove the signs

- calculate the median

 

If you want to calculate MAD for a column, you can use the Tables/summary platform:

hogi_1-1751225391676.png

 

Often, users want to calculate MAD for individual groups.
But unfortunately, there is no Col MAD (values, grouping) function in JSL - not yet.


On the other hand, it's very easy to follow the steps in (*) and calculate MAD "manually" via JSL:

Names Default to Here(1);
dt = Open( "$SAMPLE_DATA/Big Class Families.jmp" );

t0=hptime();

New Column( "MAD",
	Formula( Col Median( Abs( :height - Col Median( :height, :sex ) ), :sex ) )
);

dt << run formulas();

show((hptime()-t0)/1000000)

But wait - the Col Median inside the Col Median looks frightening - what if JMP calculates the values again and again with every row()?


Let's check if this is the case - by making the data table larger (6.4Mio rows):

Names Default to Here(1);
dt = Open( "$SAMPLE_DATA/Big Class Families.jmp" );

For Each( {i}, {1, 2},
	dts = Repeat( {dt}, 400 );
	dt = dt << Concatenate( dts, );
);

With the larger data table, it takes several minutes to do the calculation (JMP18.2.1).

An easy trick:
Split the calculation into 2 calculations: 
- first calculate a column with the grouped median values

- then calculate MAD

another trick:
- wait for the next version of JMP ?

 

are there more tricks?

1 ACCEPTED SOLUTION

Accepted Solutions
hogi
Level XII

Re: Fast Way to calculate "nested" Col ... aggregations - like MAD

Thanks @Jasean for providing the solution:

add Expr to protect the arguments of the function from being evaluated:

hogi_0-1752872492819.png

 

 

Add Custom Functions(
	New Custom Function(
		"COL",
		"MAD",
		Function( {value, byGroup},
			Col Median( Abs( value - Col Median( value, byGroup ) ), byGroup ) 
		),
		<<Formula Category( "Statistical" )
	)
);

New Column( "Col MAD",
	Formula( COL:MAD( Expr( :height ), Expr( :sex ) ) )
)


cool!

View solution in original post

3 REPLIES 3
hogi
Level XII

Re: Fast Way to calculate "nested" Col ... aggregations - like MAD


@hogi wrote:


another trick:
- wait for the next version of JMP ?


I just tried it with the EA version of JMP19 - A significant improvement !
-> just a few seconds instead of several minutes
wonderful : )

hogi
Level XII

Re: Fast Way to calculate "nested" Col ... aggregations - like MAD

Next step:
How to share the knowledge with colleagues - or via the Marketplace?

My idea:
use JSL to create a Custom Function

Add Custom Functions(
	New Custom Function(
		"COL",
		"MAD",
		Function( {value, byGroup},
			Col Median( Abs( value - Col Median( value, byGroup ) ), byGroup ) 
		),
		<<Formula Category( "Statistical" )
	)
);

and share the script as an AddIn ...

 

Then every user can use the new col aggregation via:

hogi_0-1752812552776.png

 

 


Unfortunately: This approach doesn't work : (
Does anybody find the error?

For debugging, you can use:

Names Default To Here( 1 );

Add Custom Functions(
	New Custom Function(
		"COL",
		"Median",
		Function( {value, byGroup},
			Col Median( value, byGroup )
		),
		<<Formula Category( "Statistical" )
	)
);

dt = Open( "$SAMPLE_DATA/Big Class.jmp" );

New Column( "my COL MEDIAN", Formula( COL:Median( :height, :sex ) ) );


on my system (JMP 18.2.0 & JMP 19 EA 9) , the result looks like this:

hogi_1-1752673972482.png

does it take the value of the first row?
- and use it for "F" , not having any value for "M"?

hogi
Level XII

Re: Fast Way to calculate "nested" Col ... aggregations - like MAD

Thanks @Jasean for providing the solution:

add Expr to protect the arguments of the function from being evaluated:

hogi_0-1752872492819.png

 

 

Add Custom Functions(
	New Custom Function(
		"COL",
		"MAD",
		Function( {value, byGroup},
			Col Median( Abs( value - Col Median( value, byGroup ) ), byGroup ) 
		),
		<<Formula Category( "Statistical" )
	)
);

New Column( "Col MAD",
	Formula( COL:MAD( Expr( :height ), Expr( :sex ) ) )
)


cool!

Recommended Articles