cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Check out the JMP® Marketplace featured Capability Explorer add-in
Choose Language Hide Translation Bar
BHarris
Level VI

How to identify mins of maxes across data groups

Suppose I took all the students in Big Class and put them in groups by gender and age, e.g. 12yo boys in one group, 12yo girls in another, etc.

 

With that setup, I'd like to create a formula column that has a 1 if that group has the shortest maximum height for their age across both genders, and a 0 otherwise.

 

For example, in this data:

 

GroupNameHeightNEW COLUMN
12yo GirlsKATIE590
 JANE

55

0

 JACLYN

66

0

 LILLIE

52

0

12yo BoysTIM

60

1

 JAMES

61

1

 ROBERT

51

1

 

The maximum height for girls is 66 and for boys is 61, so the boys group has the shortest maximum height for their age, so the new formula column would have a 1 for Tim, James, and Robert, and a zero for Katie, Louise, Jane, etc.

 

Is this possible in a formula column? 

 

(I can share what I've tried, but I'm trying to avoid the xyproblem, http://xyproblem.info here...)

16 REPLIES 16
hogi
Level XII

Re: How to identify mins of maxes across data groups

No, the idea is NOT that JMP goes ONCE row by row through the table and calculates the whole expression.

 

hogi
Level XII

Re: How to identify mins of maxes across data groups

To investigate what JMP is doing, let's first create a very small data table ...

dt0 = Open( "$SAMPLE_DATA/Big Class.jmp" );
dt = dt0 << Clear Select << Select Rows( [2, 10, 11, 25] ) << Subset( Selected Rows( 1 ), Selected columns only( 0 ) );
nr = n rows(current data table());

... and a formula column with some log functionality:

New Column( "test", Formula(Col Min(Write( "\!nmin:" || (Char( Row() )) );if(row()==nr, write("\!n") );:height, :age)));

the output:

hogi_0-1734378824971.png


So, for this case, JMP runs the code several times.
Let's wonder later ...

 

hogi
Level XII

Re: How to identify mins of maxes across data groups

Let's now check with

New Column( "test",
	Formula(
		Col Min(
			Write( "\!nmin:" || (Char( Row() ) || "\!n\!n") );
			Col Max(
				Write( "max:" || (Char( Row() ) || "\!n") );
				:height;,
				:age,
				:sex
			);,
			:age
		)
	)
)

... and we get *)

hogi_1-1734378993152.png

which shows:

 

  • for the simple example JMP evaluates the first argument of Col Min for every row
  • for the second version, JMP evaluates the first argument of Col Min for every row - and as there is another Col... aggregation inside this code, this is followed by the ColMax loop: first argument of Col Max for every row
    At the end, JMP knows the value of  ColMax(...) for every single row, and can then evaluate Col min with these values.

This is more or less what you say - with the list context. But I think JMP stores the results in data table context. Such that it can combine the VALUES with additional information like row(), row states() ..

 

but  somebody from JMP - or @jthi  - might explain better ...

 

*) again: multiple times:
further details can be found here:
https://community.jmp.com/t5/Discussions/Statistical-functions-Col-Mean-for-example-possibly-bugged-... 

hogi
Level XII

Re: How to identify mins of maxes across data groups

another way to test the execution of the code:

i = 0;
j = 0;
New Column( "test",
	Formula(
		Col Min(
			i++;
			Col Max(
				j++;
				:height;,
				:age,
				:sex
			);,
			:age
		)
	)
);
// too early
Show( i );
Show( j );
Current Data Table() << run formulas();
// after calculating the values
Show( i );
Show( j );

hogi_1-1734381004188.png

i = 5 x 4                [(nr+1) * nr]

j = 5 x 5 x 4          [(nr+1) * (nr+1) * nr]

the access to the global variables i & j seems to be as bad as
https://community.jmp.com/t5/Discussions/How-do-I-use-the-Col-Maximum-Formula-with-a-quot-where-quot...

https://community.jmp.com/t5/Discussions/Statistical-functions-Col-Mean-for-example-possibly-bugged-...

for the performance of Col ...

hogi
Level XII

Re: How to identify mins of maxes across data groups

-

jthi
Super User

Re: How to identify mins of maxes across data groups

Col statistical functions do have their own magic which some other functions do not, for example they have internal caching and support byvars. https://www.jmp.com/support/help/en/18.0/#page/jmp/statistical-functions.shtml# 

-Jarmo
hogi
Level XII

Re: How to identify mins of maxes across data groups

Yes, this clever way of preparing the data i extremely helpful:
examples of how the magic works:

dt = Open( "$SAMPLE_DATA/Big Class.jmp" );
dt << Clear Select << Select Rows( Index( 5, 40 ) ) <<Delete Rows;
newCol=New Column("new");
newCol << Set each value(Col Min(Print(row());:height,:age)); // 4 rows
newCol << Set Formula(Col Min(Print(row());:height,:age)); // 2 x 4

newCol << Set each value(Col Min(Col Max(Print(row());:height,:sex),:age)); // 4 + 4 newCol << Set Formula(Col Min(Col Max(Print(row());:height,:sex),:age)); // 2 x 4 + 4

 

Please make sure that you don't run into one of these taps:

i=1;
newCol << set each value (Col Min(Print("X");:height,:age)); // 16 = 4 x 4 newCol << set Formula (Col Min(Print("X");:height,:age)); // 20 = 5 x 4
newCol << set Formula (Col Min("x";Print(3);:height,:age)); // 20 = 5 x 4 newCol << Set Formula(Col Min(i;Print(row());:height,:age)); // 20 = 5 x 4
newCol << set each value (Col Min(Col Max(Print("X");:height,:sex),:age)); // 80 = 5 x 4 x 4 newCol << set Formula (Col Min(Col Max(Print("X");:height,:sex),:age)); // 100 = 5 x 5 x 4
newCol << set Formula (Col Min(Col Max("X";Print(3);:height,:sex),:age)); // 100 = 5 x 5 x 4 newCol << Set Formula(Col Min(Col Max(i;Print(row());:height,:sex),:age)); // 100 = 5 x 5 x 4
  

 

Col aggregations get cautious when they see 

  • strings
  • variables
  • ?

then they switch to a reduced speed and calculate every value for every row (+1) ... in a hierarchical tree.