How to identify mins of maxes across data groups

BHarris

Suppose I took all the students in Big Class and put them in groups by gender and age, e.g. 12yo boys in one group, 12yo girls in another, etc.

With that setup, I'd like to create a formula column that has a 1 if that group has the shortest maximum height for their age across both genders, and a 0 otherwise.

For example, in this data:

Group	Name	Height	NEW COLUMN
12yo Girls	KATIE	59	0
	JANE	55	0
	JACLYN	66	0
	LILLIE	52	0
12yo Boys	TIM	60	1
	JAMES	61	1
	ROBERT	51	1

The maximum height for girls is 66 and for boys is 61, so the boys group has the shortest maximum height for their age, so the new formula column would have a 1 for Tim, James, and Robert, and a zero for Katie, Louise, Jane, etc.

Is this possible in a formula column?

(I can share what I've tried, but I'm trying to avoid the xyproblem, http://xyproblem.info here...)

hogi · | Posted in reply to message from BHarris 12-16-2024

No, the idea is NOT that JMP goes ONCE row by row through the table and calculates the whole expression.

hogi · Dec 16, 2024 02:54 PM

To investigate what JMP is doing, let's first create a very small data table ...

dt0 = Open( "$SAMPLE_DATA/Big Class.jmp" );
dt = dt0 << Clear Select << Select Rows( [2, 10, 11, 25] ) << Subset( Selected Rows( 1 ), Selected columns only( 0 ) );
nr = n rows(current data table());

... and a formula column with some log functionality:

New Column( "test", Formula(Col Min(Write( "\!nmin:" || (Char( Row() )) );if(row()==nr, write("\!n") );:height, :age)));

the output:

So, for this case, JMP runs the code several times.
Let's wonder later ...

hogi · | Posted in reply to message from hogi 12-16-2024

Let's now check with

New Column( "test",
	Formula(
		Col Min(
			Write( "\!nmin:" || (Char( Row() ) || "\!n\!n") );
			Col Max(
				Write( "max:" || (Char( Row() ) || "\!n") );
				:height;,
				:age,
				:sex
			);,
			:age
		)
	)
)

... and we get *)

which shows:

for the simple example JMP evaluates the first argument of Col Min for every row
for the second version, JMP evaluates the first argument of Col Min for every row - and as there is another Col... aggregation inside this code, this is followed by the ColMax loop: first argument of Col Max for every row
At the end, JMP knows the value of ColMax(...) for every single row, and can then evaluate Col min with these values.

This is more or less what you say - with the list context. But I think JMP stores the results in data table context. Such that it can combine the VALUES with additional information like row(), row states() ..

but somebody from JMP - or @jthi - might explain better ...

*) again: multiple times:
further details can be found here:
https://community.jmp.com/t5/Discussions/Statistical-functions-Col-Mean-for-example-possibly-bugged-...

hogi · Dec 16, 2024 03:28 PM

another way to test the execution of the code:

i = 0;
j = 0;
New Column( "test",
	Formula(
		Col Min(
			i++;
			Col Max(
				j++;
				:height;,
				:age,
				:sex
			);,
			:age
		)
	)
);
// too early
Show( i );
Show( j );
Current Data Table() << run formulas();
// after calculating the values
Show( i );
Show( j );

i = 5 x 4 [(nr+1) * nr]

j = 5 x 5 x 4 [(nr+1) * (nr+1) * nr]

the access to the global variables i & j seems to be as bad as
https://community.jmp.com/t5/Discussions/How-do-I-use-the-Col-Maximum-Formula-with-a-quot-where-quot...

https://community.jmp.com/t5/Discussions/Statistical-functions-Col-Mean-for-example-possibly-bugged-...

for the performance of Col ...

hogi · Dec 16, 2024 1:04 PM

-

jthi · | Posted in reply to message from BHarris 12-16-2024

Col statistical functions do have their own magic which some other functions do not, for example they have internal caching and support byvars. https://www.jmp.com/support/help/en/18.0/#page/jmp/statistical-functions.shtml#

-Jarmo

hogi · Dec 16, 2024 04:49 PM

Yes, this clever way of preparing the data i extremely helpful:
examples of how the magic works:

dt = Open( "$SAMPLE_DATA/Big Class.jmp" );
dt << Clear Select << Select Rows( Index( 5, 40 ) ) <<Delete Rows;
newCol=New Column("new");
newCol << Set each value(Col Min(Print(row());:height,:age)); // 4 rows
newCol << Set Formula(Col Min(Print(row());:height,:age)); // 2 x 4

newCol << Set each value(Col Min(Col Max(Print(row());:height,:sex),:age)); // 4  + 4 
newCol << Set Formula(Col Min(Col Max(Print(row());:height,:sex),:age)); // 2 x 4  + 4

Please make sure that you don't run into one of these taps:

i=1;
newCol << set each value (Col Min(Print("X");:height,:age)); // 16 = 4 x 4 
newCol << set Formula (Col Min(Print("X");:height,:age)); // 20 = 5 x 4
newCol << set Formula (Col Min("x";Print(3);:height,:age)); // 20 = 5 x 4
newCol << Set Formula(Col Min(i;Print(row());:height,:age)); // 20 = 5 x 4

newCol << set each value (Col Min(Col Max(Print("X");:height,:sex),:age)); // 80 = 5 x 4 x 4 
newCol << set Formula (Col Min(Col Max(Print("X");:height,:sex),:age)); // 100 = 5 x 5 x 4
newCol << set Formula (Col Min(Col Max("X";Print(3);:height,:sex),:age)); // 100 = 5 x 5 x 4
newCol << Set Formula(Col Min(Col Max(i;Print(row());:height,:sex),:age)); // 100 = 5 x 5 x 4

Col aggregations get cautious when they see

strings
variables
?

then they switch to a reduced speed and calculate every value for every row (+1) ... in a hierarchical tree.

Re: How to identify mins of maxes across data groups

Re: How to identify mins of maxes across data groups

Re: How to identify mins of maxes across data groups

Re: How to identify mins of maxes across data groups

Re: How to identify mins of maxes across data groups

Re: How to identify mins of maxes across data groups

Re: How to identify mins of maxes across data groups