cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
BHarris
Level VI

How to identify mins of maxes across data groups

Suppose I took all the students in Big Class and put them in groups by gender and age, e.g. 12yo boys in one group, 12yo girls in another, etc.

 

With that setup, I'd like to create a formula column that has a 1 if that group has the shortest maximum height for their age across both genders, and a 0 otherwise.

 

For example, in this data:

 

GroupNameHeightNEW COLUMN
12yo GirlsKATIE590
 JANE

55

0

 JACLYN

66

0

 LILLIE

52

0

12yo BoysTIM

60

1

 JAMES

61

1

 ROBERT

51

1

 

The maximum height for girls is 66 and for boys is 61, so the boys group has the shortest maximum height for their age, so the new formula column would have a 1 for Tim, James, and Robert, and a zero for Katie, Louise, Jane, etc.

 

Is this possible in a formula column? 

 

(I can share what I've tried, but I'm trying to avoid the xyproblem, http://xyproblem.info here...)

16 REPLIES 16
hogi
Level XII

Re: How to identify mins of maxes across data groups

You can use the Col ... aggregations.

edit: if somebody asked for the gender with the student with smallest height per age group:
First use Col Min to find the student with the minimum height per age.
Then use Col Max to spread this knowledge among all students with the same sex.

so:

New Column( "myCol",
	Formula( Col Maximum( :height == Col Minimum( :height, :age ), :sex, :age ) ),
)

 

 As you asked for
 a formula column that has a 1 if that group has the shortest maximum height for their age across both genders, and a 0 otherwise
the approach has to be corrected to 

New Column( "myCol",
	Formula(
		Col Maximum(
			:height == Col Min( Col Maximum( :height, :age, :sex ), :age ),
			:sex,
			:age
		)
	)
)

along the idea of:

  • what is the maximum height per age and gender
  • take the values and find the minimum (max) height per age
  • which student?
    [ah, no, we don't want a random student, we just want a max height student!]
    -> replace :height with Col Maximum( :height, :age, :sex ) *)
  • use col max to spread the 1 to all other students with the same sex and age
    set the other gender to 0 - or 1, if there is a boy and a girl with the same height.
    this part we don't need anymore
    ... and we ended up exactly with Jarmo's solution (see below)

hogi
Level XII

Re: How to identify mins of maxes across data groups

X-Y problem :
thanks : )

BHarris
Level VI

Re: How to identify mins of maxes across data groups

Wow, that seems to work, but my brain cannot even fathom how this works...

 

Let me try to unpack it here and maybe you can help me understand --

 

Starting with:  Col Maximum( :height == Col Minimum( :height, :age ), :sex, :age )

 

Col Minimum( :height, :age ) --> returns the minimum height for all students of this age

 

:height == Col Minimum( :height, :age ) --> returns true/false (1/0) if this row's height column matches that minimum height

 

Col Maximum(  <1 or 0 for this row> , :sex, :age ) --> uh...  this makes no sense, I thought column maximum works by grouping the data by the group-by columns (:sex, :age), and then finds the column maximum of the values or column in the first argument -- which now is just a single true/false or 1/0 -- and then returns the corresponding max value for this row's group, i.e. this row's values for :sex and :age. 

 

Obviously that's wrong, what am I missing here?

 

 

 

jthi
Super User

Re: How to identify mins of maxes across data groups

Many ways of doing this using for example different combinations of col statistical functions (like already shown). Here other option (I think it should do what you are looking for)

 

Col Min(Col Max(:height, :age, :sex), :age) == Col Max(:height, :age, :sex)

First Col Min() looks for the Col Max() across the age, sex groups OVER age and that is then compared to age,sex group maximums and if it is same as the age groups minimum -> value is 1.

 

To understand these easier, I think it is beneficial to just write them all out as separate ones and to make them easier to understand, in this case:

  1. First column with Col Max(:height, :age, :sex) (max_height_for_agesex)
  2. Next column then utilizes that Col Min(:max_height_for_agesex, :age) age_minmax_age
  3. And the final one is the comparison of these two :max_height_for_agesex == :age_minmax_age

jthi_0-1733849869132.png

 

 

-Jarmo
BHarris
Level VI

Re: How to identify mins of maxes across data groups

Thanks @jthi  -- I'm still confused though.

 

Starting with this:

Col Min(Col Max(:height, :age, :sex), :age) == Col Max(:height, :age, :sex)

 

When JMP evaluates the "Col Max(:height, :age, :sex)", what does it return?  A single value?  or a list?

 

I can create a new column with that as the formula and it produces a list of values that end up in the rows of the column.

 

But within a formula, it seems to be returning a single value, e.g.:

 

a = col max(:height, :name);
type(a);

... returns "Number".

 

And if I create a new script and enter:

 

a = col max(:height, :name); 
show(a);

 it says "invalid row number", but without the ", :name" it works fine returning 70.  It seems like JMP is doing some kind of context-specific meaning with these functions.  How does this work?  Is it explained anywhere? 

 

hogi
Level XII

Re: How to identify mins of maxes across data groups

when you run the formula of a formula column in a cell, it calculates the value for the current row.
When you run it in JSL, it calculates it for the current row.

Sounds the same. But the default value for "current row" is "row 0" - i.e. no valid numbers.

You can also check the calculation for any row  - different from the default "0".
Just put a row()=xy in front of your calculation and the calculation will be done for row xy.

so, e.g.

row()=5;	
col max(:height, :age)

will return 66.
:age in row=5 is 12 and  the maximum height for age =12 is 66.

hogi
Level XII

Re: How to identify mins of maxes across data groups

I guess it works like this:

Col ... first runs through all the rows and stores the results of the first argument.
It calculates it row by row - e.g. if the first argument is random uniform().

In addition, it stores a lookup table for the additional group by arguments.
Then it splits the data into subsets, calculates the aggregated values - and then it distributes the aggregated values back into the respective rows.

jthi
Super User

Re: How to identify mins of maxes across data groups


When JMP evaluates the "Col Max(:height, :age, :sex)", what does it return?  A single value?  or a list?

Single value.

 

If you create a script and wish to test what is going on, you have to utilize Row() IF you are using by columns (without by columns it will work as JMP knows to use all rows). Row() will be 0 by default in JMP, so you have to change it for example to Row() = 1. I think I have seen some documentation about this in JMP Help or scripting index, but no idea where (could be related to Row(), For Each Row or Col statistical functions).

Names Default To Here(1); 

dt = open("$SAMPLE_DATA/Big Class.jmp");

max_height = Col Max(:height); // works fine as no by var used and uses all rows

Show(Row());
Col Max(:height, :age); // won't work as JMP doesn't know which age to use -> it needs row information

Row() = 1;
cur_age = :age;
cur_maxheight = Col Max(:height, :age);
Write("\!Max height for age ", cur_age, " is ", cur_maxheight);
-Jarmo
BHarris
Level VI

Re: How to identify mins of maxes across data groups

It sounds like you're saying that JMP is wrapping the entire column-formula in a loop that iterates "Row()" over the rows of the table, calculating a value for each row.

 

That *would* make sense, except for this expression:

 

Col Min(Col Max(:height, :age, :sex), :age)

 

... which by that logic, when e.g. Row() is set to i=1, then it would calculate Col Max() as the largest value in the :height column of the whole table where :age[i] == :age[1] and :sex[i] == :sex[1], which returns a single number.  Then it tries to calculate Col Min() of that single number, which by definition can only ever be that number.  I'm really struggling to make that make sense.

 

I keep thinking that Col Max() must be returning a list, then the Col Min() could be looking at all the rows of that list where :age[i] == :age[1].

 

But my idea here requires that JMP do something similar to what perl does, basically keeping track of whether something is being called in scalar or list context, i.e. if a function is used in a way that expects a list back, it returns a list, otherwise it returns a scalar.  That way Col Max() here could return a list for Col Min(), which in turn would return a scalar value.

 

I'm guessing I'm missing something here and can't put my finger on it.