cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
markschwab
Level IV

Create formula: mean of Col A for rows where Col B is minimum in group Col C?

Say I were using the Big Class dataset, and wanted to create a new column for the average weight of the shortest student of the same age - how do I set that up?

 

I know how to do the first bit (find the height of the shortest student the same age,) that is just a Col Mean with a byVariable:

 

Col Minimum( :height, :age )

 

But what I don't know how to do is to then take the average weight of those subset of students where (A) the age of that student matches the age of the student in this row, and (B) the height of that student matches the shortest height for students of their age.

1 ACCEPTED SOLUTION

Accepted Solutions
jthi
Super User

Re: Create formula: mean of Col A for rows where Col B is minimum in group Col C?

You can do this with a formula (getting mean of minimum weights in age group):

Col Mean(If(:height == Col Min(:height, :age), :weight, .), :age)

Split the formula into two different formulas to make it a bit easier to understand:

 

Min weight values:

If(:height == Col Min(:height, :age),
	:weight,
	.
)

and then mean of that value:

Col Mean(:MinWeightVal, :age)

Col Mean will ignore missing values so this should work

 

 

With cars (not exactly sure how you want the data to look:

Max wt for make:

Col Max(:Wt, :Make)

Size for weight if you want it to just one row:

If(:Wt == :Max Wt for Make,:Size,"")

Same value for multiple rows, you can most likely drop some Col functions from there, but this was the easiest I came up quickly with:

:Size[Col Max(Col Max(Row(), If(:Wt == Col Max(:Wt, :Make), 1, 0), :Make), :Make)]

spit into multiple different formulas to see how it work

 

 

-Jarmo

View solution in original post

7 REPLIES 7
ron_horne
Super User (Alumni)

Re: Create formula: mean of Col A for rows where Col B is minimum in group Col C?

Hi @markschwab ,

until someone shows us the sleek way of using a column formula you can try the following:

names default to here (1);

// Open Data Table: Big Class.jmp
dt = Open( "$SAMPLE_DATA/Big Class.jmp" );

// New column: min height 0/1
dt<< New Column( "min height",
	Numeric,
	"Continuous",
	Format( "Best", 12 ),
	Formula( If( :height == Col Min( :height, :age ), 1 ) )
);

// Data table summary
// → Data Table( "Big Class By (age, Min height)" )
dt2 = dt << Summary(
	Group( :age, :Min height ),
	Mean( :weight ),
	Freq( "None" ),
	Weight( "None" ),
	Link to original data table( 0 )
);

// Delete selected rows
dt2 <<
Select Where( Is Missing( :Min height ) ) << Delete Rows;

// Update data tables
dt << Update(
	With( dt2 ),
	Match Columns( :age = :age ),
	Replace Columns in Main Table(:"Mean(weight)"n )
);

a formula would be nicer since it will also update automatically.

let us know if it helps.

ron

 

 

markschwab
Level IV

Re: Create formula: mean of Col A for rows where Col B is minimum in group Col C?

Thanks Ron. I think that could work for me (although it would be nice to be able to do it within the same data sheet.)

 

To hopefully make it easier, it would be fine for my actual application to use a lookup instead of a mean - the column I will be taking the maximum of is a timestamp, and we can assume there is one unique row for each group with the most recent timestamp.

 

So for example, using the Cars dataset, my problem would be:

 

Create a column "Size of Max Wt for Make" which finds the car with the highest Wt among all cars with the same Make, and then looks up the Size of that max Wt car.

ron_horne
Super User (Alumni)

Re: Create formula: mean of Col A for rows where Col B is minimum in group Col C?

Hi @markschwab ,

if you can afford to sort the table try this:

Names default to here (1);

// Open Data Table: Cars.jmp
dt = Open( "$SAMPLE_DATA/Cars.jmp" );

// Sort data table
dt << Sort( By( :Make, :Wt ), Replace Table, Order( Ascending, Descending ) );

// New column: Size of Max Wt for Make
dt << New Column( "Size of Max Wt for Make",
Character,
"Nominal",
Formula( If( Row() == 1, :Size, :Make == Lag( :Make ), Lag( :Size of Max Wt for Make ), :Make != Lag( :Make ), :Size ) )
);

by now i would also like to know the direct formula solution without sorting.

if you need to resort the data later just remove the formula or suppress evaluation.

 

 

 

jthi
Super User

Re: Create formula: mean of Col A for rows where Col B is minimum in group Col C?

You can do this with a formula (getting mean of minimum weights in age group):

Col Mean(If(:height == Col Min(:height, :age), :weight, .), :age)

Split the formula into two different formulas to make it a bit easier to understand:

 

Min weight values:

If(:height == Col Min(:height, :age),
	:weight,
	.
)

and then mean of that value:

Col Mean(:MinWeightVal, :age)

Col Mean will ignore missing values so this should work

 

 

With cars (not exactly sure how you want the data to look:

Max wt for make:

Col Max(:Wt, :Make)

Size for weight if you want it to just one row:

If(:Wt == :Max Wt for Make,:Size,"")

Same value for multiple rows, you can most likely drop some Col functions from there, but this was the easiest I came up quickly with:

:Size[Col Max(Col Max(Row(), If(:Wt == Col Max(:Wt, :Make), 1, 0), :Make), :Make)]

spit into multiple different formulas to see how it work

 

 

-Jarmo
markschwab
Level IV

Re: Create formula: mean of Col A for rows where Col B is minimum in group Col C?

Thanks @jthi , that works beautifully!

 

To make sure I understand, it looks like the If statement is creating a "virtual" column that nullifies any rows where the height is not the min height for that age, and then it's using Col Mean to operate on that "virtual" column. Nice trick!

 

It doesn't appear to me the formula for the Cars example is working - for example, the formula returns "pu" for Chevrolet even though the heaviest Chevy (the Suburban) is "mpv", and returns "med" for Chrysler even though the heaviest Chrysler (the Fifth Ave) is "hev". Anyway, for my application you've already solved the issue with your formula from Big Class - but, just out of curiosity and to further my JMP-fu, I am curious to learn more about what's going on in the final formula. What does the bracket operator do, e.g. ":Size[...]"?

jthi
Super User

Re: Create formula: mean of Col A for rows where Col B is minimum in group Col C?

I have attached data table with a bit different formulas and split into more than one (my earlier ones for Cars might have had some issues due to grouping for non-max weights). This will also let you see what happens if there are ties.

 

:Size[idx] lets you access value in :Size column with specified idx, you can also access multiple with :Size[[1,2,3]] for example (this might help a little Data table subscripting)

 

Run this in script window, take a look at the log and compare it to datatable:

Names Default To Here(1);
dt = Open("$SAMPLE_DATA/Big Class.jmp");

Show(dt:name[1]);
Show(dt:name[5]);
Show(dt:name[10]);
Show(dt:name[[10, 11]]);
Show(dt:name[10::15]);
-Jarmo
markschwab
Level IV

Re: Create formula: mean of Col A for rows where Col B is minimum in group Col C?

Fantastic, thanks! You learn two new things every day

 

To combine your Cars formulae into one column, it looks like this:

 

:Size[Col Max( If( :Wt == Col Max( :Wt, :Make ), Row(), . ), :Make )]