cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Check out the JMP® Marketplace featured Capability Explorer add-in
Choose Language Hide Translation Bar
MarkJSchwab
Level II

Formula to subtract the average of a subset from a data column?

I am looking for a way to take a subset of a data table that uses only rows matching a specified condition, and subtract the mean of those rows from a column.

 

For example, using the Big Class data set, say I want to create a new column showing "Difference in height between this student and the average 14-year-old of the same gender." 

 

The way I know how to do this is by adding a second table, using Tables -> Summary -> Statistics: Mean(height), Group: sex, Subgroup: age. Then I can Table -> Update: Big Class By (sex), Match columns: sex = sex, and that will add several columns for Mean(height, [age]). Then I can create a new column "Delta height vs 14yo of same sex" that is :height - :Mean(height, 14).

 

My question: is there any way to do with without creating an extra table, and the extra column (e.g. "Mean(height, 14)")? It seems like Col Mean(x, byVar) is close to providing this functionality, but lacking the functionality to select a different group; for example I could use :height - Col Mean(:height, :age, :sex) to find the delta between student X and the average student of their own sex and height, but I don't know if there's a way to specify in Col Mean that I want to only consider rows with age=14, rather than using the age value associated with that row.

 

I do this type of normalization very frequently, so anything to make it faster and easier would be enormously appreciated!

1 ACCEPTED SOLUTION

Accepted Solutions
julian
Community Manager Community Manager

Re: Formula to subtract the average of a subset from a data column?

Hi @MarkJSchwab,

Yes, I can see how wrapping this in conditionals might not be ideal for many cases. Here's a modification to @brady_brady's solution that should work (table attached):

 

julian_0-1615932139772.png

testVal = :sex;
:height - Mean( :height[(:height << get datatable) << get rows where( :age == 14 & :sex == testVal )] );

In this case, we grab the value of :sex for the current row, save it to a variable, testVal. Then, when we evaluate the centering function, we can test use that value in the get rows where() function. 

 

Here is the table with a column of just the rows selected by this Get Rows Where() function:

 

julian_2-1615932380900.png

 

 

As you can see, all rows use age 14 as the subset, but which sex subset is determined by the current row values of sex. Should be easily adapted to different levels of complexity. 

 

@julian 

 

 

 

View solution in original post

8 REPLIES 8
Kevin_Anderson
Level VI

Re: Formula to subtract the average of a subset from a data column?

Hi, MarkJSchwab!

 

Of course, there's a way to do what you wish.  JSL will set you free!

 

I wrote a simple script that uses mostly JSL's capability to write a script for you.  Try it:

 

 

// Title: Bigclass Normalization Script for JMP User Community discussion
// Purpose: Demonstrate how to write a very basic script to do "normalizations"
Names Default To Here( 1 );
Bigclass_dt = Open( "$SAMPLE_DATA\Big Class.jmp" );
BigclassSummary_dt = bigclass_dt << Summary( Group( :age, :sex ), Mean( :height ) );
BigclassSummary_dt << Select Where( :Age != 14 ) << Delete Rows;
Bigclass_dt << Join(
With( BigclassSummary_dt ),
Select( :name, :age, :sex, :height ),
SelectWith( :Name( "Mean(height)" ) ),
By Matching Columns( :sex = :sex ),
Drop multiples( 0, 0 ),
Include Nonmatches( 0, 0 ),
Preserve main table order( 1 ),
Output Table( "NormedBigclass by AgeGender" )
);
BigclassJoin_dt = Data Table( "NormedBigclass by AgeGender" ) << New Column( "Normalized",
numeric,
continuous,
Formula( :Name( "height" ) - :Name( "Mean(height)" ) ),
evalFormula
);
Close( BigclassSummary_dt, NoSave );

 

 

I made a separate summary table and then closed it, but you don't have to do it that way.

 

There are, of course, more sophisticated ways to write a script like this, especially since all the "where" clauses are hard-coded and you may run into issues with this script if your data table is large, but ,even still, JSL will be your route to productivity.  There are several great resources for learning JSL!

 

Good luck!

julian
Community Manager Community Manager

Re: Formula to subtract the average of a subset from a data column?

Hi @MarkJSchwab,

There's a straightforward way to do this using Col Mean() that's not obvious until you see it. Col Mean() take a Frequency column argument:

julian_4-1615557944353.png

 

Usually, this is for when you need to weight using another column. But, in your example, what if we could weight the values of Age==14 as 1, and all other values as 0? That would give us a mean of just those individuals where Age==14. We can do that directly in the formula:

 

Col Mean( :height, <<Freq( :age == 14 ) )

julian_0-1615557755939.png

 

This will return:

 

julian_1-1615557782349.png

Now, for your specific question, centering with respect to this specific mean from height, we can wrap that into the formula in the usual way: 

:height - (Col Mean( :height, <<Freq( :age == 14 ) ))

julian_2-1615557892001.png

 

julian_3-1615557919453.png

 

I hope this helps!

@julian 

MarkJSchwab
Level II

Re: Formula to subtract the average of a subset from a data column?

Hi @julian, thanks for the suggestion. Unfortunately when I paste that exact formula 

Col Mean( :height, <<Freq( :age == 14 ) )

into my Big Class, it returns 62.55 (i.e. the overall Col Mean w/o selecting age=14) instead of the correct value you are seeing of 64.166.

 

I can even download your attachment, and the proper values show up in column "Centered with respect to Mean at Age==14". But if I make a new column in that table with the exact same formula, it returns different values (showing the [height] - [overall avg height], instead of properly showing [height] - [avg height of 14yos]).

 

I am using JMP Pro 14.3.0. Any ideas on how I can get my output to match yours?

 

And one more question, on your sheet if you add an additional byVar of :sex (for total formula ":height - (Col Mean( :height, :sex, <<Freq( :age == 14 ) ))" does that subtract the average 14yo boy height from the boys, and the average 14yo girl height from the girls?

julian
Community Manager Community Manager

Re: Formula to subtract the average of a subset from a data column?

Hi @MarkJSchwab,

 

It does appear <<Freq was added as an optional argument in JMP 15, so for earlier versions of JMP you will need to do something else. 

 

Here's a solution from jmp wizard @brady_brady:

 

:height - Mean( :height[(:height << get datatable) << get rows where( :age == 14 )] )

 

julian_0-1615665456532.png

 

It's worth taking a moment to unpack this formula because there are some methods I bet you will find valuable to use later. 

 

(:height << get datatable) << get rows where( :age == 14 )

will return the matrix of row entries in which :age==14. Importantly, you can modify the get rows where() function to match whatever lookup you wish, including the second ask I believe you have of including another variable. For example:

 

 

(:height << get datatable) << get rows where( :age == 14 & :sex == "F" 

julian_1-1615665647763.png

 

 

Now that we have a matrix of rows that match your condition, Brady uses this matrix to subscript :height.

height[(:height << get datatable) << get rows where( :age == 14 & :sex == "F" )]

julian_2-1615665753678.png

 

This will return the matrix of the values of :height for each of those rows, in this case, the heights of the 14yo female students: [61, 62, 65, 63, 62]

 

Finally, the Mean() function is wrapped around this matrix, which returns the mean you need to calculate your centered variable:

mean(:height[(:height << get datatable) << get rows where( :age == 14 & :sex == "F" )])

julian_3-1615665898211.png

 

This solution more flexible, tolerant of missing data, and works for earlier versions of jmp, so all together a better solution I think. Thanks, @brady_brady!

 

@julian 

 

 

 

 

MarkJSchwab
Level II

Re: Formula to subtract the average of a subset from a data column?

Hi @julian thanks again for the help!

 

One last wrinkle: ideally I would like the mean 14-yo-girl age subtracted from the girls, and mean 14-yo-boy age subtracted from the boys. (In your example, you're subtracting 14-yo-girl age from both boys and girls.)

 

(I realize in this specific case with just M/F it would be easy to use a conditional, and say "If :sex == 'F' then use this formula, if :sex == 'M' then modify the formula to say {mean(:height[(:height << get datatable) << get rows where( :age == 14 & :sex == "M" )])}" but in my more general use cases I may have a large number of groups and nested conditionals would become unwieldy.)

 

Any straightforward way to achieve this?

txnelson
Super User

Re: Formula to subtract the average of a subset from a data column?

You just need to expand the formula to handle both male and female 

If( :sex == "F",
	:height - Mean( :height[(:height << get datatable) << get rows where( :age == 14 & :sex == "F" )] ),
	:height - Mean( :height[(:height << get datatable) << get rows where( :age == 14 & :sex == "M" )] )
)
Jim
julian
Community Manager Community Manager

Re: Formula to subtract the average of a subset from a data column?

Hi @MarkJSchwab,

Yes, I can see how wrapping this in conditionals might not be ideal for many cases. Here's a modification to @brady_brady's solution that should work (table attached):

 

julian_0-1615932139772.png

testVal = :sex;
:height - Mean( :height[(:height << get datatable) << get rows where( :age == 14 & :sex == testVal )] );

In this case, we grab the value of :sex for the current row, save it to a variable, testVal. Then, when we evaluate the centering function, we can test use that value in the get rows where() function. 

 

Here is the table with a column of just the rows selected by this Get Rows Where() function:

 

julian_2-1615932380900.png

 

 

As you can see, all rows use age 14 as the subset, but which sex subset is determined by the current row values of sex. Should be easily adapted to different levels of complexity. 

 

@julian 

 

 

 

MarkJSchwab
Level II

Re: Formula to subtract the average of a subset from a data column?

Beautiful! Thanks!!!