cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
eberndl
Level I

Getting an average of a subset of data

Hi, 

 

I have intensities of 4 treated and 5 untreated samples taken between at a range of wavelengths (400 - 600 nm) at three different time points (0, 1 , 3 hours).

 

In the long run, I'm trying to normalize my data so the average intestity at 400nm of the untreated samples is 1 at each time point.  In order to do that, I need to figure out how to get the average untreated, 400nm intensity at 0, 1, 3 hours. 

 

 

This is my current code:

 

Col Mean(
	:Intensity,
	:Time == 1 & :Wavelength( nm ) == 400 & :Treat == "No"
)

 From graphs I've made (and an Excel pivot sheet), I know the value should be about 0.72 for 1 hour, but I'm getting 1.3 for most rows. I'm also getting 0.68 for the rows that satisfy my 3 "and" statements, but I know that Excel pivot sheets sometimes do funny normalizing things, so I'm not sure if I'm concerned about the differences there.

  

Any suggestions for making my code consistant would be greatly appreciated.

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
stan_koprowski
Community Manager Community Manager

Re: Getting an average of a subset of data

Hi,

If I understand your question correctly, you are asking to compute mean statistic for a particular column by one or more grouping columns.

In the attached example data table, I saved two scripts to illustrate how you can do this using two different approaches.  The one entitled, "Mean by Trt and Time Period and Wavelength" creates a pivot table with a local data filter so you can look at the mean Intensities for various time periods, treatments and wavelengths.  The local data filter can be conditionally applied (by using "Conditional" under the red triangle option) so that when selecting a category it will limit the categories of the next column filter only to those that are in the selected category.

 

Tabulate (pivot table) with local data filterTabulate (pivot table) with local data filter

 The second script entitled, "Tables Summary" is an example of using the Tables-->Summary with the treatment, time period and wavelength as grouping columns.

Tables Summary Dialog with Group ColumnsTables Summary Dialog with Group Columns

 And for the others that want to see how to perform the pivot table with a script.

 

Tabulate(
	Show Control Panel( 0 ),
	Set Format( Mean( :Intensity( 12, 1 ) ) ),
	Add Table(
		Row Table(
			Grouping Columns( :Time Period, :Treatment, :Wavelength ),
			Analysis Columns( :Intensity ),
			Statistics( Mean )
		)
	),
	Local Data Filter(
		Add Filter(
			columns( :Wavelength, :Time Period, :Treatment ),
			Where( :Time Period == "0" ),
			Display( :Treatment, Size( 309, 232 ), List Display )
		)
	)
)

And finally, the Tables-->Summary with the corresponding JSL:

 

 

Data Table( "Getting an average of a subset of data.jmp" ) <<
Summary(
	Group( :Time Period, :Treatment, :Wavelength ),
	Mean( :Intensity ),
	Freq( "None" ),
	Weight( "None" )
)

 

 Is this more toward what you are expecting? 

best,

Stan

View solution in original post

9 REPLIES 9
txnelson
Super User

Re: Getting an average of a subset of data

I am not sure you are interpreting what your code is actually doing.  The second element within the Col Mean() function is a "By" statement.  That is, the way you have it coded, it will calculate one mean for Intensity for all of the rews that meet the following 

:Time == 1 & :Wavelength( nm ) == 400 & :Treat == "No"

 and for all rows that do not meet that criteria, it will calculate the mean of Intensity for all of those rows.  So when the formula is applied, it can only generate one of two different values.  Is this what you are expecting?

Jim
eberndl
Level I

Re: Getting an average of a subset of data

No, this is not at ALL what I was expecting.

I'm just trying to get the average value of the intensity using all the
values where Time is 1 h, Wavelength is 400, and there has been no
treatment( I@1,400,N). I'll do the same for 0 and 3 hours (I@ 0, 400,N and
I@3, 400,N)

I'm eventually going to make a formula uses if statements to check the time
for the row, then divides the Intensity for that row by the time
appropriate "I@" value.

Roughly:
If t = 0, I/I@0,400,n
Else if t= 1, I/I@1,400,n
Else if t = 3, I/I@3,400,n
Else Error

I hope that explains better what I'm trying to do.
txnelson
Super User

Re: Getting an average of a subset of data

I am sure that I do not follow in total detail, the different conditions that you need to put in place, to make this work, however, I have put together an example that I believe will illustrate the method that will work to generate the answer you need.

Names Default To Here( 1 );
dt = Open( "$SAMPLE_DATA\Big Class.jmp" );

dt << New Column( "Example",
	formula(
		If( :sex == "F" & :age == 12 | :sex == "M" & :age == 13,
			Col Mean( If( :sex == "F" & :age == 12 | :sex == "M" & :age == 13, :height, . ) ),
			.
		)
	)
);

Take a look at the formula.  It used 2 different decisions to get it's response.  First, only those rows where the specific combinations of sex and age are met, should the means be calculated.  And if those specifics are not met, the value should be a missing value.  The second decision is performed when the first decision is correct, and then, we want to calculate the column mean, but only include the values where the specific values of sex and age combinations are true.

Jim
Mark_W
Level II

Re: Getting an average of a subset of data

Jim,

How would I get the Example column to hold the column mean of heights for just the age of the person in the row?

For example in the sample data, Katie is 12, I would like the her Example column to contain the column mean height for just the 12 year olds, and for Alice, who is 13, I would like her Example column to contain the column mean height for just the 13 year olds.

I am hoping there is a easier way than using nested ifs for each age.

 

Thank you,

Mark

Mark_W
Level II

Re: Getting an average of a subset of data

Never mind

Col Mean ( :height, :age)

 

Thank you

stan_koprowski
Community Manager Community Manager

Re: Getting an average of a subset of data

Hi,

If I understand your question correctly, you are asking to compute mean statistic for a particular column by one or more grouping columns.

In the attached example data table, I saved two scripts to illustrate how you can do this using two different approaches.  The one entitled, "Mean by Trt and Time Period and Wavelength" creates a pivot table with a local data filter so you can look at the mean Intensities for various time periods, treatments and wavelengths.  The local data filter can be conditionally applied (by using "Conditional" under the red triangle option) so that when selecting a category it will limit the categories of the next column filter only to those that are in the selected category.

 

Tabulate (pivot table) with local data filterTabulate (pivot table) with local data filter

 The second script entitled, "Tables Summary" is an example of using the Tables-->Summary with the treatment, time period and wavelength as grouping columns.

Tables Summary Dialog with Group ColumnsTables Summary Dialog with Group Columns

 And for the others that want to see how to perform the pivot table with a script.

 

Tabulate(
	Show Control Panel( 0 ),
	Set Format( Mean( :Intensity( 12, 1 ) ) ),
	Add Table(
		Row Table(
			Grouping Columns( :Time Period, :Treatment, :Wavelength ),
			Analysis Columns( :Intensity ),
			Statistics( Mean )
		)
	),
	Local Data Filter(
		Add Filter(
			columns( :Wavelength, :Time Period, :Treatment ),
			Where( :Time Period == "0" ),
			Display( :Treatment, Size( 309, 232 ), List Display )
		)
	)
)

And finally, the Tables-->Summary with the corresponding JSL:

 

 

Data Table( "Getting an average of a subset of data.jmp" ) <<
Summary(
	Group( :Time Period, :Treatment, :Wavelength ),
	Mean( :Intensity ),
	Freq( "None" ),
	Weight( "None" )
)

 

 Is this more toward what you are expecting? 

best,

Stan

dale_lehman
Level VII

Re: Getting an average of a subset of data

Perhaps this is obvious, but it seems more straightforward than what you have just outlined.  Using your example, I created a new column with the formula Col Mean( :Intensity, :Time Period, :Treatment, :Wavelength ).  I don't use scripts if I can avoid them and they can be avoided here.  The Col Mean function (in the Statistical group) permits the mean to be calculated as a function of a number of grouping variables and this produces exactly the table you show.

eberndl
Level I

Re: Getting an average of a subset of data

That sort of worked, but it's not easy to apply on particular value (sort of like using the $$ in Excel) to all rows which I deem to be of the same type.

 

I think it's just a difference in the software, and I will have to get used to the differences.

eberndl
Level I

Re: Getting an average of a subset of data

I ended up using your second option, but I still have to manually enter that normalization factor for my calculation... My final code looks like this: 

If( :Time == 0,
	:Intensity / 1.11175075,
	If( :Time  == 1,
		:Intensity / 0.772546829268293,
		If( :Time == 3,
			:Intensity / 1.2335052,
			"Error"
		)
	)