cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
JMP is taking Discovery online, April 16 and 18. Register today and join us for interactive sessions featuring popular presentation topics, networking, and discussions with the experts.
Choose Language Hide Translation Bar
ih
Super User (Alumni) ih
Super User (Alumni)

Summary statistics with multiple binning columns.

I often summarize a column by some combination of other categorical columns using a formula column.  For example in the Detergent sample data I might want to know the average count for each combination of softness and temperature. Below are three methods to do this, which one I use depends on the size of the data table and who my audience is.  Is there an easier way?

 

I shy away from an add-in containing a custom 'Col Mean by Multiple Columns()' function due to portability issues.

 

Tabluate

Create summary statistics in a new table using tabulate and then join the results to the original data table.

Pro: Easy to make very complex summary statistics.

Con: Not live, need to refresh the data if a referenced column changes during an analysis, no way for users to follow the source of the data.

 

Binning Formula

Create a binning formula and then use the second argument of the Col Mean() function to refrence that bin.

Pro: Easy to set up and relatively easy for users to follow.

Con: Creates extra columns that clutter the data table and have no use except for this formula, especially when creating many summary columns.

dt = Open( "$sample_data/Detergent.jmp" ); 

//binning formula
dt << New Column(
	"Bin by Temp and Softness",
	Character,
	"Nominal",
	Formula( :softness || "-" || :temperature )
);

//summary statistic
dt << New Column(
	"Mean Count by Temp and Softness Binned",
	Numeric,
	"Continuous",
	Format( "Best", 12 ),
	Formula( Col Mean( :count, :Bin by Temp and Softness) )
);

 

Identify bins and calculate summary statistic using JSL inside a formula

Identify similar rows in the data table then calculate the summary statistic on that data.  I suspect JMP does not cache results using this method so the whole table is re-evaluted for every row. In any case it is slow for large data tables.

Pro: Tons of flexibility

Cons: Slow calculations, very hard for users to follow.

//summary statistic by bin
dt << New Column(
	"Mean Count by Temp and Softness Direct",
	Numeric,
	"Continuous",
	Format( "Best", 12 ),
	Formula(
		matches = Function( {a, b},
			out = {};
			For( i = 1, i <= N Items( a ), i++,
				found = "no";
				For( j = 1, j <= N Items( b ), j++,
					If( a[i] == b[j],
						found = "yes"
					)
				);
				If( found == "yes",
					out = Insert( out, a[i] )
				);
			);
			out;
		);
		soft = Loc( :softness << get values(), :softness );
		temp = Loc( :temperature << get values(), :temperature );
		Mean( (:count << get values())[matches( soft, temp )] );
	)
);
1 ACCEPTED SOLUTION

Accepted Solutions
julian
Community Manager Community Manager

Re: Column formula to summarize column by multiple columns

Hi @ih,

 

I find myself doing just this thing pretty often. One way that works for me is to use a column formula like Col Mean(:column, byvariable), and for the second parameter I use a concatenation of the two categorical columns directly (rather than creating any additional columns in the table). For the detergent example you mentioned this would be:

 

 

Col Mean( :count, :softness || :temperature )

 

 

Does this do what you need or is there something else you're looking for?

 

@julian

 

 

View solution in original post

3 REPLIES 3
julian
Community Manager Community Manager

Re: Column formula to summarize column by multiple columns

Hi @ih,

 

I find myself doing just this thing pretty often. One way that works for me is to use a column formula like Col Mean(:column, byvariable), and for the second parameter I use a concatenation of the two categorical columns directly (rather than creating any additional columns in the table). For the detergent example you mentioned this would be:

 

 

Col Mean( :count, :softness || :temperature )

 

 

Does this do what you need or is there something else you're looking for?

 

@julian

 

 

ih
Super User (Alumni) ih
Super User (Alumni)

Re: Column formula to summarize column by multiple columns

This is perfect, and so obvious once you point it out.

Thanks!

julian
Community Manager Community Manager

Re: Column formula to summarize column by multiple columns

Great, I'm glad that helps! I should have mentioned that in recent versions of JMP (I think since JMP 11) you can have multiple column parameters directly in ColMean(). So, rather than concatenating you could have:

 

Col Mean( :count, :softness, :temperature )

 

I'm so used to concatenating that I still do that. Old habits die hard :)

 

Additionally, in JMP 12 and on, with Instant Formulas in the data table you can select those columns in the table, right-click one of their headers, select New Formula Column > Group By. 

groupby.png

 

 

Then, select the count column, right-click the header, select New Formula Column > Aggregate > Mean

 

aggregate.png

 

 

 

This will design the formula for you automatically; very friendly for your new users.