Showing results for

- JMP User Community
- :
- Discussions
- :
- Summary statistics with multiple binning columns.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Oct 13, 2017 9:34 AM
(910 views)

I often summarize a column by some combination of other categorical columns using a formula column. For example in the Detergent sample data I might want to know the average count for each combination of softness and temperature. Below are three methods to do this, which one I use depends on the size of the data table and who my audience is. Is there an easier way?

I shy away from an add-in containing a custom 'Col Mean by Multiple Columns()' function due to portability issues.

**Tabluate**

Create summary statistics in a new table using tabulate and then join the results to the original data table.

Pro: Easy to make very complex summary statistics.

Con: Not live, need to refresh the data if a referenced column changes during an analysis, no way for users to follow the source of the data.

**Binning Formula**

Create a binning formula and then use the second argument of the Col Mean() function to refrence that bin.

Pro: Easy to set up and relatively easy for users to follow.

Con: Creates extra columns that clutter the data table and have no use except for this formula, especially when creating many summary columns.

```
dt = Open( "$sample_data/Detergent.jmp" );
//binning formula
dt << New Column(
"Bin by Temp and Softness",
Character,
"Nominal",
Formula( :softness || "-" || :temperature )
);
//summary statistic
dt << New Column(
"Mean Count by Temp and Softness Binned",
Numeric,
"Continuous",
Format( "Best", 12 ),
Formula( Col Mean( :count, :Bin by Temp and Softness) )
);
```

**Identify bins and calculate summary statistic using JSL inside a formula**

Identify similar rows in the data table then calculate the summary statistic on that data. I suspect JMP does not cache results using this method so the whole table is re-evaluted for every row. In any case it is slow for large data tables.

Pro: Tons of flexibility

Cons: Slow calculations, very hard for users to follow.

```
//summary statistic by bin
dt << New Column(
"Mean Count by Temp and Softness Direct",
Numeric,
"Continuous",
Format( "Best", 12 ),
Formula(
matches = Function( {a, b},
out = {};
For( i = 1, i <= N Items( a ), i++,
found = "no";
For( j = 1, j <= N Items( b ), j++,
If( a[i] == b[j],
found = "yes"
)
);
If( found == "yes",
out = Insert( out, a[i] )
);
);
out;
);
soft = Loc( :softness << get values(), :softness );
temp = Loc( :temperature << get values(), :temperature );
Mean( (:count << get values())[matches( soft, temp )] );
)
);
```

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Oct 13, 2017 9:47 AM
(1798 views)

Solution

Hi @ih,

I find myself doing just this thing pretty often. One way that works for me is to use a column formula like Col Mean(:column, byvariable), and for the second parameter I use a concatenation of the two categorical columns directly (rather than creating any additional columns in the table). For the detergent example you mentioned this would be:

`Col Mean( :count, :softness || :temperature )`

Does this do what you need or is there something else you're looking for?

3 REPLIES

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Oct 13, 2017 9:47 AM
(1799 views)

Hi @ih,

I find myself doing just this thing pretty often. One way that works for me is to use a column formula like Col Mean(:column, byvariable), and for the second parameter I use a concatenation of the two categorical columns directly (rather than creating any additional columns in the table). For the detergent example you mentioned this would be:

`Col Mean( :count, :softness || :temperature )`

Does this do what you need or is there something else you're looking for?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Oct 13, 2017 9:55 AM
(897 views)

This is perfect, and so obvious once you point it out.

Thanks!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Oct 13, 2017 10:01 AM
(891 views)

Great, I'm glad that helps! I should have mentioned that in recent versions of JMP (I think since JMP 11) you can have multiple column parameters directly in ColMean(). So, rather than concatenating you could have:

`Col Mean( :count, :softness, :temperature )`

I'm so used to concatenating that I still do that. Old habits die hard :)

Additionally, in JMP 12 and on, with Instant Formulas in the data table you can select those columns in the table, right-click one of their headers, select New Formula Column > Group By.

Then, select the count column, right-click the header, select New Formula Column > Aggregate > Mean

This will design the formula for you automatically; very friendly for your new users.