BookmarkSubscribe
Choose Language Hide Translation Bar
Community Trekker

## Stats of two columns combined, grouped by third

I have been trying to run basic stats (Mean, StDev, Range) on a 2million record set of data.  The catch is that I need to average two columns, not just one.  Is this possible?  The number of records for each group will vary. Tabulate will average Val1 and Val2 separately, I need them averaged together.  The data table attached should average Val1 and Val2 based on each ID.

Results should be:

 IDA Avg2.000 StDev0.8944 Range2 B 3.500 1.0488 3 C 2.625 1.4079 4 D 2.700 1.9465 5

Is there JSL code that I can use to get the results I'm looking for?

1 ACCEPTED SOLUTION

Accepted Solutions
Super User

## Re: Stats of two columns combined, grouped by third

You will need to stack the columns.  Here is a simple script for illustration

``````Names default to here(1);
dt=New Table( "JMP QUESTION",
New Column( "ID",
Character( 1 ),
"Nominal",
Set Values(
{"A", "A", "A", "B", "B", "B", "C", "C", "C", "C", "D", "D", "D", "D",
"D"}
),
Set Display Width( 57 )
),
New Column( "Val1",
Numeric,
"Continuous",
Format( "Best", 12 ),
Set Values( [1, 2, 3, 4, 5, 3, 2, 1, 3, 4, 5, 3, 6, 2, 1] ),
Set Display Width( 57 )
),
New Column( "Val2",
Numeric,
"Continuous",
Format( "Best", 12 ),
Set Values( [1, 3, 2, 4, 3, 2, 1, 2, 5, 3, 1, 2, 5, 1, 1] ),
Set Display Width( 57 )
)
);

// Stack the data
dtStack = dt << Stack(invisible,
columns( :Val1, :Val2 ),
Source Label Column( "Label" ),
Stacked Data Column( "Val 1,2" ),
Name( "Non-stacked columns" )(Keep( :ID ))
);

// Generate the Statistics
dtStack << Tabulate(
Show Control Panel( 0 ),
Column Table( Statistics( Mean ), Analysis Columns( :Name( "Val 1,2" ) ) ),
Column Table(
Statistics( Std Dev ),
Analysis Columns( :Name( "Val 1,2" ) )
),
Column Table( Statistics( Range ), Analysis Columns( :Name( "Val 1,2" ) ) ),
Row Table( Grouping Columns( :ID ) )
)
);``````
Jim
3 REPLIES 3
Super User

## Re: Stats of two columns combined, grouped by third

You will need to stack the columns.  Here is a simple script for illustration

``````Names default to here(1);
dt=New Table( "JMP QUESTION",
New Column( "ID",
Character( 1 ),
"Nominal",
Set Values(
{"A", "A", "A", "B", "B", "B", "C", "C", "C", "C", "D", "D", "D", "D",
"D"}
),
Set Display Width( 57 )
),
New Column( "Val1",
Numeric,
"Continuous",
Format( "Best", 12 ),
Set Values( [1, 2, 3, 4, 5, 3, 2, 1, 3, 4, 5, 3, 6, 2, 1] ),
Set Display Width( 57 )
),
New Column( "Val2",
Numeric,
"Continuous",
Format( "Best", 12 ),
Set Values( [1, 3, 2, 4, 3, 2, 1, 2, 5, 3, 1, 2, 5, 1, 1] ),
Set Display Width( 57 )
)
);

// Stack the data
dtStack = dt << Stack(invisible,
columns( :Val1, :Val2 ),
Source Label Column( "Label" ),
Stacked Data Column( "Val 1,2" ),
Name( "Non-stacked columns" )(Keep( :ID ))
);

// Generate the Statistics
dtStack << Tabulate(
Show Control Panel( 0 ),
Column Table( Statistics( Mean ), Analysis Columns( :Name( "Val 1,2" ) ) ),
Column Table(
Statistics( Std Dev ),
Analysis Columns( :Name( "Val 1,2" ) )
),
Column Table( Statistics( Range ), Analysis Columns( :Name( "Val 1,2" ) ) ),
Row Table( Grouping Columns( :ID ) )
)
);``````
Jim
Highlighted
Community Trekker

## Re: Stats of two columns combined, grouped by third

@txnelson That gets me what I need.  Sadly, my laptop doesn't have the resources to process the full 2 million rows, but I can cut that down and get where I need to go.

Thanks a million!

Super User

## Re: Stats of two columns combined, grouped by third

There are a couple of things that might allow you to run against all 2,000,000 rows.
1. Make sure when you are stacking the data, you only stack the necessary columns. In the illustration I provided, only the ID column is carried along with the stacking columns.
2. You can change the "Invisible" to "Private" in the stack code. This will make a data table, but it will not create any of the Windowing code, so you will not be able to see it.
3. You can Compress the column, (Column Property List Check)  which will save the data in less space

Jim