BookmarkSubscribe
Choose Language Hide Translation Bar
WendyLou315
Community Trekker

Stats of two columns combined, grouped by third

I have been trying to run basic stats (Mean, StDev, Range) on a 2million record set of data.  The catch is that I need to average two columns, not just one.  Is this possible?  The number of records for each group will vary. Tabulate will average Val1 and Val2 separately, I need them averaged together.  The data table attached should average Val1 and Val2 based on each ID. 

 

Results should be:

ID

A

Avg

2.000

StDev

0.8944

Range

2

B3.5001.0488
3
C2.6251.4079
4
D2.7001.9465
5

 

Is there JSL code that I can use to get the results I'm looking for?

0 Kudos
1 ACCEPTED SOLUTION

Accepted Solutions
txnelson
Super User

Re: Stats of two columns combined, grouped by third

You will need to stack the columns.  Here is a simple script for illustration

Names default to here(1);
dt=New Table( "JMP QUESTION",
	Add Rows( 15 ),
	New Column( "ID",
		Character( 1 ),
		"Nominal",
		Set Values(
			{"A", "A", "A", "B", "B", "B", "C", "C", "C", "C", "D", "D", "D", "D",
			"D"}
		),
		Set Display Width( 57 )
	),
	New Column( "Val1",
		Numeric,
		"Continuous",
		Format( "Best", 12 ),
		Set Values( [1, 2, 3, 4, 5, 3, 2, 1, 3, 4, 5, 3, 6, 2, 1] ),
		Set Display Width( 57 )
	),
	New Column( "Val2",
		Numeric,
		"Continuous",
		Format( "Best", 12 ),
		Set Values( [1, 3, 2, 4, 3, 2, 1, 2, 5, 3, 1, 2, 5, 1, 1] ),
		Set Display Width( 57 )
	)
);

// Stack the data
dtStack = dt << Stack(invisible,
	columns( :Val1, :Val2 ),
	Source Label Column( "Label" ),
	Stacked Data Column( "Val 1,2" ),
	Name( "Non-stacked columns" )(Keep( :ID ))
);

// Generate the Statistics 
dtStack << Tabulate(
	Show Control Panel( 0 ),
	Add Table(
		Column Table( Statistics( Mean ), Analysis Columns( :Name( "Val 1,2" ) ) ),
		Column Table(
			Statistics( Std Dev ),
			Analysis Columns( :Name( "Val 1,2" ) )
		),
		Column Table( Statistics( Range ), Analysis Columns( :Name( "Val 1,2" ) ) ),
		Row Table( Grouping Columns( :ID ) )
	)
);
Jim
0 Kudos
3 REPLIES 3
txnelson
Super User

Re: Stats of two columns combined, grouped by third

You will need to stack the columns.  Here is a simple script for illustration

Names default to here(1);
dt=New Table( "JMP QUESTION",
	Add Rows( 15 ),
	New Column( "ID",
		Character( 1 ),
		"Nominal",
		Set Values(
			{"A", "A", "A", "B", "B", "B", "C", "C", "C", "C", "D", "D", "D", "D",
			"D"}
		),
		Set Display Width( 57 )
	),
	New Column( "Val1",
		Numeric,
		"Continuous",
		Format( "Best", 12 ),
		Set Values( [1, 2, 3, 4, 5, 3, 2, 1, 3, 4, 5, 3, 6, 2, 1] ),
		Set Display Width( 57 )
	),
	New Column( "Val2",
		Numeric,
		"Continuous",
		Format( "Best", 12 ),
		Set Values( [1, 3, 2, 4, 3, 2, 1, 2, 5, 3, 1, 2, 5, 1, 1] ),
		Set Display Width( 57 )
	)
);

// Stack the data
dtStack = dt << Stack(invisible,
	columns( :Val1, :Val2 ),
	Source Label Column( "Label" ),
	Stacked Data Column( "Val 1,2" ),
	Name( "Non-stacked columns" )(Keep( :ID ))
);

// Generate the Statistics 
dtStack << Tabulate(
	Show Control Panel( 0 ),
	Add Table(
		Column Table( Statistics( Mean ), Analysis Columns( :Name( "Val 1,2" ) ) ),
		Column Table(
			Statistics( Std Dev ),
			Analysis Columns( :Name( "Val 1,2" ) )
		),
		Column Table( Statistics( Range ), Analysis Columns( :Name( "Val 1,2" ) ) ),
		Row Table( Grouping Columns( :ID ) )
	)
);
Jim
0 Kudos
Highlighted
WendyLou315
Community Trekker

Re: Stats of two columns combined, grouped by third

@txnelson That gets me what I need.  Sadly, my laptop doesn't have the resources to process the full 2 million rows, but I can cut that down and get where I need to go.

Thanks a million!

0 Kudos
txnelson
Super User

Re: Stats of two columns combined, grouped by third

There are a couple of things that might allow you to run against all 2,000,000 rows.
1. Make sure when you are stacking the data, you only stack the necessary columns. In the illustration I provided, only the ID column is carried along with the stacking columns.
2. You can change the "Invisible" to "Private" in the stack code. This will make a data table, but it will not create any of the Windowing code, so you will not be able to see it.
3. You can Compress the column, (Column Property List Check)  which will save the data in less space

Jim