BookmarkSubscribe
Choose Language Hide Translation Bar
Highlighted
D_T_M
Community Member

Find column in which one row is greatest or lowest compared to other rows.

This seems like an easy question to solve but I'm kinda stuck.

 

I have a dataset with 250 rows and roughly 20K columns. The rows are split into two categories : Good ( 249 rows)  and Bad (1 row). I need to find all the columns in which the 'BAD' row has the higher/lower value than compared to rest of the 'GOOD' rows.

Can you suggest any efficient way of doing this?

 

Thank you! 

0 Kudos
1 ACCEPTED SOLUTION

Accepted Solutions
txnelson
Super User

Re: Find column in which one row is greatest or lowest compared to other rows.

Here is how I would do it, assuming you have a column that identifies the 1 Bad row, and all of the 249 Good rows.

1. Stack all of your 20k rows, keeping the column that identifies the Bad row and all of the Good rows

     Tables==>Stack

2. Summarize the resulting Stacked table

     a. Group by the new column "Label"

     b. Subgroup by the column that identifies the Bad and Good data

     c. Specifiy to calculate the Min and Max for the column "Data"

3. In the resulting Summary table

     a. Add a new column called "Min Max" using the formula

If(
			:Name( "Max(Data,Bad)" ) > :Name( "Max(Data,Good)" ), "Max",
			:Name( "Min(Data,Bad)" ) < :Name( "Min(Data,Good)" ), "Min",
			"Neither"
		)

Below is an example script that will create the above results.  It uses an example data table with only 4 columns, however, it will work for your 20k table by changing the name in the script that points to the column with the Bad and Good rows(Called "GoodBad" in the Example data table), to the name of the column in your data table that identifies the Bad and Good data.

The script also assumes that all numeric columns are to be compared.

Names Default To Here( 1 );

// Create a simple data table for illustration
dt = New Table( "simple",
	Add Rows( 10 ),
	New Column( "A", Set Values( [59, 61, 55, 66, 52, 60, 61, 51, 60, 61] ) ),
	New Column( "B",
		Format( "Fixed Dec", 5, 0 ),
		Set Values( [22, 123, 74, 145, 64, 84, 128, 79, 112, 107] )
	),
	New Column( "C",
		Set Values( [110, 96, 100, 92, 92, 97, 98, 94, 91, 94] ),
	),
	New Column( "GoodBad",
		Character,
		Set Values(
			{"Bad", "Good", "Good", "Good", "Good", "Good", "Good", "Good", "Good", "Good"}
		)
	)
);

// Get All Numeric Columns
numericCols = dt << get column names(numeric);

// Stack all of the numeric
dtStack = dt << Stack(invisible,
	columns( numericCols ),
	Source Label Column( "Label" ),
	Stacked Data Column( "Data" )
);

// Summarize the data
dtSum = dtStack << Summary(
	Group( :Label ),
	Min( :Data ),
	Max( :Data ),
	Subgroup( :GoodBad ),
	Freq( "None" ),
	Weight( "None" ),
	Link to original data table( 0 )
);

// Delete the no longer needed Stacked data table
close( dtStack, nosave );
// Add the column that determines if Bad column is < or > Good dtSum << New Column( "Max or Min", formula( If( :Name( "Max(Data,Bad)" ) > :Name( "Max(Data,Good)" ), "Max", :Name( "Min(Data,Bad)" ) < :Name( "Min(Data,Good)" ), "Min", "Neither" ) ) );

 

Jim
2 REPLIES 2
txnelson
Super User

Re: Find column in which one row is greatest or lowest compared to other rows.

Here is how I would do it, assuming you have a column that identifies the 1 Bad row, and all of the 249 Good rows.

1. Stack all of your 20k rows, keeping the column that identifies the Bad row and all of the Good rows

     Tables==>Stack

2. Summarize the resulting Stacked table

     a. Group by the new column "Label"

     b. Subgroup by the column that identifies the Bad and Good data

     c. Specifiy to calculate the Min and Max for the column "Data"

3. In the resulting Summary table

     a. Add a new column called "Min Max" using the formula

If(
			:Name( "Max(Data,Bad)" ) > :Name( "Max(Data,Good)" ), "Max",
			:Name( "Min(Data,Bad)" ) < :Name( "Min(Data,Good)" ), "Min",
			"Neither"
		)

Below is an example script that will create the above results.  It uses an example data table with only 4 columns, however, it will work for your 20k table by changing the name in the script that points to the column with the Bad and Good rows(Called "GoodBad" in the Example data table), to the name of the column in your data table that identifies the Bad and Good data.

The script also assumes that all numeric columns are to be compared.

Names Default To Here( 1 );

// Create a simple data table for illustration
dt = New Table( "simple",
	Add Rows( 10 ),
	New Column( "A", Set Values( [59, 61, 55, 66, 52, 60, 61, 51, 60, 61] ) ),
	New Column( "B",
		Format( "Fixed Dec", 5, 0 ),
		Set Values( [22, 123, 74, 145, 64, 84, 128, 79, 112, 107] )
	),
	New Column( "C",
		Set Values( [110, 96, 100, 92, 92, 97, 98, 94, 91, 94] ),
	),
	New Column( "GoodBad",
		Character,
		Set Values(
			{"Bad", "Good", "Good", "Good", "Good", "Good", "Good", "Good", "Good", "Good"}
		)
	)
);

// Get All Numeric Columns
numericCols = dt << get column names(numeric);

// Stack all of the numeric
dtStack = dt << Stack(invisible,
	columns( numericCols ),
	Source Label Column( "Label" ),
	Stacked Data Column( "Data" )
);

// Summarize the data
dtSum = dtStack << Summary(
	Group( :Label ),
	Min( :Data ),
	Max( :Data ),
	Subgroup( :GoodBad ),
	Freq( "None" ),
	Weight( "None" ),
	Link to original data table( 0 )
);

// Delete the no longer needed Stacked data table
close( dtStack, nosave );
// Add the column that determines if Bad column is < or > Good dtSum << New Column( "Max or Min", formula( If( :Name( "Max(Data,Bad)" ) > :Name( "Max(Data,Good)" ), "Max", :Name( "Min(Data,Bad)" ) < :Name( "Min(Data,Good)" ), "Min", "Neither" ) ) );

 

Jim
gzmorgan0
Super User

Re: Find column in which one row is greatest or lowest compared to other rows.

@D_T_M ,

 

@txnelson , Jim, provided you with a good, easy to follow solution that can be done with the JMP interface and column functions.  Below is an alternate solution using JSL and matrices.

 

I am not sure which approach would have the best performance. If you are not familair with JMP matrix notation, nor JSL, consider this response just an FYI for others on the JMP Community.

 

Names Default To Here( 1 );

// Create a simple data table for illustration
dt = New Table( "simple",
	Add Rows( 10 ),
	New Column( "A", Set Values( [59, 66, 55, 66, 52, 60, 61, 51, 60, 61] ) ),
	New Column( "B",
		Format( "Fixed Dec", 5, 0 ),
		Set Values( [123, 22, 74, 145, 64, 84, 128, 79, 112, 107] )
	),
	New Column( "C",
		Set Values( [96, 110, 100, 92, 92, 97, 98, 94, 91, 94] ),
	)
);

cnames = dt << get column names("string"); //need to remove column names that are not to be tested
//typically cnames = cnames[n::n+249999] //Matrix of bad row - assuming you know which row is bad, row 2. bmat = dt[2,1::3]; glist = Remove( AsList(1::nrow(dt))[1], 2); gmax = V Max(dt[ glist, 1::3 ]); //returns a row vector of each column max gmin = V Min(dt[ glist, 1::3 ]); //returns a row vector of each column min mxcol = loc(bmat - gmax > 0); //returns a row vector of column # where bmat > gmax mncol = Loc(bmat - gmin < 0); //returns a row vector of column # where bmat < gmin HiCol = cnames[transpose(mxcol)]; LoCol = cnames[transpose(mncol)]; show(HiCol, LoCol); /*: HiCol = {"C"}; LoCol = {"B"};
0 Kudos