Choose Language Hide Translation Bar
Highlighted

## Find column in which one row is greatest or lowest compared to other rows.

This seems like an easy question to solve but I'm kinda stuck.

I have a dataset with 250 rows and roughly 20K columns. The rows are split into two categories : Good ( 249 rows)  and Bad (1 row). I need to find all the columns in which the 'BAD' row has the higher/lower value than compared to rest of the 'GOOD' rows.

Can you suggest any efficient way of doing this?

Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

## Re: Find column in which one row is greatest or lowest compared to other rows.

Here is how I would do it, assuming you have a column that identifies the 1 Bad row, and all of the 249 Good rows.

1. Stack all of your 20k rows, keeping the column that identifies the Bad row and all of the Good rows

Tables==>Stack

2. Summarize the resulting Stacked table

a. Group by the new column "Label"

b. Subgroup by the column that identifies the Bad and Good data

c. Specifiy to calculate the Min and Max for the column "Data"

3. In the resulting Summary table

a. Add a new column called "Min Max" using the formula

``````If(
:Name( "Max(Data,Bad)" ) > :Name( "Max(Data,Good)" ), "Max",
:Name( "Min(Data,Bad)" ) < :Name( "Min(Data,Good)" ), "Min",
"Neither"
)``````

Below is an example script that will create the above results.  It uses an example data table with only 4 columns, however, it will work for your 20k table by changing the name in the script that points to the column with the Bad and Good rows(Called "GoodBad" in the Example data table), to the name of the column in your data table that identifies the Bad and Good data.

The script also assumes that all numeric columns are to be compared.

``````Names Default To Here( 1 );

// Create a simple data table for illustration
dt = New Table( "simple",
New Column( "A", Set Values( [59, 61, 55, 66, 52, 60, 61, 51, 60, 61] ) ),
New Column( "B",
Format( "Fixed Dec", 5, 0 ),
Set Values( [22, 123, 74, 145, 64, 84, 128, 79, 112, 107] )
),
New Column( "C",
Set Values( [110, 96, 100, 92, 92, 97, 98, 94, 91, 94] ),
),
Character,
Set Values(
{"Bad", "Good", "Good", "Good", "Good", "Good", "Good", "Good", "Good", "Good"}
)
)
);

// Get All Numeric Columns
numericCols = dt << get column names(numeric);

// Stack all of the numeric
dtStack = dt << Stack(invisible,
columns( numericCols ),
Source Label Column( "Label" ),
Stacked Data Column( "Data" )
);

// Summarize the data
dtSum = dtStack << Summary(
Group( :Label ),
Min( :Data ),
Max( :Data ),
Freq( "None" ),
Weight( "None" ),
Link to original data table( 0 )
);// Delete the no longer needed Stacked data table

close( dtStack, nosave );// Add the column that determines if Bad column is < or > Good
dtSum << New Column(
"Max or Min",
formula(
If(
:Name( "Max(Data,Bad)" ) > :Name( "Max(Data,Good)" ), "Max",
:Name( "Min(Data,Bad)" ) < :Name( "Min(Data,Good)" ), "Min",
"Neither"
)
)
);``````

Jim
2 REPLIES 2
Highlighted

## Re: Find column in which one row is greatest or lowest compared to other rows.

Here is how I would do it, assuming you have a column that identifies the 1 Bad row, and all of the 249 Good rows.

1. Stack all of your 20k rows, keeping the column that identifies the Bad row and all of the Good rows

Tables==>Stack

2. Summarize the resulting Stacked table

a. Group by the new column "Label"

b. Subgroup by the column that identifies the Bad and Good data

c. Specifiy to calculate the Min and Max for the column "Data"

3. In the resulting Summary table

a. Add a new column called "Min Max" using the formula

``````If(
:Name( "Max(Data,Bad)" ) > :Name( "Max(Data,Good)" ), "Max",
:Name( "Min(Data,Bad)" ) < :Name( "Min(Data,Good)" ), "Min",
"Neither"
)``````

Below is an example script that will create the above results.  It uses an example data table with only 4 columns, however, it will work for your 20k table by changing the name in the script that points to the column with the Bad and Good rows(Called "GoodBad" in the Example data table), to the name of the column in your data table that identifies the Bad and Good data.

The script also assumes that all numeric columns are to be compared.

``````Names Default To Here( 1 );

// Create a simple data table for illustration
dt = New Table( "simple",
New Column( "A", Set Values( [59, 61, 55, 66, 52, 60, 61, 51, 60, 61] ) ),
New Column( "B",
Format( "Fixed Dec", 5, 0 ),
Set Values( [22, 123, 74, 145, 64, 84, 128, 79, 112, 107] )
),
New Column( "C",
Set Values( [110, 96, 100, 92, 92, 97, 98, 94, 91, 94] ),
),
Character,
Set Values(
{"Bad", "Good", "Good", "Good", "Good", "Good", "Good", "Good", "Good", "Good"}
)
)
);

// Get All Numeric Columns
numericCols = dt << get column names(numeric);

// Stack all of the numeric
dtStack = dt << Stack(invisible,
columns( numericCols ),
Source Label Column( "Label" ),
Stacked Data Column( "Data" )
);

// Summarize the data
dtSum = dtStack << Summary(
Group( :Label ),
Min( :Data ),
Max( :Data ),
Freq( "None" ),
Weight( "None" ),
Link to original data table( 0 )
);// Delete the no longer needed Stacked data table

close( dtStack, nosave );// Add the column that determines if Bad column is < or > Good
dtSum << New Column(
"Max or Min",
formula(
If(
:Name( "Max(Data,Bad)" ) > :Name( "Max(Data,Good)" ), "Max",
:Name( "Min(Data,Bad)" ) < :Name( "Min(Data,Good)" ), "Min",
"Neither"
)
)
);``````

Jim
Highlighted

## Re: Find column in which one row is greatest or lowest compared to other rows.

@txnelson , Jim, provided you with a good, easy to follow solution that can be done with the JMP interface and column functions.  Below is an alternate solution using JSL and matrices.

I am not sure which approach would have the best performance. If you are not familair with JMP matrix notation, nor JSL, consider this response just an FYI for others on the JMP Community.

``````Names Default To Here( 1 );

// Create a simple data table for illustration
dt = New Table( "simple",
New Column( "A", Set Values( [59, 66, 55, 66, 52, 60, 61, 51, 60, 61] ) ),
New Column( "B",
Format( "Fixed Dec", 5, 0 ),
Set Values( [123, 22, 74, 145, 64, 84, 128, 79, 112, 107] )
),
New Column( "C",
Set Values( [96, 110, 100, 92, 92, 97, 98, 94, 91, 94] ),
)
);

cnames = dt << get column names("string"); //need to remove column names that are not to be tested//typically cnames = cnames[n::n+249999]

//Matrix of bad row - assuming you know which row is bad, row 2.
bmat = dt[2,1::3];
glist = Remove( AsList(1::nrow(dt)), 2);
gmax = V Max(dt[ glist, 1::3 ]);  //returns a row vector of each column max
gmin = V Min(dt[ glist, 1::3 ]);  //returns a row vector of each column min
mxcol = loc(bmat - gmax > 0);     //returns a row vector of column # where bmat > gmax
mncol = Loc(bmat - gmin < 0);     //returns a row vector of column # where bmat < gmin
HiCol = cnames[transpose(mxcol)];
LoCol = cnames[transpose(mncol)];
show(HiCol, LoCol);
/*:

HiCol = {"C"};
LoCol = {"B"};``````
Article Labels

There are no labels assigned to this post.