Solved: Struggling with extracting the second largest values for each column

amy · Jun 20, 2018 03:21 PM

I have been trying to extract second largest numbers from JMP tables. So far, I have only been manually doing it using

Sort descending( Eval List( {:X1, :X2, :X3}));

How can I just pass the entire table rather than individual column names?

txnelson · Jun 25, 2018 1:12 PM

Your comparison component in your IF() function has a logic issue.

If( (Abs( colMax )) > 300 | (Abs( colMax )) > 250

any value that is > 250 will make the comparison true, therefore, the >300 is not needed.

The comparison only needs to be:

If( ( Abs( colMax ) > 250,

You ask why you are getting values of 193? That is because the values being printed out are the second highest value for the column, and your filtering is based upon the maximum value in the column. The second highest value can be of any value. If you had the following values in a column

82

15

600

44

193

The maximum column value is 600, so it would pass your IF() clause of Max Col() > 250, and the second highest value in the colum is 193

Concerning your question about "how can I find the row number of the second highest value i.e colSecondMaxValues?", the way to do this is already in the script. The statement

maxRows = dt << get rows where( As Column( dt, colNamesList[i] ) == colMax );

returns a matrix of the row numbers for the rows that have the Maximum values in the column. If you want the row numbers for the second highest value, you just need to get the rows where they equal the second max value. So if you add this statement after the Insert Into() function, you will get the row(s) that have the second highest value

Show(
			dt << get rows where(
				As Column( dt, colNamesList[i] ) == Col Max( Column( dt, colNamesList[i] ) )
			)
		);

Here is the complete code

dt = Open( "$SAMPLE_DATA/semiconductor capability.jmp" );

colNamesList = dt << get column names( string, continuous, numeric );
colSecondMaxValues = {};

// Loop across all continuous columns and find the Max value in
// a column, delete that value, and then find the second max value
// finally, replace the old values, and save the results
For( i = 1, i <= N Items( colNamesList ), i++,
	colMax = Col Max( Column( dt, colNamesList[i] ) );
	If( Abs( colMax ) > 250,
		maxRows = dt << get rows where( As Column( dt, colNamesList[i] ) == colMax );
		Show( maxRows );
		Column( dt, colNamesList[i] )[maxRows] = .;
		Insert Into( colSecondMaxValues, Col Max( Column( dt, colNamesList[i] ) ) );
		Show(
			dt << get rows where(
				As Column( dt, colNamesList[i] ) == Col Max( Column( dt, colNamesList[i] ) )
			)
		);
		Show( colSecondMaxValues );
		Column( dt, colNamesList[i] )[maxRows] = colMax;
	);
);

Jim

View solution in original post

txnelson · Jun 20, 2018 04:32 PM

Here is a little script that I wrote that determines the second highest value across all continuous columns in a data table

names default to here(1);
dt=open("$SAMPLE_DATA/semiconductor capability.jmp");

colNamesList= dt<<get column names(string,continuous,numeric);
colSecondMaxValues = {};

// Loop across all continuous columns and find the Max value in
// a column, delete that value, and then find the second max value
// finally, replace the old values, and save the results
For(i=1,i<=nitems(colNamesList),i++,
	colMax=col max(column(dt,colNamesList[i]));
	maxRows = dt<<get rows where(as column(dt,colNamesList[i])==colMax);
	column(dt,colNamesList[i])[maxRows]=.;
	insert into(colSecondMaxValues, col max(column(dt,colNamesList[i])));
	column(dt,colNamesList[i])[maxRows]=colMax;
	);
	
nw = New Window( "Second Max Values",
	Outline Box( "Table of Second Max Values",
		Table Box(
			String Col Box( "Column", colNamesList ),
			Number Col Box( "Second Max Value", colSecondMaxValues )
		)
	)
);

Jim

amy · Jun 25, 2018 03:18 PM

@txnelson Thank you for the script. I am trying to make a greater than 300 or 250 condition work for this script and I am unable to understand why it still takes values like 193. Also, how can I find the row number of the second highest value i.e colSecondMaxValues? Thank you very much for your help.

dt=open("$SAMPLE_DATA/semiconductor capability.jmp");

colNamesList= dt<<get column names(string,continuous,numeric);
colSecondMaxValues = {};

// Loop across all continuous columns and find the Max value in
// a column, delete that value, and then find the second max value
// finally, replace the old values, and save the results
For(i=1,i<=nitems(colNamesList),i++,
	colMax=col max(column(dt,colNamesList[i]));
	If((abs(colMax)) > 300 | (abs(colMax)) > 250,
	maxRows = dt<<get rows where(as column(dt,colNamesList[i])==colMax);
	show(maxRows);
	column(dt,colNamesList[i])[maxRows]=.;
	insert into(colSecondMaxValues, col max(column(dt,colNamesList[i])));
	show(colSecondMaxValues);
	column(dt,colNamesList[i])[maxRows]=colMax;
	)
	);

txnelson · Jun 25, 2018 1:12 PM

Your comparison component in your IF() function has a logic issue.

If( (Abs( colMax )) > 300 | (Abs( colMax )) > 250

any value that is > 250 will make the comparison true, therefore, the >300 is not needed.

The comparison only needs to be:

If( ( Abs( colMax ) > 250,

You ask why you are getting values of 193? That is because the values being printed out are the second highest value for the column, and your filtering is based upon the maximum value in the column. The second highest value can be of any value. If you had the following values in a column

82

15

600

44

193

The maximum column value is 600, so it would pass your IF() clause of Max Col() > 250, and the second highest value in the colum is 193

Concerning your question about "how can I find the row number of the second highest value i.e colSecondMaxValues?", the way to do this is already in the script. The statement

maxRows = dt << get rows where( As Column( dt, colNamesList[i] ) == colMax );

returns a matrix of the row numbers for the rows that have the Maximum values in the column. If you want the row numbers for the second highest value, you just need to get the rows where they equal the second max value. So if you add this statement after the Insert Into() function, you will get the row(s) that have the second highest value

Show(
			dt << get rows where(
				As Column( dt, colNamesList[i] ) == Col Max( Column( dt, colNamesList[i] ) )
			)
		);

Here is the complete code

dt = Open( "$SAMPLE_DATA/semiconductor capability.jmp" );

colNamesList = dt << get column names( string, continuous, numeric );
colSecondMaxValues = {};

// Loop across all continuous columns and find the Max value in
// a column, delete that value, and then find the second max value
// finally, replace the old values, and save the results
For( i = 1, i <= N Items( colNamesList ), i++,
	colMax = Col Max( Column( dt, colNamesList[i] ) );
	If( Abs( colMax ) > 250,
		maxRows = dt << get rows where( As Column( dt, colNamesList[i] ) == colMax );
		Show( maxRows );
		Column( dt, colNamesList[i] )[maxRows] = .;
		Insert Into( colSecondMaxValues, Col Max( Column( dt, colNamesList[i] ) ) );
		Show(
			dt << get rows where(
				As Column( dt, colNamesList[i] ) == Col Max( Column( dt, colNamesList[i] ) )
			)
		);
		Show( colSecondMaxValues );
		Column( dt, colNamesList[i] )[maxRows] = colMax;
	);
);

Jim

ih · Jun 20, 2018 11:19 PM

If you want the second largest value in the whole table, reguardless of which column it is in:

Names default to here(1);
dt = open( "$SAMPLE_DATA/semiconductor capability.jmp" );

//Find the two largest values in the entire table
Sort Descending(Shape(dt << Get all columns as matrix(),1))[{1, 2}];

//Or, to exclude character columns
Sort Descending(Shape(dt << Get as matrix(dt << get column names(numeric)),1))[{1, 2}];

amy · Jun 25, 2018 03:20 PM

@ih I am sorry, but I need the values from every column and not an entire table.

Struggling with extracting the second largest values for each column

Re: Struggling with extracting the second largest values for each column

Re: Struggling with extracting the second largest values for each column

Re: Struggling with extracting the second largest values for each column

Re: Struggling with extracting the second largest values for each column

Re: Struggling with extracting the second largest values for each column

Re: Struggling with extracting the second largest values for each column