cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
Dzungtran
Level I

Filter the most frequent value in multiple columns

Hi everyone,

 

My goal is to select only the rows that have the most common values (character type) in each column.

 

For instance, my table has four columns: A, B, C, and D. The values that appear the most in columns A, B, C, and D are A1, B2, C3, and D4, respectively. How can I write a JSL code that filters the rows by these values and creates a new table with the filtered data? Thank you.

 

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
txnelson
Super User

Re: Filter the most frequent value in multiple columns

Here is a script that should give you a leg up in solving your issue

Names Default To Here( 1 );

// build an example data table
dt = New Table( "Example",
	add rows( 20 ),
	New Column( "A",
		character,
		formula(
			rand = Random Uniform( 0, 100 );
			If(
				rand >= 0 & rand <= 40, x = 1,
				rand >= 40 & rand <= 60, x = 2,
				rand >= 60 & rand <= 80, x = 3,
				x = 4
			);
			"A" || Char( x );
		)
	),
	New Column( "B",
		character,
		formula(
			rand = Random Uniform( 0, 100 );
			If(
				rand >= 0 & rand <= 20, x = 1,
				rand >= 20 & rand <= 60, x = 2,
				rand >= 60 & rand <= 80, x = 3,
				x = 4
			);
			"B" || Char( x );
		)
	),
	New Column( "C",
		character,
		formula(
			rand = Random Uniform( 0, 100 );
			If(
				rand >= 0 & rand <= 20, x = 1,
				rand >= 20 & rand <= 40, x = 2,
				rand >= 40 & rand <= 80, x = 3,
				x = 4
			);
			"C" || Char( x );
		)
	),
	New Column( "D",
		character,
		formula(
			rand = Random Uniform( 0, 100 );
			If(
				rand >= 0 & rand <= 20, x = 1,
				rand >= 20 & rand <= 40, x = 2,
				rand >= 40 & rand <= 60, x = 3,
				x = 4
			);
			"D" || Char( x );
		)
	)
);


// The JSL below is the code that does the actual work
Data Table( "Example" ) << Stack(
	columns( :A, :B, :C,  ),
	Source Label Column( "Column" ),
	Stacked Data Column( "Data" ),
	Output Table( "Stacked" )
);

dtS = Data Table( "Stacked" ) << Summary(
	Group( :Column, :Data ),
	Freq( "None" ),
	Weight( "None" ),
	output table name( "Summary" )
);

dtS << select where( :N Rows[Row()] == col max(:N Rows, :Data));
dtS << new column("Mode", formula(
	Col Max(:N Rows, :Column)
));
dtS << select where( :N Rows == :Mode);
dtS << invert row selection << delete rows;

eval(parse(
"dt << Select Where (dt:\!"" || dts:column[1]  || "\!"n == dtS:Data[1] );"));

For( i=2,i<=NRows(dtS), i++,
	eval(parse(
	"dt << Select Where (dt:\!"" || dts:column[i]  || "\!"n == dtS:Data[i] , current selection(\!"extend\!"));"));
);

dtFinal = dt << Subset( selected columns(0), selected rows(1), output table("Final"));
	


Jim

View solution in original post

2 REPLIES 2
jthi
Super User

Re: Filter the most frequent value in multiple columns

If you had data like this where (COL -> Most appearances) is A -> A1, B -> B2, C -> C1 (in reality C has a tie, how should these be handled?)

jthi_0-1696053459763.png

Would you wish to only keep row1 as it has A1, B2 and C1? And if there were more rows with the same values, would you want to keep them all? Or are you looking for something totally different, for example do you want to have a table which just has the values which appear the most?

-Jarmo
txnelson
Super User

Re: Filter the most frequent value in multiple columns

Here is a script that should give you a leg up in solving your issue

Names Default To Here( 1 );

// build an example data table
dt = New Table( "Example",
	add rows( 20 ),
	New Column( "A",
		character,
		formula(
			rand = Random Uniform( 0, 100 );
			If(
				rand >= 0 & rand <= 40, x = 1,
				rand >= 40 & rand <= 60, x = 2,
				rand >= 60 & rand <= 80, x = 3,
				x = 4
			);
			"A" || Char( x );
		)
	),
	New Column( "B",
		character,
		formula(
			rand = Random Uniform( 0, 100 );
			If(
				rand >= 0 & rand <= 20, x = 1,
				rand >= 20 & rand <= 60, x = 2,
				rand >= 60 & rand <= 80, x = 3,
				x = 4
			);
			"B" || Char( x );
		)
	),
	New Column( "C",
		character,
		formula(
			rand = Random Uniform( 0, 100 );
			If(
				rand >= 0 & rand <= 20, x = 1,
				rand >= 20 & rand <= 40, x = 2,
				rand >= 40 & rand <= 80, x = 3,
				x = 4
			);
			"C" || Char( x );
		)
	),
	New Column( "D",
		character,
		formula(
			rand = Random Uniform( 0, 100 );
			If(
				rand >= 0 & rand <= 20, x = 1,
				rand >= 20 & rand <= 40, x = 2,
				rand >= 40 & rand <= 60, x = 3,
				x = 4
			);
			"D" || Char( x );
		)
	)
);


// The JSL below is the code that does the actual work
Data Table( "Example" ) << Stack(
	columns( :A, :B, :C,  ),
	Source Label Column( "Column" ),
	Stacked Data Column( "Data" ),
	Output Table( "Stacked" )
);

dtS = Data Table( "Stacked" ) << Summary(
	Group( :Column, :Data ),
	Freq( "None" ),
	Weight( "None" ),
	output table name( "Summary" )
);

dtS << select where( :N Rows[Row()] == col max(:N Rows, :Data));
dtS << new column("Mode", formula(
	Col Max(:N Rows, :Column)
));
dtS << select where( :N Rows == :Mode);
dtS << invert row selection << delete rows;

eval(parse(
"dt << Select Where (dt:\!"" || dts:column[1]  || "\!"n == dtS:Data[1] );"));

For( i=2,i<=NRows(dtS), i++,
	eval(parse(
	"dt << Select Where (dt:\!"" || dts:column[i]  || "\!"n == dtS:Data[i] , current selection(\!"extend\!"));"));
);

dtFinal = dt << Subset( selected columns(0), selected rows(1), output table("Final"));
	


Jim