Solved: Re: jsl to create table subset of most frequent values in a column

rrtx_mike · Mar 18, 2016 11:18 AM

Hello All,

I'm a new user who is still learning the basic fundamentals and usually I can find the answers I need by searching but this one is eluding me.

I'm looking for a script to create a table subset of the top 10 most frequently occurring values of a selected column. My current work around is to create a summary table, sort the summary table by descending value, select the top ten rows, and then go back to the first table and create a subset of the selected rows. I'm certain there is an elegant solution but I just can't find it. Thanks!

txnelson · Mar 18, 2016 03:50 PM

Here is a sample script that works using only the top 5 selections, since the sample data table is so small. But I think you can see how it works.

Names default to here(1);

dt=open("$SAMPLE_DATA/big class.jmp");

dtsumm = dt << Summary(

Group( :height ),

Freq( "None" ),

Weight( "None" )

);

dtsumm << Sort( By( :N Rows ), Order( Descending ), Replace Table(1) );

dtsumm << select where(row()>5)

dtsumm << delete rows;

dtfinal = dt << Join(with(dtsumm) ,

By Matching Columns( :height = :height ),

Drop multiples( 0, 0 ),

Include Nonmatches( 0, 0 ),

Preserve main table order( 1 )

);

Jim

View solution in original post

txnelson · Mar 18, 2016 03:50 PM

Here is a sample script that works using only the top 5 selections, since the sample data table is so small. But I think you can see how it works.

Names default to here(1);

dt=open("$SAMPLE_DATA/big class.jmp");

dtsumm = dt << Summary(

Group( :height ),

Freq( "None" ),

Weight( "None" )

);

dtsumm << Sort( By( :N Rows ), Order( Descending ), Replace Table(1) );

dtsumm << select where(row()>5)

dtsumm << delete rows;

dtfinal = dt << Join(with(dtsumm) ,

By Matching Columns( :height = :height ),

Drop multiples( 0, 0 ),

Include Nonmatches( 0, 0 ),

Preserve main table order( 1 )

);

Jim

rrtx_mike · Mar 20, 2016 09:36 AM

Thanks Jim, this worked perfectly.

ms · Mar 18, 2016 08:11 PM

One alternative would would be use Summarize() instead of a summary table, and then also skip the sorting step by utilizing the Rank() function. Unfortunately, the stat options in Summarize is limited to numeric columns; the below sample script will thus not work for character columns.

dt = Open("$SAMPLE_DATA/big class.jmp");

n = 3; // Nr of most frequent values

Summarize(g = by(:age), c = Count(:age));

dt << subset(rows(dt << select where(Contains(Reverse(g[Rank(c)])[1 :: n], Char(:age)))));

rrtx_mike · Mar 20, 2016 09:40 AM

Hi MS,

Thanks for your response, I should have mentioned in my original post that the data I was interested in was character and not numeric. But I am bookmarking your reply as I will need to perform the same summary with numeric data in the future.