Subscribe Bookmark RSS Feed

jsl to create table subset of most frequent values in a column

rrtx_mike

Community Trekker

Joined:

Mar 18, 2016

Hello All,

I'm a new user who is still learning the basic fundamentals and usually I can find the answers I need by searching but this one is eluding me.

I'm looking for a script to create a table subset of the top 10 most frequently occurring values of a selected column. My current work around is to create a summary table, sort the summary table by descending value, select the top ten rows, and then go back to the first table and create a subset of the selected rows. I'm certain there is an elegant solution but I just can't find it. Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
Solution

Here is a sample script that works using only the top 5 selections, since the sample data table is so small.  But I think you can see how it works.

 

Names default to here(1);

dt=open("$SAMPLE_DATA/big class.jmp");

 

dtsumm = dt << Summary( 

       Group( :height ), 

       Freq( "None" ), 

       Weight( "None" ) 

);

 

dtsumm  << Sort( By( :N Rows ), Order( Descending ), Replace Table(1) ); 

dtsumm << select where(row()>5) 

dtsumm << delete rows;

 

dtfinal = dt << Join(with(dtsumm) ,

       By Matching Columns( :height = :height ),

       Drop multiples( 0, 0 ), 

       Include Nonmatches( 0, 0 ), 

       Preserve main table order( 1 ) 

);

 

Jim
4 REPLIES
Solution

Here is a sample script that works using only the top 5 selections, since the sample data table is so small.  But I think you can see how it works.

 

Names default to here(1);

dt=open("$SAMPLE_DATA/big class.jmp");

 

dtsumm = dt << Summary( 

       Group( :height ), 

       Freq( "None" ), 

       Weight( "None" ) 

);

 

dtsumm  << Sort( By( :N Rows ), Order( Descending ), Replace Table(1) ); 

dtsumm << select where(row()>5) 

dtsumm << delete rows;

 

dtfinal = dt << Join(with(dtsumm) ,

       By Matching Columns( :height = :height ),

       Drop multiples( 0, 0 ), 

       Include Nonmatches( 0, 0 ), 

       Preserve main table order( 1 ) 

);

 

Jim
rrtx_mike

Community Trekker

Joined:

Mar 18, 2016

Thanks Jim, this worked perfectly.

ms

Super User

Joined:

Jun 23, 2011

One alternative would would be use Summarize() instead of a summary table, and then also skip the sorting step by utilizing the Rank() function. Unfortunately, the stat options in Summarize is limited to numeric columns; the below sample script will thus not work for character columns.

dt = Open("$SAMPLE_DATA/big class.jmp");

n = 3; // Nr of most frequent values

Summarize(g = by(:age), c = Count(:age));

dt << subset(rows(dt << select where(Contains(Reverse(g[Rank(c)])[1 :: n], Char(:age)))));


rrtx_mike

Community Trekker

Joined:

Mar 18, 2016

Hi MS,

Thanks for your response, I should have mentioned in my original post that the data I was interested in was character and not numeric. But I am bookmarking your reply as I will need to perform the same summary with numeric data in the future.