Subscribe Bookmark RSS Feed

finding the mode

ron_horne

Super User

Joined:

Jun 23, 2011

hi everyone,

i would like to produce a table summery that gives the mode of a variable for each category of another variable.

ideally, this would work:

dt = open ("$SAMPLE_DATA/Mail Messages.jmp");
dt << summary ( Group ( :From ), Mode (:To) );

alternatively, i would stack the table by "From" and calculate the mode at each row. yet, "Mode" is not an option in Table summary or column formula.

any ideas calculating the mode are welcome including in sending to SAS.

i am using JMP 10.

thank you,

Ron

1 ACCEPTED SOLUTION

Accepted Solutions
Solution

Thank you Brady,

This is by far the most parsimonious and robust script.

I would like to add one optional line of code to disable the possibility of missing values as the mode:

dt = open ("$SAMPLE_DATA\Cars.jmp");

dt2 = dt << summary(group(:Make, :R Leg));

// option to disable the possibility of missing values as modes

dt2 << Select Where ( IsMissing (:R Leg)) << Delete Rows;

dt2 << selectwhere(colmaximum(:n rows, :make) != :n rows) << Delete Rows;

11 REPLIES

Hi - the mode is the most common item, so you're after the person (or presumably persons if there's a tie) who received the most number of letters from each sender, is that right?  If so, I think the following ought to do it, although I've no doubt there's a much more elegant and/or shorter way.  I've deliberately left all the intermediate calculations, and named all the tables to leave an audit trail:

dt1 = open ("$SAMPLE_DATA/Mail Messages.jmp");

dt1 << set name("dt1");

dt2 = dt1 << Summary( Group( :From ), N, Subgroup( :To ) );

dt2 << set name("dt2");

Sent_From = (dt2 << get column names);

show(Sent_From);

// Throw away the first two elements of this list;

remove from(Sent_From, {1, 2});

show(Sent_From);

// Stack them up;

dt3 = dt2 << Stack(columns(eval(Sent_From)), Source Label Column( "Label" ), Stacked Data Column( "Data" ));

dt3 << set name("dt3");

// dt3 now contains the number of letters sent from each person to each recipient;

dt4 = dt3 << Summary( Group( :From ), Max( :Data ) );

dt4 << set name("dt4");

// dt4 now contains the maximum number of letters sent from each person to any one recipient;

dt5 = dt4 << Join( With( dt3 ),

       SelectWith(:From, :Label, :Data ),

       By Matching Columns( :From = :From, :Name( "Max(Data)" ) = :Data ),

       Drop multiples( 0, 0 ),

       Name("Include non-matches" )(0, 0),

       Preserve main table order( 1 )

       );

dt5 << set name("dt5");

column(dt5, "Label") << set name("Recipient");

/*

     dt5 should now contain the maximum number of letters sent from each person to anyone recipient

     and the name of that recipient (or recipients in the event of a tie)

*/

That should give you that Ann sent 4 letters to Michael, Jeff sent 7 letters to Michael, John sent 12 letters to Michael, Katherine sent 12 letters to John and so on.  Would you like to try that out and see if it does what you need?  To get the maximum number of letters received by any recipient from any one sender, just swap over the :To and :From columns.

ron_horne

Super User

Joined:

Jun 23, 2011

Thank you Dodo,

it works fine.

ron_horne

Super User

Joined:

Jun 23, 2011

hi Dodo and all,

Dodo's solution is technically correct but is not useful when dealing with big data.

thinking it over i have an alternative solution which is scalable.

dt1 = Open( "$SAMPLE_DATA/Mail Messages.jmp" );

dt1 << set name( "dt1" );

// produce a distribution of revcivers by senders

dt1 << New Script( "Distribution", Distribution( Nominal Distribution( Column( :To ) ), By( :From ) ) );

Dist = Dt1 << Run Script( "Distribution" );

// extract the frequency tables as one data table.

dt2 = Report( dist[1] )["Frequencies"][Table Box( 1 )] << Make Combined Data Table;

dt2 << set name( "dt2" );

dist << close window;

// remove the "Total" rows from the data table - they do not represent relevant Frequencies

selection = dt2 << Select Where( :Level == "Total" );

selection << Delete Rows;

// summarize to get the mode "To" for each "From"

dt3 = Data Table( "dt2" ) << Summary( Group( :From ), Max( :Count ), statistics column name format( "column" ), Link to original data table( 0 ) );

dt3 << Set Name( "Mode" );

// update the summary table to get the values of "To"

dt3 << Update( With( Data Table( "dt2" ) ), Match Columns( :From = :From, :Count = :Count ), Add Columns from Update table( :Level ) );

dt2 << close window;

Column( dt3, "Level" ) << Set Name( "Mode" );

ms

Super User

Joined:

Jun 23, 2011

There are often several ways to perform a task in JMP. I just recalled a function I wrote some time ago that's quite efficient for finding the most common item grouped by one a or more columns. I share it here as an alternative to the above approaches. The result is similar to Dodo's, i.e. all "tied" modes are returned, not only the last as in Ron's approach using update().

mode_by_groups = Function( {collist}, // list of column names

  // Find mode of last column in collist

  Summarize( g = by( Eval( collist ) ), n = Count() );

  X = Design( g[1] ) :* n;

  i = Loc( V Max( Transpose( X == Repeat( V Max( X ), N Row( n ) ) ) ) );

  //Add grouping columns to new table

  dt1 = New Table( "modes" );

  For( k = 1, k <= N Items( collist ) - 1, k++,

  dt1 << (New Column( char(collist[k]), Character, set values( g[k][i] ) ));

  );

  //Add Mode and N

  dt1 << New Column( "Mode of " || char(collist[k]), Character, set values( g[k][i] ) );

  dt1 << New Column( "N", set values( n[i] ) );

);

//Examples

Open( "$SAMPLE_DATA/Mail Messages.jmp" );

mode_by_groups( {From, To} );

Open( "$SAMPLE_DATA/Consumer Preferences.jmp" );

mode_by_groups( {Gender, Single Status, Birth Year} );

ron_horne

Super User

Joined:

Jun 23, 2011

Thank you very much.

dealing with the mode i came across two issues that look like faults of the JMP program.

1. if there is only one observation the distribution platform does not provide a mode at all. Furthermore, in case of tied frequency of 1 nothing is returned either. this means that the definition in the help file is incorrect when it clams to bring the lower value among the most frequent.

5682_Untitled2.png

2. when producing a Frequencies table for the column width is long (i.e. 12) making a data table does not allow for recovering the whole number. for some reason it produces a text string of the abbreviated number.

5680_Untitled.png

louv

Staff

Joined:

Jun 23, 2011

Not sure if this is helpful but on the JMP File Exchange Brady Brady has posted an add-in that gives extended summary statistics capability including the Mode.

http://support.sas.com/demosdownloads/downarea_t4.jsp?productID=111731&jmpflag=Y

5688_Screen Shot 2014-03-12 at 3.30.34 PM.png

ron_horne

Super User

Joined:

Jun 23, 2011

Thank you LouV

this is defiantly useful yet not perfect.

Brady's Add-in is only accessible by the GUI and not by script - if it is please let me know.

the N Rows column that shows up is the number of occurrences in each Group and not the frequency of the mode itself. further more i didn't see built in option to get that frequency - if there is please let me know.

otherwise it is powerful and scalable. Brady's add-in aught to be part of the program and not a patch.

brady_brady

Staff

Joined:

Jun 9, 2012

Hi Ron,

I believe that the reason JMP reports no mode when a) only one observation exists and 2) when all observations occur only once is that each of these is consistent with the way the mode is defined in many textbooks. i.e., if no element appears more than once, no mode exists.

Cheers,

Brady

ron_horne

Super User

Joined:

Jun 23, 2011

Thank you very much MS. it is very useful.

This way is actually more "honest" than other alternatives mentioned in this discussion since it also assesses the frequency of missing data. for example, running the following script returns some missing values as the mode.

dt = open ("$SAMPLE_DATA\Cars.jmp");

mode_by_groups = Function( {collist}, // list of column names

  // Find mode of last column in collist

  Summarize( g = by( Eval( collist ) ), n = Count() );

  X = Design( g[1] ) :* n;

  i = Loc( V Max( Transpose( X == Repeat( V Max( X ), N Row( n ) ) ) ) );

  //Add grouping columns to new table

  dt11 = New Table( "modes" );

  For( k = 1, k <= N Items( collist ) - 1, k++,

  dt11 << (New Column( char(collist), Character, set values( g ) ));

  );

  //Add Mode and N

  dt11 << New Column( "Mode of " || char(collist), Character, set values( g ) );

  dt11 << New Column( "N", set values( n ) );

);

mode_by_groups( {Make , R Leg } );

5711_Untitled.png