Solved: finding the mode

ron_horne · Mar 9, 2014 12:39 PM

hi everyone,

i would like to produce a table summery that gives the mode of a variable for each category of another variable.

ideally, this would work:

dt = open ("$SAMPLE_DATA/Mail Messages.jmp");
dt << summary ( Group ( :From ), Mode (:To) );

alternatively, i would stack the table by "From" and calculate the mode at each row. yet, "Mode" is not an option in Table summary or column formula.

any ideas calculating the mode are welcome including in sending to SAS.

i am using JMP 10.

thank you,

Ron

ron_horne · Mar 13, 2014 09:48 PM

Thank you Brady,

This is by far the most parsimonious and robust script.

I would like to add one optional line of code to disable the possibility of missing values as the mode:

dt = open ("$SAMPLE_DATA\Cars.jmp");

dt2 = dt << summary(group(:Make, :R Leg));

// option to disable the possibility of missing values as modes

dt2 << Select Where ( IsMissing (:R Leg)) << Delete Rows;

dt2 << selectwhere(colmaximum(:n rows, :make) != :n rows) << Delete Rows;

View solution in original post

Report Inappropriate Content · Mar 9, 2014 01:54 PM

Hi - the mode is the most common item, so you're after the person (or presumably persons if there's a tie) who received the most number of letters from each sender, is that right? If so, I think the following ought to do it, although I've no doubt there's a much more elegant and/or shorter way. I've deliberately left all the intermediate calculations, and named all the tables to leave an audit trail:

dt1 = open ("$SAMPLE_DATA/Mail Messages.jmp");

dt1 << set name("dt1");

dt2 = dt1 << Summary( Group( :From ), N, Subgroup( :To ) );

dt2 << set name("dt2");

Sent_From = (dt2 << get column names);

show(Sent_From);

// Throw away the first two elements of this list;

remove from(Sent_From, {1, 2});

show(Sent_From);

// Stack them up;

dt3 = dt2 << Stack(columns(eval(Sent_From)), Source Label Column( "Label" ), Stacked Data Column( "Data" ));

dt3 << set name("dt3");

// dt3 now contains the number of letters sent from each person to each recipient;

dt4 = dt3 << Summary( Group( :From ), Max( :Data ) );

dt4 << set name("dt4");

// dt4 now contains the maximum number of letters sent from each person to any one recipient;

dt5 = dt4 << Join( With( dt3 ),

SelectWith(:From, :Label, :Data ),

By Matching Columns( :From = :From, :Name( "Max(Data)" ) = :Data ),

Drop multiples( 0, 0 ),

Name("Include non-matches" )(0, 0),

Preserve main table order( 1 )

);

dt5 << set name("dt5");

column(dt5, "Label") << set name("Recipient");

/*

dt5 should now contain the maximum number of letters sent from each person to anyone recipient

and the name of that recipient (or recipients in the event of a tie)

*/

That should give you that Ann sent 4 letters to Michael, Jeff sent 7 letters to Michael, John sent 12 letters to Michael, Katherine sent 12 letters to John and so on. Would you like to try that out and see if it does what you need? To get the maximum number of letters received by any recipient from any one sender, just swap over the :To and :From columns.

ron_horne · Mar 9, 2014 03:31 PM

Thank you Dodo,

it works fine.

ron_horne · Mar 12, 2014 07:40 AM

hi Dodo and all,

Dodo's solution is technically correct but is not useful when dealing with big data.

thinking it over i have an alternative solution which is scalable.

dt1 = Open( "$SAMPLE_DATA/Mail Messages.jmp" );

dt1 << set name( "dt1" );

// produce a distribution of revcivers by senders

dt1 << New Script( "Distribution", Distribution( Nominal Distribution( Column( :To ) ), By( :From ) ) );

Dist = Dt1 << Run Script( "Distribution" );

// extract the frequency tables as one data table.

dt2 = Report( dist[1] )["Frequencies"][Table Box( 1 )] << Make Combined Data Table;

dt2 << set name( "dt2" );

dist << close window;

// remove the "Total" rows from the data table - they do not represent relevant Frequencies

selection = dt2 << Select Where( :Level == "Total" );

selection << Delete Rows;

// summarize to get the mode "To" for each "From"

dt3 = Data Table( "dt2" ) << Summary( Group( :From ), Max( :Count ), statistics column name format( "column" ), Link to original data table( 0 ) );

dt3 << Set Name( "Mode" );

// update the summary table to get the values of "To"

dt3 << Update( With( Data Table( "dt2" ) ), Match Columns( :From = :From, :Count = :Count ), Add Columns from Update table( :Level ) );

dt2 << close window;

Column( dt3, "Level" ) << Set Name( "Mode" );

ms · Mar 12, 2014 11:34 AM

There are often several ways to perform a task in JMP. I just recalled a function I wrote some time ago that's quite efficient for finding the most common item grouped by one a or more columns. I share it here as an alternative to the above approaches. The result is similar to Dodo's, i.e. all "tied" modes are returned, not only the last as in Ron's approach using update().

mode_by_groups = Function( {collist}, // list of column names

// Find mode of last column in collist

Summarize( g = by( Eval( collist ) ), n = Count() );

X = Design( g[1] ) :* n;

i = Loc( V Max( Transpose( X == Repeat( V Max( X ), N Row( n ) ) ) ) );

//Add grouping columns to new table

dt1 = New Table( "modes" );

For( k = 1, k <= N Items( collist ) - 1, k++,

dt1 << (New Column( char(collist[k]), Character, set values( g[k][i] ) ));

);

//Add Mode and N

dt1 << New Column( "Mode of " || char(collist[k]), Character, set values( g[k][i] ) );

dt1 << New Column( "N", set values( n[i] ) );

);

//Examples

Open( "$SAMPLE_DATA/Mail Messages.jmp" );

mode_by_groups( {From, To} );

Open( "$SAMPLE_DATA/Consumer Preferences.jmp" );

mode_by_groups( {Gender, Single Status, Birth Year} );

ron_horne · Oct 18, 2016 1:35 PM

Thank you very much.

dealing with the mode i came across two issues that look like faults of the JMP program.

1. if there is only one observation the distribution platform does not provide a mode at all. Furthermore, in case of tied frequency of 1 nothing is returned either. this means that the definition in the help file is incorrect when it clams to bring the lower value among the most frequent.

2. when producing a Frequencies table for the column width is long (i.e. 12) making a data table does not allow for recovering the whole number. for some reason it produces a text string of the abbreviated number.

louv · Oct 18, 2016 1:35 PM

Not sure if this is helpful but on the JMP File Exchange Brady Brady has posted an add-in that gives extended summary statistics capability including the Mode.

http://support.sas.com/demosdownloads/downarea_t4.jsp?productID=111731&jmpflag=Y

ron_horne · Mar 13, 2014 03:16 PM

Thank you LouV

this is defiantly useful yet not perfect.

Brady's Add-in is only accessible by the GUI and not by script - if it is please let me know.

the N Rows column that shows up is the number of occurrences in each Group and not the frequency of the mode itself. further more i didn't see built in option to get that frequency - if there is please let me know.

otherwise it is powerful and scalable. Brady's add-in aught to be part of the program and not a patch.

brady_brady · Mar 13, 2014 07:20 PM

Hi Ron,

I believe that the reason JMP reports no mode when a) only one observation exists and 2) when all observations occur only once is that each of these is consistent with the way the mode is defined in many textbooks. i.e., if no element appears more than once, no mode exists.

Cheers,

Brady

ron_horne · Oct 18, 2016 1:35 PM

Thank you very much MS. it is very useful.

This way is actually more "honest" than other alternatives mentioned in this discussion since it also assesses the frequency of missing data. for example, running the following script returns some missing values as the mode.

dt = open ("$SAMPLE_DATA\Cars.jmp");

mode_by_groups = Function( {collist}, // list of column names

// Find mode of last column in collist

Summarize( g = by( Eval( collist ) ), n = Count() );

X = Design( g[1] ) :* n;

i = Loc( V Max( Transpose( X == Repeat( V Max( X ), N Row( n ) ) ) ) );

//Add grouping columns to new table

dt11 = New Table( "modes" );

For( k = 1, k <= N Items( collist ) - 1, k++,

dt11 << (New Column( char(collist), Character, set values( g ) ));

);

//Add Mode and N

dt11 << New Column( "Mode of " || char(collist), Character, set values( g ) );

dt11 << New Column( "N", set values( n ) );

);

mode_by_groups( {Make , R Leg } );

finding the mode

Re: finding the mode

Re: finding the mode

Re: finding the mode

Re: finding the mode

Re: finding the mode

Re: finding the mode

Re: finding the mode

Re: finding the mode

Re: finding the mode

Re: finding the mode