Re: Help with determining most frequently chosen combination of answers?

pdawgy13 · Jan 4, 2016 03:19 PM

Hello all. I am new to JMP and this user community. I am currently conducting an analysis on data collected across my organization, and am "stuck" on how to obtain the information I need from the data.

86 respondents (managers of conservation programs) were asked which conservation strategies of a list of 24 (labeled a through x) that they were using in their programs. They were not given a limit on how many they could choose. We are looking to determine the most frequently chosen combination of strategies, for use in determining whether there are certain "models" or "typologies" under which these projects can be categorized.

The average number of strategies chosen by each respondent is about 10. We are looking to tease out 5-10 "models" or "typologies" based on the most frequent combinations of strategies. I have the data set up in a matrix with each strategy given its own column (a, b, c, ...) and each respondent given its own row. If a strategy was chosen, it's letter is in the cell, if it is not chosen, the cell is left blank. I also have another version where choice is denoted by a 1 and non-choice by a 0.

Any ideas on how we might crack this nut would be very welcome!!

Warm Regards,

Nikki

msharp · Jan 4, 2016 05:16 PM

In the choice denoted as a 1 or 0. Create a summary table and choose all your strategy columns and select statistic "Sum".

Then from the summary table go to Graph>Graph Builder. Drag and drop all the Sum columns into the Y axis and possibly change the graph to a bar chart for better visibility.

You can play around some more.

pdawgy13 · Jan 4, 2016 05:35 PM

Thanks msharp. Unfortunately, this only gives me a frequency chart of the most often picked single strategies. What I need to do is determine the most frequently picked combination of strategies... Any thoughts? Warm Regards, Nikki

jerry_cooper · Jan 4, 2016 06:26 PM

Hi Nikki,

Try this and see if it gets close to what you’re after:

1) Select all of the columns with 1 and 0’s in your data table (the indicator columns)

2) Choose the Combine Columns option from the Cols->Utilities menu (assuming you have version 12)

a. Enter a new column name in the dialog

b. Choose a delimiter (default is a comma)

c. Select the “Columns are indicator columns” check box

3) Choose Tabulate from the Analyze menu

a. Drop the new column into the drop zone for rows

b. Check the “Order by count of grouping columns” box in the lower left

This will give you a table ordered by the combinations with the highest frequency. You can then select the chart option from the red triangle menu, or make into a data table and use Graph Builder.

Note: If you don’t have version 12, you’ll need to create a formula column to concatenate the response letters into a string. However, you might want to add some logic to leave out the blank entries.

Hope this helps.

-Jerry

pdawgy13 · Jan 4, 2016 10:04 PM

Hi Jerry! Thank you - indeed this is a start, although unfortunately, it does not get me very far. I have JMP 11, so I following your alternate advice and created a separate CONCAT column in Excel (luckily, it just excludes the null values so I did not have to get fancy with the logic). While I am able to tabulate the frequency of recurring combinations, unfortunately, there are so many combinations of varying lengths, that most of the "n" values are 1. What I really need to do is limit the number of selections to 5-6 and decipher the top combinations. In case it is helpful to see the data in order to better visualize my problem, I have provided a link to a Box folder with the spreadsheet. Again, any further advice is much appreciated!

JMP question - Box

Warm Regards,

Nikki

robot · Jan 5, 2016 01:14 AM

You might try the Hierarchical Clustering platform. Go to Analyze -> Multivariate Methods -> Cluster. This can be used to group responses by similarity. You can then set the number of clusters by adjusting the slider. To save the clusters as a new category to your data table, from the red triangle, select "Save Clusters".

Craige_Hales · Jan 5, 2016 11:25 AM

Here's some JSL to generate a table with a script to try out the clustering idea. The prime number stuff is just a way to make the different types of people have different sets of answers.

// synthesize some data with 86 rows of 24 questions where there are kinds=7

// types of people answering the questions.

// the RunMe script in the table will re-randomize and add colors by Kind and

// run the Cluster platform. The random "noise" means the clusters are not

// perfect; the colors in a cluster are from "kind" but the noise might

// make kind be too far from ideal.

// notice: the cluster platform does not use "kind" but still groups them pretty well.

noise=.15; // adding 0 noise will get perfect answers. noise>.2 is not going to find much

people = 86; // number of people. try 500

kinds=7; // number of types of people

New Table( "cluster",

Add Rows( people ),

New Script(

"RunMe",

f = Function( {k, qn}, // f(kind, n) answers the Nth question for a "kind" person

// using mod(prime*n,prime) to scramble answers

{kp = [127 131 137 139 149 151 157 163 167 173 179 181 191 193 197 199

211 223 227 229 233 239 241 251 257 263 269 271 277 281 283 293 307 311

313 317 331 337 347 349 353 359 367 373 379 383 389 397 401 409 419 421

431 433 439 443 449 457 461 463 467 479 487 491 499 503 509 521 523 541

547 557 563 569 571 577 587 593 599 601 607 613 617 619 631 641 643 647

653 659 661 673 677 683 691 701 709 719 727 733]}, // need "kinds" of these

v = Mod( kp * qn, 113 ); // 113 is also prime and less than first prime above

// 55 is about half of 113; about half the questions will be answered with 1

// *except* as the noise gets bigger the answer that would have been 0 or 1

// is more likely to flip

If( v < 55,Random Uniform() < (1-noise),Random Uniform() < (noise));

);

For Each Row(

For( icol = 1, icol <= 24, icol++,

c = Eval( Eval Expr( Column( Expr( 1 + icol ) ) ) );

c[] = f( kind, icol );

)

);

currentdatatable()<<colorOrMarkByColumn(kind,colortheme("spectral"),continuousScale(0));

Hierarchical Cluster(

Y(:Q 01,:Q 02,:Q 03,:Q 04,:Q 05,:Q 06,:Q 07,:Q 08,:Q 09,:Q 10,:Q 11,:Q 12,

:Q 13,:Q 14,:Q 15,:Q 16,:Q 17,:Q 18,:Q 19,:Q 20,:Q 21,:Q 22,:Q 23,:Q 24),

Method( "Average" ), // Ward, Average, Centroid, Single, Complete

Standardize Data( 1 ), Dendrogram Scale( "Distance Scale" ), Number of Clusters( 7 ),

SendToReport( Dispatch( {}, "Dendrogram", OutlineBox, {SetHorizontal( 1 )} ) )

);

),

New Column( "kind",Formula( Random Integer( 1, kinds ) ) ), // number of different types of people

New Column( "Q 01"),New Column( "Q 02"),New Column( "Q 03"),New Column( "Q 04"),

New Column( "Q 05"),New Column( "Q 06"),New Column( "Q 07"),New Column( "Q 08"),

New Column( "Q 09"),New Column( "Q 10"),New Column( "Q 11"),New Column( "Q 12"),

New Column( "Q 13"),New Column( "Q 14"),New Column( "Q 15"),New Column( "Q 16"),

New Column( "Q 17"),New Column( "Q 18"),New Column( "Q 19"),New Column( "Q 20"),

New Column( "Q 21"),New Column( "Q 22"),New Column( "Q 23"),New Column( "Q 24")

);

Craige

pdawgy13 · Jan 5, 2016 02:02 PM

Dear Robot and Craig,

I like this clustering idea; it seems to be getting at what we are looking to do, which is group "like" projects by strategy. Craig, I have never used JSL script before - is there somewhere I can input it in order to generate the table? Forgive my ignorance!

The other option I have seen in Excel is the Shopping Basket add-on. It looks to find these commonalities quite quickly and easily, unfortunately it requires an SQL server, which I don't think I have access to. Does JMP have a "Shopping Basket" function or is clustering the closest method?

Warmest Regards,

Nikki

Craige_Hales · Jan 5, 2016 02:28 PM

the script is mostly just making a table of data similar to what I think you described. As Robot suggested, you can run the platform against your data without a script.

to run the script:

View->Log (just in case there are messages)

File->New->Script (ctrl-t)

copy and paste from above

Edit->Run Script (ctrl-r)

You can read through the script and find some variables to tweak (noise, people, kinds) and re-run it. Probably best if you close the data table and cluster from the previous run before running it again, but not required.

There is a scripting guide (PDF) under Help->Books->Scripting Guide that will help too.

or Introduction

Craige

stan_koprowski · Jan 5, 2016 02:42 PM

Hello,

Another option (thanks to melinda.thielbar for the tip) would be to use the categorical platform using the conditional associations option. This will compute the conditional probability of one response given a different response. Here is another conditional association post in the community with the details of how to access from within the categorical platform. Additional details can also be found on page 48 of the Consumer Research book (Help-->Books-->Consumer Research Or using the Consumer Research online documentation.)

10721_Screen Shot 2016-01-05 at 2.30.28 PM.png

I have attached two data tables with the analyses.

The first table is just your Excel file imported as a JMP data table with the categorical analysis using the conditional association option.

The second table is the conditional association output. I generated this table by right-clicking on the table produced from the conditional association report and selecting "Make Into Data Table."

I then recreated the cell plot using the cell plot graphing platform. You can rerun from the table script by right-clicking on the table script and choose run script to relaunch the analysis.

Thanks for using the JMP User Community.

Let us know if this works for you.

Best,

Stan