cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
PedeGe
Level II

Count number instead of dots in scatterplot matrix of binary data

Hi all,

 

I have a dataset with 40 columns and 300+ rows.

For every column (= an event name) the values are binary, meaning yes/no attendance of the event for each row.

I am interested to see if there is a correlation in attendance between the different events/columns.

I managed to visualize this with a scatterplot matrix.

I adjusted the axes for the matrix to only show values that have value 1 for both columns/events, as I am only interested in cases where both events of the scatterplot matrix were attended by a row.

My scatterplot matrix looks like below image, but a lot bigger.

Every dot is a row for which both events/columns of the scatterplot matrix have a value of 1.

But it is very difficult to count the dots in the many different possible combination of events/columns (= the individual boxes/subplots of the scatterplot matrix).

 

My question is: is it possible to change the dots in the individual boxes into the actual count number for each individual box?

 

PedeGe_0-1710450601084.png

 

 

Synthax for the scatterplot matrix looks like below

Note that for readability I dont list all the event names, and only the scalebox synthax for 2 columns/events.

This is highlighted in red text.

:

 

Scatterplot Matrix(
Y(
:Here a long list with the ~40 column/event names,
Matrix Format( "Lower Triangular" ),
SendToReport(
Dispatch(
{},
"140",
ScaleBox,
{Min( 0.5 ), Max( 1.5 ), Inc( 1 ), Minor Ticks( 0 )}
),
Dispatch(
{},
"139",
ScaleBox,
{Format( "Best", 15 ), Min( 0.5 ), Max( 1.5 ), Inc( 1 ),
Minor Ticks( 0 )}
),

Followed by 38 more similar code segments for the other 38 columns/events.

1 ACCEPTED SOLUTION

Accepted Solutions
txnelson
Super User

Re: Count number instead of dots in scatterplot matrix of binary data

I suspect that you are running the script against the wrong data table.

I opened the Excel table you attached, and then clicked on the "Enable Editing" button to permit changing of the table.

I then clicked on the JMP Addin button in Excel and then clicked on the Data Table icon to have Excel create the spreadsheet to a JMP data table.

I then copied the script I wrote from the Community Discussion page, Changed all of the references to "40" to "41".

Next I added in ":c41" to the Cell Plot/

Finally, I ran the script and got the count data table

txnelson_0-1710934168924.png

and the Cell Plot output

txnelson_1-1710934224517.png

The only think I can think of that is causing the issue that you are seeing is that the data table with all 41 columns is not the current "Active Data Table" when you are running the script.

Here is the JSL, with the references  pointing to 41 columns.

Names Default to Here( 1 );
dt = Current Data Table();

dtCounts = New Table( "Counts", add rows( 41 ) );
dtCounts << delete columns( :column 1 );
For( i = 1, i <= 41, i++,
	dtCounts << New Column( "c" || Char( i ), set each value( . ) )
);
For( i = 1, i <= 41, i++,
	For( k = i, k <= 41, k++,
		theCount = Length( dt << get rows where( As Column( dt, i ) == 1 & As Column( dt, k ) == 1 ) );
		dtCounts[i, k] = theCount;
		dtCounts[k, i] = theCount;
	)
);

dtCounts << New Column( "The Columns", character, set each value( "c" || Char( Row() ) ) );

dtCounts << move selected columns( :The Columns, to first );


dtCounts << Cell Plot(
	Scale Uniformly( 1 ),
	Center at zero( 0 ),
	Y(
		:c1, :c2, :c3, :c4, :c5, :c6, :c7, :c8, :c9, :c10, :c11, :c12, :c13, :c14, :c15, :c16, :c17, :c18,
		:c19, :c20, :c21, :c22, :c23, :c24, :c25, :c26, :c27, :c28, :c29, :c30, :c31, :c32, :c33, :c34, :c35,
		:c36, :c37, :c38, :c39, :c40, c41
	),
	Label( :The Columns ),
	Legend( 1 ),
	SendToReport(
		Dispatch( {}, "Cell Plot Report", FrameBox, {Frame Size( 79, 23 )} ),
		Dispatch( {}, "", Lineup Box( 2 ), {Spacing( 1 )} )
	)
);

 

 

 

Jim

View solution in original post

10 REPLIES 10
txnelson
Super User

Re: Count number instead of dots in scatterplot matrix of binary data

Here is a script that creates a new data table with the counts

txnelson_0-1710481398081.png

Which gives you your scatter plot counts, and it then creates a cell plot which gives you a color map of the values.

txnelson_1-1710481603797.png

Names Default to Here( 1 );
dt = Current Data Table();

dtCounts = New Table( "Counts", add rows( 40 ) );
dtCounts << delete columns( :column 1 );
For( i = 1, i <= 40, i++,
	dtCounts << New Column( "c" || Char( i ), set each value( . ) )
);
For( i = 1, i <= 40, i++,
	For( k = i, k <= 40, k++,
		theCount = Length( dt << get rows where( As Column( dt, i ) == 1 & As Column( dt, k ) == 1 ) );
		dtCounts[i, k] = theCount;
		dtCounts[k, i] = theCount;
	)
);

dtCounts << New Column( "The Columns", character, set each value( "c" || Char( Row() ) ) );

dtCounts << move selected columns( :The Columns, to first );


dtCounts << Cell Plot(
	Scale Uniformly( 1 ),
	Center at zero( 0 ),
	Y(
		:c1, :c2, :c3, :c4, :c5, :c6, :c7, :c8, :c9, :c10, :c11, :c12, :c13, :c14, :c15, :c16, :c17, :c18,
		:c19, :c20, :c21, :c22, :c23, :c24, :c25, :c26, :c27, :c28, :c29, :c30, :c31, :c32, :c33, :c34, :c35,
		:c36, :c37, :c38, :c39, :c40
	),
	Label( :The Columns ),
	Legend( 1 ),
	SendToReport(
		Dispatch( {}, "Cell Plot Report", FrameBox, {Frame Size( 79, 23 )} ),
		Dispatch( {}, "", Lineup Box( 2 ), {Spacing( 1 )} )
	)
);

 

 

 

Jim
PedeGe
Level II

Re: Count number instead of dots in scatterplot matrix of binary data

 Many thanks for this!

the new data table with counts makes sense when randomly checking several values.

But somehow the table and corresponding figure stop after column 16, see below.

I see nothing out of the ordinary in the script that I simply copy&pasted (i.e. no extra commas or something that I accidently put in there).

Any suggestion why this could be?

 

PedeGe_0-1710882274910.png

 

txnelson
Super User

Re: Count number instead of dots in scatterplot matrix of binary data

Would it be possible to attach a copy of your data?  If necessary, you can Anonymize the data before you attach it.

 

Also, your display has 41 columns, and the code assumes 40 columns.  

Jim
PedeGe
Level II

Re: Count number instead of dots in scatterplot matrix of binary data

Very sharp, I indeed noticed i have 41 columns, I manually adjusted all "40" into "41" and added ", c41" in the cell plot part of your synthax before posting my previous post.

Both with the synthax for 40 (direct copy&paste) and for 41 columns I get the same result though.

 

Attached is the dataset that I am running the synthax on.

Thanks for looking so much into this

 

 

txnelson
Super User

Re: Count number instead of dots in scatterplot matrix of binary data

I suspect that you are running the script against the wrong data table.

I opened the Excel table you attached, and then clicked on the "Enable Editing" button to permit changing of the table.

I then clicked on the JMP Addin button in Excel and then clicked on the Data Table icon to have Excel create the spreadsheet to a JMP data table.

I then copied the script I wrote from the Community Discussion page, Changed all of the references to "40" to "41".

Next I added in ":c41" to the Cell Plot/

Finally, I ran the script and got the count data table

txnelson_0-1710934168924.png

and the Cell Plot output

txnelson_1-1710934224517.png

The only think I can think of that is causing the issue that you are seeing is that the data table with all 41 columns is not the current "Active Data Table" when you are running the script.

Here is the JSL, with the references  pointing to 41 columns.

Names Default to Here( 1 );
dt = Current Data Table();

dtCounts = New Table( "Counts", add rows( 41 ) );
dtCounts << delete columns( :column 1 );
For( i = 1, i <= 41, i++,
	dtCounts << New Column( "c" || Char( i ), set each value( . ) )
);
For( i = 1, i <= 41, i++,
	For( k = i, k <= 41, k++,
		theCount = Length( dt << get rows where( As Column( dt, i ) == 1 & As Column( dt, k ) == 1 ) );
		dtCounts[i, k] = theCount;
		dtCounts[k, i] = theCount;
	)
);

dtCounts << New Column( "The Columns", character, set each value( "c" || Char( Row() ) ) );

dtCounts << move selected columns( :The Columns, to first );


dtCounts << Cell Plot(
	Scale Uniformly( 1 ),
	Center at zero( 0 ),
	Y(
		:c1, :c2, :c3, :c4, :c5, :c6, :c7, :c8, :c9, :c10, :c11, :c12, :c13, :c14, :c15, :c16, :c17, :c18,
		:c19, :c20, :c21, :c22, :c23, :c24, :c25, :c26, :c27, :c28, :c29, :c30, :c31, :c32, :c33, :c34, :c35,
		:c36, :c37, :c38, :c39, :c40, c41
	),
	Label( :The Columns ),
	Legend( 1 ),
	SendToReport(
		Dispatch( {}, "Cell Plot Report", FrameBox, {Frame Size( 79, 23 )} ),
		Dispatch( {}, "", Lineup Box( 2 ), {Spacing( 1 )} )
	)
);

 

 

 

Jim
PedeGe
Level II

Re: Count number instead of dots in scatterplot matrix of binary data

Closing all JMP files, importing the file I uploaded here did fix the issue indeed.

Not sure why I had the issue in the first place, as the values for the first several columns did seem spot on...

 

Thank you so much, although the figure does not contain the actual numbers, the "intermediate" datatable that is generated in the process does so

So now I have both a nice visualization and the actual numbers for reference.

thanks again!

txnelson
Super User

Re: Count number instead of dots in scatterplot matrix of binary data

You do not have to close all of the files to make sure you are pointing to the correct data table.  All you need to do is to click on the data table in question and then run the script.  Clicking on the data table makes it the Current Active Data table.

 

Also, you can color the cells in the data table to allow the displaying of the colors and the actual data values.  Here is your data in such a display

txnelson_0-1710945362754.png

 

Jim
PedeGe
Level II

Re: Count number instead of dots in scatterplot matrix of binary data

I made a new script within the datatable I wanted to be analyzed, pasted the synthax and then ran it.

That should have made it the active data table right?

Not sure what I did wrong then, but closing all other, totally unrelated, open datatables seemed to work somehow though.

 

The color coded datatable looks pretty cool

dlehman1
Level V

Re: Count number instead of dots in scatterplot matrix of binary data

An alternative display to consider is a treemap - instead of relying on the coloring to denote the levels of correlation, sizes convey relative importance more clearly - although you lose the ordering of the columns/events.  Using txnelson's script up to the point of creating the cellplot, I then stacked the columns to get the attached table where I created the treemap (embedded script).  The treemap permits you to visualize the most important/frequent correlations for each event, but since the ordering is by size, you lose the natural ordering 1-41.