Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
How to plot Count of values of Nominal data vs. Continuous data
Created:
Jan 9, 2023 01:15 PM
| Last Modified: Jun 11, 2023 4:30 AM(5612 views)
Hi there, I have a question on how to model some data I am looking at (JMP 11).
End goal: Display a Chart that shows Count of values of a Nominal Data Column (X) Vs. Continuous Data Column (Y).
Further in the weeds:
I have a Nominal Data Column: DieNumber.
And a Continuous Data Column: ResDelta.
I want to display the count of each DieNumber (ex: 1, 2, 3) displayed on the X axis instead of each DieNumber value, but in Graph Builder switching the summary statistic to N when ResDelta (Continuous) is plotted on the Y axis is summarizing Y and not summarizing the counts of each value of X.
The only workaround I've found is using the Overlay Plot function, associating ResDelta as \[Y\], associating DieNumber as \[By\], but this splits out each chart individually when I want the data all in one chart.
Really struggling with this so any help would be appreciated, thanks!
Aah, thank you ask this example clarifies your use case! If I understand correctly, you want all of the individual track times plotted as separate points, not just a single point representing both of Tim's runs. I would approach this by making a column counting the number of times each person ran around the track so far, and then just plotting that against Calories Burned.
The formula for the times around track column could look like this (note how it just sums the number instead of having a column reference in the first argument):
First, welcome to the community. I'm not sure I understand the issue, but I have attached a fake data set with fake numbers for two variables Die and ResDelta. I have saved a script for Graph Builder (green arrow). Just click on it. Is this what you are looking for? BTW, I'm on V17 so not sure it will work on V11... so here is the script:
Thank you for the warm welcome and I appreciate your time looking in to this! I plotted the variables on X and Y how I think it should be, and here is what I see:
This is almost correct, except for each X (ChipDieNum) there can be multiple Y values (ResDelta). Instead of plotting the X values, I want to plot the counts of each X (ChipDieNum) value.
As mentioned previously, X is set to Nominal and Y is Continuous.
I've tried doing that, but for some reason JMP is summarizing Y (I guess since it is continuous data) instead of summarizing N for X, so I end up getting a different graph:
While this is showing the counts of the ChipDieNumber, it's now plotting on the Y in place of ResDelta.
Aah, now I understand. You need to change your y axis to be ordinal or nominal data, and then choose the response axis to summarize under the red triangle for Points. In some cases you might need to bin the data first, for example by rounding to the nearest whole number, but in your case it looks like Res Delta is already a whole number.
Now for some reason the x axis labels messed up when I ran this, so I had to go into the x-axis settings and fix them. So, I thought this script should work:
Here is the method that I used in JMP 11. Newer versions would make it easier.
First, I create a new column called "count" using the formula
If(Lag(:Die) != :Die | Row() == 1, x = 0);
x++;
x
The Label state is then applied to the Count column.
Each row has the Label State set for it. This will allow for the displaying of the value as a label.
ResDelta column is dragged to the Y dop area
Die is dragged to the X drop area
Count is dragged to the Color drop area
This last step is done so the color of the points can be changed to white, so they will not show on the graph. To change the color, just go to each of the colored points in the legend, right click on them one at a time, and change the color to white.
Here is an alternate method that I think will work in JMP 11 and will plot continuous values, and below that is perhaps an easier way to display this data using a histogram:
Method 1
Calculate the points to plot using tabulate with ordinal data types and a local data filter excluding missing values on what would be the x axis. Then in the new data table change data types back to continuous and plot.
Add a new column with a formula like this (in newer versions of JMP you could do this right in the platform launch dialog without making a column):
Open tabulate, change the y variable to nominal temporarily, you can change it back after this.
Add a local data filter to exclude rows with missing ages, set to nominal, and select '0'
Make into a data table
Change data modeling types back to continuous
Graph
Method 2
In graph builder make a histogram using only the y axis, and add a local data filter excluding missing values on what would be the x axis.
Make the same 'is missing' column for the x axis variable
Add that column as a local data filter in graph builder.
Add the y variable to the y axis, it should be continuous
Change the graph to a histogram
Change the axis settings on the y axis to something reasonable. Note that this will still do some binning, so depending on your setting here it might combine heights 51.1 and 51.2 into the same bar with a count 2.
This is the closest so far out of all the solutions, but the problem I have is that my end graph needs to be (Count of ChipDieNum) vs. (Res Delta).
For example:
ChipDieNum: 401 ran through calibration 3 times with Resistance Deltas: -10200; -2000; -100 Ohms. In the data, ChipDieNum = (401), Count of ChipDieNum = (3), & Res Delta = (-10200,-2000,-100)
ChipDieNum: 510 ran through calibration 2 times with Resistance Deltas: -13000, -2600; -200 Ohms. Which would show as: ChipDieNum = (510), Count of ChipDieNum = (2), & Res Delta = (-13000,-2600,-200)
I am attempting to correlate that Res Delta for a chip increases according to the number of times it ran through calibration by plotting Count of ChipDieNum (X) against Res Delta (Y). In another sense, I am trying to associate multiple Res Delta (Y) values with values of Count (X) of each ChipDieNum value where ChipDieNum is Nominal, and Res Delta is Continuous.