How to plot Count of values of Nominal data vs. Continuous data
Created:
Jan 9, 2023 01:15 PM
| Last Modified: Jun 11, 2023 4:30 AM(5220 views)
Hi there, I have a question on how to model some data I am looking at (JMP 11).
End goal: Display a Chart that shows Count of values of a Nominal Data Column (X) Vs. Continuous Data Column (Y).
Further in the weeds:
I have a Nominal Data Column: DieNumber.
And a Continuous Data Column: ResDelta.
I want to display the count of each DieNumber (ex: 1, 2, 3) displayed on the X axis instead of each DieNumber value, but in Graph Builder switching the summary statistic to N when ResDelta (Continuous) is plotted on the Y axis is summarizing Y and not summarizing the counts of each value of X.
The only workaround I've found is using the Overlay Plot function, associating ResDelta as \[Y\], associating DieNumber as \[By\], but this splits out each chart individually when I want the data all in one chart.
Really struggling with this so any help would be appreciated, thanks!
Like other members have tried to find solutions to your problem, I will try another one, based on the discussions.
In order to create the graph attached, here are the steps :
Create a new column "Count" by Die ID :
// New formula column: Count
Data Table( "dieexample" ) << New Formula Column(
Operation( Category( "Aggregate" ), "Count" ),
Columns( :ResDelta ),
Group By( :Die )
);
Create a graph showing the distribution of ResDelta depending on the number of points, with the information of Die ID in "Overlay" and boxplots to facilitate the comparison (optional, and might be messy if you have a large dataset):
Once this is done, you should get to the graph attached (I added boxplots in the graph and script, but you can remove them if this is not useful for your objective):
I also attached the dataset I used for demonstration with graph script attached.
Hope this will help you,
Victor GUILLER L'Oréal Data & Analytics
"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)
Aah, thank you ask this example clarifies your use case! If I understand correctly, you want all of the individual track times plotted as separate points, not just a single point representing both of Tim's runs. I would approach this by making a column counting the number of times each person ran around the track so far, and then just plotting that against Calories Burned.
The formula for the times around track column could look like this (note how it just sums the number instead of having a column reference in the first argument):