cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

JMP Wish List

We want to hear your ideas for improving JMP. Share them here.
Choose Language Hide Translation Bar

Add CDFs to Graph Builder

Recommend adding the ability to create CDF plots within Graph Builder.  

 

Add a new "Connection" type -- maybe "Extended Step", that would draw the first step from the left edge starting at 0, and from the last point to the right edge to represent a true CDF.  Then, document the approach, and consider adding these settings as some type of preset.

35 Comments
hogi
Level XIII

@XanGregg , wonderful! I love it : )
THANKS 

hogi
Level XIII

so, concerning CDF,  instead of "we love it (in the queue)"
it has to be
already available - since years : )

wow!

hogi
Level XIII

Nice, even works for points - if you add a column with "1"s and use it for the Y axis.

hogi_2-1725393396249.png

 

What a wonderful day!

XanGregg
Staff

Great to hear, @hogi ! Sorry for the confusion that has kept this open. Looks like some of the discussion is around what Distribution calls a Normal Quantile plot and does need the Cumulative Probability transform in Graph Builder, so that may be one source of confusion.

hogi
Level XIII

Cumulative Probability:

hogi_0-1725397552745.png

 

Does Cumulative Percent have Col Number (without the +1) in the denominator?
It seems to run up to 100%.

XanGregg
Staff

Yes (without the +1), though the numerator has a different not-yet-available-in-Col-Rank tie-breaking rule. It's the maximum rank of any tied values (accumulating the counts).

BHarris
Level VII

Since I submitted the idea, let me try to sum up what I think we've learned here:

 

  1. CDF line plots can* be created in Graph Builder by
    1. Setting the plot type to a line plot
    2. Setting "Connection" to "Step", and "Summary Statistic" for the line to "Cumulative Percent" (which normally tracks the cumulative sum of the y-variable up to the current x value divided by the total sum of all y-values)
    3. Dragging the desired variable into X and an optional grouping parameter in the Overlay box
    4. Leave the Y variable empty, which triggers GB to use counts for the Y-axis which is critical to mimic the behavior of a CDF, which is a plot of cumulative count of items <= current x value divided by the total count of items
  2. Likewise, CDF point-plots can be built by creating a column of all ones and dragging that to the y-axis, and switching to a points plot
    1. The column of ones can be created either in the data table, or in GB directly by right clicking one of the variables, selecting "Formula+", and just replacing the formula with a "1"
    2. Both line and points-plots can coexist at the same time.

* Note that these are not perfect CDFs as they don't draw the horizontal line and first step from the left edge at 0 to the first point's x-value and up, and from the last point at 1.0 to the right edge.

 

I believe @hogi's issues with Cumulative Percent were caused by him putting the same variable on both the X and Y axes instead of leaving the y-axis empty.  Special thanks to him for bird-dogging this one, and to Xan for clarifying how the internals work!

BHarris
Level VII

@Sarah-Sylvestre :  My 2 cents, we should leave this request open as a request to add a new "Connection" type -- maybe "Extended Step", that would draw the first step from the left edge starting at 0, and from the last point to the right edge to represent a true CDF.  Then, document the approach, and consider adding these settings as some type of preset.

 

Thanks!

hogi
Level XIII

The approach even works for Smoother plots - if there are not multiple rows with the same value.
As @XanGregg  mentioned, Cumulative Percent sets the single point per x value to the max.
Therefore, the smoother curve runs with a slight offset to the actual curve.

Maybe add a setting which allows the user to choose between 
- max (like now)

- midpoint (like Prob Scores in Distribution)

- individual points (like standard Col Rank)

- min (if somebody needs this as well)

 

besides that, at the moment, it's not possible to label individual rows:

hogi_0-1725564265303.png

 

Names Default to Here(1);
dt = Open( "$SAMPLE_DATA/Big Class.jmp" );
dt << Select Rows( [9] ) << Label;

Graph Builder(
	Transform Column( "one", Formula( 1 ) ),
	Variables( X( :age ), Y( :one ) ),
	Elements(
		Points( X, Y,  Summary Statistic( "Cumulative Percent" ) ),
		Line( X, Y,  Connection( "Step" ), Summary Statistic( "Cumulative Percent" ) ),
		Smoother( X, Y,  Summary Statistic( "Cumulative Percent" ), Lambda( 0.12 ) )
	)
);

 

Status changed to: Acknowledged

Hey everyone! I will take @BHarris suggestion and leave this request as open. I edited the description of the request to reflect that this is now more specific, with a new "Extended Step" connection type being requested.