cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar

Add CDFs to Graph Builder

Recommend adding the ability to create CDF plots within Graph Builder.  

 

Add a new "Connection" type -- maybe "Extended Step", that would draw the first step from the left edge starting at 0, and from the last point to the right edge to represent a true CDF.  Then, document the approach, and consider adding these settings as some type of preset.

32 Comments
hogi
Level XII

@XanGregg , wonderful! I love it : )
THANKS 

hogi
Level XII

so, concerning CDF,  instead of "we love it (in the queue)"
it has to be
already available - since years : )

wow!

hogi
Level XII

Nice, even works for points - if you add a column with "1"s and use it for the Y axis.

hogi_2-1725393396249.png

 

What a wonderful day!

XanGregg
Staff

Great to hear, @hogi ! Sorry for the confusion that has kept this open. Looks like some of the discussion is around what Distribution calls a Normal Quantile plot and does need the Cumulative Probability transform in Graph Builder, so that may be one source of confusion.

hogi
Level XII

Cumulative Probability:

hogi_0-1725397552745.png

 

Does Cumulative Percent have Col Number (without the +1) in the denominator?
It seems to run up to 100%.

XanGregg
Staff

Yes (without the +1), though the numerator has a different not-yet-available-in-Col-Rank tie-breaking rule. It's the maximum rank of any tied values (accumulating the counts).

BHarris
Level VI

Since I submitted the idea, let me try to sum up what I think we've learned here:

 

  1. CDF line plots can* be created in Graph Builder by
    1. Setting the plot type to a line plot
    2. Setting "Connection" to "Step", and "Summary Statistic" for the line to "Cumulative Percent" (which normally tracks the cumulative sum of the y-variable up to the current x value divided by the total sum of all y-values)
    3. Dragging the desired variable into X and an optional grouping parameter in the Overlay box
    4. Leave the Y variable empty, which triggers GB to use counts for the Y-axis which is critical to mimic the behavior of a CDF, which is a plot of cumulative count of items <= current x value divided by the total count of items
  2. Likewise, CDF point-plots can be built by creating a column of all ones and dragging that to the y-axis, and switching to a points plot
    1. The column of ones can be created either in the data table, or in GB directly by right clicking one of the variables, selecting "Formula+", and just replacing the formula with a "1"
    2. Both line and points-plots can coexist at the same time.

* Note that these are not perfect CDFs as they don't draw the horizontal line and first step from the left edge at 0 to the first point's x-value and up, and from the last point at 1.0 to the right edge.

 

I believe @hogi's issues with Cumulative Percent were caused by him putting the same variable on both the X and Y axes instead of leaving the y-axis empty.  Special thanks to him for bird-dogging this one, and to Xan for clarifying how the internals work!

BHarris
Level VI

@Sarah-Sylvestre :  My 2 cents, we should leave this request open as a request to add a new "Connection" type -- maybe "Extended Step", that would draw the first step from the left edge starting at 0, and from the last point to the right edge to represent a true CDF.  Then, document the approach, and consider adding these settings as some type of preset.

 

Thanks!

hogi
Level XII

The approach even works for Smoother plots - if there are not multiple rows with the same value.
As @XanGregg  mentioned, Cumulative Percent sets the single point per x value to the max.
Therefore, the smoother curve runs with a slight offset to the actual curve.

Maybe add a setting which allows the user to choose between 
- max (like now)

- midpoint (like Prob Scores in Distribution)

- individual points (like standard Col Rank)

- min (if somebody needs this as well)

 

besides that, at the moment, it's not possible to label individual rows:

hogi_0-1725564265303.png

 

Names Default to Here(1);
dt = Open( "$SAMPLE_DATA/Big Class.jmp" );
dt << Select Rows( [9] ) << Label;

Graph Builder(
	Transform Column( "one", Formula( 1 ) ),
	Variables( X( :age ), Y( :one ) ),
	Elements(
		Points( X, Y,  Summary Statistic( "Cumulative Percent" ) ),
		Line( X, Y,  Connection( "Step" ), Summary Statistic( "Cumulative Percent" ) ),
		Smoother( X, Y,  Summary Statistic( "Cumulative Percent" ), Lambda( 0.12 ) )
	)
);

 

Status changed to: Acknowledged

Hey everyone! I will take @BHarris suggestion and leave this request as open. I edited the description of the request to reflect that this is now more specific, with a new "Extended Step" connection type being requested.