BookmarkSubscribe
Choose Language Hide Translation Bar
theseventhhill
Community Trekker

Plot conventional and reverse CDF on same plot by grouping variable

Hello,

Have a question on CDF plots on the distribution platform

I would like to plot the CDF of two variables say Y1 and Y2 which are on the same numerical scale but differ by the range/. For example Y1 from 0-100 and Y2 from 0 to 1000. I want to plot Y1like a regular CDF plot with the CumProb going from 0 to 1 and X1 values increasing from 0 to 100.

For Y2, I want the X2 to be in decreasing order from 1000 to 0 but the cumprob of Y2 runs from 0 to Y increasing in the Y axis direction. 

I want to essentially reverse cdf(Y2) and overplot with cdf(Y1) and plot both on the same cdf plot window. Eventually I want to shade specific areas under the curves based on X1 and X2 values but before that I want to know how I could accomplish this task? 

I started with getting the probability score and adding a column for 1-P(X2 > x) for Y2 but was little stuck on how I could get 1-P(X2>x) and P(X1<=x1) on the same X axis and was running around circles with how I could approach this in general. Any thoughts and pointers would be really helpful. I have 5 different categories of a grouping variable and do not mind having 5 different such cdf plots. 

 

0 Kudos
2 REPLIES 2
Highlighted
gzmorgan0
Super User

Re: Plot conventional and reverse CDF on same plot by grouping variable

Attached is a script that I believe simulates your data setup and two methods to create the graph you described. One uses unstacked data ( two columns) and another with stacked data.  If you are looking for rows where the Inv Prob of Y2 > Prob Y1 the unstacked (raw) data would be easier to use.

 

Note, Y1 and Y2 were simulated as uniform distributions.

Names Default to Here(1);

dt = New Table("Demo - Raw", Add rows(1000),
        New Column("Y1", Numeric, Continuous, <<Set Each Value(Random Integer(0,100)) ),
        New Column("Y2", Numeric, Continuous, <<Set Each Value(Random Integer(0,1000)) )    
     );

dist = dt << Distribution(
	Continuous Distribution( Column( :Y1 ) ),
	Continuous Distribution( Column( :Y2 ) )
);

dist << Save(Prob Scores);

dist << close window();

dt << New Column("Inv Prob Y2", numeric, continuous, <<Set Each Value(1-:Prob Y2));


//Using Unstacked data and GraphBuilder
gb = dt << Graph Builder(
	Size( 534, 454 ),
	Show Control Panel( 0 ),
	Variables(
		X( :Y1 ),
		X( :Y2, Position( 1 ) ),
		Y( :Prob Y1 ),
		Y( :Inv Prob Y2, Position( 1 ) )
	),
	Elements(
		Smoother( X( 1 ), Y( 1 ), Legend( 25 ) ),
		Smoother( X( 2 ), Y( 2 ), Legend( 27 ) )
	),
	SendToReport(
		Dispatch(
			{},
			"Y1",
			ScaleBox,
			{Min( -100 ), Max( 1100 ), Inc( 50 ), Minor Ticks( 1 ),
			Label Row( {Show Major Grid( 1 ), Show Minor Grid( 1 )} )}
		),
		Dispatch(
			{},
			"Prob Y1",
			ScaleBox,
			{Min( -0.1 ), Max( 1.1 ), Inc( 0.1 ), Minor Ticks( 1 ),
			Label Row( {Show Major Grid( 1 ), Show Minor Grid( 1 )} )}
		)
	)
);


//Sometimes it is easier to stack the data and use Bivariate instead of GraphBuilder

dtstck = dt << Stack(
	columns( :Y1, :Y2, :Prob Y1, :Inv Prob Y2 ),
	Source Label Column( "Label" ),
	Stacked Data Column( "Data" ),
	Stack By Row( 0 ),
	Number of Series( 2 ),
	Contiguous,
	Output Table Name("Demo - Stacked")
);

dtstck:Data2 << set name("Prob");

//this creates one graph with 2 curves
biv = dtstck << Bivariate(
	Y( :Prob ),
	X( :Data ),
	Group By(:Label),
	Fit Each Value( {Report(0)}),
	SendToReport(
		Dispatch(
			{},
			"1",
			ScaleBox,
			{Min( -50 ), Max( 1050 ), Inc( 50 ), Minor Ticks( 1 ),
			Label Row( {Show Major Grid( 1 ), Show Minor Grid( 1 )} )}
		),
		Dispatch(
			{},
			"2",
			ScaleBox,
			{Label Row( {Show Major Grid( 1 ), Show Minor Grid( 1 )} )}
		),

		Dispatch(
			{},
			"Bivar Plot",
			FrameBox,
			{Row Legend(
				Label,
				Color( 1 ),
				Color Theme( "JMP Default" ),
				Marker( 0 ),
				Marker Theme( "" ),
				Continuous Scale( 0 ),
				Reverse Scale( 0 ),
				Excluded Rows( 0 )
			)}
		)
	)
);

Here are the two graphs:

image.pngimage.png

Re: Plot conventional and reverse CDF on same plot by grouping variable

I like Georgia's solution a lot. Here is a simpler and different approach: normalizing the two data sets before combining the plot. It might not be as satisfying.

 

I used two normal distributions with different parameters to illustrate this approach. You could save the fitted model for a distribution of sample data as a column formula instread of making up the data as I did. Here is the resulting plot:

 

Screen Shot 2019-04-11 at 6.08.21 AM.png

 

I attached the data table that I made to produce this plot.

Learn it once, use it forever!