Subscribe Bookmark RSS Feed

Y vs X, or Parallel Plot, or Something Else?

mjoner

Community Trekker

Joined:

Jun 23, 2011

I have a set of data that I am having trouble visualizing the way I want to.

In the simplest rendition of this data, I have four columns of data:

  • "Sample". I have about 1800 samples in the present data.
  • "Subsample". These are named as "2", "3", and "4". Each of my 1800 samples has each of these three subsamples.
  • "Position". These are named "A", "B", and "C", but these are actually names given to specific numeric locations measured at particular locations on the subsample. Think of the subsample as a stick that is an inch long. Location "A" might be 1/4-inch into the subsample, "B" might be 1/2-inch into the subsample, and "C" might be 3/4-inch into the subsample. I have some interest in interpolating between "A", "B", and "C", assuming for now a quadratic relationship and knowing this is at best an approximation.
  • "Data". The actual measurement taken at the position within the subsample within the sample. 

I made a simple Graph Builder by plotting Data vs. Position, with Group Y against the Subsample and Overlaying the Sample. The plot (taken on a MUCH smaller subset of the Sample variable) looks something like this:

initial plot.PNG

 

Parallel plots also comes to mind, and with a simple call to the Split platform and a little bit of effort in the Graph Builder, I can come up with this:

parallel.pngFor the curious, the JSL to get this plot looks something like this:

 

Graph Builder(
	Variables(
		X( :C 2 ), X( :B 2, Position( 1 ) ), X( :A 2, Position( 1 ) ),
		X( :C 3 ), X( :B 3, Position( 2 ) ), X( :A 3, Position( 2 ) ),
		X( :C 4 ), X( :B 4, Position( 3 ) ), X( :A 4, Position( 3 ) )
	),
	Elements( Position( 1, 1 ), Parallel( X( 1 ), X( 2 ), X( 3 ), Legend( 8 ), Curve Lines( 1 ) ) ),
	Elements( Position( 2, 1 ), Parallel( X( 1 ), X( 2 ), X( 3 ), Legend( 7 ), Curve Lines( 1 ) ) ),
	Elements( Position( 3, 1 ), Parallel( X( 1 ), X( 2 ), X( 3 ), Legend( 6 ), Curve Lines( 1 ) ) )
);

But really neither plot has all the features I want.

  • The parallel plot lays the graphs out side-by-side (as though I was using Group X). As shown earlier, I really want the subsample laid out vertically with a Group Y like behavior.
  • Since my Position (A, B, C) represents an underlying numeric dimension, I'd like to assume a quadratic model (for now) and interpolate at various positions between A and C. The Y vs. X approach which uses a Smoother element in Graph Builder, does this well. The Parallel Plot insists on rendering the curve more like a logisitic function between each parallel axis.
  • I want a constant color for all of the samples in the graph. While it's relatively easy to double-click the legend and set the color in the Y vs. X approach, it is a bit tedious and doesn't generalize well if I want to do it in JSL. (Essentially I have to run a loop and insert the color property for each sample into the Graph Builder expression.)
  • I would like to add a mean prediction line (solid, thicker black line). Neither method seems to have this ability. I don't even know where I'd start with Parallel. For Y vs. X I thought about simply dragging the Data column into the Y axis again, but the trouble here is that both versions of Data want to respect the Overlay property. I can't figure out how to have the Overlay work on one of the views and not on the other.
  • Ideally I would like to be able to select a sample and have all three subsamples highlight in the plot, something like this:
    parallel with highlight.png
    This works in the parallel view, but in the Y vs. X view each sample is represented with 9 rows of data (3 subsamples * 3 positions). Clicking one of the lines in that plot will cause JMP to highlight across the 3 positions, but JMP has no idea that I want it to highlight the other two subsamples because Group Y is seen as my Sample identifier. This is probably the hardest criteria for me to put into writing.
  • Frankly I think there may be additional questions around the variability of the measurements and identifying outlying samples, or even outlying subsamples within a sample? I'm not sure either of these approaches get me there.

I know JMP 14 will have a Functional Data Explorer but I'm not sure it will work with the nested structure I have here, and also the fact that there are only three measurements per subsample...

What other options should I be looking at?

 

2 REPLIES
Byron_JMP

Staff

Joined:

Apr 26, 2012

So this is a pretty different approach.

the first thing is to change your location from a letter to the actual value it represents.

 

Now try using fit model. Data is your Y and your X'x are these effects.

Before you hit run, if the B position isn't in the middle of A and C, then turn off "center polynomials" from the red triange menu 

Effects

location*subsample

location*location

location

subsample

This set of terms will give you the quadratic for location (make sure location is numeric continuous, not a letter) then the main effects for location and subsample plus the interaction will let you have an independent slope and intercept for each level of subsample. 

 

 

This is the profiler figure.  (I mocked up 1800 samples, from three locations, each set of three are a subsample)

Prediction Profiler

Screen Shot 2017-11-02 at 2.59.45 PM.png

So, you can interpolate from the graph, but if you save the prediction formula to the data table, then you can interpolate other poisitions. Check out the formula.  Its kind of monstorous because it contains coefficients for each of the subsamples.

 

//Mock Data Table
New Table( "mock data",
	Add Rows( 1800 ),
	New Column( "sample", Numeric, "Nominal", Format( "Best", 12 ), Formula( Sequence( 1, 1801 ) ) ),
	New Column( "subsample", Numeric, "Nominal", Format( "Best", 12 ), Formula( Sequence( 1, 601, 1, 3 ) ) ),
	New Column( "location", Numeric, "Continuous", Format( "Best", 12 ), Formula( Sequence( 1, 3 ) ) ),
	New Column( "Data",Numeric,"Continuous",Format( "Best", 12 ),Formula( Abs( Random Normal() * :location ^ 2 ) )
	)
);
mjoner

Community Trekker

Joined:

Jun 23, 2011

We are looking at going this direction. It is still somewhat challenging to really understand what is going on from sample-to-sample this way (i.e., where does sample 1 really fit in relative to sample 2 relative to sample 3?).

There is some misunderstanding in how my data are organized, as well, so I have modified the mock data table as well.

See mock data table and my current Profiler analysis here. It's a start, anyway.

New Table( "mock data",
	Add Rows( 1800 * 9 ),
	New Column( "sample", Numeric, "Nominal", Format( "Best", 12 ), Formula( Sequence( 1, 1801, 1, 9 ) ) ),
	New Column( "subsample", Numeric, "Nominal", Format( "Best", 12 ), Formula( Sequence( 1, 3, 1, 3 ) ) ),
	New Column( "location", Numeric, "Continuous", Format( "Best", 12 ), Formula( Sequence( 1, 3 ) ) ),
	New Column( "Data", Numeric, "Continuous", Format( "Best", 12 ), Formula( Abs( Random Normal() * :location ^ 2 ) ) )
);

Fit Model(
	Y( :Data ),
	Effects( :subsample, :location, :subsample * :location, :location * :location, :subsample * :location * :location ),
	Personality( "Standard Least Squares" ),
	Emphasis( "Minimal Report" ),
	Run(
		Profiler( 1 ),
		:Data << {Effect Summary( 0 ), Summary of Fit( 0 ), Analysis of Variance( 0 ), Parameter Estimates( 0 ),
		Effect Tests( 0 ), Effect Details( 0 ), Lack of Fit( 0 ), Scaled Estimates( 0 ), Plot Actual by Predicted( 0 ),
		Plot Residual by Predicted( 0 ), Plot Studentized Residuals( 0 ), Plot Effect Leverage( 0 ),
		Plot Residual by Normal Quantiles( 0 ), Box Cox Y Transformation( 0 )},
		Automatic Recalc( 1 )
	),
	Local Data Filter(
		Add Filter( columns( :sample ), Where( :sample == 1 ), Display( :sample, Size( 224, 315 ), List Display ) )
	)
);