cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar
Lukas_Häfner
Level II

Exclude data points from a regression line, but make them still visible in the diagram

Hey everybody!

 

I used a training set of data points to create a linear regression (signal intensity as a function of substance concentration within samples), and applied the resulting formula on a test set of data points for a reverse prediction of the associated sample's substance content. Now I would like to showcase the data points of the training set, the regression line that visualizes the linear regression, and the data points of the test set all in a single diagram. The purpose is to show how all the data points scatter arround the ideal function of signal intensity versus substance concentration. I know how to plot the data points of the training set and how to create the regression line within the same plot. But how do I add the data points of the test set without them influencing of the creation of the regression line? The data points of the test set should be present and visible in the diagram, but the regression line should be solely a product of the data points of the training set. Also, I would like to highlight the data points of the test set in a different color than the other data points.

 

Thanks in advance!

6 REPLIES 6
statman
Super User

Re: Exclude data points from a regression line, but make them still visible in the diagram

There may be multiple ways to accomplish this.  If you select the rows of the test set and right click Exclude.  This will exclude the data from the analysis, but still show the points.  There are also multiple options for colors and markers with that same command.

 

You can also create a column "Test" and identify the test rows (vs. the other rows, say 0 and 1) then use Rows>Color or Mark by Column>Test

"All models are wrong, some are useful" G.E.P. Box
Lukas_Häfner
Level II

Re: Exclude data points from a regression line, but make them still visible in the diagram

Thank yo very much! Your advise helped me a lot!

Byron_JMP
Staff

Re: Exclude data points from a regression line, but make them still visible in the diagram

Its completely possible that I'm not understanding your question, but here are a couple things to try.

 

Byron_JMP_0-1710796956311.png

In the Fit Y by X platform, you can group the data by your column with the indicator for Train and Test (validation column)

Then fit a line, which will be separate lines for test and train.  

To highlight the Train group, maybe try turning on the shaded and dashed lines for the confidence interval of the fit

 

I made an example figure with the big class.JMP data set

Byron_JMP_1-1710797190720.png

 

Graph builder can do something similar.

double click on the legend to get these menus:

Byron_JMP_2-1710797426737.png

 

I made the "M" line completely transparent, "0".

 

Byron_JMP_3-1710797496456.png

 

 

JMP Systems Engineer, Health and Life Sciences (Pharma)
Lukas_Häfner
Level II

Re: Exclude data points from a regression line, but make them still visible in the diagram

Thank you very much for your detailed description! It greatly improved my diagram!

matth1
Level III

Re: Exclude data points from a regression line, but make them still visible in the diagram

Using the Big Class dataset as an example: plot height vs weight and show all points but only plot correlation for Females.

Interactively:

- select Male rows in data table and exclude (but NOT hide).

- in Graph Builder: plot height against weight, set sex as Overlay variable, change from smoother to line of fit, then hold the SHIFT key and select "Show Excluded Rows" from the red triangle menu. The shift key is important because it unhides this usually hidden option.

matth1_0-1710846715975.png

 

Via scripting, this time using a local data filter rather than excluding rows in the data table:

Names Default to here(1);
dt = open("$SAMPLE_DATA/Big Class.jmp");
dt << Graph Builder(
	Size( 300, 200 ),
	Show Control Panel( 0 ),
	Show Excluded Rows( 1 ),
	Variables( X( :height ), Y( :weight ), Overlay( :sex ) ),
	Elements(
		Points( X, Y, Legend( 3 ) ),
		Line Of Fit( X, Y, Legend( 5 ), R²( 1 ) )
	),
	Local Data Filter(
		Show Controls( 0 ),
		Mode( Show( 0 ) ),
		Add Filter( columns( :sex ), Where( :sex == "F" ) )
	)
);

matth1_1-1710846918004.png

 

Hope this helps.

matth1
Level III

Re: Exclude data points from a regression line, but make them still visible in the diagram

I apologise if this posts twice. My first post seems to have vanished.

 

Using Big Class as an example: plot height vs weight but only show correlation using Female data points.

 

Interactively:

- in the data table, select Male data rows and exclude (but NOT hide),

- in Graph Builder: plot height against weight, set sex as Overlay variable, change smoother to line of fit, then with the SHIFT key pressed select  "Show Excluded Rows" from the red triangle menu (shift is needed to unhide this usually hidden option).

matth1_0-1710847475354.png

 

Using scripting, with a local data filter this time instead of modifying the table:

Names Default to here(1);
dt = open("$SAMPLE_DATA/Big Class.jmp");
dt << Graph Builder(
	Size( 300, 200 ),
	Show Control Panel( 0 ),
	Show Excluded Rows( 1 ),
	Variables( X( :height ), Y( :weight ), Overlay( :sex ) ),
	Elements(
		Points( X, Y, Legend( 3 ) ),
		Line Of Fit( X, Y, Legend( 5 ), R²( 1 ) )
	),
	Local Data Filter(
		Show Controls( 0 ),
		Mode( Show( 0 ) ),
		Add Filter( columns( :sex ), Where( :sex == "F" ) )
	)
);

matth1_1-1710847516624.png

 

Hope this helps!