cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
New to using JMP? Hit the ground running with the Early User Edition of Discovery Summit. Register now, free of charge.
Register for our Discovery Summit 2024 conference, Oct. 21-24, where you’ll learn, connect, and be inspired.
Choose Language Hide Translation Bar
Cklud
Level I

Parameter Identification Technique Bi-variate Signatures

Hi,

 

It would be great to get inputs on an approach to automate identification of parameters that can distinguish a bivariate response as shown in the example plot below. The data set has ~20k numerical continuous columns and one character response column. The response column has entries of the letter A or B.

 

Objective is to identify parameters that can identify as many of category B without identifying any (or very few of) category B.

 

Visually I identified 2 specific parameters that allow identification of the majority of category B without identifying any of category A. It can distinguish them with a linear line. Partition doesn't work well as category B is within the distribution of category A for each of the two parameters individually.

 

What would be the best approach in jmp/jsl to automate this identification?

 

Thank you for your time and help. - C

 

Cklud_1-1613782823149.png

 

 

7 REPLIES 7
Georg
Level VII

Re: Parameter Identification Technique Bi-variate Signatures

Please see enclosed script,

it generates a table with continuous x, and a bivariate response.

There are some scripts for vizualization and modeling.

You can try response screening, and Fit model for finding relevant variables.

As there are as many columns 20K, PCA for the continuous x would be an option (variable reduction technique).

In this example there is no outcome from it, as all x are independent (random normal).

 

 

Names Default To Here( 1 );

cdt = New Table( "data", New Column( "response", Character ) );
cdt << add rows( 100 );
For( i = 1, i <= 10, i++,
	cdt << New Column( "col" || Char( i ), Continuous, set each value( Random Normal( i, 1 ) ) )
);

cdt:response << set formula( If( :col5 < 4 & :col6 < 5, "B", "A" ) );

cdt << 
Add Properties to Table(
	{New Script(
		"col6 vs. col5",
		Graph Builder(
			Variables( X( :col5 ), Y( :col6 ), Overlay( :response ) ),
			Elements( Points( X, Y, Legend( 13 ) ), Smoother( X, Y, Legend( 14 ) ) )
		)
	), New Script(
		"Scatterplot Matrix",
		Scatterplot Matrix(
			Y(
				:response,
				:col1,
				:col2,
				:col3,
				:col4,
				:col5,
				:col6,
				:col7,
				:col8,
				:col9,
				:col10
			),
			Matrix Format( "Lower Triangular" )
		)
	), New Script(
		"Response Screening of response",
		Response Screening(
			Y( :response ),
			X(
				:col1,
				:col2,
				:col3,
				:col4,
				:col5,
				:col6,
				:col7,
				:col8,
				:col9,
				:col10
			)
		)
	), New Script(
		"Fit Nominal Logistic",
		Fit Model(
			Y( :response ),
			Effects(
				:col1,
				:col2,
				:col3,
				:col4,
				:col5,
				:col6,
				:col7,
				:col8,
				:col9,
				:col10
			),
			Personality( "Nominal Logistic" ),
			Run( Likelihood Ratio Tests( 1 ), Wald Tests( 0 ) )
		)
	), New Script(
		"Principal Components",
		Principal Components(
			Y(
				:col1,
				:col2,
				:col3,
				:col4,
				:col5,
				:col6,
				:col7,
				:col8,
				:col9,
				:col10
			),
			Estimation Method( "Default" ),
			"on Correlations"
		)
	)}
);
Georg
Georg
Level VII

Re: Parameter Identification Technique Bi-variate Signatures

Also predictor screening and discriminant analysis may give exactly what you want.

With many data, it could be a good choice to first start with a small set, to test out the different methods.

Georg
Jeff_Perkinson
Community Manager Community Manager

Re: Parameter Identification Technique Bi-variate Signatures

Do you know the categories ahead of time? Or have you identified them based on the scatterplot?

 

Are you looking for a way to discriminate the two clusters in that two dimensional space?

-Jeff

Re: Parameter Identification Technique Bi-variate Signatures

Minor correction: you used the term "bi-variate" incorrectly. That term refers to two variables. It does not refer to one variable with two levels. You might confuse other readers.

gzmorgan0
Super User (Alumni)

Re: Parameter Identification Technique Bi-variate Signatures

@Cklud ,

You should include which version of JMP you are using.

 

In line with @ron_horne links, I suggest using JMP Partition and for visualization, I like Parallel Plots, but not all people are fans of this plot.

 

Using @Georg's data table, the JMP Partition result is shown below.  Note since there are only 2 in one group, I set the minimum split size to 2.

Partition selects a slice one input variable cut at a time.

image.png 

 

Here is the script for Partition, which can be achieved with the user interface.

 

Partition(
	Y( :response ),
	X( :col1, :col2, :col3, :col4, :col5, :col6, :col7, :col8, :col9, :col10 ),
	Minimum Size Split( 2 ),
	Informative Missing( 1 ),
	SendToReport(
		Dispatch( {}, "Partition Report", FrameBox, {Frame Size( 480, 56 )} )
	)
)

Here is the script of standardized distributions using GraphBuilder

Graph Builder(
	Size( 843, 703 ),
	Variables(
		X( :col1, Combine( "Parallel Independent" ) ),
		X( :col2, Position( 1 ), Combine( "Parallel Independent" ) ),
		X( :col3, Position( 1 ), Combine( "Parallel Independent" ) ),
		X( :col4, Position( 1 ), Combine( "Parallel Independent" ) ),
		X( :col5, Position( 1 ), Combine( "Parallel Independent" ) ),
		X( :col6, Position( 1 ), Combine( "Parallel Independent" ) ),
		X( :col7, Position( 1 ), Combine( "Parallel Independent" ) ),
		X( :col8, Position( 1 ), Combine( "Parallel Independent" ) ),
		X( :col9, Position( 1 ), Combine( "Parallel Independent" ) ),
		X( :col10, Position( 1 ), Combine( "Parallel Independent" ) ),
		Color( :response ),
		Size( :response )
	),
	Elements(
		Points(
			X( 1 ),
			X( 2 ),
			X( 3 ),
			X( 4 ),
			X( 5 ),
			X( 6 ),
			X( 7 ),
			X( 8 ),
			X( 9 ),
			X( 10 ),
			Legend( 17 )
		)
	)
)

image.png

 

Or make it a parallel plot.

image.png

 

ih
Super User (Alumni) ih
Super User (Alumni)

Re: Parameter Identification Technique Bi-variate Signatures

@Mark_Bailey, in fairness, under the fit y by x platform, when plotting continuous v continuous variables, it is called Bivariate.  I had the same initial confusion though.