Solved: Highlight bins in a distribution based on a column value to highlight bin migrat...

bb101 · Oct 9, 2024 06:13 PM

I have a set of distributions which contain a population of units. I have used a select where with a containsitem call to select the list items in my dataset matching specific DUT reference ID numbers. This helps to show which bins contain the measurements for the units of interest. I would like use the date of test for each instance of the selection to highlight the bin containing measurements. For example if the same unit was tested on 6/20/2024 and later tested again on 6/25/2024, then once more in on 6/28/2024. The bin result from 6/20 would be highlighted with one color and the bin result on the 6/25, 6/28 would two other colors. Some other goals are to have these colors to be consistent across the distributions in the report (one color for each date of test), with all unselected units remaining with default color. I would also like to have the highlight be from top to bottom of the plot for the respective bin.

Your support is greatly appreciated.

//dtmain << select where(:SN=="12346777" | :SN=="12346982");
dtmain << select where(ContainsItem(:SN, {"12346777", "12346982"}));

Distribution(
	Stack( 1 ),
	Continuous Distribution(
		Column( :Average ),
		Horizontal Layout( 1 ),
		Vertical( 0 ),
		Set Bin Width( 2 ),
		Process Capability( 0 )
	),
	Continuous Distribution(
		Column( :Min ),
		Horizontal Layout( 1 ),
		Vertical( 0 ),
		Set Bin Width( 2 ),
		Process Capability( 0 )
	),
	Continuous Distribution(
		Column( :Max ),
		Horizontal Layout( 1 ),
		Vertical( 0 ),
		Set Bin Width( 2 ),
		Process Capability( 0 )
	),
	By(
		:TID, :BLOCKNAME, :MEAS_TYPE
	)
);

Mock-up of what I would like to achieve:

hogi · Oct 11, 2024 2:40 AM

If you need the BY groups, it gets a bit more complicated.
Every subplot can have different bin widths. This is why a for loop is necessary:

- to get the individual bin widths.
- to get the selected rows for the specific subplot
- to add the ref lines to the specific subplot.

Hm, maybe rethinking the idea of the shaded regions - we also have the issue with the colors.
At the moment, we just apply a different color to every selected bin. The shaded regions look nice - but the width is quite arbitrary. Maybe just add a line per selected ID? How many "IDs" do you want to select?

When it smells like tape, it's time to ask: are we reinventing the wheel?
In general Graph Builder can do such things automatically without the need to talk to the report layer ...

Names Default To Here( 1 ); 

dt = Open( "$SAMPLE_DATA/Big Class.jmp" );


col_name_list = dt << get column names( string );
newColName = "State";

// New column: State
If( !Contains( col_name_list, newColName ),
	Local( {dt, tempStrings},
		dt = Data Table( "Big Class" );
		dt << New Column( "State", Character, "Nominal" ) << Begin Data Update;
		tempStrings = {"CA", "WA", "OR"};
		For Each Row( dt, :State = tempStrings[Sequence( 1, 3, 1, 1 )] );
		dt << End Data Update;
	);
	//Make it a double Patty :), another person with the same name.
	dt << Add Rows( 1, At End );
	//Cheat and use a constant of 41 for the last row index
	dt[[41], {name, age, sex, height, weight, State}] = {{"PATTY", 16, "F", 57, 134, "WA"}};
);


// For illustration, select some names
dt << select where( Contains Item( :name, {"LESLIE", "PATTY"} ) );

dist = dt << Distribution(
	Stack( 1 ),
	Continuous Distribution(
		Column( :height ),
		Quantiles( 0 ),
		Summary Statistics( 0 ),
		Horizontal Layout( 1 ),
		Normal Quantile Plot( 0 ), // disable, if the user enabled it in the preferences :)
		Vertical( 0 ),
		Set Bin Width( 2 ),
		Process Capability( 0 )
	),
	By( :State )
);


// get all reports
// if there is a BY group, get the info - if there is none, put it into a list
 If( Is List( dist ),
	dists=dist;
	myBys = Transform each ({myexpr} , dist << Get Where Expr, Arg(myExpr , 2 ))
	
	, //no By
	
	dists = Eval List( {dist} );
	myBys= {};
	
);

//get the reports
distRPTs = Transform Each( {dist}, dists, Report( dist ) );


// Find the Midpoints of the bins the selected rows are in
dist << Save( "Level Midpoints" );

//loop though the By groups and add the reflines
For Each( {report, idx}, distRPTs,
//report = distRPTs[2]; idx=2;
	selected rows = dt << get selected rows;

	// there is a BY group -> restrict the selection
	if(N Items(myBys),
	Eval( Eval Expr( selected rows = dt << get rows where( Contains( Expr( selected rows ), Row() ) & :State == Expr( myBys[idx] ) ) ) );
	);


	theBins = (Associative Array( Column( dt, N Cols( dt ) )[selected rows] )) << get keys;
	
	// Get the Increment of the X axis
	theIncr = report[AxisBox( 1 )] << get inc;

// Add the reference lines for each selected bin

	For Each( {bin, index}, theBins,
	//bin = 57; index=1
		binList = {};
		Insert Into( binList, bin - theIncr / 2 );
		Insert Into( binList, bin + theIncr / 2 );
		Eval( Eval Expr(report[AxisBox( 1 )] << Add Ref Line( Expr( binList ), "Solid", Expr( Index + 2 ), "", 1, 0.25 ) ) );
	);

);
dt << delete columns( N Cols( dt ) );

View solution in original post

hogi · Oct 10, 2024 02:46 AM

Could you pick a suitable file from the samples folder and create a mockup which illustrates the issue?

The benefits:

saves time (#1) :
no user needs to create a mockup dataset
saves time (#2):
helps to understand the issue
all replies will be based on a common mockup
-> easy to compare

e.g. with temperature data, state selection and colors by quarter -- instead of the original data, DUT selection and colors by measurement dates -- is this what you are interested in?

(view in My Videos)

dt = Open( "$SAMPLE_DATA/Functional Data/Weekly Weather Data.jmp" );

dt << New Column( "selected Quarter",
	"Ordinal",
	Formula( If( Selected(), Quarter( :DATE ) ) ),
	Set Property(
		"Value Colors",
		{1 = -13912408, 2 = -4042310, 3 = -4354269, 4 = -13400361}
	)
);

dt  << Graph Builder(
	Transform Column( "temperature", Formula( Round( :TAVG / 2 ) * 2 ) ),
	Size( 568, 328 ),
	Show Control Panel( 0 ),
	Variables( X( :STATION ), X( :temperature ), Overlay( :selected Quarter ) ),
	Elements( Position( 1, 1 ), Bar( X, Overlay( 0 ), Legend( 9 ), Bar Style( "Stacked" ) ) ),
	Elements( Position( 2, 1 ), Bar( X, Legend( 1 ), Bar Style( "Stacked" ) ) ),
	SendToReport(
			Dispatch( {}, "400", ScaleBox,
			{Legend Model( 1, Properties( 0, {Fill Color( 32 )}, Item ID( "Missing", 1 ) ) )}
		),
		Dispatch( {}, "Graph Builder", FrameBox( 2 ), {Fill Selection Mode( "Unselected Faded" )} )
	)
);

bb101 · Oct 10, 2024 04:31 PM

Noted. Thank you. I will adapt my question to use available data as you suggested.

jthi · Oct 10, 2024 03:34 AM

You can do this maybe with reference lines

Names Default To Here(1); 

dt = Open("$SAMPLE_DATA/Big Class.jmp");

dist = dt << Distribution(
	Stack(1),
	Continuous Distribution(
		Column(:height),
		Horizontal Layout(1),
		Vertical(0),
		Set Bin Width(2),
		Process Capability(0)
	)
);
ab = Report(dist)[AxisBox(1)];

ab << Add Ref Line({60, 62}, "Solid", "Red", "", 1, 0.25);
ab << Add Ref Line({62, 64}, "Solid", "Green", "", 1, 0.25);
ab << Add Ref Line({64, 66}, "Solid", "Blue", "", 1, 0.25);

or using graphic scripts

Names Default To Here(1); 

dt = Open("$SAMPLE_DATA/Big Class.jmp");

dist = dt << Distribution(
	Stack(1),
	Continuous Distribution(
		Column(:height),
		Horizontal Layout(1),
		Vertical(0),
		Set Bin Width(2),
		Process Capability(0)
	)
);

fb = Report(dist)[FrameBox(2)];
fb << Add Graphics Script(
	1,
	Transparency(0.2);
	Fill Color("Red");
	Rect(68, Y Range() + 10, 70, Y Origin(), 1);

	Transparency(0.2);
	Fill Color("Light Green");
	Rect(70, Y Range() + 10, 72, Y Origin(), 1);
);

To get same coloring across all distributions, you could set value colors property for your dates and utilize that when setting the colors, or create a list of colors and use those in order.

-Jarmo

txnelson · Oct 10, 2024 06:56 AM

Here is my take on how to do this. It finds the selected rows and the bins associated with them, and then adds the reference lines as required.

Names Default To Here( 1 ); 

dt = Open( "$SAMPLE_DATA/Big Class.jmp" );

// For illustration, randomly select some rows
For( i = 1, i <= 3, i++,
	dt << select rows( Random Integer( 1, N Rows( dt ) ) )
);

dist = dt << Distribution(
	Stack( 1 ),
	Continuous Distribution(
		Column( :height ),
		Quantiles( 0 ),
		Summary Statistics( 0 ),
		Horizontal Layout( 1 ),
		Vertical( 0 ),
		Set Bin Width( 2 ),
		Process Capability( 0 )
	)
);
distRPT = Report( dist )[AxisBox( 1 )];
// Find the Midpoints of the bins the selected rows are in
dist << Save( "Level Midpoints" );
theBins = (Associative Array( Column( dt, N Cols( dt ) )[dt << get selected rows] )) << get keys;
dt << delete columns( N Cols( dt ) );

// Get the Increment of the X axis
theIncr = distRPT[AxisBox( 1 )] << get inc;

// Add the reference lines for each selected bin

For Each( {bin, index}, theBins,
	binList = {};
	Insert Into( binList, bin - theIncr / 2 );
	Insert Into( binList, bin + theIncr / 2 );
	Eval( Eval Expr( distRPT << Add Ref Line( Expr( binList ), "Solid", Expr( Index + 2 ), "", 1, 0.25 ) ) );
);

An expansion on this to where interactively the graph would change as the user selects and deselects different rows, would require adding in a RowState Handler, and some logic that would first remove any existing reference lines, and then add the new ones.

Jim

hogi · Oct 11, 2024 2:59 AM

A question to be answered:
how should the colors be assigned and how should the range be marked.
e.g. if 6/20 data for the selected IDs is: 48, 90 ( 77, 56, ...).

Or the reverse: can we be sure that the data for all the selected Device IDs at one date are in the same bin

bb101 · Oct 10, 2024 05:00 PM

Thank you for the support.

I was able to use the add Ref as you and @jthi suggested. I used the approach for coloration and confirmed it will work for a single distribution. When I add By conditions, I lose the plot addressing capability resulting in errors.

As suggested by @hogi I switched to available data for this question. To help facilitate the question I augmented Big Class in a modified version of Jim's script. Here are the steps I followed and the outcomes

1. Import Big Class

2. Conditionally add some data if the column "State" is not in the table

a. Add a column named State with sequenced data to place people in either "CA", "WA" or "OR".

b. Add a second "PATTY" who is 16, "F", 57, 134 and is in "WA".

3. Select some names. I chose LESLIE and PATTY

4. Run script to find and highlight bins of interest.

OK Case - remove By (:State) from the script below and change the preceding comma to a semicolon to run successfully. The code finds both instances of Patty and uses different colors for each of them. If they had the same height, for this example the reference lines would overlap only showing one of the two colors.

@hogi made a good point about the arbitration of which color is shown. Once I understand the element addressing, I plan to add a conditional for the case where more than one interesting data point occurs in the same bin.

NG Case - run distribution with a By condition based on :State and see an error in addressing the plot object. I tried a few addressing permutations without success. I am missing key information to access the AxisBox.

Did not work either:

distRPT = Report( dist )["height",AxisBox( 1 )];

distRPT = Report( dist )["State=WA","height",AxisBox( 1 )];

distRPT = Report( dist["State=WA"] )["height",AxisBox( 1 )];

Sample error code: object not subscriptable at row 41 in access or evaluation of 'Report(dist)[ /*###*/AxisBox(1)]' , Report( dist )[/*###*/AxisBox( 1 )]

Names Default To Here( 1 ); 

dt = Open( "$SAMPLE_DATA/Big Class.jmp" );


col_name_list = dt << get column names(string);
newColName = "State";

// New column: State
if (!contains(col_name_list, newColName),
	Local( {dt, tempStrings},
		dt = Data Table( "Big Class" );
		dt << New Column( "State", Character, "Nominal" ) << Begin Data Update;
		tempStrings = {"CA", "WA", "OR"};
		For Each Row( dt, :State = tempStrings[Sequence( 1, 3, 1, 1 )] );
		dt << End Data Update;
	);
	//Make it a double Patty :), another person with the same name.
	dt << Add Rows( 1, At End );
	//Cheat and use a constant of 41 for the last row index
	dt[[41],{name,age,sex,height,weight,State}] = {{"PATTY", 16, "F", 57, 134, "WA"}};
);


// For illustration, select some names
dt << select where( ContainsItem(:name, {"LESLIE","PATTY"}));

dist = dt << Distribution(
	Stack( 1 ),
	Continuous Distribution(
		Column( :height ),
		Quantiles( 0 ),
		Summary Statistics( 0 ),
		Horizontal Layout( 1 ),
		Vertical( 0 ),
		Set Bin Width( 2 ),
		Process Capability( 0 )
	),
	By (:State)
);
distRPT = Report( dist )[AxisBox( 1 )];
// Find the Midpoints of the bins the selected rows are in
dist << Save( "Level Midpoints" );
theBins = (Associative Array( Column( dt, N Cols( dt ) )[dt << get selected rows] )) << get keys;
dt << delete columns( N Cols( dt ) );

// Get the Increment of the X axis
theIncr = distRPT[AxisBox( 1 )] << get inc;

// Add the reference lines for each selected bin

For Each( {bin, index}, theBins,
	binList = {};
	Insert Into( binList, bin - theIncr / 2 );
	Insert Into( binList, bin + theIncr / 2 );
	Eval( Eval Expr( distRPT << Add Ref Line( Expr( binList ), "Solid", Expr( Index + 2 ), "", 1, 0.25 ) ) );
);

Your comments and guidance are appreciated.

Thanks!

Bryan

hogi · Oct 11, 2024 01:18 AM

nice : )

concerning "find the right axis box", this wish in the Wish List:
Make it easier to get results from reports via JSL
has the status "in the queue" - I hope, besides "easier to get results from reports", it will also facilitate the way to find the right axis box ...

hogi · Oct 11, 2024 2:40 AM

If you need the BY groups, it gets a bit more complicated.
Every subplot can have different bin widths. This is why a for loop is necessary:

- to get the individual bin widths.
- to get the selected rows for the specific subplot
- to add the ref lines to the specific subplot.

Hm, maybe rethinking the idea of the shaded regions - we also have the issue with the colors.
At the moment, we just apply a different color to every selected bin. The shaded regions look nice - but the width is quite arbitrary. Maybe just add a line per selected ID? How many "IDs" do you want to select?

When it smells like tape, it's time to ask: are we reinventing the wheel?
In general Graph Builder can do such things automatically without the need to talk to the report layer ...

Names Default To Here( 1 ); 

dt = Open( "$SAMPLE_DATA/Big Class.jmp" );


col_name_list = dt << get column names( string );
newColName = "State";

// New column: State
If( !Contains( col_name_list, newColName ),
	Local( {dt, tempStrings},
		dt = Data Table( "Big Class" );
		dt << New Column( "State", Character, "Nominal" ) << Begin Data Update;
		tempStrings = {"CA", "WA", "OR"};
		For Each Row( dt, :State = tempStrings[Sequence( 1, 3, 1, 1 )] );
		dt << End Data Update;
	);
	//Make it a double Patty :), another person with the same name.
	dt << Add Rows( 1, At End );
	//Cheat and use a constant of 41 for the last row index
	dt[[41], {name, age, sex, height, weight, State}] = {{"PATTY", 16, "F", 57, 134, "WA"}};
);


// For illustration, select some names
dt << select where( Contains Item( :name, {"LESLIE", "PATTY"} ) );

dist = dt << Distribution(
	Stack( 1 ),
	Continuous Distribution(
		Column( :height ),
		Quantiles( 0 ),
		Summary Statistics( 0 ),
		Horizontal Layout( 1 ),
		Normal Quantile Plot( 0 ), // disable, if the user enabled it in the preferences :)
		Vertical( 0 ),
		Set Bin Width( 2 ),
		Process Capability( 0 )
	),
	By( :State )
);


// get all reports
// if there is a BY group, get the info - if there is none, put it into a list
 If( Is List( dist ),
	dists=dist;
	myBys = Transform each ({myexpr} , dist << Get Where Expr, Arg(myExpr , 2 ))
	
	, //no By
	
	dists = Eval List( {dist} );
	myBys= {};
	
);

//get the reports
distRPTs = Transform Each( {dist}, dists, Report( dist ) );


// Find the Midpoints of the bins the selected rows are in
dist << Save( "Level Midpoints" );

//loop though the By groups and add the reflines
For Each( {report, idx}, distRPTs,
//report = distRPTs[2]; idx=2;
	selected rows = dt << get selected rows;

	// there is a BY group -> restrict the selection
	if(N Items(myBys),
	Eval( Eval Expr( selected rows = dt << get rows where( Contains( Expr( selected rows ), Row() ) & :State == Expr( myBys[idx] ) ) ) );
	);


	theBins = (Associative Array( Column( dt, N Cols( dt ) )[selected rows] )) << get keys;
	
	// Get the Increment of the X axis
	theIncr = report[AxisBox( 1 )] << get inc;

// Add the reference lines for each selected bin

	For Each( {bin, index}, theBins,
	//bin = 57; index=1
		binList = {};
		Insert Into( binList, bin - theIncr / 2 );
		Insert Into( binList, bin + theIncr / 2 );
		Eval( Eval Expr(report[AxisBox( 1 )] << Add Ref Line( Expr( binList ), "Solid", Expr( Index + 2 ), "", 1, 0.25 ) ) );
	);

);
dt << delete columns( N Cols( dt ) );

hogi · Oct 11, 2024 06:13 AM

via Graph Builder:

(view in My Videos)

New Column( "selected",
	Character,
	Formula( If( Col Maximum( Selected( Row State() ), :name ), :name, "-" ) )
)

New Window( "try",
	H List Box(
		Graph Builder( Size( 519, 452 ), Show Control Panel( 0 ), Show Legend( 0 ), Variables( X( :name ) ), Elements( Bar( X ) ) );


		gb = Graph Builder(
			Size( 487, 500 ),
			Show Control Panel( 0 ),
			Variables( X( :height ), Wrap( :State ), Color( :selected ) ),
			Elements(
				Heatmap( X, Legend( 8 ) ),
				Histogram( X, Legend( 6 ), Response Scale( "Fill" ) ),
				Points( X, Legend( 9 ), Jitter( "None" ) )
			),
			SendToReport(
				Dispatch( {}, "400", ScaleBox,
					{Legend Model( 8, Properties( 0, {Transparency( 0 )}, Item ID( "-", 1 ) ) ), Legend Model(
						9,
						Properties( 0, {Transparency( 0 )}, Item ID( "-", 1 ) )
					)}
				)
			)
		);
	)
);

(gb << xpath( "//FrameBox" )) << {Marker Drawing Mode( "Outlined" ), Marker Size( 10 ), Transparency( 0.5 ),
Fill Selection Mode( "Unselected Faded" )};

Highlight bins in a distribution based on a column value to highlight bin migration

Re: Highlight bins in a distribution based on a column value to highlight bin migration

Re: Highlight bins in a distribution based on a column value to highlight bin migration

Re: Highlight bins in a distribution based on a column value to highlight bin migration

Re: Highlight bins in a distribution based on a column value to highlight bin migration

Re: Highlight bins in a distribution based on a column value to highlight bin migration

Re: Highlight bins in a distribution based on a column value to highlight bin migration

Re: Highlight bins in a distribution based on a column value to highlight bin migration

Re: Highlight bins in a distribution based on a column value to highlight bin migration

Re: Highlight bins in a distribution based on a column value to highlight bin migration

Re: Highlight bins in a distribution based on a column value to highlight bin migration