Byron Wingerd's Blog

Byron_JMP · Feb 23, 2018 05:15 PM

Screen Shot 2018-02-23 at 1.25.08 PM.png

Multiple Y-Axis with a Scaled Response Overlay Graph

STOP! If you just want the script, it is attached at the bottom of the post.

EDIT: Stop! - This is now implemented in Graph Builder for v19. : )

This is a specialized graph that scales all of the y-axis variables and overlays them and then provides the y-axis scale in the units and range of the variable. This is useful when you want to overlay 3 or 4 variables that are all in different ranges. In the figure above the flow rates are in 0-40 range while power is in the range of 200-1000 and 100 to 500. This type of graph is often used in scientific and engineering applications where multiple sensors are recording data at the same time, but the range of each sensor is quite different. For example, in a fermentation example, glucose is 1-5 g/dl, pH is 6.8 - 7.8, lactic Acid is 10-50 mmol, and cell count or turrbidity is an order of magnitude greater. A simple way to graph the relationship between all the variables over time is very helpful for understanding the kinetics (or detecting unusual patterns) of the process.

This specific graph has been my nemesis for quite a while. It usually takes some creative copy and pasting either within JMP or in PowerPoint to get anything that looks reasonably good. So why is this graph so hard? Somehow you have to get the data scaled and combined into one graph and then you have to get the all the unscaled y-axis components combined with the scaled data graph. All of the methods I've seen and tried involve far, far too many copy and paste steps and the metods that involve ungrouping and reformatting graph elements in Powerpoint can sometimes generate poor results due to the tedium of trying to get all the elements aligned right on my slides. Messy, manual, tedius, and repetitive processes scream for JSL automation.

In thinking about a strategy to script a solution there are a couple of steps to consider

1. Make a y-axis box for each variable/column.

2. Scale the values of each column to the same range

3. Reshape the data so that it will work in an overlay plot.

4. It would also be nice if there was a launch dialog for selecting columns for y roles and the x role, (or maybe just using row numbers)

5. And last but not least, the graph needs to include the axis box with each variable in the original scale.

I sharpened my JSL pencil and took a whack at trying automate making this figure. I'll try to explain what I did here, and would appreciate further discussion on how to make it more functional. The current state is a script that can generate one type of fixed figure and doesn't bother with cleaning up the wake of intermediate tables it leaves behind. (I'll attach the complete script to the post)

First Step: The launch dialog

cd = Column Dialog(
	ylist = ColList( "Y", Min Col( 2 ), Max Col( 8 ), Data Type( "Numeric" ), Modeling Type( {"Continuous"} ) ),
	xlist = ColList( "X", Max Col( 1 ), Modeling Type( {"Continuous", "Multiple Response"} ) )
);
Eval( Eval Expr( cd[1] ) ); //eval expr, returns what is in cd[1], and the eval makes it execute
Eval( Eval Expr( cd[2] ) ); //separately run just cd[1], then run evalexpr)cd[1], this is how I get the ylist and xlist variables

Anytime I start to write a launch dialog, I shamelessly copy the one from the scripting index, deleting what I don't need or adding more as necessary. The goal is to capture the Y's and the X. There are some rules here, like the columns must be numeric continuous, and also, there must be at least 2 Y's (the reason for this is more clear later, but also, if you only have one Y, then why are you using this?) There is also a limit of 8 Y responses, feel free to edit this, more than 8 seemed too messy.

The last two lines with the eval(eval expr()) are how I generate the ylist and xlist variables from the what is returned in "cd" from the column dialog.

Data table 1: The standardized data table

dt1 = New Table( "multiple y-axis table" );  	//this is the table where the standardized values go
If( N Items( xlist ) == 1,			//note that if there is a column selected for xlist, then it doesn't get scaled
	Insert Into( ylist, xlist, 1 ),	        //call me crazy, but the first value in ylist is the "X", if it exists
	dt1 << New Column( "Row", formula( Row() ) )
);

DT1 is the table that will contain the data after its standardized. If no X is selected in the dialog, the first column is "Row" and it has row numbers in it to use as the x-axis in the graph. If there is an X then it gets added into the first position of ylist. Yes that's right, the variable ylist has both the X's and the Y's.

Filling dt1 and making the "axis boxes"

For(
	If( N Items( xlist ) == 1,	 //yep, that's an "if" argument inside a "for" loop
		i = 2;			 //if an "x" is specified, then start with the second thing in the list
		ycol = As Column( dt, ylist[1] ) << get values;
		dt1 << New Column( Munger( ylist[1], 1, ":", "" ), set values( ycol ) );//this is soooo ugly
		,i = 1), i <= N Items( ylist ), i++,    //end of setup for the For loop
	ycol = As Column( dt, ylist[i] ) << get values;
	yycol = ((ycol - Min( ycol )) / (Max( ycol ) - Min( ycol )));
	dt1 << New Column( Munger( ylist[i], 1, ":", "" ), set values( yycol ) );	//this is soooo ugly, very sorry
	amin = Min( ycol );		        //this second part of this script is getting values to make the y-axis boxes
	amax = Max( ycol );			//I need the min and max values from each of the unscaled columns, along with their names
	ymin = amin - ((amax - amin) * .1);	//The y-axis box axis range is 10% higher and lower than the min and max.  
	ymax = amax + ((amax - amin) * .1);     //It looks better and is more closely aligned with the scaled data in gb
	obj1 = Graph Box( Framesize( 0, 230 ), yName( Char( ylist[i] ) ), Y Scale( ymin, ymax ) );	
        //its not a true axis box, just a squished graph box. the axis box was free
	Insert Into( objhlb1, Name Expr( obj1 ) );  
        //there is a lot of inserting of name expr() here. its part of building expressions from the inside out.
);

This next part loops through the list of columns selected for the Y's. If there was an X specified, then it starts with the second item in the list

The first part uses some logic for setting up the loop and making a table for the result so go into. The table variant is dependent on whether or not an x-axis variable was specified.

The middlepart of the script is really important, even though its short. It gets the values from each column in the original table (dt) and puts the scaled values into dt1. I'm really not proud of this line: dt1 << New Column( Munger( ylist[i], 1, ":", "" ), set values( yycol ) ); But here's the thing, New column wants "colname" or colname, and :colname kills this step, so I just mungered out the colon. (Send a better way to do this in the comments.)

The last part looks at each of the sets of column values and used them to scale a graph box. The graph box makes nice axis boxes, and if the width of the graph box is set to 0, only the y-axis box shows up. What happens is this step is important for formatting the graph. The last line inserts the expression for the graph box into a display box that gets evaluated at the end. By using name expr to refer to obj1 (the graph box) I get the script to make it and all the variables in the script are evaluated, but the script doesn't get evaluated. Each graph box is inserted in to a horizontal list box (objhlb1). Later this box is inserted into another box.

Building a table to make a graph from:

For(
	If( N Items( xlist ) == 1,
		i = 2,
		i = 1
	), i <= N Items( ylist ), i++,
	Insert Into( dt2stackcols, ylist[i] )
);
Insert Into( dt2stack, Name Expr( dt2stackcols ), 1 );
Eval(
	Substitute( Expr( dt2 = dt1 << ping ),
		Expr( ping ), Eval Expr( dt2stack )
	)
);	//This is tricky!
dt2 << set name( "multiple y-axis stack" );

Above I mentioned a constraint: that at least two y variables had to be selected. Well it turns out that you can't stack only one variable (because its already stacked) and it makes the script crash if there is only one. The interesting thing here is the logic dealing with having a specified X or not. If there is an X, I already didn't want to scale it, and now I don't want to include it in the stack because I want it as the X label. If there isn't an X specified, then I want to use all the variables in ylist.

The last part in red is hard to explain. In one line, the script to do the stack is built and run/evaluated. "Ping" is the pattern that gets substituted out, it was the first word that came to mind when I was thinking about bouncing these expressions aroud.

Where is the expression for dt2stackcols? For the sake of clarity, and my sanity when I was working on this I moved all the expressions that are defined to the beginning of the script. This was helpful, because as I changed the order of the script around, the expressions all existed before they were needed. If I just left them scattered in the script, every so often I would try to insert into an expression that was defined later in the script. Take a look at the complete script, attached to see how this is layed out. It might not be a best practice, but I'm pulling out all the stops for my nemisis graph.

Assembling all the parts

Insert Into( objout, Name Expr( objhlb1 ) );
Insert Into( objvlb, Name Expr( objout ) );
Insert Into( objhlb2, Name Expr( objvlb ) );
Insert Into( objhlb2, Name Expr( obj ) );
Insert Into( objwin, Name Expr( objhlb2 ) );
Eval( Eval Expr( objwin ) );		//Evaluating all the assembled parts after all the expressions are evaluated
gb << size( 600, 285 );			//if you want to format the graph, send messages to gb

One of the best things about JSL is that it is a function based language. This accomplishes two important things, first it makes it easy to do specific things without having to reinvent the wheel, and second, it makes it easy think about how to build complex structures. (as long as you think about them as an onion that you're building from the inside out. For example, a typical way of thinking would be something like this. First I buy a lot of land, then I build a building on the land, then I put rooms in the building, and then I put stuff like my office chair in the room, and then I sit in the chair and then, finally, I start working. Function thinking works like this; First there was a chair, and I put Byron into it. Then I put the chair (still has a Byron) into the office, which I then put into the building, which I put into the lot, which I then buy and then, finally, Byron works.

The first step was to put the graph boxes into objhlb1, then there is a vertical list box (objvlb) into which an outline box (objout) and objhlb1 are inserted. This is the left side of the graph. Since its the left side, ovjvlb is inserted into a second horizontal list box (objhlb2) first followed by the graph builder chart (obj). Finally objhlb2 (which now contains everything) is inserted into a new window box so that it can be displayed. And then to make it all work, it gets evaluated. If I just used Eval, then.. I'm not sure what happens but it doesn't work, so I did Eval Expr first so the whole evaluated script of everything is displayed (try running just Eval Expr( objwin ) in the script to see what I mean) and then I used Eval to run that part.

There is a little formatting of obj, the graph builder plot. The title is turned off, the frame size is readjusted, and the y-axis is turned off. This is all to make it match the scale and shape of the y-axis boxes. Its convenient to use the Graph Builder plot because the x-axis can still be edited, the legend can be moved around and all the GB options are available for futher formatting.

Current End Point

As the script stands, on the 7th major revision, its not very tidy, and it leaves a bunch of tables hanging around. That and several other things should be fixed. So, if you have ideas for additions, please send them along. Somethings just won't happen, e.g. the data in the graph won't be dynamically linked to the original data table and also the Y-axis box won't be linked the the Scaled Response Graph outline box, you'll have to size each one independently. It would be great if that worked. Finally, I am nearly 100% sure that there are ways to make the script break, or return unfortunate results, so look out for special column names and extreme outliers, or lots of missing data.

I would love comments on better ways to make this script work. (speaking of working, I tested this in v13 and v14 on the PC and Mac and it ran well to post.)

An end note from the author:

This is written in the most informal tone, just like if we were talking about this.

This is intentional because formal jargon in coding, IMHO, impedes communication. Instead, I eschew obsfucation.