cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar

Transform Columns - as comfortable as Summary Statistics?

☑ cool new feature
☑ could help many users!

☑ removes (what fells like) a „bug“

☐ nice to have

☐ nobody needs it

 

#myTop10_2023a

 

What inspired this wish list request? 

The Summary Statistics in GraphBuilder are very comfortable, they automatically react on changes of the Data Filter and the calculated values respect the grouping of data points on the x axes, group X/Y, wrap, Overlay and  pages.

 

Unfortunately, Summary Statistics are just available for Point, Line, Bar and Heatmap plots, but e.g. not for for parallel or smoother plots.
This is why sometimes it's NOT possible to get a smoother and point/line plot aligned - by automatic means of Graph Builder, like in this plot from Graph Builder: Summary Statistics for Smoother 

hogi_0-1694502452629.png

 

To fix the ussue, one has to do the aggregation "manually" by a Transform Column.

There are more reasons why users add Transform Columns instead of using the Summary Statistics of Graph Builder.

- limited number of aggregation option which are available via Summary Statistics ( Add CDFs to Graph Builder , Rank is also missing )

- user has to adjust the statistics (will be used more frequently, once new JSL function: Col N Categories  is available).

 

The issue of Transform Columns: they are much less flexible than Summary Statistics:

- They ignore the distribution of data points on the x/y axes, X/Y groups,wrap, overlay and  pages.

- They ignore the Local Data Filter

 

View more...
Names Default to Here(1);
dt = Open( "$SAMPLE_DATA/Big Class.jmp" );
gb = dt << Graph Builder(
	Transform Column( "Mean[weight]", Formula( Col Mean( :weight ) ) ),
	Size( 480, 463 ),
	Show Control Panel( 0 ),
	Set α Level( 0.01 ),
	Summary Statistic( "Median" ),
	Graph Spacing( 4 ),
	Variables(
		X( :age ),
		Y( :"Mean[weight]"n ),
		Y( :weight, Position( 1 ) ),
		Group Y( :sex ),
		Overlay( :sex )
	),
	Elements(
		Points( X, Y( 2 ), Legend( 1 ), Summary Statistic( "Mean" ) ),
		Points( X, Y( 1 ), Legend( 3 ), Jitter( "None" ) )
	),
	Local Data Filter(
		Add Filter(
			columns( :name ),
			Where(
				:name == {"ALFRED", "ALICE", "AMY", "BARBARA", "CAROL",
				"CHRIS", "CLAY", "DANNY", "DAVID", "EDWARD", "ELIZABETH",
				"FREDERICK", "JACLYN", "JAMES", "JANE", "JEFFREY", "JOE",
				"JOHN", "JUDY", "KIRK", "LAWRENCE", "LESLIE", "LILLIE",
				"LINDA", "MARION", "MARK", "MARY", "MICHAEL", "PHILLIP",
				"ROBERT", "SUSAN", "WILLIAM"}
			),
			Display( :name, N Items( 15 ), Find( Set Text( "" ) ) )
		)
	),
	SendToReport(
		Dispatch(
			{},
			"400",
			ScaleBox,
			{Legend Model(
				3,
				Base( 0, 0, 0, Item ID( "F", 1 ) ),
				Base( 1, 0, 0, Item ID( "M", 1 ) ),
				Properties(
					0,
					{Marker( "Triangle" ), Marker Size( 3 )},
					Item ID( "F", 1 )
				),
				Properties(
					1,
					{Marker( "Triangle" ), Marker Size( 3 )},
					Item ID( "M", 1 )
				)
			)}
		)
	)
)

solid points: mean via Summary Statistics -> different values for different ages, sexes + respects the local data filter

open triangles: mean via Transform Column -> all values are the same, LDF is ignored

hogi_3-1694504246854.png

One could argue: the triangles displays exactly what the Transform Column is told to calculate. No need to worry.

Yes, sure  ... if there wasn't the Summary Statistics!

The Summary Statistics got the same task: "mean(weight)"  - and displays completely different values.

 

What is the improvement you would like to see? 

For Transform Columns in Graph Builder (and other Platforms - like Summary!!!), please add an option to respect the Local Data Filter (option: LDF subset).

 

The issue with "respect the distribution of data points on the x/y axes, X/Y groups, pages etc." can be fixed very easily - one just has to add the respective columns as arguments of the column aggregations.

But - with each change of the graph, the user has to check very carefully if all the necessary columns are in in the JSL code.

Much easier: an option which tells the transform column to split the data similar to what Summary Statistics does (individual subsets for different values on the X axis, X/Y groups, Pages ...). I called it "graph items. (comfort mode  - I am sure there is a word which describes it more precisely

Alternative option:
If a column aggregation is used in a Transform Column of a GraphBuilder it should accept arguments like allPlotGroups, X, X(1), X(5), Group X, GroupX(1), Overlay, Page.

 

Col Number(:height, :sex, "Page") // if a column is used as page, please calculate individual values for the data points on different pages
Col Number(:height, :sex, "GroupY") // if a column is used as GroupY, please calculate individual values for the data points on different GroupsY
Col Number(:height, :sex, "X(1)") // please calculate individual values for different values of the column which is used as X(1)
Col Number(:height, "allPlotGroups") // please calculate individual values for different values of the column which is used as X, GroupX, GroupY, Wrap, Page
Col Number(:height, "GroupX", "GroupY", "Wrap", "Page") // similar to the previous one, but no subgroups for "same X value"

 

"comfort mode" from the context menu and "allPlotGroups" as argument of the column aggregation provide the same functionality as Summary Statistics.

The last example is something I wished for a long time when working with Summary Statistics:

When plotting data vs. time I can use the lot number as overlay, but if a lot is split over several days, it will show up as two lots.
So, here the new feature will provide an even larger flexibility than SummaryStatistics (!): a flexible way to specify how data is aggregated:

- just group by Overlay (lot)

- don't group by X (time) - instead: average over time (via another Transfrom Column)

 

 

hogi_5-1694506499317.png

 

 

Why is this idea important? 

Graph Builder is cool.

Summary Statistics are very useful ... but the functionality of Summary Statistics is limited.

And when falling back to Formula Columns, issues pup up which show how comfortable Summary Statistics can be.

 

For the issue with the Data filter, there is a workaround: Transform Column: Bug with excluded rows 
But it needs some effort and gets much more complicated for Dashboards with more than a single Data Filter.

 

With the new grouping options for column aggregations, Transform Columns will get much more powerful than Summary Statistics.

 

--> With the new feature, users can use Transform Columns with the comfort and flexibility which they got used to from Summary Statistics - and without the limitations.

 

 

 

 

more wishes submitted by  hogi_2-1702196401638.png

3 Comments
Status changed to: Acknowledged
 
hogi
Level XI

Concering: LDF subset

hogi_0-1698502161403.png

 

I asked Jmp support if there is a JSL way to implement this ( TS-00056243):

 

The short answer is there is not a way to have the transform column change with a change to the Local Data Filter [... one of us]  has been working with others in tech support and development on the issue, and no solution is available.


hogi
Level XI

As a motivation:

the discussion Re-evaluate a Formula for a report based on the subset on the data selected in Local Data Filter  by @MrSmith has > 5000 views at the moment:

hogi_0-1698619252925.png

So, kind of a hot topic.