Transform Columns - as comfortable as Summary Statistics?

hogi · ‎09-12-2023

☑ cool new feature
☑ could help many users!

☑ removes (what fells like) a „bug“

☐ nice to have

☐ nobody needs it

#myTop10_2023

What inspired this wish list request?

The Summary Statistics in GraphBuilder are very comfortable, they automatically react on changes of the Data Filter and the calculated values respect the grouping of data points on the x axes, group X/Y, wrap, Overlay and pages.

old part about smoother, where summary statistics is available now:

Unfortunately, Summary Statistics are just available for Point, Line, Bar and Heatmap plots, but e.g. not for for parallel or smoother plots.
This is why sometimes it's NOT possible to get a smoother and point/line plot aligned - by automatic means of Graph Builder, like in this plot from Graph Builder: Summary Statistics for Smoother

To fix the issue, one has to do the aggregation "manually" by a Transform Column.

Unfortunately, Summary Statistics are just available for Point, Line, Bar and Heatmap plots, but e.g. not for for parallel or smoother plots.This is why sometimes it's NOT possible to get a smoother and point/line plot aligned - by automatic means of Graph Builder, like in this plot from To fix the issue, one has to do the aggregation "manually" by a Transform Column.

There are many reasons why users add Transform Columns instead of using the Summary Statistics of Graph Builder.

- limited number of aggregation option which are available via Summary Statistics ( Add CDFs to Graph Builder , Rank is also missing )

- user has to adjust the statistics (will be used more frequently, once new JSL function: Col N Categories is available).

The issue of Transform Columns: they are much less flexible than Summary Statistics:

They ignore the Local Data Filter
View more...
```
Names Default to Here(1);
dt = Open( "$SAMPLE_DATA/Big Class.jmp" );
gb = dt << Graph Builder(
	Transform Column( "Mean[weight]", Formula( Col Mean( :weight ) ) ),
	Size( 480, 463 ),
	Show Control Panel( 0 ),
	Set α Level( 0.01 ),
	Summary Statistic( "Median" ),
	Graph Spacing( 4 ),
	Variables(
		X( :age ),
		Y( :"Mean[weight]"n ),
		Y( :weight, Position( 1 ) ),
		Group Y( :sex ),
		Overlay( :sex )
	),
	Elements(
		Points( X, Y( 2 ), Legend( 1 ), Summary Statistic( "Mean" ) ),
		Points( X, Y( 1 ), Legend( 3 ), Jitter( "None" ) )
	),
	Local Data Filter(
		Add Filter(
			columns( :name ),
			Where(
				:name == {"ALFRED", "ALICE", "AMY", "BARBARA", "CAROL",
				"CHRIS", "CLAY", "DANNY", "DAVID", "EDWARD", "ELIZABETH",
				"FREDERICK", "JACLYN", "JAMES", "JANE", "JEFFREY", "JOE",
				"JOHN", "JUDY", "KIRK", "LAWRENCE", "LESLIE", "LILLIE",
				"LINDA", "MARION", "MARK", "MARY", "MICHAEL", "PHILLIP",
				"ROBERT", "SUSAN", "WILLIAM"}
			),
			Display( :name, N Items( 15 ), Find( Set Text( "" ) ) )
		)
	),
	SendToReport(
		Dispatch(
			{},
			"400",
			ScaleBox,
			{Legend Model(
				3,
				Base( 0, 0, 0, Item ID( "F", 1 ) ),
				Base( 1, 0, 0, Item ID( "M", 1 ) ),
				Properties(
					0,
					{Marker( "Triangle" ), Marker Size( 3 )},
					Item ID( "F", 1 )
				),
				Properties(
					1,
					{Marker( "Triangle" ), Marker Size( 3 )},
					Item ID( "M", 1 )
				)
			)}
		)
	)
)
```
solid points: mean via Summary Statistics -> different values for different ages, sexes + respects the local data filter
open triangles: mean via Transform Column -> all values are the same, LDF is ignored
One could argue: the triangles display exactly what the Transform Column is told to calculate. No need to worry.
Yes, sure ... if there wasn't the Summary Statistics!
The Summary Statistics got the same task: "mean(weight)" - and displays completely different values.
They ignore the distribution of data points on the x/y axes, X/Y groups,wrap, overlay and pages.

In Principle, this issue can be fixed very easily - one just has to add the respective columns as arguments of the column aggregations.
But - with each change of the graph, the user has to check very carefully if all the necessary columns are in in the JSL code.
-> argh!

What is the improvement you would like to see?

For Transform Columns in Graph Builder (and other Platforms - like Summary!!!), please add an option to respect the Local Data Filter (option: LDF subset).
an option which tells the transform column to split the data similar to what Summary Statistics does (individual subsets for different values on the X axis, X/Y groups, Pages ...). I called it "graph items. (comfort mode - I am sure there is a word which describes it more precisely

Alternative option:
If a column aggregation is used in a Transform Column of a GraphBuilder it should accept arguments like allPlotGroups, X, X(1), X(5), Group X, GroupX(1), Overlay, Page.
```
Col Number(:height, :sex, "Page") // if a column is used as page, please calculate individual values for the data points on different pages
Col Number(:height, :sex, "GroupY") // if a column is used as GroupY, please calculate individual values for the data points on different GroupsY
Col Number(:height, :sex, "X(1)") // please calculate individual values for different values of the column which is used as X(1)
Col Number(:height, "allPlotGroups") // please calculate individual values for different values of the column which is used as X, GroupX, GroupY, Wrap, Page
Col Number(:height, "GroupX", "GroupY", "Wrap", "Page") // similar to the previous one, but no subgroups for "same X value"
```
"comfort mode" from the context menu and "allPlotGroups" as argument of the column aggregation provide the same functionality as Summary Statistics.
The last example is something I wished for a long time when working with Summary Statistics:
When plotting data vs. time I can use the lot number as overlay, but if a lot is split over several days, it will show up as two lots.
So, here the new feature will provide an even larger flexibility than SummaryStatistics (!): a flexible way to specify how data is aggregated:
- just group by Overlay (lot)
- don't group by X (time) - instead: average over time (via another Transfrom Column)

Why is this idea important?

Graph Builder is cool.

Summary Statistics are very useful - but the functionality of Summary Statistics is limited. (I don't know why the developers restricted the summary statistics to such a tiny subset of what is available e.g. via Header statistics ?!)

And when falling back to Formula Columns, issues pop up which show how comfortable Summary Statistics can be.

With the new feature, users can use Transform Columns with the comfort and flexibility which they got used to from Summary Statistics - and without the limitations.

more wishes submitted by

Sarah-Sylvestre · ‎09-18-2023

hogi · ‎10-28-2023

Concering: LDF subset

I asked Jmp support if there is a JSL way to implement this ( TS-00056243):

The short answer is there is not a way to have the transform column change with a change to the Local Data Filter [... one of us] has been working with others in tech support and development on the issue, and no solution is available.

hogi · ‎10-29-2023

As a motivation:

the discussion Re-evaluate a Formula for a report based on the subset on the data selected in Local Data Filter by @MrSmith has > 5000 views at the moment:

So, kind of a hot topic.

hogi · ‎09-10-2024

-1 argument:
smoother got summary statistics: