cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
jthi
Super User

Statistical functions (Col Mean for example) possibly bugged on first row of data table when using << Set Each Value to replace values in column

While trying to get my solution working for Is there a faster way to loop? I came across an issue where first row of my data wasn't looking correct (in fact the issue was for whole first group).

Names Default To Here(1);

dt = New Table("Untitled 5",
	Add Rows(8),
	Compress File When Saved(1),
	New Column("Vals", Numeric, "Continuous", Format("Best", 12), Set Values([1, 2, 3, 4, 1, 2, 3, 4])),
	New Column("Column 1", Numeric, "Continuous", Format("Best", 12), Set Values([1, 2, 3, 4, 1, 2, 3, 4])),
	New Column("Column 1 1", Numeric, "Continuous", Format("Best", 12), Set Values([1, 2, 3, 4, 1, 2, 3, 4])),
	New Column("Column 2", Numeric, "Continuous", Format("Best", 12), Set Values([1, 2, 3, 4, 1, 2, 3, 4])),
	New Column("A", Character, "Nominal", Set Values({"A", "A", "A", "A", "B", "B", "B", "B"})),
	New Column("B", Character, "Nominal", Formula(Col Mean(:Vals, :A)))
	
);

dt << New Column("Column 3", Numeric, Continuous, <<Set Each Value(Col Mean(:Column 1, :A)));
Column(dt, "Column 1") << Set Each Value(Col Mean(:Column 1, :A));

// workaround
dt << Add Row(1, At Start);
Column(dt, "Column 1 1") << Set Each Value(Col Mean(:Column 1 1, :A));
dt << Delete Rows(1);

Show(Col Mean(:Column 2));
Show(Mean(dt[0, "Column 2"]), Mean(dt[[2 3 4], "Column 2"]));
Show(:Column 1 << get values);
Show(:Column 3 << get values);
Show(Mean([1 2 3 4]));
Show(Mean([2.5 2 3 4]));

Show(Mean([2.5 2.875 2.875 2.875]));

jthi_1-1678901605470.png

To me it looks like Col Mean is being calculated to the first row and then mean for rest of the group one is calculated using this already replaced value. There is a workaround where you can add empty row as first in data table -> use << set each value -> remove that first row. 

 

I have already opened a ticket regarding this issue (TS-00036082) and JMP support has submitted bug report to development team, but I wanted to make it more visible. I might have to check some of my older script that there aren't any places where I use << Set Each Value() to replace existing values (usually I use it during column creation) when using statistical col functions.

-Jarmo
1 ACCEPTED SOLUTION

Accepted Solutions
jthi
Super User

Re: Statistical functions (Col Mean for example) possibly bugged on first row of data table when using << Set Each Value to replace values in column

It has (TS-00036082) and support forwarded it to JMP development. Purpose of this post was just to make this a bit more visible.

 

Edit: As I accepted this as a solution, I will add one possible workaround which seemed to work: add extra row to your datatable as first row,  use << set each value to update values and then remove that extra empty row

// workaround
dt << Add Row(1, At Start);
Column(dt, "Column 1 1") << Set Each Value(Col Mean(:Column 1 1, :A));
dt << Delete Rows(1);

The bug was present for me for JMP16.2 and JMP17.0

-Jarmo

View solution in original post

5 REPLIES 5
txnelson
Super User

Re: Statistical functions (Col Mean for example) possibly bugged on first row of data table when using << Set Each Value to replace values in column

This needs to go to JMP Tech Support

Jim
hogi
Level XII

Re: Statistical functions (Col Mean for example) possibly bugged on first row of data table when using << Set Each Value to replace values in column

I am a bit anxious when using the same column at the same time as an input and output of a calculation.
Extreme case: If Col ... Aggregation just acted row by row like Lag(), the output would be like a caterpillar moving along the rows, row by row changing the aggregation output.

 

Concerning speed, it's cool that Col ... aggregation does it much better.

 

Concerning prevention of the issue: clever idea to add a spare  temporary row. Alternatively, one could use a temporary column to store the values, right?

 

Concerning: Why?
[edited after the input of the slow version]
hm ...

Adding a print function to the GroupBy argument helps to see what's going on ...


Indeed, the Value and GroupBy functions are evaluated twice.
First to get values to compare with the GroupBy argument of row 1, then again after writing the result to row 1.

 

newCol=New Column("new");
newCol << Set Each Value(Col Mean(Print(row());:Vals,
write("\!n groupby row # ",row()," ",newCol[row()],"\!n");:A));


a line with just a number: which value function was evaluated ?
a line with groupby row # ... : which GroupBy function was evaluated?

 

1
groupby row # 1 .
2
groupby row # 2 .
3
groupby row # 3 .
4
groupby row # 4 .
5
groupby row # 5 .
6
groupby row # 6 .
7
groupby row # 7 .
8
groupby row # 8 .

groupby row # 1 .
1
groupby row # 1 2.5
2
groupby row # 2 .
3
groupby row # 3 .
4
groupby row # 4 .
5
groupby row # 5 .
6
groupby row # 6 .
7
groupby row # 7 .
8
groupby row # 8 .

groupby row # 2 .

groupby row # 3 .

groupby row # 4 .

groupby row # 5 .

groupby row # 6 .

groupby row # 7 .

groupby row # 8 .

 

hogi
Level XII

Re: Statistical functions (Col Mean for example) possibly bugged on first row of data table when using << Set Each Value to replace values in column

After adding some additional formatting to the value printout, Col ... aggregation switches to the very slow mode which is already known from here: 
How-do-I-use-the-Col-Maximum-Formula-with-a-quot-where-quot/m-p/578576/highlight/true#M78599 

 

new column("Vals", set each value(random uniform()));
new column("groupBy", set each value(random integer(4)));
newCol=New Column("new");
newCol << Set Each Value(Col Mean(Print(row());:Vals, write("\!n groupby row # ",row()," ",newCol[row()],"\!n");:groupBy));

 

Seems that for each single row

- all rows get evaluated : first the value and then the GroupBy argument

- then the GroupBy argument of the specific row is evaluated (again)

and then the matching values are collected and aggregated.

Then it starts with the next row:

- all rows get evaluated (again): first the value and then the GroupBy argument

- then the GroupBy argument of the next row is evaluated (again)

and so forth ....

This explains why the execution scales quadratically with the length of the rows, [compare How-do-I-use-the-Col-Maximum-Formula .... ]

 

Funny: Additional evaluations are different from the moon - not there if you don't watch
Perhaps somebody from Jmp could give a more detailed view ...


full printout of the slow version:

View more...
-value, row # 1
groupby row # 1 .
-value, row # 2
groupby row # 2 .
-value, row # 3
groupby row # 3 .
-value, row # 4
groupby row # 4 .
-value, row # 5
groupby row # 5 .
-value, row # 6
groupby row # 6 .
-value, row # 7
groupby row # 7 .
-value, row # 8
groupby row # 8 .
groupby row # 1 .
-value, row # 1
groupby row # 1 2.5
-value, row # 2
groupby row # 2 .
-value, row # 3
groupby row # 3 .
-value, row # 4
groupby row # 4 .
-value, row # 5
groupby row # 5 .
-value, row # 6
groupby row # 6 .
-value, row # 7
groupby row # 7 .
-value, row # 8
groupby row # 8 .
groupby row # 2 .
-value, row # 1
groupby row # 1 2.5
-value, row # 2
groupby row # 2 2.5
-value, row # 3
groupby row # 3 .
-value, row # 4
groupby row # 4 .
-value, row # 5
groupby row # 5 .
-value, row # 6
groupby row # 6 .
-value, row # 7
groupby row # 7 .
-value, row # 8
groupby row # 8 .
groupby row # 3 .
-value, row # 1
groupby row # 1 2.5
-value, row # 2
groupby row # 2 2.5
-value, row # 3
groupby row # 3 2.5
-value, row # 4
groupby row # 4 .
-value, row # 5
groupby row # 5 .
-value, row # 6
groupby row # 6 .
-value, row # 7
groupby row # 7 .
-value, row # 8
groupby row # 8 .
groupby row # 4 .
-value, row # 1
groupby row # 1 2.5
-value, row # 2
groupby row # 2 2.5
-value, row # 3
groupby row # 3 2.5
-value, row # 4
groupby row # 4 2.5
-value, row # 5
groupby row # 5 .
-value, row # 6
groupby row # 6 .
-value, row # 7
groupby row # 7 .
-value, row # 8
groupby row # 8 .
groupby row # 5 .
-value, row # 1
groupby row # 1 2.5
-value, row # 2
groupby row # 2 2.5
-value, row # 3
groupby row # 3 2.5
-value, row # 4
groupby row # 4 2.5
-value, row # 5
groupby row # 5 2.5
-value, row # 6
groupby row # 6 .
-value, row # 7
groupby row # 7 .
-value, row # 8
groupby row # 8 .
groupby row # 6 .
-value, row # 1
groupby row # 1 2.5
-value, row # 2
groupby row # 2 2.5
-value, row # 3
groupby row # 3 2.5
-value, row # 4
groupby row # 4 2.5
-value, row # 5
groupby row # 5 2.5
-value, row # 6
groupby row # 6 2.5
-value, row # 7
groupby row # 7 .
-value, row # 8
groupby row # 8 .
groupby row # 7 .
-value, row # 1
groupby row # 1 2.5
-value, row # 2
groupby row # 2 2.5
-value, row # 3
groupby row # 3 2.5
-value, row # 4
groupby row # 4 2.5
-value, row # 5
groupby row # 5 2.5
-value, row # 6
groupby row # 6 2.5
-value, row # 7
groupby row # 7 2.5
-value, row # 8
groupby row # 8 .
groupby row # 8 .

 

hogi
Level XII

Re: Statistical functions (Col Mean for example) possibly bugged on first row of data table when using << Set Each Value to replace values in column


@hogi wrote:

After adding some additional formatting to the value printout, Col ... aggregation switches to the very slow mode which is already known from here: 
How-do-I-use-the-Col-Maximum-Formula-with-a-quot-where-quot/m-p/578576/highlight/true#M78599 


 

I just checked the behavior with Jmp 17.2 and later versions (<=18.1.1). The issue is gone:

dt = Open( "$SAMPLE_DATA/Big Class.jmp" );
dt << Clear Select << Select Rows( Index( 5, 40 ) ) <<Delete Rows;
newCol=New Column("new");
newCol << Set Each Value(Col Mean(Print(row());:height, write("\!n groupby row # ",row()," ",newCol[row()],"\!n");:age));

gives:

hogi_0-1734384257364.png

 

so, the value and GroupBy values are calculated for every row - then the GroupBy values are calculated again for every row.
That's it.

 

jthi
Super User

Re: Statistical functions (Col Mean for example) possibly bugged on first row of data table when using << Set Each Value to replace values in column

It has (TS-00036082) and support forwarded it to JMP development. Purpose of this post was just to make this a bit more visible.

 

Edit: As I accepted this as a solution, I will add one possible workaround which seemed to work: add extra row to your datatable as first row,  use << set each value to update values and then remove that extra empty row

// workaround
dt << Add Row(1, At Start);
Column(dt, "Column 1 1") << Set Each Value(Col Mean(:Column 1 1, :A));
dt << Delete Rows(1);

The bug was present for me for JMP16.2 and JMP17.0

-Jarmo