Discussions

JesperJohansen · May 23, 2013 02:34 AM

I am doing a bunch of calculations using the "Col" type statistical functions (e.g. Col Mean). My problem is that my data set contains some bad data, that I want to exclude from the calculations. By simply selecting the relevant rows and choosing exclude, I apparently do not affect the formula results. So my question is this:

How do I exclude these bad data from the calculations without removing them from the data set entirely?

BR
Jesper

ms · May 23, 2013 03:01 AM

Try this construction in a column formula. If included, return mean by excluded state (0/1), else do nothing.

If( !Excluded(), Col Mean( :Column 1, Excluded() ))

View solution in original post

ms · May 23, 2013 03:01 AM

Try this construction in a column formula. If included, return mean by excluded state (0/1), else do nothing.

If( !Excluded(), Col Mean( :Column 1, Excluded() ))

JesperJohansen · May 23, 2013 04:34 AM

Thank you. That in fact solved my problem. But is there no way to do it without incorporating the row-state into the formula? It seems strange to me that I need to tell JMP to exclude already excluded data.

BR
Jesper

ms · May 23, 2013 04:52 AM

No, not with the pre-evaluated statistics functions (see note on p. 304 in JMP 10 scripting guide).

Summarize() accounts for excluded rows but is less efficient than e.g. col mean, and in my experience when Summarize() is used in column formulas the result is not automatically uppdated if rowstates are changed.

But in principle this formula should give the same result as the trick above (except that that the result will be also in excluded rows), but formulas may have to be forced to reevaluate upon changes in the datatable.

summarize(m=mean(:Column 1)); m

JesperJohansen · May 23, 2013 05:06 AM

Just so I understand completely what I am doing:

The first "Excluded()" entry ensures that results are only displayed in non-excluded rows, while the second ensures that the mean (in the above case) is calculated for non-excluded and excluded rows separately. If I remove the "If" function but leave in the "Excluded()" clause in the "Col Mean()" function, I will get the same result in the non-excluded rows, but I will get values in the excluded rows as well (here only the mean for excluded data).

Is that correct?

BR
Jesper

ms · May 23, 2013 05:14 AM

That's correct. The optional second (and third etc..) argument in the col stat functions is reserved for a by-column, i.e. calculates the statistics for each group based on the values in a separate column. But it apparently also works with a binary argument like Excluded(), and not only with table columns.

Thomas1 · Jan 2, 2019 03:44 PM

Thanks ms. That's the best formula which I found concerning excluded().

Discussions

Excluding data from formula calculations

Re: Excluding data from formula calculations

Re: Excluding data from formula calculations

Re: Excluding data from formula calculations

Re: Excluding data from formula calculations

Re: Excluding data from formula calculations

Re: Excluding data from formula calculations

Re: Excluding data from formula calculations

Recommended Articles