Subscribe Bookmark RSS Feed

Excluding data from formula calculations

JesperJohansen

Community Trekker

Joined:

Apr 15, 2013

I am doing a bunch of calculations using the "Col" type statistical functions (e.g. Col Mean). My problem is that my data set contains some bad data, that I want to exclude from the calculations. By simply selecting the relevant rows and choosing exclude, I apparently do not affect the formula results. So my question is this:

How do I exclude these bad data from the calculations without removing them from the data set entirely?

BR
Jesper
1 ACCEPTED SOLUTION

Accepted Solutions
Solution

Try this construction in a column formula. If included, return mean by excluded state (0/1), else do nothing.

If( !Excluded(), Col Mean( :Column 1, Excluded() ))

5 REPLIES
Solution

Try this construction in a column formula. If included, return mean by excluded state (0/1), else do nothing.

If( !Excluded(), Col Mean( :Column 1, Excluded() ))

JesperJohansen

Community Trekker

Joined:

Apr 15, 2013

Thank you. That in fact solved my problem. But is there no way to do it without incorporating the row-state into the formula? It seems strange to me that I need to tell JMP to exclude already excluded data.

BR
Jesper
ms

Super User

Joined:

Jun 23, 2011

No, not with the pre-evaluated statistics functions (see note on p. 304 in JMP 10 scripting guide).

Summarize() accounts for excluded rows but is less efficient than e.g. col mean, and in my experience when Summarize() is used in column formulas the result is not automatically uppdated if rowstates are changed.

But in principle this formula should give the same result as the trick above (except that that the result will be also in excluded rows), but formulas may have to be forced to reevaluate upon changes in the datatable.

summarize(m=mean(:Column 1)); m


JesperJohansen

Community Trekker

Joined:

Apr 15, 2013

Just so I understand completely what I am doing:

The first "Excluded()" entry ensures that results are only displayed in non-excluded rows, while the second ensures that the mean (in the above case) is calculated for non-excluded and excluded rows separately. If I remove the "If" function but leave in the "Excluded()" clause in the "Col Mean()" function, I will get the same result in the non-excluded rows, but I will get values in the excluded rows as well (here only the mean for excluded data).

Is that correct?

BR
Jesper
ms

Super User

Joined:

Jun 23, 2011

That's correct. The optional second (and third etc..) argument in the col stat functions is reserved for a by-column, i.e. calculates the statistics for each group based on the values in a separate column. But it apparently also works with a binary argument like Excluded(), and not only with table columns.