I use the following "for" loop to detect and remove outliers from a data set. I import the column as a matrix (mBase) because this solved other problems and then run it through the loop. Does anyone have any recommendations to speed this up? Currently 2000 records take 3-4 seconds and I'm dealing with 60,000 records.
For( i = 1, i < 2000 + 1, i++,
mBase > Quantile( 0.75, mBase ) + 3.95 * (Quantile( 0.75, mBase ) -
Quantile( 0.25, mBase )) | mBase < Quantile( 0.25, mBase ) - 3.95 * (
Quantile( 0.75, mBase ) - Quantile( 0.25, mBase )),
mBase = [.]
Thanks in advance for the reply(s).
Even faster: operate on the entire matrix at once and avoid the loop altogether.
mBase = mBase :/ ((q25 - 3.95 * (q75 - q25)) <= mBase <= (q75 + 3.95 * (q75 - q25)));
:/ is element-wise matrix division. For example, [10 21] :/ [5 3] == [2 7]
The above code takes advantage of the fact that dividing by 0 produces a missing value. A more general, and slightly slower approach is to use the Loc() function.
locs = Loc(!((q25 - 3.95 * (q75 - q25)) <= mBase <= (q75 + 3.95 * (q75 - q25))));
mBase[locs] = .;