Choose Language Hide Translation Bar
Highlighted
Level II

## How to speed up a "for" loop used in outlier detection ?

I use the following "for" loop to detect and remove outliers from a data set.  I import the column as a matrix (mBase) because this solved other problems and then run it through the loop.  Does anyone have any recommendations to speed this up?  Currently 2000 records take 3-4 seconds and I'm dealing with 60,000 records.

For( i = 1, i < 2000 + 1, i++,

If(

mBase > Quantile( 0.75, mBase ) + 3.95 * (Quantile( 0.75, mBase ) -

Quantile( 0.25, mBase )) | mBase < Quantile( 0.25, mBase ) - 3.95 * (

Quantile( 0.75, mBase ) - Quantile( 0.25, mBase )),

mBase = [.]

)

);

Matt

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted
Super User

## Re: How to speed up a "for" loop used in outlier detection ?

This should be a lot faster:

q75 = quantile(.75, mBase);

q25 = quantile(.25, mBase);

For( i = 1, i < 2000 + 1, i++,

If( mBase > q75 + 3.95 * (q75 - q25) |

mBase < q25 - 3.95 * (q75 - q25),

mBase = [.]

)

);

3 REPLIES 3
Highlighted
Super User

## Re: How to speed up a "for" loop used in outlier detection ?

This should be a lot faster:

q75 = quantile(.75, mBase);

q25 = quantile(.25, mBase);

For( i = 1, i < 2000 + 1, i++,

If( mBase > q75 + 3.95 * (q75 - q25) |

mBase < q25 - 3.95 * (q75 - q25),

mBase = [.]

)

);

Highlighted
Level II

## Re: How to speed up a "for" loop used in outlier detection ?

This is much improved.  Thanks for the help.

Highlighted
Staff

## Re: How to speed up a "for" loop used in outlier detection ?

Even faster: operate on the entire matrix at once and avoid the loop altogether.

mBase = mBase :/ ((q25 - 3.95 * (q75 - q25)) <= mBase <= (q75 + 3.95 * (q75 - q25)));

:/ is element-wise matrix division. For example, [10 21] :/ [5 3]  == [2 7]

The above code takes advantage of the fact that dividing by 0 produces a missing value. A more general, and slightly slower approach is to use the Loc() function.

locs = Loc(!((q25 - 3.95 * (q75 - q25)) <= mBase <= (q75 + 3.95 * (q75 - q25))));

mBase[locs] = .;

Article Labels

There are no labels assigned to this post.