Subscribe Bookmark RSS Feed

How to speed up a "for" loop used in outlier detection ?

myoungers

Community Trekker

Joined:

Mar 20, 2012

I use the following "for" loop to detect and remove outliers from a data set.  I import the column as a matrix (mBase) because this solved other problems and then run it through the loop.  Does anyone have any recommendations to speed this up?  Currently 2000 records take 3-4 seconds and I'm dealing with 60,000 records.

For( i = 1, i < 2000 + 1, i++,

    If(

        mBase > Quantile( 0.75, mBase ) + 3.95 * (Quantile( 0.75, mBase ) -

        Quantile( 0.25, mBase )) | mBase < Quantile( 0.25, mBase ) - 3.95 * (

        Quantile( 0.75, mBase ) - Quantile( 0.25, mBase )),

        mBase = [.]

    )

);

Thanks in advance for the reply(s).

Matt

1 ACCEPTED SOLUTION

Accepted Solutions
Solution

This should be a lot faster:

q75 = quantile(.75, mBase);

q25 = quantile(.25, mBase);

For( i = 1, i < 2000 + 1, i++,

    If( mBase > q75 + 3.95 * (q75 - q25) |

        mBase < q25 - 3.95 * (q75 - q25),

        mBase = [.]

    )

);

3 REPLIES
Solution

This should be a lot faster:

q75 = quantile(.75, mBase);

q25 = quantile(.25, mBase);

For( i = 1, i < 2000 + 1, i++,

    If( mBase > q75 + 3.95 * (q75 - q25) |

        mBase < q25 - 3.95 * (q75 - q25),

        mBase = [.]

    )

);

myoungers

Community Trekker

Joined:

Mar 20, 2012

This is much improved.  Thanks for the help.

XanGregg

Staff

Joined:

Jun 23, 2011

Even faster: operate on the entire matrix at once and avoid the loop altogether.

mBase = mBase :/ ((q25 - 3.95 * (q75 - q25)) <= mBase <= (q75 + 3.95 * (q75 - q25)));

:/ is element-wise matrix division. For example, [10 21] :/ [5 3]  == [2 7]

The above code takes advantage of the fact that dividing by 0 produces a missing value. A more general, and slightly slower approach is to use the Loc() function.

locs = Loc(!((q25 - 3.95 * (q75 - q25)) <= mBase <= (q75 + 3.95 * (q75 - q25))));

mBase[locs] = .;