turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- JMP User Community
- :
- Discussions
- :
- How to speed up a "for" loop used in outlier detec...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Jan 17, 2013 2:07 PM
(773 views)

I use the following "for" loop to detect and remove outliers from a data set. I import the column as a matrix (mBase) because this solved other problems and then run it through the loop. Does anyone have any recommendations to speed this up? Currently 2000 records take 3-4 seconds and I'm dealing with 60,000 records.

For( i = 1, i < 2000 + 1, i++,

If(

mBase* > Quantile( 0.75, mBase ) + 3.95 * (Quantile( 0.75, mBase ) -*

Quantile( 0.25, mBase )) | mBase* < Quantile( 0.25, mBase ) - 3.95 * (*

Quantile( 0.75, mBase ) - Quantile( 0.25, mBase )),

mBase* = [.]*

)

);

Thanks in advance for the reply(s).

Matt

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Solution

This should be a lot faster:

q75 = quantile(.75, mBase);

q25 = quantile(.25, mBase);

For( i = 1, i < 2000 + 1, i++,

If( mBase* > q75 + 3.95 * (q75 - q25) | *

mBase* < q25 - 3.95 * (q75 - q25),*

mBase* = [.]*

)

);

3 REPLIES

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

This should be a lot faster:

q75 = quantile(.75, mBase);

q25 = quantile(.25, mBase);

For( i = 1, i < 2000 + 1, i++,

If( mBase* > q75 + 3.95 * (q75 - q25) | *

mBase* < q25 - 3.95 * (q75 - q25),*

mBase* = [.]*

)

);

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Jan 18, 2013 4:51 AM
(542 views)

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Jan 18, 2013 10:00 AM
(542 views)

Even faster: operate on the entire matrix at once and avoid the loop altogether.

mBase = mBase :/ ((q25 - 3.95 * (q75 - q25)) <= mBase <= (q75 + 3.95 * (q75 - q25)));

:/ is element-wise matrix division. For example, [10 21] :/ [5 3] == [2 7]

The above code takes advantage of the fact that dividing by 0 produces a missing value. A more general, and slightly slower approach is to use the Loc() function.

locs = Loc(!((q25 - 3.95 * (q75 - q25)) <= mBase <= (q75 + 3.95 * (q75 - q25))));

mBase[locs] = .;