cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar

Isolation forest and isolation trees

Isolation Forest has been emerging as arguably one of the most popular anomaly detectors in recent years due to its general effectiveness across different benchmarks and strong scalability. It is computationally efficient and has been proven to be very effective in anomaly detection.

 

FN_0-1678345243246.png

Isolation forest compared to others (src)


The algorithm isolates observations by randomly selecting a feature and then randomly selecting a split value between the maximum and minimum values of the selected feature. The implementation is therefore the same as partition trees and bootstrap trees with the difference that the target function should be uniform (trying to split variables with a constant Y value).

Yet, if you try this in JMP, it doesn’t work.

By including Isolation Forest in JMP, similar tu predictor screening, users would have access to a powerful tool for detecting anomalies in their data. This could help them identify unusual patterns or behaviors that may warrant further investigation.

 

This paper studies how IForest works and improves upon its few limitations (i.e., extended isolation forest)

https://hal.science/hal-03537102/document

 

Scikit-learn documentation

https://scikit-learn.org/stable/modules/outlier_detection.html

 

 

4 Comments
Victor_G
Super User

Great suggestion @FN.

 

In addition to Isolation Forest, 

it might be also interesting to consider the Extended Isolation Forest instead of the "regular" Isolation Forest algorithm, as it provides more flexibility in the directions of decision boundaries (not only horizontal or vertical).

Here is an article explaining how Extended Isolation Forest work and how to use it : https://link.medium.com/TL1Lxirf9pb

FN
Level VI

Extended trees or forests will be even better, indeed. 
Following your link, the original paper for extended is here: https://arxiv.org/pdf/1811.02141.pdf




mia_stephens
Staff
Status changed to: Investigating

Thank you for this request @FN . We are currently investigating. 

FN
Level VI

Thanks, Mia. Notice that the implementation of an isolation tree is the same as a partition tree. 


The "only" difference is that the partition happens randomly.

 

In JMP, if you introduce a Y without any variability (all 0s, for example), JMP doesn't perform any split (something expected, otherwise solutions will be stochastic at a certain point).

 

In the literature, this is known as Extratrees.

https://en.wikipedia.org/wiki/Random_forest#ExtraTrees