Patient Rule Induction Method (PRIM)

paulp · ‎03-27-2023

What inspired this wish list request?

Data mining "bump hunting" algorithm that does not require an empirical model. Rather, it 'patiently' / incrementally searches hyper-boxes of input factors that maximize the response.

What is the improvement you would like to see?

From R:

library(prim)
library(MASS)
data(Boston)
x \<- Boston\[,5:6\]
y \<- Boston\[,1\]
boston.prim \<- prim.box(x=x, y=y, threshold.type=1)

Why is this idea important?

paulp · ‎03-27-2023

This is a 'bump hunting' algorithm. This idea can be important because the method does not assume empirical models relating the responses to the input variables. It searches what actually happened and finds the smallest combined range of input values that optimized the response. Such answers can be an easier sell to the intended audience because the optimal conditions are not projected to occur, they actually did occur. In this way, it is similar to a decision tree. It is different from a decision tree in that it is less greedy - thus the 'patient' description in its name. IT is also different because it doesn't product a model or rules. The output is only the best box/range of inputs, the next best box, etc.

The method is being written up in industrial statistics journals as a more robust optimization method when applied to industrial settings that have many sources of variation. It's hard to get the modeling of all these sources accurate. The PRIM algorithm can be more robust because it doesn't not model the conditions, it merely searches them.