Wendy,
I am using your example and able to get the Huber distance window up.
I am looking for outliers in column name = num_clk
dt << Distribution( ContinuousDistribution( Column( :num_clk ) ) ),
my_original_median = colquantile( :num_clk, .5 ),
my_original_mean = col mean ( :num_clk ),
:num_clk << Set Selected( 1 ),
obj = dt << Explore Outliers( Robust Fit Outliers ( ) ) << Huber K ( 1 ),
I have two questions after this:
- How can I modify the K values here? It is going by default K=4 for my example. And, I want to explore this with multiple values of K.
- How do I recalculate the mean and median of num_clk (column) after excluding the data points marked as outlier (by Huber above)? I have tried using the following, but I do not how many rows I will be excluding each time:
obj << Automatic Recalc( 1 );
dt << Select Rows( 5 ) << Exclude( 1 );
Any help will be highly appreciated.
Thanks,
Reshmi