Hi
I'm trying to find a correlation between some X predictors and a continuous response variable strongly skewed on the right.
The database contains 140 instances only.
First of all I transformed the response variable by using the function lnx+1.
Then I normalized all the x in the range 0 and 1, reduced the dimensionality with the screening option and then removed the correlated predictors X. At the end of all this process I had 4 predictors only. By running a Random forest without any weigth and the R2 on training data was very low (see first plot below). Then I used a like parabolic weight function (see second plot) e re-ran the Random forest. The result was better as showed in the last plot. Looks like there was a rotation of data around the center of plot. Some one can explain why?
How the weight function should be choosen?