Essentially what we are doing is accounting for over-estimations of the median/MAD. In our example data set, using the median/MAD in the winzorization process adjusts the outlier point too much.
x_orig = [4.5, 4.9, 5.6, 4.2, 6.2, 5.2, 9.9];
median and mad estimates:
mean = 5.2; and sigma = 1.05;
using these estimates changes the data to:
x_firstpass = [4.5, 4.9, 5.6, 4.2, 6.2, 5.2, 6.775];
This data gives us a better approximate of our mean and sigma:
adjusted mean = 5.34; sigma = 1.04;
If we use the better estimates to winzorize the data set it would would lead to:
x_secondpass = [4.5, 4.9, 5.6, 4.2, 6.2, 5.2, 6.905];
However, if we use x instead of x_orig, the code won't find the outlier point 9.9 to adjust it accordingly, b/c 6.775 isn't an outlier when the mean = 5.34 and sigma = 1.04.
thus:
x_secondpass = [4.5, 4.9, 5.6, 4.2, 6.2, 5.2, 6.775];
mean and sigma will be the same:
adjusted mean = 5.34; sigma = 1.04;
sigma - sigma_old = 0
and our convergence will believe it's completed.
tl:dr - If you are trying to find the optimum zero point your code needs to be robust enough to run both directions.