Hi Hadley,
Thanks for your help. Unfortunately, your suggestion doesn't quite address my specific need.
My Goal: I need to predict outputs with a constraint where the sum equals 1—this is common when modeling population proportions or mixture formulations based on their properties.
My Solution: I've implemented a centered log-ratio (CLR) transformation manually. The workflow is:
- Transform outputs to centered log space
- Fit the model on transformed variables
- Apply the inverse transformation to ensure the constraint is respected
Here's the code for the transformation:
// Adjust zeros
epsilon = 0.0000001;
p1_adj = If(:p1 == 0, epsilon, :p1);
p2_adj = If(:p2 == 0, epsilon, :p2);
pk_adj = If(:pk == 0, epsilon, :pk);
// Scaling
sum_adj = p1_adj + p2_adj + ... + pk_adj;
p1_norm = p1_adj / sum_adj;
p2_norm = p2_adj / sum_adj;
pk_norm = pk_adj / sum_adj;
// Geometric mean
geom_mean = (p1_norm * p2_norm * ... * pk_norm)^(1/k);
// CLR transformation
y1_clr = Log(p1_norm / geom_mean);
y2_clr = Log(p2_norm / geom_mean);
yk_clr = Log(pk_norm / geom_mean);
// Inverse transformation (after prediction)
sum_exp = Exp(:y1_clr_pred) + Exp(:y2_clr_pred) + ... + Exp(:yk_clr_pred);
p1_pred = Exp(:y1_clr_pred) / sum_exp;
p2_pred = Exp(:y2_clr_pred) / sum_exp;
pk_pred = Exp(:yk_clr_pred) / sum_exp;
Additional Request: I'd also like to use the Profiler to display stacked proportions for each output—similar to how Nominal Logistic displays stacked probabilities for each label. Your solution addresses the stacking visualization, but it only predicts the probability of the highest category rather than the exact proportion of each output.
I've attached an example of the manual transformation for reference.
Best regards,