I have a discriminant analysis that I'm using to classify some materials. Works pretty well and I'm going to take it to production shortly. Some items of course end up being "edge cases" where slight variations in them can move them from one classification to the next. So for instance Sample A-1 (first sampling of stream A) ends up being classed as a Gamma object, where-as sample A-2 (second sampling of stream A) is classified as a Theta object due to small variations in their nature.
Alternatively, I have cases where the material just doesn't fit great in any of the classes but is assigned to one.
What I'm looking for is the best way to normalize a scoring metric for how well something fits with their assigned class. As far as I can tell the probability function is utilizing a Logit function so it's got too much exponential movement to it's nature to have much value there (basically the A1 and A2 flip above will show 99% to either category based on minute perturbations).
Alternatively I've tried looking at the SQDIST parameter. The problem with this one is that it's not normalized, so you can see very small SQDISTs for one tight cluster and very large SQDISTS for another large cluster. I tried to normalize each SQDIST to the median of the cluster, that helped but still didn't give the result I wanted. Maybe z-score them?
Anyways, just wondering if there was already a good off-the-shelf parameter already in place for this? Here's the details of what I would want
1) normalized metric for how well the data fits it's best bucket
2) normalized metric for how well the data fits it's second best bucket.
3) Have enough linearity in these to be easily interpretable (no exponential decay/growth)
The normalization is really the key. My goal, ultimately, would be to develop a control chart of new data as it is processed by the LDA to catch outliers as they come in, and to gage the overall health of the LDA classes.