I had conversations with Peter Bartell and he suggested I start a discussion about this concept.

I'm hoping some of the very smart people out there would have seen this result if it has been published. I am preparing a short course on logistic regression. As a mathematician I can easily visualize the concept of minimizing distance between the span of a design matrix and the response vector (vector space approach), which is used in OLS. In logistic regression I was looking for an analogous notion of distance between a vector of probabilities and a response consisting of say True and False...without coding the responses as 0's and 1's. I had a great ahaaa moment and came up with a way to restate the problem. Instead of maximizating a likelihood, I could minimize a distance. But the "span" of the design matrix is now a space of probability mass functions, but not a vector space, and the response is yet another pmf, and this space has a natural metric on it. Interestingly, the pmf that minimizes the distance between the span of the design matrix turns out to be the same as the MLE pmf. I've run the proposition by some excellent mathematicians and the math seems sound and quite generalizable. But they were not statisticians and could not say whether the result is novel. For my part, I've searched and searched the leterature and haven't seen glm/MLE unified in this way.

When we pose the glm/MLE in a metric space setting we now have the additional structure that comes along for the ride. Namely open and closed sets as well as the notion of convergence. In particular, we can now see exactly why, in logistic regression, certain responses yield a solution with unstable coefficients. The problem is that the span of the design matrix is an open set in this metric space and the response is a point on the edge of this set. So although there is a sequence of pmf's that converge to the response pmf, they can't attain it.

I've attached a short expository paper and I'd welcome any feedback.