cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
profjmb
Level II

Changing Standard Deviation with Weights in "Distribution" Platform

This general topic has been discussed before here but I am unsatisfied. The JMP expert posted a formula that perhaps explains what's going on. But I have a conceptual question. I would think that weighting is useful to adjust estimates if one wants to weight, say, some subsamples more than others. Like if one had 100 men and 1,000 women, and one wanted to estimate the mean and SD of a sample that had equal numbers of men and women, one could weight observations by 1100/100 and 1100/1000, respectively. 

 

Here's the thing: If I start with any distribution of observations in JMP and change the weight from "1," the mean stays the same (as it should–it shouldn't matter if I weight all observations by 1, 2, 3, or  6,000), that shouldn't change the distribution mean, but the standard deviation DOES change. Why does this make any sense? JMP experts, please don't just provide the formula showing, if it does, that this happens. Why would it make sense conceptually? It seems to me that neither the mean nor SD should change if you weight by 2 (which effectively just doubles your sample size) or 3 (which triples it). Sure it increases sums of squares, but SD is a function of the AVERAGE sum of squares.

 

Thank you.

Mike Bailey

p.s. I attach a file "Weightsdata" you can use to verify if you want. The variable "Weight" has a value of 1, so weighting with it shouldn't change things; "Weight 2" has a value of 2, and "Weight 3" has a value of 3.

 

3 REPLIES 3
MRB3855
Super User

Re: Changing Standard Deviation with Weights in "Distribution" Platform

At the risk of being accused on "just providing the formula", I'll refer you to two references for your consideration.  In the end, it depends on what is meant by "weight". And any conceptual understanding can't, for me anyway, be assessed in isolation from the definition (and consequently, the formula). The first link is the formula used by JMP (guilty as charged). The second link is a more nuanced discussion that I hope illuminates. In the former, it is easy to see why the SD increases (by sqrt(weight)) as the weight increases (as you describe).

Statistical Details for Summary Statistics (jmp.com).

https://stats.stackexchange.com/questions/6534/how-do-i-calculate-a-weighted-standard-deviation-in-e...

 

.    

PMort3
Level I

Re: Changing Standard Deviation with Weights in "Distribution" Platform

I'd like to follow up on Mike's question.  Mike, I share your frustration with the JMP calculation.  

The formula that is described in the JMP documentation is inconsistent with the NIST standard that is sub-referenced in the stackexchange link, see  itl.nist.gov/div898/software/dataplot/refman2/ch2/weightsd.pdf.   The NIST definition normalizes the weight values, giving a consistent stdev.

A work-around to make JMP comply with the NIST standard is to create and use a column of weight values that sum to:

  • (n-1), where n is the number in the sample; or
  • n, where n is the number in a population.

As an example of the problem, run a weighted distribution using Mark's "Y" data, which is ~ normally distributed.

  • Using a weight of 1 for each datum (i.e., the JMP work-around), we get a stdev of about 0.98, which makes sense -- i.e., adding 0.98 to the mean of ~0.02 gives a value of 1.0, which fits in at about the 84th percentile of the quantile distribution (i.e., about + 1 sigma). 
  • If you use any other weight value, JMP will give a misleading value for weighted stdev.   For example, JMP gives a weighted stdev of 1.38 for Mark's data with a weight of 2 for each datum, well above the 90th percentile.   It does not make sense.  To convince yourself, just run the "test std dev" option in the distribution menu -- it will tell you that 1.38 is statistically unlikely (p value ~0.95).

-Paul

MRB3855
Super User

Re: Changing Standard Deviation with Weights in "Distribution" Platform

Hi @PMort3 : I'm sorry, but I don't see how your last bullet is relevant; what does "statistically unlikely" have to do with the definition of weighted SD?  i.e., you say the p-value is about 0.95.  And just what statistical hypothesis is that p-value for?  i.e., what is your null and alternative hypothesis?

 

And...I'm not sure what the "correct" definition is of weighted SD (seems to me that it depends on what you're trying to estimate). And, I'm reminded of a quote from the stackexchange link above that may bear repeating for our consideration: "In light of the added reference (which is not authoritative, but it is a reference) I am removing the downvote. I am not upvoting this answer, though, because calculations show the proposed weighting does not produce an unbiased estimate of anything at all (except when all weights equal 1). The real difficulty here--which is the fault of the question, not the answer--is that it's not clear what this "weighted standard deviation" is attempting to estimate (italics mine). Without a definite estimand, there is no justification to introduce an (M−1)/M factor to "reduce bias" (or for any other reason)."