cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
The Discovery Summit 2025 Call for Content is open! Submit an abstract today to present at our premier analytics conference.
Choose Language Hide Translation Bar
View Original Published Thread

"Weight" in "Fit Y by X" (and perhaps generally in JMP platforms)

profjmb
Level II

I have data from several (22?) samples, which vary considerably in sample size. I understand that parameters are estimated better with larger samples, but that is not the issue I want to address here. It is that I would like to both graph distributions, and estimate parameters, as if the sample sizes didn't differ across the groups/samples. This means that large samples will need to be de-weighted and small samples up-weighted.

 

I tried to do this in "Fit Y by X," as follows: I computed a new variable, WEIGHT, as the sample size of the Group divided by the total sample size. So let's say that I have four groups:

 

Group 1: N1=100

Group 2: N1=500

Group 3: N1=200

Group 4: N1=400

Also Groups 1 and 2 are Type 1 and Groups 3 and 4 are Type 2

 

WEIGHT will be:

 

Group 1: 100/1200

Group 2: 500/1200

Group 3: 200/1200

Group 4: 400/1200

 

If I do "Fit Y by X" where Y is the DV and X is "Type" and weight by WEIGHT, then use "Compare Densities" to get a plot of overlapping densities, it looks right. However, if I continue and get the standard deviations for the DV, these are much larger than the standard deviations for any of the groups. This makes me think that I am not understanding what I'm doing. 

 

Help?

3 REPLIES 3
profjmb
Level II


Re: "Weight" in "Fit Y by X" (and perhaps generally in JMP platforms)

l made a mistake in my original post, which I would delete if I knew how. The correct version is below:

 

I have data from several (22?) samples, which vary considerably in sample size. I understand that parameters are estimated better with larger samples, but that is not the issue I want to address here. It is that I would like to both graph distributions, and estimate parameters, as if the sample sizes didn't differ across the groups/samples. This means that large samples will need to be de-weighted and small samples up-weighted.

 

I tried to do this in "Fit Y by X," as follows: I computed a new variable, WEIGHT, as the sample size of the Group divided by the total sample size. So let's say that I have four groups:

 

Group 1: N1=100

Group 2: N1=500

Group 3: N1=200

Group 4: N1=400

Also Groups 1 and 2 are Type 1 and Groups 3 and 4 are Type 2

 

WEIGHT will be:

 

Group 1: 1200/100

Group 2: 1200/500

Group 3: 1200/200

Group 4: 1200/400

 

If I do "Fit Y by X" where Y is the DV and X is "Type" and weight by WEIGHT, then use "Compare Densities" to get a plot of overlapping densities, it looks right. However, if I continue and get the standard deviations for the DV, these are much larger than the standard deviations for any of the groups. This makes me think that I am not understanding what I'm doing. 

 

Help?

Georg
Level VII


Re: "Weight" in "Fit Y by X" (and perhaps generally in JMP platforms)

I think this post may help you:

Solved: Weighted Standard Deviation - JMP User Community

And probably the following script helps to understand what can happen.

 

So in your approach calculation of mean works, whatever method you take (role weight or frequency).

I made a different definition of weight in comparison to yours, I wanted the total sum of weights to be 1200 (you have 4*1200).

This should not matter, and for weight in the role of frequency it does not, but for weight in role weight it does. See script.

Unfortunately I cannot exactly explain why, the small dataset gets a very large stddev in comparison to the total average. Its perhaps due to square and root ...

 

I personally would not use your approach, because it's not clear, what is happening. I would do the group summary, and then the average over groups each weighted 1. And then combine that result into your graph.

 

Names Default To Here( 1 );
// about the role of weight and frequency for calculation of mean and stddev
//
// web("https://www.jmp.com/support/help/en/16.1/?os=win&source=application&utm_source=helpmenu&utm_medium=application#page/jmp/summary-statistics.shtml");
//
nelem_lst = {100, 500, 200, 400};
table_lst = {};
For Each( {value, index}, nelem_lst,
	Eval(
		Eval Expr(
			table_lst[index] = New Table( "Table " || Char( index ),
				add rows( nelem_lst[index] ),
				New Column( "Group", "Character", set each value( "Group " || Char( index ) ) ),
				New Column( "Type", "Character", set each value( If( index <= 2, "Type 1", "Type 2" ) ) ),
				New Column( "DV", "Continuous", formula( Random Normal( Expr( Mod( index, 2 ) ), Expr( Mod( index, 2 ) + 1 ) ) ) )
			)
		)
	);
	Wait( 0.1 );
	table_lst[index]:DV << delete formula;
);
Wait( 0 );
dt = table_lst[1] << concatenate( Table Name( "All" ), table_lst[2 :: 4] );
For Each( {value}, table_lst, Close( value, "NoSave" ) );

Summarize( dt, group_lst = by( :group ) );
ngroups = N Items( group_lst );

dt << New Column( "ColMean[Group]", formula( Col Mean( :DV, :group ) ) );
dt << New Column( "ColStd[Group]", formula( Col Std Dev( :DV, :group ) ) );
Eval( Eval Expr( dt << New Column( "weight[Group]", formula( Col Number( :DV ) / Expr( ngroups ) / Col Number( :DV, :group ) ) ) ) );

nw = New Window( "oneway comparison",
	H List Box(
		Panel Box( "w/o weight",
			dt << Oneway( Y( :DV ), X( :Group ),  Means and Std Dev( 1 ), Mean Error Bars( 1 ), Std Dev Lines( 1 ) );
		),
		Panel Box( "weight in role frequency",
			dt << dt << Oneway(
				Y( :DV ),
				X( :Group ),
				Freq( :"weight[Group]"n ),
				Means and Std Dev( 1 ),
				Mean Error Bars( 1 ),
				Std Dev Lines( 1 )
			);

		),
		Panel Box( "weight in role weight",
			dt << dt << Oneway(
				Y( :DV ),
				X( :Group ),
				Weight( :"weight[Group]"n ),
				Means and Std Dev( 1 ),
				Mean Error Bars( 1 ),
				Std Dev Lines( 1 )
			);

		)
	)
);

nw = New Window( "Tabulate comparison",
	H List Box(
		Panel Box( "w/o weight",
			dt << Tabulate(
				Change Item Label( Grouping Columns( :Type( "All" ), "All" ) ),
				Show Control Panel( 0 ),
				Add Table(
					Column Table( Statistics( N ) ),
					Column Table( Analysis Columns( :"weight[Group]"n ), Statistics( Sum ) ),
					Column Table( Analysis Columns( :DV ), Statistics( Mean ) ),
					Column Table( Statistics( Std Dev ), Analysis Columns( :DV ) ),
					Row Table( Grouping Columns( :Type, :Group ), Add Aggregate Statistics( :Type, :Group ) )
				)
			)
		),
		Panel Box( "weight in role frequency",
			dt << Tabulate(
				Change Item Label( Grouping Columns( :Type( "All" ), "All" ) ),
				Freq( :"weight[Group]"n ),
				Show Control Panel( 0 ),
				Add Table(
					Column Table( Statistics( N ) ),
					Column Table( Analysis Columns( :"weight[Group]"n ), Statistics( Sum ) ),
					Column Table( Analysis Columns( :DV ), Statistics( Mean ) ),
					Column Table( Statistics( Std Dev ), Analysis Columns( :DV ) ),
					Row Table( Grouping Columns( :Type, :Group ), Add Aggregate Statistics( :Type, :Group ) )
				)
			)
		),
		Panel Box( "weight in role weight",
			dt << Tabulate(
				Change Item Label( Grouping Columns( :Type( "All" ), "All" ) ),
				weight( :"weight[Group]"n ),
				Show Control Panel( 0 ),
				Add Table(
					Column Table( Statistics( N ) ),
					Column Table( Analysis Columns( :"weight[Group]"n ), Statistics( Sum ) ),
					Column Table( Analysis Columns( :DV ), Statistics( Mean ) ),
					Column Table( Statistics( Std Dev ), Analysis Columns( :DV ) ),
					Row Table( Grouping Columns( :Type, :Group ), Add Aggregate Statistics( :Type, :Group ) )
				)
			)
		)
	)
);

dt << Summary( Group( :Group, :Type, :"ColMean[Group]"n, :"ColStd[Group]"n, :"weight[Group]"n ), Freq( "None" ), Weight( "None" ) );
Georg


Re: "Weight" in "Fit Y by X" (and perhaps generally in JMP platforms)

I don't think parameter estimation requires equal sample sizes or normalization.