- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Calculating a p-value in a script
Hi All,
l know l am doing something wrong here, but l can't see it at the moment, l am wanting the two-tailed p-value for a difference in two means and am using the following script
dt5 << New Column(" t-Value", Numeric, Continuous, Format( "Fixed Dec", 15, 3), Formula( t Quantile (0.95,:"N(Data) VPP"n + :"N(Data) Commercial"n -2)));
dt5 << New Column( "p-Value", Numeric, Continuous, Format( "Fixed Dec", 15, 3), Formula( t Distribution (:"T-Value"n,:"N(Data) VPP"n + :"N(Data) Commercial"n - 1 )));
can anyone please offer some advice.
Thanks,
Mick.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Calculating a p-value in a script
There a couple things here:
- The script is calculating the p-value based on the critical t-value (calculated on the first line) instead of the observed t-value. The latter is the difference in means divided by the standard error of the difference.
- The t Distribution() call is calculating the proportion of the distribution less than the t-value, which isn't quite the p-value you're looking for. Calculating the two-tailed p-value requires some simple arithmetic from here (see example).
If we can assume the data table has at minimum the by-group sample sizes, means, and standard deviations in their own columns, then here's some JSL that'll do what you want. Try it out on the attached example table. Note that I've included an intermediate step of calculating a pooled standard deviation column to improve readability of the formula for the t-value column.
Names Default to Here( 1 );
dt = Data Table( "Example.jmp" );
dt << New Column( "Pooled stdev",
Numeric,
Continuous,
Format( "Fixed Dec", 15, 3 ),
Formula(
Sqrt( ((:Group A n - 1) * :Group A stdev ^ 2 + (:Group B n - 1) * :Group B stdev ^ 2) / (:Group A n + :Group B n - 2) )
)
);
dt << New Column( " t-Value",
Numeric,
Continuous,
Format( "Fixed Dec", 15, 3),
Formula(
(:Group A mean - :Group B mean) / (:Pooled stdev * Sqrt( 1 / :Group A n + 1 / :Group B n) )
)
);
dt << New Column( "p-Value",
Numeric,
Continuous,
Format( "Fixed Dec", 15, 3 ),
Formula(
( 1 - t Distribution( Abs( :"t-Value"n ), :Group A n + :Group B n - 2 ) ) * 2
)
);
JMP Academic Ambassador
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Calculating a p-value in a script
I haven't gone through each equation in full detail so definitely double-check everything yourself too. But with that said, I only caught one thing you'll probably want to change: The formula for the t-statistic takes the absolute value of the difference in means, which means that your t-statistic will always come out positive when it should be negative any time the VPP mean is lower than the Commercial mean. I used the Abs() function in my example only to ensure that the p-value would be calculated correctly regardless of the sign of the t-statistic. You'll probably want to do the same.
JMP Academic Ambassador
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Calculating a p-value in a script
There a couple things here:
- The script is calculating the p-value based on the critical t-value (calculated on the first line) instead of the observed t-value. The latter is the difference in means divided by the standard error of the difference.
- The t Distribution() call is calculating the proportion of the distribution less than the t-value, which isn't quite the p-value you're looking for. Calculating the two-tailed p-value requires some simple arithmetic from here (see example).
If we can assume the data table has at minimum the by-group sample sizes, means, and standard deviations in their own columns, then here's some JSL that'll do what you want. Try it out on the attached example table. Note that I've included an intermediate step of calculating a pooled standard deviation column to improve readability of the formula for the t-value column.
Names Default to Here( 1 );
dt = Data Table( "Example.jmp" );
dt << New Column( "Pooled stdev",
Numeric,
Continuous,
Format( "Fixed Dec", 15, 3 ),
Formula(
Sqrt( ((:Group A n - 1) * :Group A stdev ^ 2 + (:Group B n - 1) * :Group B stdev ^ 2) / (:Group A n + :Group B n - 2) )
)
);
dt << New Column( " t-Value",
Numeric,
Continuous,
Format( "Fixed Dec", 15, 3),
Formula(
(:Group A mean - :Group B mean) / (:Pooled stdev * Sqrt( 1 / :Group A n + 1 / :Group B n) )
)
);
dt << New Column( "p-Value",
Numeric,
Continuous,
Format( "Fixed Dec", 15, 3 ),
Formula(
( 1 - t Distribution( Abs( :"t-Value"n ), :Group A n + :Group B n - 2 ) ) * 2
)
);
JMP Academic Ambassador
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Calculating a p-value in a script
Hi Ross,
Thanks for your reply, how does this look
dt5 << New Column( "Difference in Mean", Numeric, Continuous, Format( "Fixed Dec", 15, 3 ), Formula( :"Mean(Data) VPP"n - :"Mean(Data) Commercial"n ) );
dt5 << New Column( "Pooled Std Dev", Numeric, Continuous, Format( "Fixed Dec", 15, 3 ), Formula( sqrt(((:"N(Data) VPP"n - 1)* :"Std Dev(Data) VPP"n ^ 2 + (:"N(Data) Commercial"n - 1) * :"Std Dev(Data) Commercial"n ^ 2 ) / (:"N(Data) VPP"n + :"N(Data) Commercial"n - 2)))) ;
dt5 << New Column( "Lower 90% Confidence Interval", Numeric, Continuous, Format( "Fixed Dec", 15, 3 ), Formula( (:"Mean(Data) VPP"n - :"Mean(Data) Commercial"n ) - t Quantile (0.95,:"N(Data) VPP"n + :"N(Data) Commercial"n -2) * (:Pooled Std Dev * sqrt(( 1/:"N(Data) VPP"n) + (1/:"N(Data) Commercial"n )) )));
dt5 << New Column( "Upper 90% Confidence Interval", Numeric, Continuous, Format( "Fixed Dec", 15, 3), Formula( (:"Mean(Data) VPP"n - :"Mean(Data) Commercial"n ) + t Quantile (0.95,:"N(Data) VPP"n + :"N(Data) Commercial"n -2) * (:Pooled Std Dev * sqrt(( 1/:"N(Data) VPP"n) + (1/:"N(Data) Commercial"n )) )));
dt5 << New Column( "Pooled Variance", Numeric, Continuous, Format( "Fixed Dec", 15, 3), Formula(:Pooled Std Dev ^ 2));
dt5 << New Column(" Std Error for t-Value", Numeric, Continuous, Format( "Fixed Dec", 15, 3), Formula(sqrt (:Pooled Variance * ( 1/:"N(Data) VPP"n + 1/ :"N(Data) Commercial"n ) )));
dt5 << New Column(" t-stat", Numeric, Continuous, Format( "Fixed Dec", 15, 3), Formula((Abs (:"Mean(Data) VPP"n - :"Mean(Data) Commercial"n - 0)) / :"Std Error for t-Value"n));
dt5 << New Column( "one tail p-Value", Numeric, Continuous, Format( "Fixed Dec", 15, 4), Formula(1 - t Distribution (:"t-stat"n,:"N(Data) VPP"n + :"N(Data) Commercial"n - 2 )));
dt5 << New Column( "two tail p-value", Numeric, Continuous, Format( "Fixed Dec", 12, 4 ),Formula( :"one tail p-Value"n * 2 ));
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Calculating a p-value in a script
I haven't gone through each equation in full detail so definitely double-check everything yourself too. But with that said, I only caught one thing you'll probably want to change: The formula for the t-statistic takes the absolute value of the difference in means, which means that your t-statistic will always come out positive when it should be negative any time the VPP mean is lower than the Commercial mean. I used the Abs() function in my example only to ensure that the p-value would be calculated correctly regardless of the sign of the t-statistic. You'll probably want to do the same.
JMP Academic Ambassador
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Calculating a p-value in a script
Thanks again Ross, much appreciated
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Calculating a p-value in a script
Thanks for the solution.