Solved: Re: Back transform of Student t distribution transformation

Mickyboy · Sep 26, 2024 12:18 AM

Hi All,

I had to transform a variable to a student t distribution, nice and easy in JMP using Analyze << Distribution << Selecting your variable then clicking the red arrow and using the continuous fit selection, is the back transformation is just as easy?? Can someone explain the correct menu selections for this to happen.

Thanks

MRB3855 · Sep 30, 2024 2:42 AM

Hi @Mickyboy : OK. I still worry that you don't quite understand what you are doing. But, I'll leave that alone as you don't want to get "fixated on what the transformation is or why".

If you want to back-transform, in general, you will have to do the algebraic manipulations to solve for what you want. And in your case of the t-distribution, that will depend on which function you used. the CDF or the PDF (the PDF is the Density Formula, the CDF is the Distribution Formula):

If PDF then you will have to manually solve the density (link below) for x.

https://en.wikipedia.org/wiki/Student%27s_t-distribution#Location-scale_transformation

https://www.jmp.com/support/help/en/18.0/#page/jmp/statistical-details-for-continuous-fit-distributi...

The problem you will run into is the PDF is not a one-to-one function (https://en.wikipedia.org/wiki/Injective_function) as it is symmetric around the location parameter; that means for a given value of the PDF, there will be two solutions for x. For example, let's suppose location = 100. Then [110-location]^2 = [90-location]^2 = 100. So, the PDF will have the same value for 90 as it does for 110; so, how would you choose between the two when transforming from the PDF back to x?

If CDF you will have to solve the integral (from -infinity to t) of the PDF for t. If this is the case, you can use the "t quantile" function.

t Quantile( p, df ) * scale + location

View solution in original post

txnelson · Sep 30, 2024 05:58 AM

I have calculated back transforms in the past using Successive Approximations using the original transform formula, Here is an example of back transforming the mean of the new distribution back to the original data values

Names Default To Here( 1 );
dt = 
// Open Data Table: Semiconductor Capability.jmp
// → Data Table( "Semiconductor Capability" )
Open( "$SAMPLE_DATA/Semiconductor Capability.jmp" );

theColumn = "NPN1";

Distribution(
	Continuous Distribution(
		Column( as column(theColumn) ),
		Process Capability( Use Column Property Specs ),
		Fit Student's t(save distribution formula)
	),
	SendToReport(
		Dispatch( {"NPN1"}, "Process Capability", OutlineBox, {Close( 1 )} )
	)
);

// Find the back transform of the calculated mean of the transformed distribution
theTargetCol = Column( dt, N Cols( dt ) ) << get name;
theFormula = Char( As Column( dt, theTargetCol ) << get formula );
theTarget = Col Mean( Column( dt, theTargetCol ) );

// Change the reference to the original column used in the transformation to _X_
// so it can be used in the back transform
Substitute Into( theFormula, ":"|| theColumn, "_X_" );
// The formula needs to create a memory variable to allow for comparison.
// Add the resulting variable name to be the resulting variable when the
// formula is executed
theFormula = "theResult = " || theFormula || ";";

// Successive Approximation will be used to get the back transform
// Use the min and max of the original column to start the approximations
theMax = Col Max(as column(theColumn) );
theMin = Col Min( as column(theColumn)  );

// Loop through the approximations for up to 100 loops to find the value
For( i = 1, i <= 100, i++,
	_X_ = Mean( theMax, theMin );
	Eval( Parse( theFormula ) );
	If(
		theResult > theTarget, theMax = _X_,
		theResult < theTarget, theMin = _X_,
		Break()
	);
);

// Display the findings in the log
Show( theTarget, theResult, _X_, i );

Jim

View solution in original post

MRB3855 · Sep 26, 2024 04:31 AM

Hi @Mickyboy : There is no transformation being carried out; the options you've identified ("Analyze << Distribution << Selecting your variable then clicking the red arrow and using the continuous fit selection") is not transforming your data in any way, rather it is just checking to see if the distribution selected (t distribution in your case) is plausible as the parent population for your data. Or am I misunderstanding your question?

Mickyboy · Sep 26, 2024 08:14 PM

Hi MRB3855,

Thanks for reply, you are miss understanding slightly, but it could be the way l am explaining, l am transforming, from the above. the continuous fit selection i get compare distributions, the best fit distribution, and lets say in this case its student t, I can tick normal, go to the fitted normal distributions and click on the red arrow and select save columns and get an easy student t transformation of my variable, nice and easy. l am just wondering if there is an easy way of then back transforming from the student t transformation. Hope this makes more sense, if you are familiar with r, the have a procedure called emmeans that identifies the transformation and can back transform easily.

MRB3855 · Sep 27, 2024 06:17 AM

Hi @Mickyboy : I read up on emmeans and the documentation I found (https://aosmith.rbind.io/2019/03/25/getting-started-with-emmeans/#back-transforming-results) says

" Since I used a log transformation I can express the results as multiplicative differences in medians on the original (data) scale.

We can always back-transform estimates and CI limits by hand, but in emmeans() we can use the type argument for this. Using type = "response" will return results on the original scale. This works when the transformation is explicit in the model (e.g., log(resp)) and works similarly for link functions in generalized linear models."

More here: https://cran.r-project.org/web/packages/emmeans/emmeans.pdf

So...you are using the t distribution function (pdf or cdf ?) as a transformation of "Y" in some linear model? And then you want to back transform the confidence intervals to get back to the native scale? If that is correct, what led you to use that transformation? I ask because it's not clear to me that you are actually doing what you think you are doing. Transformations are usually used so that the normality assumption of traditional linear models is met. Sorry if I'm still misunderstanding...

Mickyboy · Sep 30, 2024 02:25 AM

Hi @MRB3855,

Again, thanks for your reply, don't get fixated on what the transformation is or why, l want to know if there is an easy method to back transform, as easy as the transformation procedure described above, please see below, l used student t because of the below, but it could be any of the transformations you can see below. To transform this variable its nice and easy as per the steps above, so l can transform the variable and do what l need to do, and let's say l wanted the Median, GMT, std error, confidence intervals around the mean and prediction intervals, in theory l could add these to the bottom of the list of transformed values and back transform all if there is an easy way of doing it. Again, this is a general question IF there is a procedure as simple and easy as transforming (again not specific to any transformation) to back transform.

MRB3855 · Sep 30, 2024 2:42 AM

Hi @Mickyboy : OK. I still worry that you don't quite understand what you are doing. But, I'll leave that alone as you don't want to get "fixated on what the transformation is or why".

If you want to back-transform, in general, you will have to do the algebraic manipulations to solve for what you want. And in your case of the t-distribution, that will depend on which function you used. the CDF or the PDF (the PDF is the Density Formula, the CDF is the Distribution Formula):

If PDF then you will have to manually solve the density (link below) for x.

https://en.wikipedia.org/wiki/Student%27s_t-distribution#Location-scale_transformation

https://www.jmp.com/support/help/en/18.0/#page/jmp/statistical-details-for-continuous-fit-distributi...

The problem you will run into is the PDF is not a one-to-one function (https://en.wikipedia.org/wiki/Injective_function) as it is symmetric around the location parameter; that means for a given value of the PDF, there will be two solutions for x. For example, let's suppose location = 100. Then [110-location]^2 = [90-location]^2 = 100. So, the PDF will have the same value for 90 as it does for 110; so, how would you choose between the two when transforming from the PDF back to x?

If CDF you will have to solve the integral (from -infinity to t) of the PDF for t. If this is the case, you can use the "t quantile" function.

t Quantile( p, df ) * scale + location

txnelson · Sep 30, 2024 05:58 AM

I have calculated back transforms in the past using Successive Approximations using the original transform formula, Here is an example of back transforming the mean of the new distribution back to the original data values

Names Default To Here( 1 );
dt = 
// Open Data Table: Semiconductor Capability.jmp
// → Data Table( "Semiconductor Capability" )
Open( "$SAMPLE_DATA/Semiconductor Capability.jmp" );

theColumn = "NPN1";

Distribution(
	Continuous Distribution(
		Column( as column(theColumn) ),
		Process Capability( Use Column Property Specs ),
		Fit Student's t(save distribution formula)
	),
	SendToReport(
		Dispatch( {"NPN1"}, "Process Capability", OutlineBox, {Close( 1 )} )
	)
);

// Find the back transform of the calculated mean of the transformed distribution
theTargetCol = Column( dt, N Cols( dt ) ) << get name;
theFormula = Char( As Column( dt, theTargetCol ) << get formula );
theTarget = Col Mean( Column( dt, theTargetCol ) );

// Change the reference to the original column used in the transformation to _X_
// so it can be used in the back transform
Substitute Into( theFormula, ":"|| theColumn, "_X_" );
// The formula needs to create a memory variable to allow for comparison.
// Add the resulting variable name to be the resulting variable when the
// formula is executed
theFormula = "theResult = " || theFormula || ";";

// Successive Approximation will be used to get the back transform
// Use the min and max of the original column to start the approximations
theMax = Col Max(as column(theColumn) );
theMin = Col Min( as column(theColumn)  );

// Loop through the approximations for up to 100 loops to find the value
For( i = 1, i <= 100, i++,
	_X_ = Mean( theMax, theMin );
	Eval( Parse( theFormula ) );
	If(
		theResult > theTarget, theMax = _X_,
		theResult < theTarget, theMin = _X_,
		Break()
	);
);

// Display the findings in the log
Show( theTarget, theResult, _X_, i );

Jim

Mickyboy · Sep 30, 2024 08:02 PM

Thank you both @txnelson and @MRB3855 ,

@txnelson thanks very much for that script, that was amazing, and l will use it very much in the future.

@MRB3855, the reason l asked that question was because of the output from JMP for compare distributions in my previous post, l haven't heard of a lot of these distribution, Johnson Su, SHASH, Cauchy, Gamma, Weibull and l thought it's one thing to transform your variable to something that approaches a normal distribution, and it's very easy to do in JMP as highlighted above, but how do you back transform. l thought there might be a nice and easy way to back transform that l wasn't aware of, that's all.

Thank you both for taking the time to respond

MRB3855 · Oct 1, 2024 3:19 AM

Hi @Mickyboy : I understand...and apologies if I'm not making myself clear and/or coming across as somehow argumentative; my only intention here is to offer some guidance where it may prove helpful. That said, I'll respond to your above comment "...l thought it's one thing to transform your variable to something that approaches a normal distribution, and it's very easy to do in JMP as highlighted above, but how do you back transform." .

Yes, there are good reasons to transform your variable to something that is approximately normal. Indeed. But, using any of these distribution functions will not, by definition, do that.

In your case, you applied either the PDF or CDF (see my post above) function to your variable (let's call it X).

1. If you used PDF: All this does is create the t distribution curve (the blue line in pic below).So, for each X (x-axis) you will get a point (y-axis) on the blue curve if you apply this function.

2. If you used CDF: All this does is calculate area under the curve (probability) of the PDF. In the example below, the area under the blue line to X=65.104 is 0.439265. So, Prob(X<65.104) = 0.439265 if X has a t-distribution (given your values of Location, Scale, and DF). So, for each X you will get a probability. if you apply this function.

So, whichever function you choose, PDF or CDF, it can't result in a normal distribution (or t-distribution); all it does is assume your data is from a t-distribution and then generate a curve and/or calculate probabilities. In fact, the CDF (no matter what the distribution of X is) has a Uniform Distribution (special case, a = b =1, of the Beta Distribution), but that is another topic for perhaps another day.

https://en.wikipedia.org/wiki/Inverse_transform_sampling

https://math.stackexchange.com/questions/1564584/prove-uniform-distribution#:~:text=Proof%3A%20In%20....