Subscribe Bookmark RSS Feed

## rank transformation

Hello,
i want to compare the degree of defoliation of different station. so i sampled three trees per station and for each tree i sampled at the bottom, at the middle and at the top of the crown. i wanted to used a two-way ANOVA to see whether there is a difference between station and between the level in the tree. but my problem is that the variances are highly unequal.
so i was wondering how i could do a rank transformation on my data using jmp? then i would do a regular two-way ANOVA.

5 REPLIES

Community Trekker

Joined:

Jun 23, 2011

I'd sort your data by defoliation, then create a new column named "rank", and apply a formula to this column--the row function "sequence" will number the rows in this column from 1 to whatever you specify, adn because they are sorted by defoliation, will give you ranks.

This won't serve you if you have lots of tied observations, however.
My first thought would be that if you could find a transformation of your data that would normalise it you'd make life a lot easier for yourself, not least because you'd still be able to answer questions about the data (like how much more defoliated the trees are at station A then station B) that you'll have difficulty answering if you start taking the ranks of it. As a separate point though, it seems to me that if you feed the ranks into a regular two-way ANOVA, you're still not really retaining the assumption of independent Normally-distributed residuals with constant variance: you're just making it less easy to demonstrate that they're not (and actually you know they're not if you're analysing ranks).

One transformation you might consider for your data is an arcsine square root transformation, which is the one usually recommended when your data is a proportion (see for example http://udel.edu/~mcdonald/stattransform.html ). This one goes some way towards normalising variances towards the extreme ends of the scale (i.e. 0% and 100%), though even there there's not much you can do to normalise the very extreme ends.

If your data includes large numbers of instances of almost complete defoliation or almost no defoliation at all, you might consider analysing a binary variable (i.e. "defoliated" / "not defoliated") or an ordinal variable (1="not defoliated", 2="partially defoliated", 3="wholly defoliated") using a logistic regression (see for example http://udel.edu/~mcdonald/statlogistic.html ), which would still enable you to fit explanatory variables to your data like station, type of tree, age of tree, height of tree etc. Logistic regression is provided within the "Fit Model" platform: just change the personality to whatever type of logistic regression you intend to perform, and make sure that the modelling type of your response variable is set to "Nominal" or "Ordinal" (otherwise the two logistic regression options will be greyed out).
hey,
thanks for your help, i tryed the acrsin square root transformation but the variances are still unequal.
and if i used a logistic regression i wouldn't be able to compare the degree of defoliation between the different level of the tree, right?

thanks again

Community Trekker

Joined:

Jun 23, 2011

First question...is the measurement system adequate? How do you measure defoliation? Is the data categorized?

Second question (set). What do you want to see? Do you want to find out if the variation is different within tree or tree-to-tree or between station? Use control charts and variability charts to see the differences.
the defoliation was measured by sampling leaves and estimating the defoliation per leave, then i calculated the average defoliation per leave
and i want to see if where the defoliation is the most important ( top, middle bottom of the crown) and i also want to see which stations are significantly more defoliated then the other