Subscribe Bookmark RSS Feed

repetitive analysis

paul_apicella

Community Trekker

Joined:

Feb 11, 2015

Hi everybody,

I want to process data contained in a table according to the example attached. The table consists in 100 columns + one column serving as a control measure, and with 30 to 80 rows.

The aim is to apply a statistical test in an iterative manner starting with control vs column 51, control vs column 52, control vs column 53, etc… in order to detect when a significant difference appears, how long it is present on successive columns and when the significance is lost.

For example, a difference can be detected from column 63 to column 78, and then lost from c79 to c86, and then reappears on c87 to c98.

I don't know how I can write a script which allows an automatic analysis of this type of table, giving a report of the moments at which differences occur.

Any suggestion ?

8 REPLIES
volker_kraft

Staff

Joined:

May 29, 2014

Hi Paul,


here is what I learned from our chat:


Definition of a difference: Wilcoxon test comparing values in the Ctrl column to values in each of the 100 other columns (repeating comparisons).

The example Table is only for giving an idea of the data structure. The behavior described above does not apply to this table.

The aim would be to sort the onset and offset times of a difference for each analyzed table of this kind.


If you could provide another table together with some analysis results you would like to see would help a lot.


BR, Volker

paul_apicella

Community Trekker

Joined:

Feb 11, 2015

Hi Volker and thank you for taking care of my problem,

Here is an already analyzed data table in which the Ctrl / test comparison has evidenced significant differences (assessed with a t test, but a Wilcoxon would be more appropriate).

Each row below gives the comparison between the Ctrl data (control column) and a given test column (numbered from 100 to 200)

Col nbr    P value

...

160      0.01   

161      0,0022

162      ns

163      ns

164      0,03

165      ns

166      0,008

167      0,04

168      0,04

169      ns     

170      0,004 

171      ns   

172      ns 

173      ns 

174      0,04   

175      ns

176      <0,0001

177      0,008

178      ns     

179      0,004 

180      0,02   

181      0,003 

182      0,004

183      0,009

184      0,04

185      0,005

186      ns

187      0,006

188      0,03

189      0,02

190      0,0002

191      ns

192      0,001

etc...

volker_kraft

Staff

Joined:

May 29, 2014

Hi Paul,

thank you, that helped a lot.

It seems that scripting is not mandatory here. You could do the following:

1. Tables > Stack: Get a new data table with data from your control column and all measurement columns stacked. This will create a column Label with your column names, and a second column with your all your data.

The attached file shows my result.

2. Analyse > Fit Y by X, with X=Label and Y=Data > Hotspot > Compare Means > With Control, Dunnett's: Choose your control column and you will get all pairwise tests. You can right-click the table under LSD Threshold Matrix, e.g. to sort it or to make it into a data table (maybe for graphing p-values in Graph Builder).

For Dunnett's test see also Compare Means.

Hope that helps,

Volker

kevin_c_anderso

Community Trekker

Joined:

Jun 5, 2014

Someone should probably mention that the approach y'all are taking might cause a sizable number of dead preeminent statisticians to spin like lathes in their respective graves.

I cannot discern exactly what decision you're trying to make from your description, but it sounds more like changepoint detection to me.  If so, analyzing a number of sequentially-aggregated p-values from any two-sample test could be considered poor form at best.

paul_apicella

Community Trekker

Joined:

Feb 11, 2015

Thanks for your message Kevin,

I don’t want to disturb preeminent statisticians whether they are alive or dead.

I would be pleased to know what is your suggestion.

volker_kraft

Staff

Joined:

May 29, 2014

Thanks Kevin, any suggestion is appreciated.

From my understanding Paul did not choose any test yet, but was more asking for a way how to run a sequence of two-sample tests (one sample always the same) in JMP, given his original data set. My point was that scripting would not be necessary in this case.

Thanks again for your contributions.

kevin_c_anderso

Community Trekker

Joined:

Jun 5, 2014

Hi, Paul!

Without more information, I'm not doing much more than shooting in the dark.

But if, as I suspect, you are searching through sequentially-gathered data in an attempt to discern a change in the data's generative process, a changepoint approach might be more justified.

There are many ways to detect changepoints, and a rich trove of references going back many years.

CRAN has a changepoint package that implements several recently-researched methods in R.  JMP has some neat R integration, in which you can execute R code on JMP datasets and get the results back into JMP.  Try the Pruned Exact Linear Time (PELT) method referenced in R. Killick , P. Fearnhead & I. A. Eckley (2012) Optimal Detection of Changepoints With a Linear Computational Cost, Journal of the American Statistical Association, 107:500, 1590-1598.

paul_apicella

Community Trekker

Joined:

Feb 11, 2015

That’s a constructive way to exchange advice ! Thanks for that Kevin. It is interesting to know that we can go back and forth with R and JMP. I will surely take a closer look to changepoint procedures.