BookmarkSubscribeSubscribe to RSS Feed

simple random samples of columns

Highlighted
abmayfield

Contributor

Joined:

Jun 18, 2018

Hello, 

    I know how to use JMP to create a SRS using rows but not with columns. I have 118,000 columns as part of a transcriptome, and I want to calculate the standard deviation across all genes (i.e., across rows). Doing so will cause JMP to crash on a Macbook Air with 8 GB of RAM. Therefore, I am wondering if I can just pick groups of 5,000 or 10,000 genes and calculate a standard deviation across them. Then, I can take the mean standard deviation of several SRSs (each column is a unique gene.). If I transpose the table to where the genes are in the rows, the transposition step itself will cause JMP to crash! Any ideas?

 

PS: I would attach the data file, but it's 80 MB!

Anderson B. Mayfield
1 ACCEPTED SOLUTION

Accepted Solutions
uday_guntupalli

Community Trekker

Joined:

Sep 15, 2014

Solution

@abmayfield
         If I understand what you are after, you want the ability to randomly select "n" columns in a data table ? Try to look at this: 

dt = Open( "$SAMPLE_DATA/Airline Delays.jmp" );

nColumns = n Cols(dt); 

nSampleCols = 2; SelCols = As List(Random Index(nColumns,nSampleCols)); dt << Select columns(SelCols);

I think this allows to randomly sample "n"  columns from the data table. Now you can use the random sample to do what you would like, please let me know if this is what you are after. 

Best
Uday
6 REPLIES
uday_guntupalli

Community Trekker

Joined:

Sep 15, 2014

Solution

@abmayfield
         If I understand what you are after, you want the ability to randomly select "n" columns in a data table ? Try to look at this: 

dt = Open( "$SAMPLE_DATA/Airline Delays.jmp" );

nColumns = n Cols(dt); 

nSampleCols = 2; SelCols = As List(Random Index(nColumns,nSampleCols)); dt << Select columns(SelCols);

I think this allows to randomly sample "n"  columns from the data table. Now you can use the random sample to do what you would like, please let me know if this is what you are after. 

Best
Uday
abmayfield

Contributor

Joined:

Jun 18, 2018

Thanks so much. That's exactly what I needed. Now I can do SRS of columns OR stack data. Just out of curiousity, do you know if this can be down WITHOUT scripting in JMP14? Thanks, 

Anderson

Anderson B. Mayfield
uday_guntupalli

Community Trekker

Joined:

Sep 15, 2014

@abmayfield,
       I am not aware of a way to do it interactively without using a script. 

Best
Uday
abmayfield

Contributor

Joined:

Jun 18, 2018

To answer my own question, 118,000 columns CAN be stacked, which allows for me to look at the standard deviation across all genes by my treatment factors. But I will also try this random column selection script, too. 

Anderson B. Mayfield
gzmorgan0

Community Trekker

Joined:

Jul 25, 2016

Congratulations on finding a solution!

 

If you are using JMP 14, you might want to try a the following column formula in your table. Since you said you have limited memory, and a crash is possible, save anything important first.

 

Std Dev( Current Data Table()[Row(), Index( c1, c2 )] ) 

where c1 is the first column and c2 the last, for example 6, 118005.

 

If there is no crash, I would remove the formula.

 

BTW, the stacked table is likely more efficient for summarizing. Just posing a possible alternative.

abmayfield

Contributor

Joined:

Jun 18, 2018

Yes, I did try to calculate a standard deviation across 118,000 columns, but it freezes JMP, and I usually give up after a few hours. I may try to let it run overnight tonight, though. 

 

Also, I learned that when you stack 118000 columns on top of each other for 12 samples, the resulting JMP table is over 2 GB and is unstable! This is weird because a 118000 column x 12 row table is only 50 MB. I wonder why a 6 column x 1.5 million row table is so much larger? 

Anderson B. Mayfield