Choose Language Hide Translation Bar
Highlighted
abmayfield
Level V

simple random samples of columns

Hello, 

    I know how to use JMP to create a SRS using rows but not with columns. I have 118,000 columns as part of a transcriptome, and I want to calculate the standard deviation across all genes (i.e., across rows). Doing so will cause JMP to crash on a Macbook Air with 8 GB of RAM. Therefore, I am wondering if I can just pick groups of 5,000 or 10,000 genes and calculate a standard deviation across them. Then, I can take the mean standard deviation of several SRSs (each column is a unique gene.). If I transpose the table to where the genes are in the rows, the transposition step itself will cause JMP to crash! Any ideas?

 

PS: I would attach the data file, but it's 80 MB!

Anderson B. Mayfield
1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted
uday_guntupalli
Level VIII

Re: simple random samples of columns

@abmayfield
         If I understand what you are after, you want the ability to randomly select "n" columns in a data table ? Try to look at this: 

dt = Open( "$SAMPLE_DATA/Airline Delays.jmp" );

nColumns = n Cols(dt); 

nSampleCols = 2; SelCols = As List(Random Index(nColumns,nSampleCols)); dt << Select columns(SelCols);

I think this allows to randomly sample "n"  columns from the data table. Now you can use the random sample to do what you would like, please let me know if this is what you are after. 

Best
Uday

View solution in original post

6 REPLIES 6
Highlighted
uday_guntupalli
Level VIII

Re: simple random samples of columns

@abmayfield
         If I understand what you are after, you want the ability to randomly select "n" columns in a data table ? Try to look at this: 

dt = Open( "$SAMPLE_DATA/Airline Delays.jmp" );

nColumns = n Cols(dt); 

nSampleCols = 2; SelCols = As List(Random Index(nColumns,nSampleCols)); dt << Select columns(SelCols);

I think this allows to randomly sample "n"  columns from the data table. Now you can use the random sample to do what you would like, please let me know if this is what you are after. 

Best
Uday

View solution in original post

Highlighted
abmayfield
Level V

Re: simple random samples of columns

Thanks so much. That's exactly what I needed. Now I can do SRS of columns OR stack data. Just out of curiousity, do you know if this can be down WITHOUT scripting in JMP14? Thanks, 

Anderson

Anderson B. Mayfield
Highlighted
uday_guntupalli
Level VIII

Re: simple random samples of columns

@abmayfield,
       I am not aware of a way to do it interactively without using a script. 

Best
Uday
Highlighted
abmayfield
Level V

Re: simple random samples of columns

To answer my own question, 118,000 columns CAN be stacked, which allows for me to look at the standard deviation across all genes by my treatment factors. But I will also try this random column selection script, too. 

Anderson B. Mayfield
Highlighted
gzmorgan0
Super User

Re: simple random samples of columns

Congratulations on finding a solution!

 

If you are using JMP 14, you might want to try a the following column formula in your table. Since you said you have limited memory, and a crash is possible, save anything important first.

 

Std Dev( Current Data Table()[Row(), Index( c1, c2 )] ) 

where c1 is the first column and c2 the last, for example 6, 118005.

 

If there is no crash, I would remove the formula.

 

BTW, the stacked table is likely more efficient for summarizing. Just posing a possible alternative.

Highlighted
abmayfield
Level V

Re: simple random samples of columns

Yes, I did try to calculate a standard deviation across 118,000 columns, but it freezes JMP, and I usually give up after a few hours. I may try to let it run overnight tonight, though. 

 

Also, I learned that when you stack 118000 columns on top of each other for 12 samples, the resulting JMP table is over 2 GB and is unstable! This is weird because a 118000 column x 12 row table is only 50 MB. I wonder why a 6 column x 1.5 million row table is so much larger? 

Anderson B. Mayfield
Article Labels

    There are no labels assigned to this post.