cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Submit your abstract to the call for content for Discovery Summit Americas by April 23. Selected abstracts will be presented at Discovery Summit, Oct. 21- 24.
Discovery is online this week, April 16 and 18. Join us for these exciting interactive sessions.
Choose Language Hide Translation Bar
abmayfield
Level VI

simple random samples of columns

Hello, 

    I know how to use JMP to create a SRS using rows but not with columns. I have 118,000 columns as part of a transcriptome, and I want to calculate the standard deviation across all genes (i.e., across rows). Doing so will cause JMP to crash on a Macbook Air with 8 GB of RAM. Therefore, I am wondering if I can just pick groups of 5,000 or 10,000 genes and calculate a standard deviation across them. Then, I can take the mean standard deviation of several SRSs (each column is a unique gene.). If I transpose the table to where the genes are in the rows, the transposition step itself will cause JMP to crash! Any ideas?

 

PS: I would attach the data file, but it's 80 MB!

Anderson B. Mayfield
1 ACCEPTED SOLUTION

Accepted Solutions
uday_guntupalli
Level VIII

Re: simple random samples of columns

@abmayfield
         If I understand what you are after, you want the ability to randomly select "n" columns in a data table ? Try to look at this: 

dt = Open( "$SAMPLE_DATA/Airline Delays.jmp" );

nColumns = n Cols(dt); 

nSampleCols = 2; SelCols = As List(Random Index(nColumns,nSampleCols)); dt << Select columns(SelCols);

I think this allows to randomly sample "n"  columns from the data table. Now you can use the random sample to do what you would like, please let me know if this is what you are after. 

Best
Uday

View solution in original post

6 REPLIES 6
uday_guntupalli
Level VIII

Re: simple random samples of columns

@abmayfield
         If I understand what you are after, you want the ability to randomly select "n" columns in a data table ? Try to look at this: 

dt = Open( "$SAMPLE_DATA/Airline Delays.jmp" );

nColumns = n Cols(dt); 

nSampleCols = 2; SelCols = As List(Random Index(nColumns,nSampleCols)); dt << Select columns(SelCols);

I think this allows to randomly sample "n"  columns from the data table. Now you can use the random sample to do what you would like, please let me know if this is what you are after. 

Best
Uday
abmayfield
Level VI

Re: simple random samples of columns

Thanks so much. That's exactly what I needed. Now I can do SRS of columns OR stack data. Just out of curiousity, do you know if this can be down WITHOUT scripting in JMP14? Thanks, 

Anderson

Anderson B. Mayfield
uday_guntupalli
Level VIII

Re: simple random samples of columns

@abmayfield,
       I am not aware of a way to do it interactively without using a script. 

Best
Uday
abmayfield
Level VI

Re: simple random samples of columns

To answer my own question, 118,000 columns CAN be stacked, which allows for me to look at the standard deviation across all genes by my treatment factors. But I will also try this random column selection script, too. 

Anderson B. Mayfield
gzmorgan0
Super User (Alumni)

Re: simple random samples of columns

Congratulations on finding a solution!

 

If you are using JMP 14, you might want to try a the following column formula in your table. Since you said you have limited memory, and a crash is possible, save anything important first.

 

Std Dev( Current Data Table()[Row(), Index( c1, c2 )] ) 

where c1 is the first column and c2 the last, for example 6, 118005.

 

If there is no crash, I would remove the formula.

 

BTW, the stacked table is likely more efficient for summarizing. Just posing a possible alternative.

abmayfield
Level VI

Re: simple random samples of columns

Yes, I did try to calculate a standard deviation across 118,000 columns, but it freezes JMP, and I usually give up after a few hours. I may try to let it run overnight tonight, though. 

 

Also, I learned that when you stack 118000 columns on top of each other for 12 samples, the resulting JMP table is over 2 GB and is unstable! This is weird because a 118000 column x 12 row table is only 50 MB. I wonder why a 6 column x 1.5 million row table is so much larger? 

Anderson B. Mayfield