turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- JMP User Community
- :
- Discussions
- :
- Discussions
- :
- simple random samples of columns

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

Highlighted

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Aug 9, 2018 1:42 PM
(1170 views)

Hello,

I know how to use JMP to create a SRS using rows but not with columns. I have 118,000 columns as part of a transcriptome, and I want to calculate the standard deviation across all genes (i.e., across rows). Doing so will cause JMP to crash on a Macbook Air with 8 GB of RAM. Therefore, I am wondering if I can just pick groups of 5,000 or 10,000 genes and calculate a standard deviation across them. Then, I can take the mean standard deviation of several SRSs (each column is a unique gene.). If I transpose the table to where the genes are in the rows, the transposition step itself will cause JMP to crash! Any ideas?

PS: I would attach the data file, but it's 80 MB!

Anderson B. Mayfield

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

@abmayfield,

If I understand what you are after, you want the ability to randomly select "n" columns in a data table ? Try to look at this:

```
dt = Open( "$SAMPLE_DATA/Airline Delays.jmp" );
nColumns = n Cols(dt);
```

nSampleCols = 2;
SelCols = As List(Random Index(nColumns,nSampleCols));
dt << Select columns(SelCols);

I think this allows to randomly sample "n" columns from the data table. Now you can use the random sample to do what you would like, please let me know if this is what you are after.

Best

Uday

Uday

6 REPLIES

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

@abmayfield,

If I understand what you are after, you want the ability to randomly select "n" columns in a data table ? Try to look at this:

```
dt = Open( "$SAMPLE_DATA/Airline Delays.jmp" );
nColumns = n Cols(dt);
```

nSampleCols = 2;
SelCols = As List(Random Index(nColumns,nSampleCols));
dt << Select columns(SelCols);

I think this allows to randomly sample "n" columns from the data table. Now you can use the random sample to do what you would like, please let me know if this is what you are after.

Best

Uday

Uday

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Aug 9, 2018 2:57 PM
(1155 views)
| Posted in reply to message from uday_guntupalli 08/09/2018 04:55 PM

Thanks so much. That's exactly what I needed. Now I can do SRS of columns OR stack data. Just out of curiousity, do you know if this can be down WITHOUT scripting in JMP14? Thanks,

Anderson

Anderson B. Mayfield

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

To answer my own question, 118,000 columns CAN be stacked, which allows for me to look at the standard deviation across all genes by my treatment factors. But I will also try this random column selection script, too.

Anderson B. Mayfield

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Congratulations on finding a solution!

If you are using JMP 14, you might want to try a the following column formula in your table. Since you said you have limited memory, and a crash is possible, save anything important first.

Std Dev( Current Data Table()[Row(), Index( c1, c2 )] )

where c1 is the first column and c2 the last, for example 6, 118005.

If there is no crash, I would remove the formula.

BTW, the stacked table is likely more efficient for summarizing. Just posing a possible alternative.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Yes, I did try to calculate a standard deviation across 118,000 columns, but it freezes JMP, and I usually give up after a few hours. I may try to let it run overnight tonight, though.

Also, I learned that when you stack 118000 columns on top of each other for 12 samples, the resulting JMP table is over 2 GB and is unstable! This is weird because a 118000 column x 12 row table is only 50 MB. I wonder why a 6 column x 1.5 million row table is so much larger?

Anderson B. Mayfield