turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- JMP User Community
- :
- Discussions
- :
- Discussions
- :
- Random row selection below a certain threshold

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Aug 22, 2013 3:29 PM
(3425 views)

Hello All,

I am attempting to randomly select 20% of the observations in my table below 200, and then exclude those observations from any analysis. In other words, do a random row selection both at a given percentage and below a certain threshold. Thanks for any pointers!

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Two options I can think of.

1. Use Select Where to select rows with values < 200. Make a subset using selected rows and choose Link to Original Table. Make subset of that using a random sample and also choose Link to Original Table. Now select all the rows in the final subset and they will also be selected in the original, because of the linking.

2. Make a new column with a formula such as :value < 200 & random uniform() < 0.2.

3 REPLIES

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

A simple approach would be to use a data step to get all the obs that satisfy your cutoff value (<200), apply one of the SAS RANDom functions to generate random numbers for each row, sort by the random number and then only keep the top (or bottom) 20%. Then, merge back with your original data to exclude drop those obs.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Two options I can think of.

1. Use Select Where to select rows with values < 200. Make a subset using selected rows and choose Link to Original Table. Make subset of that using a random sample and also choose Link to Original Table. Now select all the rows in the final subset and they will also be selected in the original, because of the linking.

2. Make a new column with a formula such as :value < 200 & random uniform() < 0.2.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Thanks Xan,

that works great!