Subscribe Bookmark RSS Feed

Random row selection below a certain threshold

dharding

Community Trekker

Joined:

Aug 14, 2013

Hello All,

I am attempting to randomly select 20% of the observations in my table below 200, and then exclude those observations from any analysis. In other words, do a random row selection both at a given percentage and below a certain threshold. Thanks for any pointers!

1 ACCEPTED SOLUTION

Accepted Solutions
XanGregg

Staff

Joined:

Jun 23, 2011

Solution

Two options I can think of.

1. Use Select Where to select rows with values < 200. Make a subset using selected rows and choose Link to Original Table. Make subset of that using a random sample and also choose Link to Original Table. Now select all the rows in the final subset and they will also be selected in the original, because of the linking.

2. Make a new column with a formula such as :value < 200 & random uniform() < 0.2.

3 REPLIES
fugue

Community Member

Joined:

Jul 25, 2012

A simple approach would be to use a data step to get all the obs that satisfy your cutoff value (<200), apply one of the SAS RANDom functions to generate random numbers for each row, sort by the random number and then only keep the top (or bottom) 20%. Then, merge back with your original data to exclude drop those obs.

XanGregg

Staff

Joined:

Jun 23, 2011

Solution

Two options I can think of.

1. Use Select Where to select rows with values < 200. Make a subset using selected rows and choose Link to Original Table. Make subset of that using a random sample and also choose Link to Original Table. Now select all the rows in the final subset and they will also be selected in the original, because of the linking.

2. Make a new column with a formula such as :value < 200 & random uniform() < 0.2.

dharding

Community Trekker

Joined:

Aug 14, 2013

Thanks Xan,

that works great!