BookmarkSubscribe
Choose Language Hide Translation Bar
dharding
Community Trekker

Random row selection below a certain threshold

Hello All,

I am attempting to randomly select 20% of the observations in my table below 200, and then exclude those observations from any analysis. In other words, do a random row selection both at a given percentage and below a certain threshold. Thanks for any pointers!

0 Kudos
1 ACCEPTED SOLUTION

Accepted Solutions
XanGregg
Staff

Re: Random row selection below a certain threshold

Two options I can think of.

1. Use Select Where to select rows with values < 200. Make a subset using selected rows and choose Link to Original Table. Make subset of that using a random sample and also choose Link to Original Table. Now select all the rows in the final subset and they will also be selected in the original, because of the linking.

2. Make a new column with a formula such as :value < 200 & random uniform() < 0.2.

0 Kudos
3 REPLIES 3
fugue
Community Member

Re: Random row selection below a certain threshold

A simple approach would be to use a data step to get all the obs that satisfy your cutoff value (<200), apply one of the SAS RANDom functions to generate random numbers for each row, sort by the random number and then only keep the top (or bottom) 20%. Then, merge back with your original data to exclude drop those obs.

0 Kudos
XanGregg
Staff

Re: Random row selection below a certain threshold

Two options I can think of.

1. Use Select Where to select rows with values < 200. Make a subset using selected rows and choose Link to Original Table. Make subset of that using a random sample and also choose Link to Original Table. Now select all the rows in the final subset and they will also be selected in the original, because of the linking.

2. Make a new column with a formula such as :value < 200 & random uniform() < 0.2.

0 Kudos
dharding
Community Trekker

Re: Random row selection below a certain threshold

Thanks Xan,

that works great!

0 Kudos