Discussions

ylee · Jul 17, 2023 07:30 PM

Hello there,

I have a table containing 1mil rows = 1mil sample size, and 3000+ columns = 3000 parameters to study.

I have tried the Subset Random - sampling rate = 0.01 in attempt to reduce my data size to 10k rows, while still representing the initial big table sufficiently. Noted that most of the aggregated stats, CPK are still quite matched to the big table, but some tail observations may be excluded.

I couldn't find more details about this feature in the JMP help/manual, if could you share how the Random sampling is being done in the background? Is that a Random Uniform kind of selection, evenly distributed from row1 to rowN ? Or something else?

Thank you.

Jordan_Hiller · Jul 17, 2023 08:39 PM

JMP is giving you a Simple Random Sample: each observation in the original dataset is equally likely to be in the subset.

Sounds like you want a sample that is stratified by CPK. If you have JMP Pro you could do this with the “Make Validation Column” utility.

View solution in original post

Jordan_Hiller · Jul 17, 2023 08:39 PM

JMP is giving you a Simple Random Sample: each observation in the original dataset is equally likely to be in the subset.

Sounds like you want a sample that is stratified by CPK. If you have JMP Pro you could do this with the “Make Validation Column” utility.

Discussions

Table Subset Random Rows

Re: Table Subset Random Rows

Re: Table Subset Random Rows

Recommended Articles