cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar
tbob
Level II

Finding subset of sample that is representative of the whole

Hi All,

 

I have a set of 49 data points that have repeatable x,y coordinates.  I would like to know which subset of 9 sites best represent the 49 sites.  I tried hierarchical clustering, but that didn't seem to work or I am not using it correctly.  

 

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
SDF1
Super User

Re: Finding subset of sample that is representative of the whole

Hi @tbob ,

 

  You should be able to do this using the Subset platform, Tables > Subset. After you select the Random type you want, e.g. random fraction of the data table, or random selection of a fixed number of rows, you can select the Stratify box and then select the Columns of interest that you want to stratify on -- perhaps a response variable, or just your input variables. When you stratify, JMP tries to make a subset that has the same distribution, mean, standard deviation, etc., so that you're viewing a representative subset of all your data.

 

SDF1_0-1703177402446.png

 

Hope this helps!,

DS

 

View solution in original post

5 REPLIES 5
SDF1
Super User

Re: Finding subset of sample that is representative of the whole

Hi @tbob ,

 

  You should be able to do this using the Subset platform, Tables > Subset. After you select the Random type you want, e.g. random fraction of the data table, or random selection of a fixed number of rows, you can select the Stratify box and then select the Columns of interest that you want to stratify on -- perhaps a response variable, or just your input variables. When you stratify, JMP tries to make a subset that has the same distribution, mean, standard deviation, etc., so that you're viewing a representative subset of all your data.

 

SDF1_0-1703177402446.png

 

Hope this helps!,

DS

 

tbob
Level II

Re: Finding subset of sample that is representative of the whole

I seem to be missing something.  I tried to do this but when I choose a sample size of 9, the output is the original table.

SDF1
Super User

Re: Finding subset of sample that is representative of the whole

Hi @tbob ,

 

  You're right, and I can confirm that this takes place. My apologies, I thought stratifying on the subset would be like stratifying when creating a validation column. In the case of creating a validation column, it's like as I described before, JMP tries to match the different data sets in order for them to have the same distribution, mean, standard deviation, etc.

 

  Instead, looking up stratifying with Subsets, you can find the help page here, Where it describes that when making a subset the number of sample sizes is the number of samples per stratum. So, if you stratify on a column that has 4 levels, and you choose a sample size of 4, you get 4x4 = 16 rows in the subset because there are 4 levels (strata) and you selected 4 samples per strata.

 

  You might try not using the stratify and then look at some statistics of your subset and main data table to make sure that the distributions are similar, or whatever you need to compare the subsets to make sure it represents the whole.

 

  Alternatively, you could create a fake strata column where every entry is the same, say, setting all values to 1. Then you stratify on that column and select 9 as your sample size, and that should do it.

 

  I get why it's done differently here, but I still find it a bit strange and non-intuitive, which is not normal on most JMP platforms.

 

Hope this helps!,

DS

Re: Finding subset of sample that is representative of the whole

How will you use the 9 selected sites? Why won't you use all 49 sites?

tbob
Level II

Re: Finding subset of sample that is representative of the whole

The 49 site measurement is a test qualifying production while the 9 site is actual product.  The product is limited to 9 sites.