cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar

K-means Modelling by Consensus

Hello, 

I've got a extensive dataset and I'm looking to try out a consensus modelling approach to pick out the likely patterns within my data set. Since these is going to be a pretty extensive approach I just have a few logistical questions about the scripting. Firstly, I'll break down the workflow of what the modelling is trying to achieve. 

Workflow

1. Subset dataset by a random - sampling rate of 0.1.
2. Run factor analysis for "y" columns.

3. Using the Eigenvalues from the factor analysis, run a K-means cluster for the following number of factors: 
- 3 factors to the number of factors with eigenvalues above 1.

4. Make all Cluster Means into data table excluding standard deviations. 

5. Concatenate step 4 with the subset of random data sampled in step 1. 

6. Create indicator column that distinguishes samples and k-mean cluster means. 

7. Loop to run steps 1 through 6  n amount of times. 

8. Create indicator column to distinguish each loop in the final concatenate table. 

All of this can be done one at a time, but what I'm trying to get at here is to automate this process through lots of different iterations and create a dashboard to showcase how the results changed through the random sampling procedure. 

Any input on how to code these steps into each other would be appreciated. The loop is what is giving me grief as all the other steps (with the exception of step 3) can be saved to the data table and code extracted. 

M. Dereviankin
1 REPLY 1

Re: K-means Modelling by Consensus

Just an update on my steps and hiccups for additional information. 

After I subset the data table and run a factor analysis, how do I extract the eigenvalues as a string to use later in my coding? 

 

Data Table(
	"Data"
) << Subset(
	Output Table( "Subset" ),
	Sampling Rate( 0.1 ),
	Selected columns only( 0 )
Factor Analysis(
Y(
:Column1, :Column2
),
Variance Estimation( "Row-wise" ),
Variance Scaling( "Correlations" )
)

The next step will be to write that string into my code so that I can run K-means in that range. 

 

M. Dereviankin