- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Two questions about the use of JMP clustering
I am using JMPpro to conduct an unsupervised machine learning cluster analysis with a sample size of about 10,000 and 20 variables. I have some questions during the use process, so I would like to ask other users or engineers for help. Thank you for taking the time to read and patiently analyze and answer my questions! Question 1: I have carefully read the introduction to the cluster analysis function of the software. The hierarchical clustering in JMP is suitable for any data type with small samples. So, if the 20 variables in the data are mixed data (including continuous variables, discrete variables, ordinal variables, and nominal variables), do I need to standardize the continuous and discrete variables in advance when performing hierarchical clustering, and then select "unstandardized" in [Standardization basis]? Or will the software automatically identify the continuous and discrete variables and standardize them after selecting "unstandardized", and the ordinal and nominal variables will keep their original values (that is, there is no need to manually standardize them before hierarchical clustering)? Question 2: Cluster analysis belongs to unsupervised learning in machine learning. If 19 variables are hierarchically clustered and a certain (1) binary nominal variable is set in [basis], then is this analysis still considered unsupervised learning, or is it already semi-supervised learning? If it is considered to be "semi-supervised learning", but clustering itself belongs to unsupervised learning, how should the relationship between them be accurately described?
This post originally written in Chinese (Simplified) and has been translated for your convenience. When you reply, it will also be translated back to Chinese (Simplified).
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: 关于JMP聚类使用过程出现的两个疑问
Hi @pnogau,
Welcome in the Community !
Concerning your questions :
- If you have data with mixed data and modeling type (numerical continuous, ordinal and nominal), then only the Hierarchical Cluster platform will be able to handle such various data type. You can have more info here, Overview of Platforms for Clustering Observations where this table is shown :
You don't need to do the processing of numerical continuous variables beforehand, there are several options to do the pre-processing directly in the platform by specifying data format, type of standardization, and missing data imputation : Launch the Hierarchical Cluster Platform - Not sure to fully understand your second question.
Clustering is used when you don't know beforehand how many "groups"/clusters you have in your data and in which group your observations belong, so it's a unsupervised learning technique. Hierarchical clustering is an interesting technique and platform in JMP, as it enables to perform Two-Way clustering, where your observations are grouped in clusters but also the variables used, to see the similarity and correlations between the variables used. This analysis can be performed in addition of other multivariate platforms like Correlations and Multivariate Techniques or with visualizations done with Graph Builder, to better assess the correlations between your variables.
Also if your binomial variable is some kind of target, you could perform the clustering "blindly" and see how many groups are recommended, and analyze the link between the groups and the binomial variable (which would be a combination of unsupervised learning for clustering, and then supervised learning to analyze the link between clusters and binomial target), or directly specifying that you want 2 clusters in the Hierarchical Clustering platform (which could then be considered as semi-supervised learning, since you already knwo the number of clusters to find and specify it), and see if/how the clustering matches the binomial target variable.
Hope this answer will help you,
"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: 关于JMP聚类使用过程出现的两个疑问
@Victor_G wrote:Hi @pnogau,
Welcome in the Community !
Concerning your questions :
- If you have data with mixed data and modeling type (numerical continuous, ordinal and nominal), then only the Hierarchical Cluster platform will be able to handle such various data type. You can have more info here, Overview of Platforms for Clustering Observations where this table is shown :
You don't need to do the processing of numerical continuous variables beforehand, there are several options to do the pre-processing directly in the platform by specifying data format, type of standardization, and missing data imputation : Launch the Hierarchical Cluster Platform![]()
- Not sure to fully understand your second question.
Clustering is used when you don't know beforehand how many "groups"/clusters you have in your data and in which group your observations belong, so it's a unsupervised learning technique. Hierarchical clustering is an interesting technique and platform in JMP, as it enables to perform Two-Way clustering, where your observations are grouped in clusters but also the variables used, to see the similarity and correlations between the variables used. This analysis can be performed in addition of other multivariate platforms like Correlations and Multivariate Techniques or with visualizations done with Graph Builder, to better assess the correlations between your variables.
Also if your binomial variable is some kind of target, you could perform the clustering "blindly" and see how many groups are recommended, and analyze the link between the groups and the binomial variable (which would be a combination of unsupervised learning for clustering, and then supervised learning to analyze the link between clusters and binomial target), or directly specifying that you want 2 clusters in the Hierarchical Clustering platform (which could then be considered as semi-supervised learning, since you already knwo the number of clusters to find and specify it), and see if/how the clustering matches the binomial target variable.
希望这个回答对你有所帮助,
嗨 @Victor_G ,我在上一期上取得了进展,想与您分享这个好消息。正如您所说,在对混合数据进行聚类时,JMP 会自动对连续和离散数据进行标准化,而无需事先进行手动标准化。虽然我前段时间与JMP中国大学区域业务经理取得了联系,但仍然没有收到明确的答复。我非常感谢您当时的帮助。再次感谢!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: 关于JMP聚类使用过程出现的两个疑问
Hi @pnogau,
Welcome in the Community !
Concerning your questions :
- If you have data with mixed data and modeling type (numerical continuous, ordinal and nominal), then only the Hierarchical Cluster platform will be able to handle such various data type. You can have more info here, Overview of Platforms for Clustering Observations where this table is shown :
You don't need to do the processing of numerical continuous variables beforehand, there are several options to do the pre-processing directly in the platform by specifying data format, type of standardization, and missing data imputation : Launch the Hierarchical Cluster Platform - Not sure to fully understand your second question.
Clustering is used when you don't know beforehand how many "groups"/clusters you have in your data and in which group your observations belong, so it's a unsupervised learning technique. Hierarchical clustering is an interesting technique and platform in JMP, as it enables to perform Two-Way clustering, where your observations are grouped in clusters but also the variables used, to see the similarity and correlations between the variables used. This analysis can be performed in addition of other multivariate platforms like Correlations and Multivariate Techniques or with visualizations done with Graph Builder, to better assess the correlations between your variables.
Also if your binomial variable is some kind of target, you could perform the clustering "blindly" and see how many groups are recommended, and analyze the link between the groups and the binomial variable (which would be a combination of unsupervised learning for clustering, and then supervised learning to analyze the link between clusters and binomial target), or directly specifying that you want 2 clusters in the Hierarchical Clustering platform (which could then be considered as semi-supervised learning, since you already knwo the number of clusters to find and specify it), and see if/how the clustering matches the binomial target variable.
Hope this answer will help you,
"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: 关于JMP聚类使用过程出现的两个疑问
Hi. @Victor_G
Thank you very much for your assistance and patient response to my question on the forum. Your answers and suggestions have provided me with valuable insights, and I believe that the issue I was facing has been tentatively resolved.
Question 1: Since my data contains mixed types of variables, it is indeed necessary to perform hierarchical clustering. Furthermore, following your advice, I carefully reread "Launch the Hierarchical Cluster Platform" and found the solution: To address the issue of different measurement scales for continuous and ordinal columns, it seems I should standardize the continuous and discrete variables first, and then select "Unstandardized" under "Standardize By."Standardize By
Question 2: Your understanding of my doubts was very accurate, and your response has given me important inspiration. In fact, I aim to use cluster analysis to discover different clusters within a vast dataset (individuals) and to conduct visual analysis to explore the potential relationships between more than twenty variables, which is an unsupervised machine learning task. However, to obtain more ideal clustering results, it seems that choosing a certain binomial variable under "By" yields very satisfactory clustering outcomes. I am pondering whether this has now become semi-supervised or supervised learning.By
Actually, I was fortunate enough to get in touch with an engineer responsible for JMP's university business in China, and I am planning to further verify my conjecture with the engineer. If you are interested, I will share the answers I receive with you.
I was very excited to receive your reply! Wishing you a happy life and smooth work ~
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: 关于JMP聚类使用过程出现的两个疑问
@Victor_G wrote:Hi @pnogau,
Welcome in the Community !
Concerning your questions :
- If you have data with mixed data and modeling type (numerical continuous, ordinal and nominal), then only the Hierarchical Cluster platform will be able to handle such various data type. You can have more info here, Overview of Platforms for Clustering Observations where this table is shown :
You don't need to do the processing of numerical continuous variables beforehand, there are several options to do the pre-processing directly in the platform by specifying data format, type of standardization, and missing data imputation : Launch the Hierarchical Cluster Platform![]()
- Not sure to fully understand your second question.
Clustering is used when you don't know beforehand how many "groups"/clusters you have in your data and in which group your observations belong, so it's a unsupervised learning technique. Hierarchical clustering is an interesting technique and platform in JMP, as it enables to perform Two-Way clustering, where your observations are grouped in clusters but also the variables used, to see the similarity and correlations between the variables used. This analysis can be performed in addition of other multivariate platforms like Correlations and Multivariate Techniques or with visualizations done with Graph Builder, to better assess the correlations between your variables.
Also if your binomial variable is some kind of target, you could perform the clustering "blindly" and see how many groups are recommended, and analyze the link between the groups and the binomial variable (which would be a combination of unsupervised learning for clustering, and then supervised learning to analyze the link between clusters and binomial target), or directly specifying that you want 2 clusters in the Hierarchical Clustering platform (which could then be considered as semi-supervised learning, since you already knwo the number of clusters to find and specify it), and see if/how the clustering matches the binomial target variable.
希望这个回答对你有所帮助,
嗨 @Victor_G ,我在上一期上取得了进展,想与您分享这个好消息。正如您所说,在对混合数据进行聚类时,JMP 会自动对连续和离散数据进行标准化,而无需事先进行手动标准化。虽然我前段时间与JMP中国大学区域业务经理取得了联系,但仍然没有收到明确的答复。我非常感谢您当时的帮助。再次感谢!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: 关于JMP聚类使用过程出现的两个疑问
Hi @Victor_G , I’ve made progress on my last issue and wanted to share this good news with you. Just as you said, when clustering mixed data, JMP automatically standardizes continuous and discrete data without the need for manual standardization beforehand. Although I got in touch with the JMP China regional business manager for universities some time ago, I still haven’t received a definitive answer. I am very grateful for your assistance at the time. Thank you again!