Browse apps to extend the software in the new JMP Marketplace
Practice JMP using these webinar videos and resources. We hold live Mastering JMP Zoom webinars with Q&A most Fridays at 2 pm US Eastern Time. See the list and register. Local-language live Zoom webinars occur in the UK, Western Europe and Asia. See your country jmp.com/mastering site.
Created:
Jun 28, 2022 01:34 PM
| Last Modified: Oct 24, 2023 12:42 PM
ImportantVariables_1.1.jrn
See how to use simple statistical techniques help identify important variables and interactions in your data as a precursor to building high performing prediction models and uncovering additional insights.
See how to:
Understand prerequisites to consider before starting to ID important variables
Clean-up and prepare data using Column Viewer
ID and decide how to handle outliers and missing data using Explore Missing Values, Explore Outliers and Explore Patterns
Ensure dataset is the best representation of the system you plan to study<
Use Distributions in Column headers and Graph Builder to explore the data and visualize trends
Perform Multivariate Analysis to ID correlated variables
Use Response Screening to ID the most important variables that affect the response
Use Predictor Screening (decision tree methods) to test many factors in predicting a response
Use Multivariate Analysis to ID variables with high correlation
Cluster variables to group correlated variables together and identifies the single variable that is most representative
Use Fit Model>Response Screening and Stepwise Regression to find interactions and higher-order terms based on your important and unique variables
Questions answered by Scott @scott_allen and @Byron_JMP during the June, 2023 live webinar.
Q: How did you "un-select" while in the graph builder?
A: Click on the white space in the graph, not on a marker.
Q: Could you further elaborate on your explanation around P value and its rank? What does it mean its high vs low, what to expect when we see certain values or trends, etc.
A: For p-value and most statistics, consider . Also, when in JMP and you want help on a statistic (or anything else displayed). Select Tools > ? and click the area (in this case, p-value column header) that you want help with. Also, see this explanation.
Q: How is predictor screening platform different from random forest model?
A: In JMP we call Random Forest Bootstrap Forest (because Random Forest is trademarked). It’s the same thing. Predictor Screening does not include a model, only the ranking.
Q: Since the predictor screen isn’t telling you the percent variation is attributed to each individual predictor, is there any other platform that does just that?
A: The Predictor Screening Report shows the list of predictors with their respective contributions and rank. Predictors with the highest contributions are likely to be important in predicting Y. The Contribution column shows the contribution of each predictor to the Bootstrap Forest model. The Portion column in the report shows the percent contribution of each variable.
Q: Can multivariate analysis have absolute correlation numbers instead of -1 to 1?
A: Yes and no. The table is a matrix of correlation coefficients that summarizes the strength of the linear relationships between each pair of response (Y) variables. See multivariate methods.
Q: How would you select the most important variables if you had continuous, binomial (yes/no) and ordered (low, middle, high) independent variables?
A: Yes, you can use all these methods: Predictor Screening and Response Screening.
Q: Do you ever use stepwise regression? Does JMP offer all possible subsets regression?
Q: A lot of my data contains multiple responses, and I find it limits some of things I can do in JMP. Do you do any special cleaning of multiple response data, or do you work with it as-is? I am referring to a "Multiple Response" column property (a list of multiple data points). I'm assuming it is easier to just break these out into two different columns but wanted to see if there is a better way.
A: Yes, we let you store multiple responses in one cell as an, but to analyze them, splitting them into different columns is correct.
Q: How do Tree-Based methods compare to PLS or PCA?
A: With PCA or PLS, we are trying to represent all the data with this smaller number of latent variables. You are trying to find a reduced number of dimensions that can represent all the data. That is different than tree-based methods, which help you nail down what actually explains something. Tree methods are supplanting PCA and PLS for building better. Tree-Based methods are fantastic for modeling data but are really sensitive to changes and they can over fit a lot. With the tree-based methods where you're taking a random sample of the rows and a random sample of the columns each time that you make a new tree, makes it really robust to outliers, and it's able to model like interactions and like quadratics or cubics, if they exist in the data. So, you don't have to make a bunch of assumptions.
Q: Can I use Predictor Screening and Response Screening for dependent variables that are numerical and my independent variables are continuous, binomial and ordered.
Q: Why wouldn't you get rid of the correlated variables first?
A: The presence of correlated variables often informs the modeling method. PLS for example, is more robust to correlated X’s. Also, the order may be a preference. You could do it both ways – predictor screening then clustering or vice versa. Be aware that clustering can be a bit inaccurate when you have lots of missing values or columns that don’t change (like an instrument setting). Sometimes additional data cleanup and preparation is needed if you want to start with clustering.
Q: How would you decide the prediction algorithm?
A: That’s a complicated question and the answer depends on your situation. Most of the time a linear model is going to be the most simple and informative. Other methods work around specific problems with the data, or analysis objectives.
Q: Why are the linear terms so strongly recommended? If my correlation is Y=X^2, why does JMP keep trying to add the X term?
A: This relates to effect heredity. Some of the time your higher order terms are significant but not the main effect. So, you want to keep in the main effect. For example, if you removed the main effect, then in the Prediction Filer you would not see the slope changing, you would just see the curvature and get not as good a prediction.
Q: For Predictor Screening, should the data of the factors and the responses be gathered in the same order (paired) or can randomly gathered data (unpaired) can be used.
A: Order doesn't really matter, but for Multivariate Analysis, we often like to throw in Ys in one group. The order that you add them to Y is the order it will be shown in the matrix.
Q: Would multi-linear regression analysis of x-factors (600 columns) help in finding which x columns are higher contributors to yield?
A: If you just do linear regression on 600 columns, you're really diluting what's explaining your Y. Tree methods like Partition work great for trying to nail down which of those 600 columns are getting you there. When Scott showed Predictor Screening, it took a random sample of the rows and columns and it builds a tree, does that 100 times and averages across those hundred trees to gives you the result. With Partition, you can force things to happen, you can force decisions to happen in different places or follow something as it is getting split. You could then look at the Variance Importance Report, or
the split history and R-Square changes. This example is too large to do that, but will give you and idea of how to use Partition.
Q: If the 600 separate linear regressions dilute the explanatory power of your model, is this why Response Screening employs the FDR (False Discovery Rate) correction in those summary statistics (that Scott showed and shorted ascending before) to try and compensate for this?
A: By chance alone you're going to find things that are really good, you want to try and filter those. We don't want to find things by chance, alone, so FDR helps correct for that. I emphasize HELPS, because no model is perfect. Likewise, all tree-based models, in situations where you have multiple linearity are best when first you try to separate out variance and find those variables that are really the most important instead of throwing everything into the model. Sometimes you can build ensemble models using JMP Pro.
Q: Are thre other methods for doing this with JMP Pro?
Nick Shelton @nick_shelton , JMP SE manager, presented this topic in the past and I retired the older videos. I borrowed a comment he made to viewer @HelenaG because it may be relevant to anyone viewing this video and Q&A by JMP SEs Scot Allen @scott_allen and Byron Wingerd @Byron_JMP :
Helena asked:
I would like to better understand the following: all the steps linked with interactions identification and assessment (with response screening and stepwise regression tools) are based on a linear assumption - is this correct? If yes, is there any suggestion to check for non-linear interactions?
Nick wrote:
The identification of interactions is best done through domain expertise and a known scientific/physical understanding of the variables under analysis.
When interactions between variables are not well understood the "Response Screening" model personality in Fit Model works well at quantifying the strength between individual interaction terms and the response variable.
The "Stepwise Regression" model personality in Fit Model works well at identifying the best combination of terms (Ex: Main effects, interactions..) to use for a potential model.
The term "linear" model refers to how the formula of the model is structured but both the "Response Screening" & "Stepwise Regression" model personalities in Fit Model can also be used to explore non-linear relationships.
We can include non-linear effects and interactions into an analysis by adding polynomial terms.
In Fit Model you can explore any polynomial relationships (non-linear) by adding the corresponding term in the Model effects box (Ex: Quadratic Relationship (x1*x1), Cubic Relationship (x1*x1*x1), Quadratic Interaction (x1*x1*x2)...)
For more information on polynomials and how they can assist with understanding relationships among variables please watch the quick video below:
Due to this presentation of yours, I have been using Predictor screening and I am really satisfied with the results. Concerning this platform, I now ask your deep expertise on the following:
1. How is calculated the portion that is presented in the JMP results for predictor screening? I mean, in terms of Bootstrap Forest (BF) what this portion means and how the specific value is achieved? (I am writing a scientific paper using this and I would need this detail).
2. In terms of interpretation I understood that portion explains the percentage of variability of a response (yield) explained by a predictor. Is this correct?
3. Still in predictor screening platform, what "contribution column" means? How is it calculated in terms of BF algorithm?
In relation to my previous post, I noticed that Scot Allen @scott_allen and Byron Wingerd @Byron_JMP are now presenting this theme. Therefore, I would like your help with the above questions!
Thanks for the questions. I think Predictor Screening is a great tool and I'm glad you are finding it helpful. Instead of answering your individual question, let me explain in a little more detail what is going on in this analysis.
The Predictor Screening analysis runs a Bootstrap Forest partition model (with a default 100 decision trees) and then ranks the predictors based on their contribution to the model. In short, the Bootstrap Forest analysis is averaging together many decision trees. To create a tree, it takes a bootstrap sample of observations and recursively fits a model by making splits with a random set of predictors. This continues until a stopping rule is met. Then another tree is created with a new set of bootstrapped observations and random predictor splits. Once all the trees have been created, they are averaged into a "forest" of trees (thus the name).
Back to Predictor Screening output, the Contribution Column provides the sum of squares for a continuous numeric response and G^2 for a categorical response. The portion column is the individual predictor contribution divided by the sum of all the contributions. This is not the same as the variance explained by each individual predictor.
To learn more about these platforms, I recommend reading the Partition Model and Boostrap Forest overviews in the documentation.
-Scott
Recommended Articles
'
var data = div.getElementsByClassName("video-js");
var script = document.createElement('script');
script.src = "https://players.brightcove.net/" + data_account + "/" + data_palyer + "_default/index.min.js";
for(var i=0;i< data.length;i++){
videodata.push(data[i]);
}
}
}
for(var i=0;i< videodata.length;i++){
document.getElementsByClassName('lia-vid-container')[i].innerHTML = videodata[i].outerHTML;
document.body.appendChild(script);
}
}
catch(e){
}
/* Re compile html */
$compile(rootElement.querySelectorAll('div.lia-message-body-content')[0])($scope);
}
if (code_l.toLowerCase() != newBody.getAttribute("slang").toLowerCase()) {
/* Adding Translation flag */
var tr_obj = $filter('filter')($scope.sourceLangList, function (obj_l) {
return obj_l.code.toLowerCase() === newBody.getAttribute("slang").toLowerCase()
});
if (tr_obj.length > 0) {
tr_text = "This post originally written in lilicon-trans-text has been computer translated for you. When you reply, it will also be translated back to lilicon-trans-text.".replace(/lilicon-trans-text/g, tr_obj[0].title);
try {
if ($scope.wootMessages[$rootScope.profLang] != undefined) {
tr_text = $scope.wootMessages[$rootScope.profLang].replace(/lilicon-trans-text/g, tr_obj[0].title);
}
} catch (e) {
}
} else {
//tr_text = "This message was translated for your convenience!";
tr_text = "This message was translated for your convenience!";
}
try {
if (!document.getElementById("tr-msz-" + value)) {
var tr_para = document.createElement("P");
tr_para.setAttribute("id", "tr-msz-" + value);
tr_para.setAttribute("class", "tr-msz");
tr_para.style.textAlign = 'justify';
var tr_fTag = document.createElement("IMG");
tr_fTag.setAttribute("class", "tFlag");
tr_fTag.setAttribute("src", "/html/assets/lingoTrFlag.PNG");
tr_fTag.style.marginRight = "5px";
tr_fTag.style.height = "14px";
tr_para.appendChild(tr_fTag);
var tr_textNode = document.createTextNode(tr_text);
tr_para.appendChild(tr_textNode);
/* Woot message only for multi source */
if(rootElement.querySelector(".lia-quilt-forum-message")){
rootElement.querySelector(".lia-quilt-forum-message").appendChild(tr_para);
} else if(rootElement.querySelector(".lia-message-view-blog-topic-message")) {
rootElement.querySelector(".lia-message-view-blog-topic-message").appendChild(tr_para);
} else if(rootElement.querySelector(".lia-quilt-blog-reply-message")){
rootElement.querySelector(".lia-quilt-blog-reply-message").appendChild(tr_para);
} else if(rootElement.querySelector(".lia-quilt-tkb-message")){
rootElement.querySelector(".lia-quilt-tkb-message").appendChild(tr_para);
} else if(rootElement.querySelector(".lia-quilt-tkb-reply-message")){
rootElement.querySelector(".lia-quilt-tkb-reply-message").insertBefore(tr_para,rootElement.querySelector(".lia-quilt-row.lia-quilt-row-footer"));
} else if(rootElement.querySelector(".lia-quilt-idea-message")){
rootElement.querySelector(".lia-quilt-idea-message").appendChild(tr_para);
}else if(rootElement.querySelector(".lia-quilt-column-alley-left")){
rootElement.querySelector(".lia-quilt-column-alley-left").appendChild(tr_para);
}
else {
if (rootElement.querySelectorAll('div.lia-quilt-row-footer').length > 0) {
rootElement.querySelectorAll('div.lia-quilt-row-footer')[0].appendChild(tr_para);
} else {
rootElement.querySelectorAll('div.lia-quilt-column-message-footer')[0].appendChild(tr_para);
}
}
}
} catch (e) {
}
}
} else {
/* Do not display button for same language */
// syncList.remove(value);
var index = $scope.syncList.indexOf(value);
if (index > -1) {
$scope.syncList.splice(index, 1);
}
}
}
}
}
}
angular.forEach(mszList_l, function (value) {
if (document.querySelectorAll('div.lia-js-data-messageUid-' + value).length > 0) {
var rootElements = document.querySelectorAll('div.lia-js-data-messageUid-' + value);
}else if(document.querySelectorAll('.lia-occasion-message-view .lia-component-occasion-message-view').length >0){
var rootElements = document.querySelectorAll('.lia-occasion-message-view .lia-component-occasion-message-view')[0].querySelectorAll('.lia-occasion-description')[0];
}else {
var rootElements = document.querySelectorAll('div.message-uid-' + value);
}
angular.forEach(rootElements, function (rootElement) {
if (value == '514954' && "TkbArticlePage" == "TkbArticlePage") {
rootElement = document.querySelector('.lia-thread-topic');
}
/* V1.1 Remove from UI */
if (document.getElementById("tr-msz-" + value)) {
document.getElementById("tr-msz-" + value).remove();
}
if (document.getElementById("tr-sync-" + value)) {
document.getElementById("tr-sync-" + value).remove();
}
/* XPath expression for subject and Body */
var lingoRBExp = "//lingo-body[@id = " + "'lingo-body-" + value + "'" + "]";
lingoRSExp = "//lingo-sub[@id = " + "'lingo-sub-" + value + "'" + "]";
/* Get translated subject of the message */
lingoRSXML = doc.evaluate(lingoRSExp, doc, null, XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE, null);
for (var i = 0; i < lingoRSXML.snapshotLength; i++) {
/* Replace Reply/Comment subject with transalted subject */
var newSub = lingoRSXML.snapshotItem(i);
/*** START : extracting subject from source if selected language and source language is same **/
var sub_L = "";
if (newSub.getAttribute("slang").toLowerCase() == code_l.toLowerCase()) {
if (value == '514954') {
sub_L = decodeURIComponent($scope.sourceContent[value].subject);
}
else{
sub_L = decodeURIComponent($scope.sourceContent[value].subject);
}
} else {
sub_L = newSub.innerHTML;
}
/*** End : extracting subject from source if selected language and source language is same **/
/* This code is placed to remove the extra meta tag adding in the UI*/
try{
sub_L = sub_L.replace('<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />','');
}
catch(e){
}
// if($scope.viewTrContentOnly || (newSub.getAttribute("slang").toLowerCase() != code_l.toLowerCase())) {
if ($scope.viewTrContentOnly) {
if ("TkbArticlePage" == "IdeaPage") {
if (value == '514954') {
if( (sub_L != "") && (sub_L != undefined) && (sub_L != "undefined") ){
document.querySelector('.MessageSubject .lia-message-subject').innerHTML = sub_L;
}
}
}
if ("TkbArticlePage" == "TkbArticlePage") {
if (value == '514954') {
if( (sub_L != "") && (sub_L != undefined) && (sub_L != "undefined") ){
var subTkbElement = document.querySelector('.lia-thread-subject');
if(subTkbElement){
document.querySelector('.lia-thread-subject').innerHTML = sub_L;
}
}
}
}
else if ("TkbArticlePage" == "BlogArticlePage") {
if (value == '514954') {
try {
if((sub_L != "") && (sub_L!= undefined) && (sub_L != "undefined")){
var subElement = rootElement.querySelector('.lia-blog-article-page-article-subject');
if(subElement) {
subElement.innerText = sub_L;
}
}
} catch (e) {
}
/* var subElement = rootElement.querySelectorAll('.lia-blog-article-page-article-subject');
for (var subI = 0; subI < subElement.length; subI++) {
if((sub_L != "") && (sub_L!= undefined) && (sub_L != "undefined")){
subElement[subI].innerHTML = sub_L;
}
} */
}
else {
try {
// rootElement.querySelectorAll('.lia-blog-article-page-article-subject').innerHTML= sub_L;
/** var subElement = rootElement.querySelectorAll('.lia-blog-article-page-article-subject');
for (var j = 0; j < subElement.length; j++) {
if( (sub_L != "") && (sub_L != undefined) && (sub_L != "undefined") ){
subElement[j].innerHTML = sub_L;
}
} **/
} catch (e) {
}
}
}
else {
if (value == '514954') {
try{
/* Start: This code is written by iTalent as part of iTrack LILICON - 98 */
if( (sub_L != "") && (sub_L != undefined) && (sub_L != "undefined") ){
if(document.querySelectorAll('.lia-quilt-forum-topic-page').length > 0){
if(rootElement.querySelector('div.lia-message-subject').querySelector('h5')){
rootElement.querySelector('div.lia-message-subject').querySelector('h5').innerText = decodeURIComponent(sub_L);
} else {
rootElement.querySelector('.MessageSubject .lia-message-subject').innerText = sub_L;
}
} else {
rootElement.querySelector('.MessageSubject .lia-message-subject').innerText = sub_L;
}
}
/* End: This code is written by iTalent as part of iTrack LILICON - 98 */
}
catch(e){
console.log("subject not available for second time. error details: " + e);
}
} else {
try {
/* Start: This code is written by iTalent as part of LILICON - 98 reported by Ian */
if ("TkbArticlePage" == "IdeaPage") {
if( (sub_L != "") && (sub_L != undefined) && (sub_L != "undefined") ){
document.querySelector('.lia-js-data-messageUid-'+ value).querySelector('.MessageSubject .lia-message-subject').innerText = sub_L;
}
}
else{
if( (sub_L != "") && (sub_L != undefined) && (sub_L != "undefined") ){
rootElement.querySelector('.MessageSubject .lia-message-subject').innerText = sub_L;
/* End: This code is written as part of LILICON - 98 reported by Ian */
}
}
} catch (e) {
console.log("Reply subject not available. error details: " + e);
}
}
}
// Label translation
var labelEle = document.querySelector("#labelsForMessage");
if (!labelEle) {
labelEle = document.querySelector(".LabelsList");
}
if (labelEle) {
var listContains = labelEle.querySelector('.label');
if (listContains) {
/* Commenting this code as bussiness want to point search with source language label */
// var tagHLink = labelEle.querySelectorAll(".label")[0].querySelector(".label-link").href.split("label-name")[0];
var lingoLabelExp = "//lingo-label/text()";
trLabels = [];
trLabelsHtml = "";
/* Get translated labels of the message */
lingoLXML = doc.evaluate(lingoLabelExp, doc, null, XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE, null);
/* try{
for(var j=0;j,';
}
trLabelsHtml = trLabelsHtml+'