Created:
Sep 19, 2018 12:10 AM
| Last Modified: Oct 24, 2018 9:37 AM
ADI Presentation Discovery 2018.jrn
1800 KB
Level: Intermediate Milo Page, JMP Research Statistician Developer, SAS
JMP Pro 14 includes a new Automated Data Imputation (ADI) utility, a versatile, empirically tuned, streaming, missing data imputation method. We recommend it for handling missing data as a pre-processing step to predictive model fitting. It empirically tunes to your data set to extract the underlying structure, even in the presence of missing data. It also respects training and validation partitions and interfaces seamlessly with predictive models. It is developed using powerful matrix completion methods with some added extensions for robustness and flexibility. This talk will focus on when ADI is appropriate and how to use it in JMP Pro 14. I will also outline a recommended workflow for processing data with missing values and demonstrate ADI’s performance on some examples.
Could you give me more details about the process reported in this slide below?
If I understood well, you are sayng to split data in training and test set before to impute the missing values. Once you divided the data, do you apply the ADI method only to training data. Correct?
Then what do you do on Validation and test set?
What do you mean with Streaming imputed validation/test set?
I can help with your questions, I was Milo's Ph.D. advisor for the research that led to the creation of ADI in JMP Pro.
If I understood well, you are saying to split data in training and test set before to impute the missing values. Once you divided the data, do you apply the ADI method only to training data.
The work flow is to go to the Make Validation Column platform to set up the datatable column that partitions the rows into training/validation sets. When you launch Explore Missing Values, set that column in the validation role. The ADI algorithm fits PCA-type models to the training set, and uses the validation set to tune the model for things like identifying the best rank of the dimension reduction.
Then what do you do on Validation and test set?
I recommend using the same Training/Validation or Training/Validation/Test partitioning of the data when I move on to fitting supervised learning algorithms like Neural Networks or Bootstrap Forest models at the next step in the modeling process. Imputation is a part of the modeling process, just as the predictive modeling component is. Using the same partitioning of the rows into Training/Validation/Test sets for the supervised learning step is a way to ensure that we have the same degree of independent assessment of goodness of fit for the entire modeling exercise.
What do you mean with Streaming imputed validation/test set?
Maybe our language isn't clear here. What happens is that ADI will save a formula column that creates imputed versions of the existing columns. This means that you can use Tables<<Concatenate to load new observations to the data table. These new observations will immediately be imputed, and the prediction formulas that you have saved will execute, giving predictions for the responses based on the imputed data, with no further work needed on the part of the user.
This process happens immediately to the training/validation/test portions of the the data ADI. So, you can go straight to modeling using the imputed versions of the columns without worrying too much about the missing cells. ADI is pretty smart. It uses the validation rows to determine the rank of the dimension reduction used. If the columns are independent, or too messy for a linear rank reduction, it figures that out and simply imputes the training mean of the columns.
'
var data = div.getElementsByClassName("video-js");
var script = document.createElement('script');
script.src = "https://players.brightcove.net/" + data_account + "/" + data_palyer + "_default/index.min.js";
for(var i=0;i< data.length;i++){
videodata.push(data[i]);
}
}
}
for(var i=0;i< videodata.length;i++){
document.getElementsByClassName('lia-vid-container')[i].innerHTML = videodata[i].outerHTML;
document.body.appendChild(script);
}
}
catch(e){
}
/* Re compile html */
$compile(rootElement.querySelectorAll('div.lia-message-body-content')[0])($scope);
}
if (code_l.toLowerCase() != newBody.getAttribute("slang").toLowerCase()) {
/* Adding Translation flag */
var tr_obj = $filter('filter')($scope.sourceLangList, function (obj_l) {
return obj_l.code.toLowerCase() === newBody.getAttribute("slang").toLowerCase()
});
if (tr_obj.length > 0) {
tr_text = "This post originally written in lilicon-trans-text has been computer translated for you. When you reply, it will also be translated back to lilicon-trans-text.".replace(/lilicon-trans-text/g, tr_obj[0].title);
try {
if ($scope.wootMessages[$rootScope.profLang] != undefined) {
tr_text = $scope.wootMessages[$rootScope.profLang].replace(/lilicon-trans-text/g, tr_obj[0].title);
}
} catch (e) {
}
} else {
//tr_text = "This message was translated for your convenience!";
tr_text = "This message was translated for your convenience!";
}
try {
if (!document.getElementById("tr-msz-" + value)) {
var tr_para = document.createElement("P");
tr_para.setAttribute("id", "tr-msz-" + value);
tr_para.setAttribute("class", "tr-msz");
tr_para.style.textAlign = 'justify';
var tr_fTag = document.createElement("IMG");
tr_fTag.setAttribute("class", "tFlag");
tr_fTag.setAttribute("src", "/html/assets/lingoTrFlag.PNG");
tr_fTag.style.marginRight = "5px";
tr_fTag.style.height = "14px";
tr_para.appendChild(tr_fTag);
var tr_textNode = document.createTextNode(tr_text);
tr_para.appendChild(tr_textNode);
/* Woot message only for multi source */
if(rootElement.querySelector(".lia-quilt-forum-message")){
rootElement.querySelector(".lia-quilt-forum-message").appendChild(tr_para);
} else if(rootElement.querySelector(".lia-message-view-blog-topic-message")) {
rootElement.querySelector(".lia-message-view-blog-topic-message").appendChild(tr_para);
} else if(rootElement.querySelector(".lia-quilt-blog-reply-message")){
rootElement.querySelector(".lia-quilt-blog-reply-message").appendChild(tr_para);
} else if(rootElement.querySelector(".lia-quilt-tkb-message")){
rootElement.querySelector(".lia-quilt-tkb-message").appendChild(tr_para);
} else if(rootElement.querySelector(".lia-quilt-tkb-reply-message")){
rootElement.querySelector(".lia-quilt-tkb-reply-message").insertBefore(tr_para,rootElement.querySelector(".lia-quilt-row.lia-quilt-row-footer"));
} else if(rootElement.querySelector(".lia-quilt-idea-message")){
rootElement.querySelector(".lia-quilt-idea-message").appendChild(tr_para);
}else if(rootElement.querySelector(".lia-quilt-column-alley-left")){
rootElement.querySelector(".lia-quilt-column-alley-left").appendChild(tr_para);
}
else {
if (rootElement.querySelectorAll('div.lia-quilt-row-footer').length > 0) {
rootElement.querySelectorAll('div.lia-quilt-row-footer')[0].appendChild(tr_para);
} else {
rootElement.querySelectorAll('div.lia-quilt-column-message-footer')[0].appendChild(tr_para);
}
}
}
} catch (e) {
}
}
} else {
/* Do not display button for same language */
// syncList.remove(value);
var index = $scope.syncList.indexOf(value);
if (index > -1) {
$scope.syncList.splice(index, 1);
}
}
}
}
}
}
angular.forEach(mszList_l, function (value) {
if (document.querySelectorAll('div.lia-js-data-messageUid-' + value).length > 0) {
var rootElements = document.querySelectorAll('div.lia-js-data-messageUid-' + value);
}else if(document.querySelectorAll('.lia-occasion-message-view .lia-component-occasion-message-view').length >0){
var rootElements = document.querySelectorAll('.lia-occasion-message-view .lia-component-occasion-message-view')[0].querySelectorAll('.lia-occasion-description')[0];
}else {
var rootElements = document.querySelectorAll('div.message-uid-' + value);
}
angular.forEach(rootElements, function (rootElement) {
if (value == '73706' && "TkbArticlePage" == "TkbArticlePage") {
rootElement = document.querySelector('.lia-thread-topic');
}
/* V1.1 Remove from UI */
if (document.getElementById("tr-msz-" + value)) {
document.getElementById("tr-msz-" + value).remove();
}
if (document.getElementById("tr-sync-" + value)) {
document.getElementById("tr-sync-" + value).remove();
}
/* XPath expression for subject and Body */
var lingoRBExp = "//lingo-body[@id = " + "'lingo-body-" + value + "'" + "]";
lingoRSExp = "//lingo-sub[@id = " + "'lingo-sub-" + value + "'" + "]";
/* Get translated subject of the message */
lingoRSXML = doc.evaluate(lingoRSExp, doc, null, XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE, null);
for (var i = 0; i < lingoRSXML.snapshotLength; i++) {
/* Replace Reply/Comment subject with transalted subject */
var newSub = lingoRSXML.snapshotItem(i);
/*** START : extracting subject from source if selected language and source language is same **/
var sub_L = "";
if (newSub.getAttribute("slang").toLowerCase() == code_l.toLowerCase()) {
if (value == '73706') {
sub_L = decodeURIComponent($scope.sourceContent[value].subject);
}
else{
sub_L = decodeURIComponent($scope.sourceContent[value].subject);
}
} else {
sub_L = newSub.innerHTML;
}
/*** End : extracting subject from source if selected language and source language is same **/
/* This code is placed to remove the extra meta tag adding in the UI*/
try{
sub_L = sub_L.replace('<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />','');
}
catch(e){
}
// if($scope.viewTrContentOnly || (newSub.getAttribute("slang").toLowerCase() != code_l.toLowerCase())) {
if ($scope.viewTrContentOnly) {
if ("TkbArticlePage" == "IdeaPage") {
if (value == '73706') {
if( (sub_L != "") && (sub_L != undefined) && (sub_L != "undefined") ){
document.querySelector('.MessageSubject .lia-message-subject').innerHTML = sub_L;
}
}
}
if ("TkbArticlePage" == "TkbArticlePage") {
if (value == '73706') {
if( (sub_L != "") && (sub_L != undefined) && (sub_L != "undefined") ){
var subTkbElement = document.querySelector('.lia-thread-subject');
if(subTkbElement){
document.querySelector('.lia-thread-subject').innerHTML = sub_L;
}
}
}
}
else if ("TkbArticlePage" == "BlogArticlePage") {
if (value == '73706') {
try {
if((sub_L != "") && (sub_L!= undefined) && (sub_L != "undefined")){
var subElement = rootElement.querySelector('.lia-blog-article-page-article-subject');
if(subElement) {
subElement.innerText = sub_L;
}
}
} catch (e) {
}
/* var subElement = rootElement.querySelectorAll('.lia-blog-article-page-article-subject');
for (var subI = 0; subI < subElement.length; subI++) {
if((sub_L != "") && (sub_L!= undefined) && (sub_L != "undefined")){
subElement[subI].innerHTML = sub_L;
}
} */
}
else {
try {
// rootElement.querySelectorAll('.lia-blog-article-page-article-subject').innerHTML= sub_L;
/** var subElement = rootElement.querySelectorAll('.lia-blog-article-page-article-subject');
for (var j = 0; j < subElement.length; j++) {
if( (sub_L != "") && (sub_L != undefined) && (sub_L != "undefined") ){
subElement[j].innerHTML = sub_L;
}
} **/
} catch (e) {
}
}
}
else {
if (value == '73706') {
try{
/* Start: This code is written by iTalent as part of iTrack LILICON - 98 */
if( (sub_L != "") && (sub_L != undefined) && (sub_L != "undefined") ){
if(document.querySelectorAll('.lia-quilt-forum-topic-page').length > 0){
if(rootElement.querySelector('div.lia-message-subject').querySelector('h5')){
rootElement.querySelector('div.lia-message-subject').querySelector('h5').innerText = decodeURIComponent(sub_L);
} else {
rootElement.querySelector('.MessageSubject .lia-message-subject').innerText = sub_L;
}
} else {
rootElement.querySelector('.MessageSubject .lia-message-subject').innerText = sub_L;
}
}
/* End: This code is written by iTalent as part of iTrack LILICON - 98 */
}
catch(e){
console.log("subject not available for second time. error details: " + e);
}
} else {
try {
/* Start: This code is written by iTalent as part of LILICON - 98 reported by Ian */
if ("TkbArticlePage" == "IdeaPage") {
if( (sub_L != "") && (sub_L != undefined) && (sub_L != "undefined") ){
document.querySelector('.lia-js-data-messageUid-'+ value).querySelector('.MessageSubject .lia-message-subject').innerText = sub_L;
}
}
else{
if( (sub_L != "") && (sub_L != undefined) && (sub_L != "undefined") ){
rootElement.querySelector('.MessageSubject .lia-message-subject').innerText = sub_L;
/* End: This code is written as part of LILICON - 98 reported by Ian */
}
}
} catch (e) {
console.log("Reply subject not available. error details: " + e);
}
}
}
// Label translation
var labelEle = document.querySelector("#labelsForMessage");
if (!labelEle) {
labelEle = document.querySelector(".LabelsList");
}
if (labelEle) {
var listContains = labelEle.querySelector('.label');
if (listContains) {
/* Commenting this code as bussiness want to point search with source language label */
// var tagHLink = labelEle.querySelectorAll(".label")[0].querySelector(".label-link").href.split("label-name")[0];
var lingoLabelExp = "//lingo-label/text()";
trLabels = [];
trLabelsHtml = "";
/* Get translated labels of the message */
lingoLXML = doc.evaluate(lingoLabelExp, doc, null, XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE, null);
/* try{
for(var j=0;j,';
}
trLabelsHtml = trLabelsHtml+'