Stack side by side tables from excel and add header columns
Mar 14, 2025 08:33 AM(1370 views)
I have been struggling for 2 days pulling data from excel into JMP in the correct format.
I have hundreds of excel sheets, but the data is disorganised. There are many tables next to eachother (hundreds) in an excel sheet.
I want to stack the data across the different tables, but also make new columns with the associated header data. I have only been able to achieve this on one table, but can't replicate it across all tables, or sheets across different excel docs. The sheets also tend to have different tab names, but if i can just get it to work on one excel document that would be very good.
I modified the excel file you provided to be a bit simpler, but the idea would still be the same if it was more complicated. I will only handle one worksheet, but it is easy to loop over the sheets.
Starting data
Importing to JMP (for some reason JMP wants to drop the first empty column even though I haven't told it to do that, but that is fine)
Our starting data
At this point I will immediately go for JSL but you could easily do some cleanup using interactive features of JMP. Note that this solution does not deal with different types of issues in the data (missing headers, duplicated parameters) as I don't know what should be done to those
Names Default To Here(1);
dt = Open(
"$DOWNLOADS/Book1.xlsx",
Worksheets("Sheet1"),
Use for all sheets(0),
Concatenate Worksheets(0),
Create Concatenation Column(0),
Worksheet Settings(
1,
Has Column Headers(0),
Number of Rows in Headers(1),
Headers Start on Row(1),
Data Starts on Row(1),
Data Starts on Column(1),
Data Ends on Row(0),
Data Ends on Column(0),
Replicated Spanned Rows(0),
Replicated Spanned Headers(0),
Suppress Hidden Rows(0),
Suppress Hidden Columns(0),
Suppress Empty Columns(0),
Treat as Hierarchy(0),
Multiple Series Stack(0),
Import Cell Colors(0),
Limit Column Detect(1),
Column Separator String("-"),
Column Numeric Format(Index(0), Column Name(Column 1), Datatype("Character"))
)
);
// turn everything after this into functions/expressions which you can easily run on different sheets
// Take only useful data: header which belongs to parameters (first two rows)
// Parameter rows
header_rows = [1, 2];
param_rows = dt << Get Rows Where(Starts With(:Column 1, "Parameter"));
dt_headers = dt << subset(Rows(header_rows), Selected Columns(0), Output table("Data Header"));
dt_params = dt << subset(Rows(param_rows), Selected Columns(0), Output table("Data Parametere"));
Close(dt, no save);
// Drop empty columns
For Each({colname}, dt_params << Get Column Names("String"),
If(Col Number(Column(dt_params, colname)) == 0,
dt_params << Delete Column(colname)
);
);
dt_result = dt_params << Stack(
columns(dt_params << Get Column Names("String")),
Stack By Row(0),
Drop All Other Columns(1),
Number of Series(2),
Output Table("Result")
);
// Close(dt_params, no save);
// Cleanup some columns
dt_result << Delete Columns(1, 3);
Column(dt_result, 1) << Set Name("Name");
Column(dt_result, 2) << Set Name("Value");
// Add grouping for header joining
new_col = dt_result << New Column("Group", Numeric, Ordinal, Formula(
Col Cumulative Sum(1, :Name)
));
dt_result << run formulas;
new_col << Delete Formula;
headers = dt_headers[1, 0];
header_count = N Items(Associative Array(headers));
dt_headerstack = Eval(EvalExpr(dt_headers << Stack(
columns(dt_headers << Get Column Names("String")),
Stack By Row(0),
Number of Series(Expr(header_count)),
Output Table("header stack")
)));
// Close(dt_headers, no save);
// drop unnecessary columns
dt_headerstack << Delete Columns(1::N Cols(dt_headerstack)::2);
// Move header columns and remove extra header rows
dt_headerstack << Move Up;
rows_to_delete = dt_headerstack << Get Rows Where(Starts With(Column(dt_headerstack, 1)[], "Header"));
dt_headerstack << Delete Rows(rows_to_delete);
// Add grouping for update
new_col = dt_headerstack << New Column("Group", Numeric, Ordinal, Formula(
Row()
));
dt_headerstack << run formulas;
new_col << Delete Formula;
dt_result << Update(
With(dt_headerstack),
Match Columns(:Group = :Group)
);
Close(dt_headerstack, no save);
Close(dt_headers, no save); // close earlier
Close(dt_params, no save); // close earlier
Close(dt, no save); // close earlier
/*
sheetList = Get Excel Worksheets("$DOWNLOADS/Book1.xlsx"); // for sheet list to loop over
*/
Absolutely, i've attached some dummy data. There can be up to maybe 200 tables side-by-side, and i have added different worksheets within the same excel file just to show the tab names change, but actually these sheets are often in separate excel sheets within a folder. The yellow highlights are where new headers would be needed i believe, and the orange highlight is just there because in the actual data set there are a few hundred empty rows before this data starts below it.
Ideally i'd like to bring in all of the parameter (parameters 1-19, there are more when you scroll down in the excel) under a parameters header, and the values under a value header, and add columns to show that this data belongs to header1/2/3 (and new 4 ) values by adding columns as in the original post.
Ultimately the two sets of data (VCL and VL header columns) i will also be looking to save similarly with some identifier from header1/2/3 (and 4).
Let me know if you have any thought, it would be much appreciated!
It won't be easy, as to be honest, this is horribly formatted data. Going to the source and fixing the data there would be the best option but it is rarely an option.
I would start by pulling the data into JMP as raw as possible, so something like this
And from this I would start building a script as this data has too many issues to deal without using JSL (you can most likely get quite far without JSL, but it with JSL it will be much easier). Fixing the first line of headers is most likely the first thing you should start with as then you can start more easily separating the "tables" and their data
I modified the excel file you provided to be a bit simpler, but the idea would still be the same if it was more complicated. I will only handle one worksheet, but it is easy to loop over the sheets.
Starting data
Importing to JMP (for some reason JMP wants to drop the first empty column even though I haven't told it to do that, but that is fine)
Our starting data
At this point I will immediately go for JSL but you could easily do some cleanup using interactive features of JMP. Note that this solution does not deal with different types of issues in the data (missing headers, duplicated parameters) as I don't know what should be done to those
Names Default To Here(1);
dt = Open(
"$DOWNLOADS/Book1.xlsx",
Worksheets("Sheet1"),
Use for all sheets(0),
Concatenate Worksheets(0),
Create Concatenation Column(0),
Worksheet Settings(
1,
Has Column Headers(0),
Number of Rows in Headers(1),
Headers Start on Row(1),
Data Starts on Row(1),
Data Starts on Column(1),
Data Ends on Row(0),
Data Ends on Column(0),
Replicated Spanned Rows(0),
Replicated Spanned Headers(0),
Suppress Hidden Rows(0),
Suppress Hidden Columns(0),
Suppress Empty Columns(0),
Treat as Hierarchy(0),
Multiple Series Stack(0),
Import Cell Colors(0),
Limit Column Detect(1),
Column Separator String("-"),
Column Numeric Format(Index(0), Column Name(Column 1), Datatype("Character"))
)
);
// turn everything after this into functions/expressions which you can easily run on different sheets
// Take only useful data: header which belongs to parameters (first two rows)
// Parameter rows
header_rows = [1, 2];
param_rows = dt << Get Rows Where(Starts With(:Column 1, "Parameter"));
dt_headers = dt << subset(Rows(header_rows), Selected Columns(0), Output table("Data Header"));
dt_params = dt << subset(Rows(param_rows), Selected Columns(0), Output table("Data Parametere"));
Close(dt, no save);
// Drop empty columns
For Each({colname}, dt_params << Get Column Names("String"),
If(Col Number(Column(dt_params, colname)) == 0,
dt_params << Delete Column(colname)
);
);
dt_result = dt_params << Stack(
columns(dt_params << Get Column Names("String")),
Stack By Row(0),
Drop All Other Columns(1),
Number of Series(2),
Output Table("Result")
);
// Close(dt_params, no save);
// Cleanup some columns
dt_result << Delete Columns(1, 3);
Column(dt_result, 1) << Set Name("Name");
Column(dt_result, 2) << Set Name("Value");
// Add grouping for header joining
new_col = dt_result << New Column("Group", Numeric, Ordinal, Formula(
Col Cumulative Sum(1, :Name)
));
dt_result << run formulas;
new_col << Delete Formula;
headers = dt_headers[1, 0];
header_count = N Items(Associative Array(headers));
dt_headerstack = Eval(EvalExpr(dt_headers << Stack(
columns(dt_headers << Get Column Names("String")),
Stack By Row(0),
Number of Series(Expr(header_count)),
Output Table("header stack")
)));
// Close(dt_headers, no save);
// drop unnecessary columns
dt_headerstack << Delete Columns(1::N Cols(dt_headerstack)::2);
// Move header columns and remove extra header rows
dt_headerstack << Move Up;
rows_to_delete = dt_headerstack << Get Rows Where(Starts With(Column(dt_headerstack, 1)[], "Header"));
dt_headerstack << Delete Rows(rows_to_delete);
// Add grouping for update
new_col = dt_headerstack << New Column("Group", Numeric, Ordinal, Formula(
Row()
));
dt_headerstack << run formulas;
new_col << Delete Formula;
dt_result << Update(
With(dt_headerstack),
Match Columns(:Group = :Group)
);
Close(dt_headerstack, no save);
Close(dt_headers, no save); // close earlier
Close(dt_params, no save); // close earlier
Close(dt, no save); // close earlier
/*
sheetList = Get Excel Worksheets("$DOWNLOADS/Book1.xlsx"); // for sheet list to loop over
*/
This is excellent thank you! This works very well for the example file, i will start on some cleanup and see how far this gets me, but this looks like a great way forward with it, thanks!
Recommended Articles
'
var data = div.getElementsByClassName("video-js");
var script = document.createElement('script');
script.src = "https://players.brightcove.net/" + data_account + "/" + data_palyer + "_default/index.min.js";
for(var i=0;i< data.length;i++){
videodata.push(data[i]);
}
}
}
for(var i=0;i< videodata.length;i++){
document.getElementsByClassName('lia-vid-container')[i].innerHTML = videodata[i].outerHTML;
document.body.appendChild(script);
}
}
catch(e){
}
/* Re compile html */
$compile(rootElement.querySelectorAll('div.lia-message-body-content')[0])($scope);
}
if (code_l.toLowerCase() != newBody.getAttribute("slang").toLowerCase()) {
/* Adding Translation flag */
var tr_obj = $filter('filter')($scope.sourceLangList, function (obj_l) {
return obj_l.code.toLowerCase() === newBody.getAttribute("slang").toLowerCase()
});
if (tr_obj.length > 0) {
tr_text = "This post originally written in lilicon-trans-text has been computer translated for you. When you reply, it will also be translated back to lilicon-trans-text.".replace(/lilicon-trans-text/g, tr_obj[0].title);
try {
if ($scope.wootMessages[$rootScope.profLang] != undefined) {
tr_text = $scope.wootMessages[$rootScope.profLang].replace(/lilicon-trans-text/g, tr_obj[0].title);
}
} catch (e) {
}
} else {
//tr_text = "This message was translated for your convenience!";
tr_text = "This message was translated for your convenience!";
}
try {
if (!document.getElementById("tr-msz-" + value)) {
var tr_para = document.createElement("P");
tr_para.setAttribute("id", "tr-msz-" + value);
tr_para.setAttribute("class", "tr-msz");
tr_para.style.textAlign = 'justify';
var tr_fTag = document.createElement("IMG");
tr_fTag.setAttribute("class", "tFlag");
tr_fTag.setAttribute("src", "/html/assets/lingoTrFlag.PNG");
tr_fTag.style.marginRight = "5px";
tr_fTag.style.height = "14px";
tr_para.appendChild(tr_fTag);
var tr_textNode = document.createTextNode(tr_text);
tr_para.appendChild(tr_textNode);
/* Woot message only for multi source */
if(rootElement.querySelector(".lia-quilt-forum-message")){
rootElement.querySelector(".lia-quilt-forum-message").appendChild(tr_para);
} else if(rootElement.querySelector(".lia-message-view-blog-topic-message")) {
rootElement.querySelector(".lia-message-view-blog-topic-message").appendChild(tr_para);
} else if(rootElement.querySelector(".lia-quilt-blog-reply-message")){
rootElement.querySelector(".lia-quilt-blog-reply-message").appendChild(tr_para);
} else if(rootElement.querySelector(".lia-quilt-tkb-message")){
rootElement.querySelector(".lia-quilt-tkb-message").appendChild(tr_para);
} else if(rootElement.querySelector(".lia-quilt-tkb-reply-message")){
rootElement.querySelector(".lia-quilt-tkb-reply-message").insertBefore(tr_para,rootElement.querySelector(".lia-quilt-row.lia-quilt-row-footer"));
} else if(rootElement.querySelector(".lia-quilt-idea-message")){
rootElement.querySelector(".lia-quilt-idea-message").appendChild(tr_para);
} else if(rootElement.querySelector(".lia-quilt-idea-reply-message")){
rootElement.querySelector(".lia-quilt-idea-reply-message").insertBefore(tr_para,rootElement.querySelector(".lia-quilt-row.lia-quilt-row-footer"));
}else if(rootElement.querySelector(".lia-quilt-column-alley-left")){
rootElement.querySelector(".lia-quilt-column-alley-left").appendChild(tr_para);
}
else {
if (rootElement.querySelectorAll('div.lia-quilt-row-footer').length > 0) {
rootElement.querySelectorAll('div.lia-quilt-row-footer')[0].appendChild(tr_para);
} else {
rootElement.querySelectorAll('div.lia-quilt-column-message-footer')[0].appendChild(tr_para);
}
}
}
} catch (e) {
}
}
} else {
/* Do not display button for same language */
// syncList.remove(value);
var index = $scope.syncList.indexOf(value);
if (index > -1) {
$scope.syncList.splice(index, 1);
}
}
}
}
}
}
angular.forEach(mszList_l, function (value) {
if (document.querySelectorAll('div.lia-js-data-messageUid-' + value).length > 0) {
var rootElements = document.querySelectorAll('div.lia-js-data-messageUid-' + value);
}else if(document.querySelectorAll('.lia-occasion-message-view .lia-component-occasion-message-view').length >0){
var rootElements = document.querySelectorAll('.lia-occasion-message-view .lia-component-occasion-message-view')[0].querySelectorAll('.lia-occasion-description')[0];
}else {
var rootElements = document.querySelectorAll('div.message-uid-' + value);
}
angular.forEach(rootElements, function (rootElement) {
if (value == '847863' && "ForumTopicPage" == "TkbArticlePage") {
rootElement = document.querySelector('.lia-thread-topic');
}
/* V1.1 Remove from UI */
if (document.getElementById("tr-msz-" + value)) {
document.getElementById("tr-msz-" + value).remove();
}
if (document.getElementById("tr-sync-" + value)) {
document.getElementById("tr-sync-" + value).remove();
}
/* XPath expression for subject and Body */
var lingoRBExp = "//lingo-body[@id = " + "'lingo-body-" + value + "'" + "]";
lingoRSExp = "//lingo-sub[@id = " + "'lingo-sub-" + value + "'" + "]";
/* Get translated subject of the message */
lingoRSXML = doc.evaluate(lingoRSExp, doc, null, XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE, null);
for (var i = 0; i < lingoRSXML.snapshotLength; i++) {
/* Replace Reply/Comment subject with transalted subject */
var newSub = lingoRSXML.snapshotItem(i);
/*** START : extracting subject from source if selected language and source language is same **/
var sub_L = "";
if(newSub.getAttribute("slang").toLowerCase() == code_l.toLowerCase()) {
if(value == '847863') {
if($scope.sourceContent[value]){
if($scope.sourceContent[value].subject != ''){
sub_L = decodeURIComponent($scope.sourceContent[value].subject);
}
}else{
sub_L = newSub.innerHTML;
}
}
else{
sub_L = decodeURIComponent($scope.sourceContent[value].subject);
}
} else {
sub_L = newSub.innerHTML;
}
/*** End : extracting subject from source if selected language and source language is same **/
/* This code is placed to remove the extra meta tag adding in the UI*/
try{
sub_L = sub_L.replace('<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />','');
}
catch(e){
}
// if($scope.viewTrContentOnly || (newSub.getAttribute("slang").toLowerCase() != code_l.toLowerCase())) {
if ($scope.viewTrContentOnly) {
if ("ForumTopicPage" == "IdeaPage") {
if (value == '847863') {
if( (sub_L != "") && (sub_L != undefined) && (sub_L != "undefined") ){
document.querySelector('.MessageSubject .lia-message-subject').innerHTML = sub_L;
}
}
}
if ("ForumTopicPage" == "TkbArticlePage") {
if (value == '847863') {
if( (sub_L != "") && (sub_L != undefined) && (sub_L != "undefined") ){
var subTkbElement = document.querySelector('.lia-thread-subject');
if(subTkbElement){
document.querySelector('.lia-thread-subject').innerHTML = sub_L;
}
}
}
}
else if ("ForumTopicPage" == "BlogArticlePage") {
if (value == '847863') {
try {
if((sub_L != "") && (sub_L!= undefined) && (sub_L != "undefined")){
var subElement = rootElement.querySelector('.lia-blog-article-page-article-subject');
if(subElement) {
subElement.innerText = sub_L;
}
}
} catch (e) {
}
/* var subElement = rootElement.querySelectorAll('.lia-blog-article-page-article-subject');
for (var subI = 0; subI < subElement.length; subI++) {
if((sub_L != "") && (sub_L!= undefined) && (sub_L != "undefined")){
subElement[subI].innerHTML = sub_L;
}
} */
}
else {
try {
// rootElement.querySelectorAll('.lia-blog-article-page-article-subject').innerHTML= sub_L;
/** var subElement = rootElement.querySelectorAll('.lia-blog-article-page-article-subject');
for (var j = 0; j < subElement.length; j++) {
if( (sub_L != "") && (sub_L != undefined) && (sub_L != "undefined") ){
subElement[j].innerHTML = sub_L;
}
} **/
} catch (e) {
}
}
}
else {
if (value == '847863') {
try{
/* Start: This code is written by iTalent as part of iTrack LILICON - 98 */
if( (sub_L != "") && (sub_L != undefined) && (sub_L != "undefined") ){
if(document.querySelectorAll('.lia-quilt-forum-topic-page').length > 0){
if(rootElement.querySelector('div.lia-message-subject').querySelector('h5')){
rootElement.querySelector('div.lia-message-subject').querySelector('h5').innerText = decodeURIComponent(sub_L);
} else {
rootElement.querySelector('.MessageSubject .lia-message-subject').innerText = sub_L;
}
} else {
rootElement.querySelector('.MessageSubject .lia-message-subject').innerText = sub_L;
}
}
/* End: This code is written by iTalent as part of iTrack LILICON - 98 */
}
catch(e){
console.log("subject not available for second time. error details: " + e);
}
} else {
try {
/* Start: This code is written by iTalent as part of LILICON - 98 reported by Ian */
if ("ForumTopicPage" == "IdeaPage") {
if( (sub_L != "") && (sub_L != undefined) && (sub_L != "undefined") ){
document.querySelector('.lia-js-data-messageUid-'+ value).querySelector('.MessageSubject .lia-message-subject').innerText = sub_L;
}
}
else{
if( (sub_L != "") && (sub_L != undefined) && (sub_L != "undefined") ){
rootElement.querySelector('.MessageSubject .lia-message-subject').innerText = sub_L;
/* End: This code is written as part of LILICON - 98 reported by Ian */
}
}
} catch (e) {
console.log("Reply subject not available. error details: " + e);
}
}
}
// Label translation
var labelEle = document.querySelector("#labelsForMessage");
if (!labelEle) {
labelEle = document.querySelector(".LabelsList");
}
if (labelEle) {
var listContains = labelEle.querySelector('.label');
if (listContains) {
/* Commenting this code as bussiness want to point search with source language label */
// var tagHLink = labelEle.querySelectorAll(".label")[0].querySelector(".label-link").href.split("label-name")[0];
var lingoLabelExp = "//lingo-label/text()";
trLabels = [];
trLabelsHtml = "";
/* Get translated labels of the message */
lingoLXML = doc.evaluate(lingoLabelExp, doc, null, XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE, null);
/* try{
for(var j=0;j,';
}
trLabelsHtml = trLabelsHtml+'