I have been struggling for 2 days pulling data from excel into JMP in the correct format.
I have hundreds of excel sheets, but the data is disorganised. There are many tables next to eachother (hundreds) in an excel sheet.
I want to stack the data across the different tables, but also make new columns with the associated header data. I have only been able to achieve this on one table, but can't replicate it across all tables, or sheets across different excel docs. The sheets also tend to have different tab names, but if i can just get it to work on one excel document that would be very good.
Absolutely, i've attached some dummy data. There can be up to maybe 200 tables side-by-side, and i have added different worksheets within the same excel file just to show the tab names change, but actually these sheets are often in separate excel sheets within a folder. The yellow highlights are where new headers would be needed i believe, and the orange highlight is just there because in the actual data set there are a few hundred empty rows before this data starts below it.
Ideally i'd like to bring in all of the parameter (parameters 1-19, there are more when you scroll down in the excel) under a parameters header, and the values under a value header, and add columns to show that this data belongs to header1/2/3 (and new 4 ) values by adding columns as in the original post.
Ultimately the two sets of data (VCL and VL header columns) i will also be looking to save similarly with some identifier from header1/2/3 (and 4).
Let me know if you have any thought, it would be much appreciated!
It won't be easy, as to be honest, this is horribly formatted data. Going to the source and fixing the data there would be the best option but it is rarely an option.
I would start by pulling the data into JMP as raw as possible, so something like this
And from this I would start building a script as this data has too many issues to deal without using JSL (you can most likely get quite far without JSL, but it with JSL it will be much easier). Fixing the first line of headers is most likely the first thing you should start with as then you can start more easily separating the "tables" and their data
I modified the excel file you provided to be a bit simpler, but the idea would still be the same if it was more complicated. I will only handle one worksheet, but it is easy to loop over the sheets.
Starting data
Importing to JMP (for some reason JMP wants to drop the first empty column even though I haven't told it to do that, but that is fine)
Our starting data
At this point I will immediately go for JSL but you could easily do some cleanup using interactive features of JMP. Note that this solution does not deal with different types of issues in the data (missing headers, duplicated parameters) as I don't know what should be done to those
Names Default To Here(1);
dt =Open("$DOWNLOADS/Book1.xlsx",
Worksheets("Sheet1"),
Use for all sheets(0),
Concatenate Worksheets(0),
Create Concatenation Column(0),
Worksheet Settings(1,
Has Column Headers(0),
Number of Rows in Headers(1),
Headers Start on Row(1),
Data Starts on Row(1),
Data Starts on Column(1),
Data Ends on Row(0),
Data Ends on Column(0),
Replicated Spanned Rows(0),
Replicated Spanned Headers(0),
Suppress Hidden Rows(0),
Suppress Hidden Columns(0),
Suppress Empty Columns(0),
Treat as Hierarchy(0),
Multiple Series Stack(0),
Import Cell Colors(0),
Limit Column Detect(1),
Column Separator String("-"),
Column Numeric Format(Index(0),Column Name(Column 1), Datatype("Character"))));// turn everything after this into functions/expressions which you can easily run on different sheets// Take only useful data: header which belongs to parameters (first two rows)// Parameter rows
header_rows =[1,2];
param_rows = dt << Get Rows Where(Starts With(:Column 1,"Parameter"));
dt_headers = dt << subset(Rows(header_rows), Selected Columns(0), Output table("Data Header"));
dt_params = dt << subset(Rows(param_rows), Selected Columns(0), Output table("Data Parametere"));Close(dt, no save);// Drop empty columnsFor Each({colname}, dt_params << Get Column Names("String"),If(Col Number(Column(dt_params, colname))==0,
dt_params << Delete Column(colname)););
dt_result = dt_params << Stack(
columns(dt_params << Get Column Names("String")),
Stack By Row(0),
Drop All Other Columns(1),
Number of Series(2),
Output Table("Result"));// Close(dt_params, no save);// Cleanup some columns
dt_result << Delete Columns(1,3);Column(dt_result,1)<< Set Name("Name");Column(dt_result,2)<< Set Name("Value");// Add grouping for header joining
new_col = dt_result << New Column("Group", Numeric, Ordinal, Formula(Col Cumulative Sum(1,:Name)));
dt_result << run formulas;
new_col << Delete Formula;
headers = dt_headers[1,0];
header_count =N Items(Associative Array(headers));
dt_headerstack =Eval(EvalExpr(dt_headers << Stack(
columns(dt_headers << Get Column Names("String")),
Stack By Row(0),
Number of Series(Expr(header_count)),
Output Table("header stack"))));// Close(dt_headers, no save);// drop unnecessary columns
dt_headerstack << Delete Columns(1::N Cols(dt_headerstack)::2);// Move header columns and remove extra header rows
dt_headerstack << Move Up;
rows_to_delete = dt_headerstack << Get Rows Where(Starts With(Column(dt_headerstack,1)[],"Header"));
dt_headerstack << Delete Rows(rows_to_delete);// Add grouping for update
new_col = dt_headerstack << New Column("Group", Numeric, Ordinal, Formula(Row()));
dt_headerstack << run formulas;
new_col << Delete Formula;
dt_result << Update(
With(dt_headerstack),
Match Columns(:Group =:Group));Close(dt_headerstack, no save);Close(dt_headers, no save);// close earlierClose(dt_params, no save);// close earlierClose(dt, no save);// close earlier/*
sheetList = Get Excel Worksheets("$DOWNLOADS/Book1.xlsx"); // for sheet list to loop over
*/