Greetings,
I am in need of help scripting a multi-step process as I am unable to abstract a viable solution to achieve some data preparation work.
I have time series data that comes in as several spread sheets, containing several tabs, and each tab has data with column names that contain meta data contextualizing the data.
In the example below, the columns with "Descriptor" (columns 1,3 and 5) should really be named "Date", but instead contain Descriptors that are needed for analysis. Additionally, The Unit columns(2,4 and 6) should really be named data but contain the unit of measure.
Big:Small:Wet | mg | Bad:Good:Ugly | Hz | Port:Starboard:Speed | mL |
12/15/2021 | 0 | 12/15/2021 | 4 | 12/15/2021 | 8 |
12/16/2021 | 1 | 12/16/2021 | 5 | 12/16/2021 | 9 |
12/17/2021 | 2 | 12/17/2021 | 6 | 12/17/2021 | 10 |
12/18/2021 | 3 | 12/18/2021 | 7 | 12/18/2021 | 11 |
Ideally, the finished data table would have the following structure.
Date | DescriptorType1 | DescriptorType2 | DescriptorType3 | Unit | Data |
12/15/2021 | Big | Small | Wet | mg | 0 |
12/16/2021 | Big | Small | Wet | mg | 1 |
12/17/2021 | Big | Small | Wet | mg | 2 |
12/18/2021 | Big | Small | Wet | mg | 3 |
12/15/2021 | Bad | Good | Ugly | Hz | 4 |
12/16/2021 | Bad | Good | Ugly | Hz | 5 |
12/17/2021 | Bad | Good | Ugly | Hz | 6 |
12/18/2021 | Bad | Good | Ugly | Hz | 7 |
12/15/2021 | Port | Starboard | Speed | mL | 8 |
12/16/2021 | Port | Starboard | Speed | mL | 9 |
My attempts to pull the column headers into a separate table and re-join with nested for loops have not proven successful. Here is code that pulls and prepares the column name data for the example data included.
Names Default to Here(1);
//Get MainDT column Names
colNames = Data Table("Data") << Get Column Names;
//Create Empty Table
dtCols = New Table("DTCols");
//Put in Columns into the table
dtCols << New Column( "Columns", Character, Values(colNames));
//Label every other row for splitting
dtCols << New Column( "Data", formula( sequence(1,2,1)));
//Remove formula from Data column
dtCols:Data << Delete Formula;
//New column from which to split from
dtCols << New Column( "Type", formula(
If(
contains(:Columns, "Desc"), "Meta",
"Unit_Raw"
);
);
);
//Deleting formula for handling
dtCols:Type << Delete formula;
//Splitting the row based on the labelling
splitdtCols = dtCols << Split(
Split By( :Type ),
Split( :Columns ),
Sort by Column Property,
Output Table( "SplitDTCols" )
);
//Text to Columns by ":"
splitdtCols << text to columns(delimiter(":"), columns(:Meta));
The nested for loop (which only looks to put the full description back into the original table) is not only relatively slow (this is needed to loop through >1 million entries) but it only writes missing data into the columns.
//Meta column back into Data
For(i = 1, i <= N Items(colNames), i++,
For(j = 1, j <= N Rows(splitdtCols), j++,
//show(splitdtCols:Meta[j]);
If( eval(char(colNames[i])) == eval(char(splitdtCols:Meta[j])),
Data Table("Data") << New Column("Descriptor", set Values(eval(char(splitdtCols:Meta[j]))))
)
)
);
I am willing to abandon the approach of pulling out the column name data into a separate table. It really feels like there is a simpler approach I am missing.