- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Meta Data in Column Name to Row
Greetings,
I am in need of help scripting a multi-step process as I am unable to abstract a viable solution to achieve some data preparation work.
I have time series data that comes in as several spread sheets, containing several tabs, and each tab has data with column names that contain meta data contextualizing the data.
In the example below, the columns with "Descriptor" (columns 1,3 and 5) should really be named "Date", but instead contain Descriptors that are needed for analysis. Additionally, The Unit columns(2,4 and 6) should really be named data but contain the unit of measure.
Big:Small:Wet | mg | Bad:Good:Ugly | Hz | Port:Starboard:Speed | mL |
12/15/2021 | 0 | 12/15/2021 | 4 | 12/15/2021 | 8 |
12/16/2021 | 1 | 12/16/2021 | 5 | 12/16/2021 | 9 |
12/17/2021 | 2 | 12/17/2021 | 6 | 12/17/2021 | 10 |
12/18/2021 | 3 | 12/18/2021 | 7 | 12/18/2021 | 11 |
Ideally, the finished data table would have the following structure.
Date | DescriptorType1 | DescriptorType2 | DescriptorType3 | Unit | Data |
12/15/2021 | Big | Small | Wet | mg | 0 |
12/16/2021 | Big | Small | Wet | mg | 1 |
12/17/2021 | Big | Small | Wet | mg | 2 |
12/18/2021 | Big | Small | Wet | mg | 3 |
12/15/2021 | Bad | Good | Ugly | Hz | 4 |
12/16/2021 | Bad | Good | Ugly | Hz | 5 |
12/17/2021 | Bad | Good | Ugly | Hz | 6 |
12/18/2021 | Bad | Good | Ugly | Hz | 7 |
12/15/2021 | Port | Starboard | Speed | mL | 8 |
12/16/2021 | Port | Starboard | Speed | mL | 9 |
My attempts to pull the column headers into a separate table and re-join with nested for loops have not proven successful. Here is code that pulls and prepares the column name data for the example data included.
Names Default to Here(1);
//Get MainDT column Names
colNames = Data Table("Data") << Get Column Names;
//Create Empty Table
dtCols = New Table("DTCols");
//Put in Columns into the table
dtCols << New Column( "Columns", Character, Values(colNames));
//Label every other row for splitting
dtCols << New Column( "Data", formula( sequence(1,2,1)));
//Remove formula from Data column
dtCols:Data << Delete Formula;
//New column from which to split from
dtCols << New Column( "Type", formula(
If(
contains(:Columns, "Desc"), "Meta",
"Unit_Raw"
);
);
);
//Deleting formula for handling
dtCols:Type << Delete formula;
//Splitting the row based on the labelling
splitdtCols = dtCols << Split(
Split By( :Type ),
Split( :Columns ),
Sort by Column Property,
Output Table( "SplitDTCols" )
);
//Text to Columns by ":"
splitdtCols << text to columns(delimiter(":"), columns(:Meta));
The nested for loop (which only looks to put the full description back into the original table) is not only relatively slow (this is needed to loop through >1 million entries) but it only writes missing data into the columns.
//Meta column back into Data
For(i = 1, i <= N Items(colNames), i++,
For(j = 1, j <= N Rows(splitdtCols), j++,
//show(splitdtCols:Meta[j]);
If( eval(char(colNames[i])) == eval(char(splitdtCols:Meta[j])),
Data Table("Data") << New Column("Descriptor", set Values(eval(char(splitdtCols:Meta[j]))))
)
)
);
I am willing to abandon the approach of pulling out the column name data into a separate table. It really feels like there is a simpler approach I am missing.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Meta Data in Column Name to Row
Interactively this is easy to do with a Multiple series stack.
Use Tables->Stack and add all the columns to the Stack Columns list. Then click the checkbox for Multiple series stack. Set Number of Series to 2 and don't check Contiguous.
You'll get a table like this:
Rename the Data column to Date and the Label 2 column to Units.
Then, you can use Cols->Utilities->Text to Cols... to split the Label column into three columns using a colon (":") as the delimiter.
You'll get this:
If you need to script this, because this is something you need to automate or you do this a lot, you can get the script for the Stack from Source script:
stacked_dt = Data Table( "445470" ) << Stack(
columns(
:"Big:Small:Wet"n, :mg, :"Bad:Good:Ugly"n, :Hz, :"Port:Starboard:Speed"n,
:mL
),
Source Label Column( "Label" ),
Stacked Data Column( "Data" ),
Number of Series( 2 )
);
You'll need to use the <<Set Name message to change the names of the columns.
stacked_dt:"Data" << Set Name( "Date" );
stacked_dt:"Label 2" << Set Name( "Units" );
Finally Text to Columns is scriptable as well.
stacked_dt << Text To Columns(
delimiter( ":" ),
columns( :Label )
);
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Meta Data in Column Name to Row
Interactively this is easy to do with a Multiple series stack.
Use Tables->Stack and add all the columns to the Stack Columns list. Then click the checkbox for Multiple series stack. Set Number of Series to 2 and don't check Contiguous.
You'll get a table like this:
Rename the Data column to Date and the Label 2 column to Units.
Then, you can use Cols->Utilities->Text to Cols... to split the Label column into three columns using a colon (":") as the delimiter.
You'll get this:
If you need to script this, because this is something you need to automate or you do this a lot, you can get the script for the Stack from Source script:
stacked_dt = Data Table( "445470" ) << Stack(
columns(
:"Big:Small:Wet"n, :mg, :"Bad:Good:Ugly"n, :Hz, :"Port:Starboard:Speed"n,
:mL
),
Source Label Column( "Label" ),
Stacked Data Column( "Data" ),
Number of Series( 2 )
);
You'll need to use the <<Set Name message to change the names of the columns.
stacked_dt:"Data" << Set Name( "Date" );
stacked_dt:"Label 2" << Set Name( "Units" );
Finally Text to Columns is scriptable as well.
stacked_dt << Text To Columns(
delimiter( ":" ),
columns( :Label )
);
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Meta Data in Column Name to Row
This was exactly what I was hoping for. Something simple and that I overlooked. I was able to implement this pretty quickly.