Solved: Meta Data in Column Name to Row

danielrbiber · Jun 10, 2023 4:41 PM

Greetings,

I am in need of help scripting a multi-step process as I am unable to abstract a viable solution to achieve some data preparation work.

I have time series data that comes in as several spread sheets, containing several tabs, and each tab has data with column names that contain meta data contextualizing the data.

In the example below, the columns with "Descriptor" (columns 1,3 and 5) should really be named "Date", but instead contain Descriptors that are needed for analysis. Additionally, The Unit columns(2,4 and 6) should really be named data but contain the unit of measure.

Big:Small:Wet	mg	Bad:Good:Ugly	Hz	Port:Starboard:Speed	mL
12/15/2021	0	12/15/2021	4	12/15/2021	8
12/16/2021	1	12/16/2021	5	12/16/2021	9
12/17/2021	2	12/17/2021	6	12/17/2021	10
12/18/2021	3	12/18/2021	7	12/18/2021	11

Ideally, the finished data table would have the following structure.

Date	DescriptorType1	DescriptorType2	DescriptorType3	Unit	Data
12/15/2021	Big	Small	Wet	mg	0
12/16/2021	Big	Small	Wet	mg	1
12/17/2021	Big	Small	Wet	mg	2
12/18/2021	Big	Small	Wet	mg	3
12/15/2021	Bad	Good	Ugly	Hz	4
12/16/2021	Bad	Good	Ugly	Hz	5
12/17/2021	Bad	Good	Ugly	Hz	6
12/18/2021	Bad	Good	Ugly	Hz	7
12/15/2021	Port	Starboard	Speed	mL	8
12/16/2021	Port	Starboard	Speed	mL	9

My attempts to pull the column headers into a separate table and re-join with nested for loops have not proven successful. Here is code that pulls and prepares the column name data for the example data included.

Names Default to Here(1);

//Get MainDT column Names
colNames = Data Table("Data") << Get Column Names;

//Create Empty Table
dtCols = New Table("DTCols");

//Put in Columns into the table
dtCols << New Column( "Columns", Character, Values(colNames));

//Label every other row for splitting
dtCols << New Column( "Data", formula( sequence(1,2,1)));

//Remove formula from Data column
dtCols:Data << Delete Formula;

//New column from which to split from
dtCols << New Column( "Type", formula(
	If(
		contains(:Columns, "Desc"), "Meta",
		"Unit_Raw"
	   );
	);
);


//Deleting formula for handling
dtCols:Type << Delete formula;

//Splitting the row based on the labelling
splitdtCols = dtCols << Split(
	Split By( :Type ),
	Split( :Columns ),
	Sort by Column Property,
	Output Table( "SplitDTCols" )
);

//Text to Columns by ":"
splitdtCols << text to columns(delimiter(":"), columns(:Meta));

The nested for loop (which only looks to put the full description back into the original table) is not only relatively slow (this is needed to loop through >1 million entries) but it only writes missing data into the columns.

//Meta column back into Data
For(i = 1, i <= N Items(colNames), i++,
	For(j = 1, j <= N Rows(splitdtCols), j++,
		//show(splitdtCols:Meta[j]);
		If( eval(char(colNames[i])) == eval(char(splitdtCols:Meta[j])),
			Data Table("Data") << New Column("Descriptor", set Values(eval(char(splitdtCols:Meta[j]))))
		)
	)
);

I am willing to abandon the approach of pulling out the column name data into a separate table. It really feels like there is a simpler approach I am missing.

Jeff_Perkinson · Dec 15, 2021 02:51 PM

Interactively this is easy to do with a Multiple series stack.

Use Tables->Stack and add all the columns to the Stack Columns list. Then click the checkbox for Multiple series stack. Set Number of Series to 2 and don't check Contiguous.

You'll get a table like this:

Rename the Data column to Date and the Label 2 column to Units.

Then, you can use Cols->Utilities->Text to Cols... to split the Label column into three columns using a colon (":") as the delimiter.

You'll get this:

If you need to script this, because this is something you need to automate or you do this a lot, you can get the script for the Stack from Source script:

stacked_dt = Data Table( "445470" ) << Stack(
	columns(
		:"Big:Small:Wet"n, :mg, :"Bad:Good:Ugly"n, :Hz, :"Port:Starboard:Speed"n,
		:mL
	),
	Source Label Column( "Label" ),
	Stacked Data Column( "Data" ),
	Number of Series( 2 )
);

You'll need to use the <<Set Name message to change the names of the columns.

stacked_dt:"Data" << Set Name( "Date" );
stacked_dt:"Label 2" << Set Name( "Units" );

Finally Text to Columns is scriptable as well.

stacked_dt << Text To Columns(
	delimiter( ":" ),
	columns( :Label )
);

-Jeff

View solution in original post

Jeff_Perkinson · Dec 15, 2021 02:51 PM

Interactively this is easy to do with a Multiple series stack.

Use Tables->Stack and add all the columns to the Stack Columns list. Then click the checkbox for Multiple series stack. Set Number of Series to 2 and don't check Contiguous.

You'll get a table like this:

Rename the Data column to Date and the Label 2 column to Units.

Then, you can use Cols->Utilities->Text to Cols... to split the Label column into three columns using a colon (":") as the delimiter.

You'll get this:

If you need to script this, because this is something you need to automate or you do this a lot, you can get the script for the Stack from Source script:

stacked_dt = Data Table( "445470" ) << Stack(
	columns(
		:"Big:Small:Wet"n, :mg, :"Bad:Good:Ugly"n, :Hz, :"Port:Starboard:Speed"n,
		:mL
	),
	Source Label Column( "Label" ),
	Stacked Data Column( "Data" ),
	Number of Series( 2 )
);

You'll need to use the <<Set Name message to change the names of the columns.

stacked_dt:"Data" << Set Name( "Date" );
stacked_dt:"Label 2" << Set Name( "Units" );

Finally Text to Columns is scriptable as well.

stacked_dt << Text To Columns(
	delimiter( ":" ),
	columns( :Label )
);

-Jeff

danielrbiber · Dec 15, 2021 03:13 PM

This was exactly what I was hoping for. Something simple and that I overlooked. I was able to implement this pretty quickly.

Meta Data in Column Name to Row

Re: Meta Data in Column Name to Row

Re: Meta Data in Column Name to Row

Re: Meta Data in Column Name to Row