- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Data cleanup: How to combine several columns with slightly different names and non-intersecting data into one?
I have an exported CSV that was formed from two or more different sources that have slightly different column names.
No row has data in more than one of those columns. Like on the screenshot below:
How do I combine all of them into one column?
One idea is to use
Combine Columns()
which will result in a Multiple Response column, but since data is non-intersecting, it will be just one number throughout the column. Then force data type to Numeric and modeling type to Continuous. But it produces warning\error.
Any cleaner way to do that?
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Data cleanup: How to combine several columns with slightly different names and non-intersecting data into one?
I would handle it this way
Names Default To Here( 1 );
dt = Current Data Table();
For Each Row(
:AA bb = Sum( :AA bb, :AA_bb_ )
);
Using the Sum() function ignores missing values, so you will get the values you want.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Data cleanup: How to combine several columns with slightly different names and non-intersecting data into one?
It's a little more complicated than needed. This code works, too, without too much thinking. It will accept more than two column names, in case that's desired.
Names Default To Here( 1 );
dt = Current Data Table();
dict = {"AA bb", "AA _bb_"};
For Each Row( dt,
As Column(dict[1]) = Sum( Transform Each( {v, i}, dict, As Column( v ) ) )
);
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Data cleanup: How to combine several columns with slightly different names and non-intersecting data into one?
I would handle it this way
Names Default To Here( 1 );
dt = Current Data Table();
For Each Row(
:AA bb = Sum( :AA bb, :AA_bb_ )
);
Using the Sum() function ignores missing values, so you will get the values you want.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Data cleanup: How to combine several columns with slightly different names and non-intersecting data into one?
Jim,
Thanks a lot.
I'm going to use this solution.
I'm trying to script it now so that the column names are not hardcoded.
I'm not quite sure why I'm getting error on this:
For Each Row(
Column("AA bb") = 1234;
//Sum( :AA bb, :AA _bb_ )
);
Error is:
attempting to assign to an object that is not an L-value at row 1 in access or evaluation of 'Assign' , Column( "AA bb" ) = /*###*/1234 /*###*/
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Data cleanup: How to combine several columns with slightly different names and non-intersecting data into one?
Idea is to write a function and feed a dictionary to that function.
Something like this:
Names Default To Here( 1 );
dt = Current Data Table();
dict = {"AA bb", "AA _bb_"};
For Each Row(
Column(dict[1]) = Sum( Column(dict[1]), Column(dict[2]) )
);
Only this variant gives the following error:
need arguments to be scalars, matrices, or lists at row 1 in access or evaluation of 'Sum' , Sum/*###*/(Column( dict[1] ), Column( dict[2] ))
I could never wrap my head round the fact that sometimes Column() works as expected, sometimes it doesn't. Sometimes I need to use As Column(). Sometimes something else.
P.S. As Column() worked. Can somebody explain the difference between Column() and As Column()?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Data cleanup: How to combine several columns with slightly different names and non-intersecting data into one?
Ok, my final snippet of code it here:
Names Default To Here( 1 );
dt = Current Data Table();
dict = {"AA bb", "AA _bb_"};
sumExpr = Expr( Sum() );
For Each( {value, index}, dict,
Insert Into( sumExpr, Name Expr( As Column( value ) ) )
);
Eval(
Substitute(
Expr(
For Each Row(
As Column( dict[1] ) = placeholder
)
),
Expr( placeholder ), Name Expr( sumExpr )
)
);
Any way to improve it?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Data cleanup: How to combine several columns with slightly different names and non-intersecting data into one?
It's a little more complicated than needed. This code works, too, without too much thinking. It will accept more than two column names, in case that's desired.
Names Default To Here( 1 );
dt = Current Data Table();
dict = {"AA bb", "AA _bb_"};
For Each Row( dt,
As Column(dict[1]) = Sum( Transform Each( {v, i}, dict, As Column( v ) ) )
);
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Data cleanup: How to combine several columns with slightly different names and non-intersecting data into one?
My next iteration accounts for a case when column name is present in the dictionary, but not present in the table:
Names Default To Here( 1 );
dt = Current Data Table();
dict = {"AA bb", "AA _bb_", "AA cc"};
sumExpr = Expr( Sum() );
For Each( {value, index}, dict,
If( Try( Column( value ) << Get Name, "nocolumn" ) != "nocolumn",
Insert Into( sumExpr, Name Expr( As Column( value ) ) )
)
);
Name Expr(sumExpr);
Try(
Eval(
Substitute(
Expr(
For Each Row(
As Column( dict[1] ) = placeholder
)
),
Expr( placeholder ), Name Expr( sumExpr )
)
),
Write("No columns found")
)
How would you do that when using Transform Each?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Data cleanup: How to combine several columns with slightly different names and non-intersecting data into one?
I would just add a Try().
Names Default To Here( 1 );
dt = Current Data Table();
dict = {"AA bb", "AA _bb_", "AA cc"};
For Each Row( dt, As Column( dict[1] ) = Sum( Transform Each( {v, i}, dict, Try( As Column( v ), . ) ) ) );
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Data cleanup: How to combine several columns with slightly different names and non-intersecting data into one?
Column() refers to the column object, and As Column() refers to the value of that column in the current Row().
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Data cleanup: How to combine several columns with slightly different names and non-intersecting data into one?
Create a new column that will include both variables
Right-click on the column
Select Formula to open the Formula Editor.
In the Formula Editor, go to the Functions list on the left.
Under Character, find and select Concatenate.
Drag or select the first column into the first argument box of the Concatenate function.
Drag or select the second column into the next argument box.
Click OK or Apply to create the new combined column.