cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Check out the JMP® Marketplace featured Capability Explorer add-in
Choose Language Hide Translation Bar
miguello
Level VI

Data cleanup: How to combine several columns with slightly different names and non-intersecting data into one?

I have an exported CSV that was formed from two or more different sources that have slightly different column names.

No row has data in more than one of those columns. Like on the screenshot below:

2024-12-06 19_43_09-Untitled 7 - JMP.png

 

How do I combine all of them into one column?

 

One idea is to use 

Combine Columns()

which will result in a Multiple Response column, but since data is non-intersecting, it will be just one number throughout the column. Then force data type to Numeric and modeling type to Continuous. But it produces warning\error.

 

Any cleaner way to do that?

 

2 ACCEPTED SOLUTIONS

Accepted Solutions
txnelson
Super User

Re: Data cleanup: How to combine several columns with slightly different names and non-intersecting data into one?

I would handle it this way

Names Default To Here( 1 );
dt = Current Data Table();
For Each Row(
	:AA bb = Sum( :AA bb, :AA_bb_ )
);

Using the Sum() function ignores missing values, so you will get the values you want. 

Jim

View solution in original post

mmarchandFSLR
Level IV

Re: Data cleanup: How to combine several columns with slightly different names and non-intersecting data into one?

It's a little more complicated than needed.  This code works, too, without too much thinking.  It will accept more than two column names, in case that's desired.

 

Names Default To Here( 1 );
dt = Current Data Table();
dict = {"AA bb", "AA _bb_"};
For Each Row( dt,
	As Column(dict[1]) = Sum( Transform Each( {v, i}, dict, As Column( v ) ) )
);

View solution in original post

9 REPLIES 9
txnelson
Super User

Re: Data cleanup: How to combine several columns with slightly different names and non-intersecting data into one?

I would handle it this way

Names Default To Here( 1 );
dt = Current Data Table();
For Each Row(
	:AA bb = Sum( :AA bb, :AA_bb_ )
);

Using the Sum() function ignores missing values, so you will get the values you want. 

Jim
miguello
Level VI

Re: Data cleanup: How to combine several columns with slightly different names and non-intersecting data into one?

Jim, 

 

Thanks a lot.

I'm going to use this solution.

I'm trying to script it now so that the column names are not hardcoded.

I'm not quite sure why I'm getting error on this:

For Each Row(
	Column("AA bb") = 1234;
	//Sum( :AA bb, :AA _bb_ )
);

Error is:

attempting to assign to an object that is not an L-value at row 1 in access or evaluation of 'Assign' , Column( "AA bb" ) = /*###*/1234 /*###*/

miguello
Level VI

Re: Data cleanup: How to combine several columns with slightly different names and non-intersecting data into one?

Idea is to write a function and feed a dictionary to that function. 

Something like this:

 

Names Default To Here( 1 );
dt = Current Data Table();
dict = {"AA bb", "AA _bb_"};

For Each Row(
	Column(dict[1]) = Sum( Column(dict[1]), Column(dict[2]) )
);

Only this variant gives the following error:

need arguments to be scalars, matrices, or lists at row 1 in access or evaluation of 'Sum' , Sum/*###*/(Column( dict[1] ), Column( dict[2] ))

 

I could never wrap my head round the fact that sometimes Column() works as expected, sometimes it doesn't. Sometimes I need to use As Column(). Sometimes something else.

 

P.S. As Column() worked. Can somebody explain the difference between Column() and As Column()?

miguello
Level VI

Re: Data cleanup: How to combine several columns with slightly different names and non-intersecting data into one?

Ok, my final snippet of code it here:

Names Default To Here( 1 );
dt = Current Data Table();
dict = {"AA bb", "AA _bb_"};

sumExpr = Expr( Sum() );
For Each( {value, index}, dict,
	Insert Into( sumExpr, Name Expr( As Column( value ) ) )
);
	
Eval(
	Substitute(
			Expr(
				For Each Row(
					As Column( dict[1] ) = placeholder
				)
			),
		Expr( placeholder ), Name Expr( sumExpr )
	)
);
	

Any way to improve it?

mmarchandFSLR
Level IV

Re: Data cleanup: How to combine several columns with slightly different names and non-intersecting data into one?

It's a little more complicated than needed.  This code works, too, without too much thinking.  It will accept more than two column names, in case that's desired.

 

Names Default To Here( 1 );
dt = Current Data Table();
dict = {"AA bb", "AA _bb_"};
For Each Row( dt,
	As Column(dict[1]) = Sum( Transform Each( {v, i}, dict, As Column( v ) ) )
);
miguello
Level VI

Re: Data cleanup: How to combine several columns with slightly different names and non-intersecting data into one?

My next iteration accounts for a case when column name is present in the dictionary, but not present in the table:

Names Default To Here( 1 );
dt = Current Data Table();
dict = {"AA bb", "AA _bb_", "AA cc"};


sumExpr = Expr( Sum() );
For Each( {value, index}, dict,
	If( Try( Column( value ) << Get Name, "nocolumn" ) != "nocolumn", 
		Insert Into( sumExpr, Name Expr( As Column( value ) ) )
	)
);

Name Expr(sumExpr);
Try(	
Eval(
	Substitute(
			Expr(
				For Each Row(
					As Column( dict[1] ) = placeholder
				)
			),
		Expr( placeholder ), Name Expr( sumExpr )
	)
),
Write("No columns found")
)
	

How would you do that when using Transform Each?

mmarchandFSLR
Level IV

Re: Data cleanup: How to combine several columns with slightly different names and non-intersecting data into one?

I would just add a Try().

 

Names Default To Here( 1 );
dt = Current Data Table();
dict = {"AA bb", "AA _bb_", "AA cc"};
For Each Row( dt, As Column( dict[1] ) = Sum( Transform Each( {v, i}, dict, Try( As Column( v ), . ) ) ) );
mmarchandFSLR
Level IV

Re: Data cleanup: How to combine several columns with slightly different names and non-intersecting data into one?

Column() refers to the column object, and As Column() refers to the value of that column in the current Row().

Samira
Level II

Re: Data cleanup: How to combine several columns with slightly different names and non-intersecting data into one?

Try using concatenate tool :

Create a new column that will include both variables

Right-click on the column
Select Formula to open the Formula Editor.

In the Formula Editor, go to the Functions list on the left.
Under Character, find and select Concatenate.

Drag or select the first column into the first argument box of the Concatenate function.
Drag or select the second column into the next argument box.

Click OK or Apply to create the new combined column.