Choose Language Hide Translation Bar
Highlighted
Sburel
Level III

Concatenation of strings within a column based on content from another column

Hello,

 

I have a table with 2 columns  "column 1" and "column 2". I would like to find a script  in which the rows in column 2 are concatenated based on  items in column 1 (see 2nd table below) and placed in a new table

 

Column 1Column 2
Item 1Item 1
Item 1Item 2
Item 2Item 3
Item 2Item 1

 

 

Column 1Column 2
Item 1Item 1 | Item 2
Item 2Item 3 | Item 1

 

Any suggestions would be greatly appreciated.

 

Thank you

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted
Jeff_Perkinson
Community Manager Community Manager

Re: Concatenation of strings within a column based on content from another column

Yes, variable number of vegetables will make this more complicated. It's probably easier to build the values of the concatenation in the script rather than a formula.

 

Take a look at this version of your script to see how I would do it.

 

orig = New Table( "Fruit vs Veggies",
	Add Rows( 48 ),
	New Column( "Fruit",
		Character,
		"Nominal",
		Set Values(
			{"Polb", "Polb", "Polb", "Polb", "Polb", "Polb", "Polb", "Polb", "Polb", "Polb",
			"Polb", "Polb", "Polb", "Polb", "Polb", "Polb", "Polb", "Polb", "Polb", "Polb",
			"Polb", "Polb", "Polb", "Polb", "Polb", "Polb", "Polb", "Chrnb2", "Chrnb2", "Chrnb2",
			"Chrnb2", "Chrnb2", "Chrnb2", "Chrnb2", "Chrnb2", "Chrnb2", "Chrnb2", "Chrnb2",
			"Chrnb2", "Chrnb2", "Chrnb2", "Chrnb2", "Chrnb2", "Chrnb2", "Chrnb2", "Chrnb2",
			"Chrnb2", "Chrnb2"}
		)
	),
	New Column( "Vegetable",
		Character,
		"Nominal",
		Set Values(
			{"nuclear chromosome", "cellular_component", "intracellular", "cell", "nucleus",
			"chromosome", "organelle", "immune system process", "biological_process",
			"anatomical structure development", "DNA metabolic process",
			"cellular nitrogen compound metabolic process", "molecular_function", "cytoplasm",
			"cytoskeleton", "DNA binding", "nucleotidyltransferase activity",
			"biosynthetic process", "response to stress", "cell death", "signal transduction",
			"cytoskeletal protein binding", "lyase activity", "enzyme binding",
			"homeostatic process", "ion binding", "protein complex",
			"cellular component assembly", "cellular nitrogen compound metabolic process",
			"transmembrane transport", "ion binding", "cell proliferation",
			"immune system process", "nervous system process", "response to stress",
			"cell-cell signaling", "molecular_function", "cell", "cell morphogenesis",
			"biological_process", "anatomical structure development", "cell differentiation",
			"transport", "signal transduction", "plasma membrane", "cellular_component",
			"signal transducer activity", "transmembrane transporter activity"}
		)
	),
	New Column( "Vegetable number",
		Numeric,
		"Nominal",
		Format( "Best", 9 ),
		Formula( Col Rank( :Vegetable, :Fruit ) )
	)
);

split = orig << Split(
	Split By( :Vegetable number ),
	Split( :Vegetable ),
	Group( :Fruit ),
	Sort by Column Property
);

vc = {};

For( i = 1, i <= N Row( split ), i++,
	vc[i] = Column( 2 )[i];
	For( j = 3, j <= N Col( split ), j++,
		If( Column( j )[i] != "",
			vc[i] = vc[i] || ", " || Column( j )[i]
		)
	);
);

split << New Column( "Veggies_concat", character, set values( vc ) );
-Jeff

View solution in original post

5 REPLIES 5
Highlighted
txnelson
Super User

Re: Concatenation of strings within a column based on content from another column

for each row( :Column 2 = :Column 1 || :Column 2);

// or if you want the "|" between the values it would be

for each row( :Column 2 = :Column 1 || "|" || :Column 2);
Jim
Highlighted
Jeff_Perkinson
Community Manager Community Manager

Re: Concatenation of strings within a column based on content from another column

@txnelson's solution may be what you want but your description and example don't make it clear to me what you're looking for.

 

I'm going to rename some things in your table to make it clear what I think you want (I might be wrong about what you want though) and then show you how to get it:

 

What you have:

 

Fruit Vegetable
apple asparagus
apple broccoli
banana brussel sprout
banana asparagus

 

What you want:

 

Fruit Vegetables
apple asparagus | broccoli
banana brussel sprout | asparagus

 

To get that can use Tables->Split to get a data table but you'll need one more column to identify which number Vegetable, in sequence each row is. So, something like this:

 

Fruit Vegetable Veg number
apple asparagus

1

apple broccoli 2
banana brussel sprout 1
banana asparagus 2

 

If you don't already have a Veg number column in your data table there are a few ways to create it, depending on exactly your situation, so ask back here for more help with this step.

 

Then use Tables->Split with Split By = Veg number, Split Columns = Vegetable, Group = Fruit.

 

2020-04-23_17-41-34.666.png

 

You'll get a table like this:

 

Fruit

1

2

apple

asparagus

broccoli

banana

brussel sprout

asparagus

 

Then you just concatenate columns 1 and 2 using the Formula Editor.

 

2020-04-23_17-50-09.989.png

 

I hope that helps. If not, please post clarification on exactly what you have and what you want.

 

-Jeff
Highlighted
Sburel
Level III

Re: Concatenation of strings within a column based on content from another column

Hi Jeff,

 

Thanks a lot for the quick reply. Your description of the problem is stop on.

However, the number of vegetable types might vary with each fruits which will results in a different  number of columns filled up. This leads to additional commas/separators (could be cleaned up) at the  end. Furthermore, once scripted it does not allow for automatically adjust for the varying number of different items (ie veggies) each time the script is executed.

 

Any suggestion on being able to automatically adjust the number of columns after the split steps would be appreciated.

 

Thanks a lot

 

Sebastien

 

New Table( "Fruit vs Veggies",
	Add Rows( 48 ),
	New Script(
		"Source",
		Data Table( "mart_export (1)" ) << Subset(
			Selected Rows( 0 ),
			Rows(
				[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
				21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
				39, 40, 41, 42, 43, 44, 45, 46, 47, 48]
			),
			columns( :Fruit, :Vegetable )
		)
	),
	New Script(
		"Concatenate vegetables per fruits",
		dt = Data Table( "Fruit vs Veggies" );
		New Column( "Vegetable number",
			Character,
			"Nominal",
			formula( Col Rank( :Vegetable, :Fruit ) )
		);
		Data Table( "Fruit vs Veggies" ) << Split(
			Split By( :Vegetable number ),
			Split( :Vegetable ),
			Group( :Fruit ),
			Output Table( "Strange fruits" ),
			Sort by Column Property
		);
		Data Table( "Strange fruits" ) << New Column( "Vegggies_concat",
			Character,
			"Nominal",
			formula(
				:Name( "1" ) || ", " || :Name( "2" ) || ", " || :Name( "3" ) || ", " ||
				:Name( "4" ) || ", " || :Name( "5" ) || ", " || :Name( "6" ) || ", " ||
				:Name( "7" ) || ", " || :Name( "8" ) || ", " || :Name( "9" ) || ", " ||
				:Name( "10" ) || ", " || :Name( "11" ) || ", " || :Name( "12" ) || ", "
				 || :Name( "13" ) || ", " || :Name( "14" ) || ", " || :Name( "15" ) ||
				", " || :Name( "16" ) || ", " || :Name( "17" ) || ", " || :Name( "18" )
				 || ", " || :Name( "19" ) || ", " || :Name( "20" ) || ", " ||
				:Name( "21" ) || ", " || :Name( "22" ) || ", " || :Name( "23" ) || ", "
				 || :Name( "24" ) || ", " || :Name( "25" ) || ", " || :Name( "26" ) ||
				", " || :Name( "27" )
			)
		);
		Data Table( "Strange fruits" ) << Subset(
			Output Table( "Strange fruits_clean" ),
			All rows,
			columns( :Fruit, :Vegggies_concat )
		) << close window;
	),
	New Column( "Fruit",
		Character,
		"Nominal",
		Set Values(
			{"Polb", "Polb", "Polb", "Polb", "Polb", "Polb", "Polb", "Polb", "Polb",
			"Polb", "Polb", "Polb", "Polb", "Polb", "Polb", "Polb", "Polb", "Polb",
			"Polb", "Polb", "Polb", "Polb", "Polb", "Polb", "Polb", "Polb", "Polb",
			"Chrnb2", "Chrnb2", "Chrnb2", "Chrnb2", "Chrnb2", "Chrnb2", "Chrnb2",
			"Chrnb2", "Chrnb2", "Chrnb2", "Chrnb2", "Chrnb2", "Chrnb2", "Chrnb2",
			"Chrnb2", "Chrnb2", "Chrnb2", "Chrnb2", "Chrnb2", "Chrnb2", "Chrnb2"}
		)
	),
	New Column( "Vegetable",
		Character,
		"Nominal",
		Set Values(
			{"nuclear chromosome", "cellular_component", "intracellular", "cell",
			"nucleus", "chromosome", "organelle", "immune system process",
			"biological_process", "anatomical structure development",
			"DNA metabolic process", "cellular nitrogen compound metabolic process",
			"molecular_function", "cytoplasm", "cytoskeleton", "DNA binding",
			"nucleotidyltransferase activity", "biosynthetic process",
			"response to stress", "cell death", "signal transduction",
			"cytoskeletal protein binding", "lyase activity", "enzyme binding",
			"homeostatic process", "ion binding", "protein complex",
			"cellular component assembly",
			"cellular nitrogen compound metabolic process", "transmembrane transport",
			"ion binding", "cell proliferation", "immune system process",
			"nervous system process", "response to stress", "cell-cell signaling",
			"molecular_function", "cell", "cell morphogenesis", "biological_process",
			"anatomical structure development", "cell differentiation", "transport",
			"signal transduction", "plasma membrane", "cellular_component",
			"signal transducer activity", "transmembrane transporter activity"}
		)
	),
	New Column( "Vegetable number",
		Numeric,
		"Nominal",
		Format( "Best", 9 ),
		Formula( Col Rank( :Vegetable, :Fruit ) )
	)
)

 

Highlighted
Jeff_Perkinson
Community Manager Community Manager

Re: Concatenation of strings within a column based on content from another column

Yes, variable number of vegetables will make this more complicated. It's probably easier to build the values of the concatenation in the script rather than a formula.

 

Take a look at this version of your script to see how I would do it.

 

orig = New Table( "Fruit vs Veggies",
	Add Rows( 48 ),
	New Column( "Fruit",
		Character,
		"Nominal",
		Set Values(
			{"Polb", "Polb", "Polb", "Polb", "Polb", "Polb", "Polb", "Polb", "Polb", "Polb",
			"Polb", "Polb", "Polb", "Polb", "Polb", "Polb", "Polb", "Polb", "Polb", "Polb",
			"Polb", "Polb", "Polb", "Polb", "Polb", "Polb", "Polb", "Chrnb2", "Chrnb2", "Chrnb2",
			"Chrnb2", "Chrnb2", "Chrnb2", "Chrnb2", "Chrnb2", "Chrnb2", "Chrnb2", "Chrnb2",
			"Chrnb2", "Chrnb2", "Chrnb2", "Chrnb2", "Chrnb2", "Chrnb2", "Chrnb2", "Chrnb2",
			"Chrnb2", "Chrnb2"}
		)
	),
	New Column( "Vegetable",
		Character,
		"Nominal",
		Set Values(
			{"nuclear chromosome", "cellular_component", "intracellular", "cell", "nucleus",
			"chromosome", "organelle", "immune system process", "biological_process",
			"anatomical structure development", "DNA metabolic process",
			"cellular nitrogen compound metabolic process", "molecular_function", "cytoplasm",
			"cytoskeleton", "DNA binding", "nucleotidyltransferase activity",
			"biosynthetic process", "response to stress", "cell death", "signal transduction",
			"cytoskeletal protein binding", "lyase activity", "enzyme binding",
			"homeostatic process", "ion binding", "protein complex",
			"cellular component assembly", "cellular nitrogen compound metabolic process",
			"transmembrane transport", "ion binding", "cell proliferation",
			"immune system process", "nervous system process", "response to stress",
			"cell-cell signaling", "molecular_function", "cell", "cell morphogenesis",
			"biological_process", "anatomical structure development", "cell differentiation",
			"transport", "signal transduction", "plasma membrane", "cellular_component",
			"signal transducer activity", "transmembrane transporter activity"}
		)
	),
	New Column( "Vegetable number",
		Numeric,
		"Nominal",
		Format( "Best", 9 ),
		Formula( Col Rank( :Vegetable, :Fruit ) )
	)
);

split = orig << Split(
	Split By( :Vegetable number ),
	Split( :Vegetable ),
	Group( :Fruit ),
	Sort by Column Property
);

vc = {};

For( i = 1, i <= N Row( split ), i++,
	vc[i] = Column( 2 )[i];
	For( j = 3, j <= N Col( split ), j++,
		If( Column( j )[i] != "",
			vc[i] = vc[i] || ", " || Column( j )[i]
		)
	);
);

split << New Column( "Veggies_concat", character, set values( vc ) );
-Jeff

View solution in original post

Highlighted
Sburel
Level III

Re: Concatenation of strings within a column based on content from another column

Works like a charm.

Thanks a lot

Sebastien
Article Labels

    There are no labels assigned to this post.