cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar
cchueng
Level III

Parse string of a column and fill another column

Hi, 

I have a table containing ProductName and I want to generate the OUTPUT column.  The OUTPUT column a concatenation of ProductID with Serial#. How do I accurately parse the Serial# and create the OUTPUT column?

Serial # examples:

  • 07-08_13 means Serial# 07, 08, 13
  • 09-12 means Serial# 09, 10, 11, 12
  • 01-03_05-08 means Serial# 01, 02, 03, 05, 06, 07, 08
  • 01_04_05-08 means Serial# 01, 04, 05, 06, 07, 08
  • 09_11_13_14 means Serial# 09,11,13,14

 

cchueng_5-1633542341785.png

 

ProductNameOUTPUT
AABBCC_07-08_13AABBCC-07
AABBCC_07-08_13AABBCC-08
AABBCC_07-08_13AABBCC-13
AABBCC_09-12AABBCC-09
AABBCC_09-12AABBCC-10
AABBCC_09-12AABBCC-11
AABBCC_09-12AABBCC-12
EEFFGG_02EEFFGG-02
HHIIJJ_01-07HHIIJJ-01
HHIIJJ_01-07HHIIJJ-02
HHIIJJ_01-07HHIIJJ-03
HHIIJJ_01-07HHIIJJ-04
HHIIJJ_01-07HHIIJJ-05
HHIIJJ_01-07HHIIJJ-06
HHIIJJ_01-07HHIIJJ-07
KKLLLMM_01-03_05-08KKLLLMM-01
KKLLLMM_01-03_05-08KKLLLMM-02
KKLLLMM_01-03_05-08KKLLLMM-03
KKLLLMM_01-03_05-08KKLLLMM-05
KKLLLMM_01-03_05-08KKLLLMM-06
KKLLLMM_01-03_05-08KKLLLMM-07
KKLLLMM_01-03_05-08KKLLLMM-08
NNOOPP_01_04_05-08NNOOPP-01
NNOOPP_01_04_05-08NNOOPP-04
NNOOPP_01_04_05-08NNOOPP-05
NNOOPP_01_04_05-08NNOOPP-06
NNOOPP_01_04_05-08NNOOPP-07
NNOOPP_01_04_05-08NNOOPP-08
QQRRSS_09_11_13_14QQRRSS_09
QQRRSS_09_11_13_14QQRRSS_11
QQRRSS_09_11_13_14QQRRSS_13
QQRRSS_09_11_13_14QQRRSS_14

 

 

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
txnelson
Super User

Re: Parse string of a column and fill another column

It just had a minor error where a -1 was not removed when I restructured my code.

txnelson_0-1634012182394.png

Names Default To Here( 1 );
dt = Current Data Table();

dt2 = New Table( "The Output",
	New Column( "Product Name", character ),
	New Column( "ProductID", Character ),
	New Column( "OUTPUT", Character )
);

For( dtRow = 1, dtRow <= N Rows( dt ), dtRow++,
	If( dtRow == 1 | dt:ProductName[dtRow - 1] != dt:ProductName[dtRow],
		prodID = Word( 1, dt:ProductName[dtRow], "_" );
		remainder = Substr( dt:ProductName[dtRow], Length( prodID ) + 2 );
		While( Word( 1, remainder, "_" ) != "",
			theWord = Word( 1, remainder, "_" );
			If( Contains( theWord, "-" ),
				numDigets = Length( Word( 1, theWord, "-" ) ) - 1;
				zeros = Repeat( "0", numDigets );
				For( loop = Num( Word( 1, theWord, "-" ) ),
					loop <= Num( Word( 2, theWord, "-" ) ), loop++,
					dt2 << Add rows( 1 );
					theRow = N Rows( dt2 );
					dt2:ProductName[theRow] = dt:ProductName[dtRow];
					dt2:ProductID[theRow] = Word( 1, dt:ProductName[dtRow], "_-" );
					dt2:output[theRow] = dt2:ProductID[theRow] || "-" ||
					Substr( zeros, Length( Char( loop ) ) ) || Char( loop );
				);
			,
				dt2 << Add Rows( 1 );
				theRow = N Rows( dt2 );
				dt2:output[theRow] = prodID || "-" || Word( 1, theWord, "-" );
				dt2:ProductName[theRow] = dt:ProductName[dtRow];
				dt2:ProductID[theRow] = Word( 1, dt:ProductName[dtRow], "_-" );
			);
			remainder = Substr( remainder, Length( theWord ) + 2 );
		);
	)
);

Please make sure you take the time to study the code, so you completely understand how it is working.

Jim

View solution in original post

20 REPLIES 20
jthi
Super User

Re: Parse string of a column and fill another column

Most likely you could write a formula with Lag to do that (or script with JSL), but you can also attempt to do that interactively but it takes quite a few steps:

1. Select ProductName column

2. Go to Cols/Utilities/Text To columns and use both - and _ as delimiters

jthi_1-1633543923524.png

 

3. Go to Tables and Stack and add everything else besides first column to Stack Columns selection

jthi_2-1633543952690.png

jthi_3-1633543992153.png

 

5. Remove Label column

6. Select both columns which are left and go to Rows / Row Selection / Select Dublicate Rows and delete those rows

jthi_4-1633544046999.png

 

7. Select one cell with empty value and choose Select Matching Cells and delete those rows

8. Now choose both colums and right click on the column header and create formula

jthi_5-1633544143551.png

9. Open the just created formula and change the , to _

jthi_6-1633544215817.png

10. Rename and delete columns/formulas

11. If you need to have ProductName there, you should be able to keep it there if you don't accidentally delete it in first step like i did

jthi_7-1633544489056.png

If you have to have - or _ depending on the serialnumber you could handle it in the concatenate formula most likely

 

Edit: I checked the question a bit more accurately and this is a bit more complicated than I showed here. You will have to add calculations for some of the serial numbers because they are between numbers. This solution won't work in those because some rows will be removed when duplicates are removed.

-Jarmo

Re: Parse string of a column and fill another column

@jthi ,

 

You beat me to it.   Your explanation is more detailed and what I was thinking of too.

Chris Kirchberg, M.S.2
Data Scientist, Life Sciences - Global Technical Enablement
JMP Statistical Discovery, LLC. - Denver, CO
Tel: +1-919-531-9927 ▪ Mobile: +1-303-378-7419 ▪ E-mail: chris.kirchberg@jmp.com
www.jmp.com

Re: Parse string of a column and fill another column

Oh, your right. the - is interpreted as a range and the _ is interpreted as a true delimiter of items

Chris Kirchberg, M.S.2
Data Scientist, Life Sciences - Global Technical Enablement
JMP Statistical Discovery, LLC. - Denver, CO
Tel: +1-919-531-9927 ▪ Mobile: +1-303-378-7419 ▪ E-mail: chris.kirchberg@jmp.com
www.jmp.com
cchueng
Level III

Re: Parse string of a column and fill another column

Thanks but I prefer a jsl rather than the interactive delimiter. Your method did not address the scenario when the Serial # 09-12. It should output rows like below. Not just AABBCC-09 and AABBCC-12. The "-" means "to" meaning range. So 09-12 means 09, 10, 11, 12.

AABBCC_09-12AABBCC-09
AABBCC_09-12AABBCC-10
AABBCC_09-12AABBCC-11
AABBCC_09-12AABBCC-12
jthi
Super User

Re: Parse string of a column and fill another column

Ok, this might work. Requires JMP16 due to usage of For Each (can be replaced with For for other versions. This is heavily scripted solution, with little or no comments and will require specific starting table to work (needs OUTPUT column).

 

 

Names Default To Here(1);

dt = New Table("Untitled 4993",
	Add Rows(32),
	Compress File When Saved(1),
	New Column("ProductName",
		Character,
		"Nominal",
		Set Values(
			{"AABBCC_07-08_13", "AABBCC_07-08_13", "AABBCC_07-08_13", "AABBCC_09-12", "AABBCC_09-12", "AABBCC_09-12", "AABBCC_09-12", "EEFFGG_02",
			"HHIIJJ_01-07", "HHIIJJ_01-07", "HHIIJJ_01-07", "HHIIJJ_01-07", "HHIIJJ_01-07", "HHIIJJ_01-07", "HHIIJJ_01-07", "KKLLLMM_01-03_05-08",
			"KKLLLMM_01-03_05-08", "KKLLLMM_01-03_05-08", "KKLLLMM_01-03_05-08", "KKLLLMM_01-03_05-08", "KKLLLMM_01-03_05-08", "KKLLLMM_01-03_05-08",
			"NNOOPP_01_04_05-08", "NNOOPP_01_04_05-08", "NNOOPP_01_04_05-08", "NNOOPP_01_04_05-08", "NNOOPP_01_04_05-08", "NNOOPP_01_04_05-08",
			"QQRRSS_09_11_13_14", "QQRRSS_09_11_13_14", "QQRRSS_09_11_13_14", "QQRRSS_09_11_13_14"}
		)
	),
	New Column("OUTPUT",
		Character(16),
		"Nominal",
		Set Values({"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", ""})
	)
);


Summarize(dt, uniqProd = by(:ProductName));

valList = {};
For Each({prod}, uniqProd,
	prodPart = Word(1, prod, "_");
	serialPart = Substr(prod, Contains(prod, "_") + 1); //remove start
	possibleRanges = Words(serialPart, "_"); //if they contain -
	For Each({val}, possibleRanges,
		If(Contains(val, "-"), //range
			rangeVals = Words(val, "-");
			For Each({val1}, Index(Num(rangeVals[1]), Num(rangeVals[2])), 
				valToAdd = Substr("0", Length(Char(val1))) || Char(val1);
				Insert Into(valList, prodPart || "-" || valToAdd)
			);
		,
			Insert Into(valList, prodPart || "-" || Substr("0", Length(Char(val))) || Char(val))
		)
	);
);

dt[0,"OUTPUT"] = valList;

Wondering if something like this could be done "easily" with JMP's own pattern matching @Craige_Hales 

 

-Jarmo
cchueng
Level III

Re: Parse string of a column and fill another column

Hi jthi,

Thanks for the script but my table is actually dynamic. Each day, I will pull up this table from a database with a different date range and different sets of ProductName will appear. So it is not always 32 rows. Your script can almost work but can you address the array size? Thanks.

jthi
Super User

Re: Parse string of a column and fill another column

It should be fairly easy. Do you have as many rows in the pulled data as you will have serial numbers?

-Jarmo
cchueng
Level III

Re: Parse string of a column and fill another column

Hi jthi,

Yes, the number of rows matched the number of serial numbers.

jthi
Super User

Re: Parse string of a column and fill another column

If that is the case, it should already be able to handle it. I did small modifications (creation of OUTPUT column if it is missing). Currently it does rely on the fact that the datatable is sorted by ProductNames, which could cause issues at some point, or then you can make sure the input table is sorted (easiest solution).

 

You should be able to use the For Each part can be even used without datatable as long as you have list variable named uniqProd which has ProductNames. After that is just figuring out how to add values to datatable.

 

Names Default To Here(1);

//Test table
/*
dt = New Table("Untitled 4993",
	Compress File When Saved(1),
	New Column("ProductName",
		Character,
		"Nominal",
		Set Values(
			{"AABBCC_07-08_13", "AABBCC_07-08_13", "AABBCC_07-08_13", "AABBCC_09-12", "AABBCC_09-12", "AABBCC_09-12", "AABBCC_09-12", "EEFFGG_02",
			"HHIIJJ_01-07", "HHIIJJ_01-07", "HHIIJJ_01-07", "HHIIJJ_01-07", "HHIIJJ_01-07", "HHIIJJ_01-07", "HHIIJJ_01-07", "KKLLLMM_01-03_05-08",
			"KKLLLMM_01-03_05-08", "KKLLLMM_01-03_05-08", "KKLLLMM_01-03_05-08", "KKLLLMM_01-03_05-08", "KKLLLMM_01-03_05-08", "KKLLLMM_01-03_05-08",
			"NNOOPP_01_04_05-08", "NNOOPP_01_04_05-08", "NNOOPP_01_04_05-08", "NNOOPP_01_04_05-08", "NNOOPP_01_04_05-08", "NNOOPP_01_04_05-08",
			"QQRRSS_09_11_13_14", "QQRRSS_09_11_13_14", "QQRRSS_09_11_13_14", "QQRRSS_09_11_13_14"}
		)
	)
);
*/

dt = Current Data Table();

Summarize(dt, uniqProd = by(:ProductName));

valList = {};
For Each({prod}, uniqProd, //loop for each ProductName
	prodPart = Word(1, prod, "_");
	serialPart = Substr(prod, Contains(prod, "_") + 1); //remove start
	possibleRanges = Words(serialPart, "_"); //if they contain -
	For Each({val}, possibleRanges, //Loop for serialnumber inside the ProductName
		If(Contains(val, "-"), //range
			rangeVals = Words(val, "-");
			For Each({val1}, Index(Num(rangeVals[1]), Num(rangeVals[2])), 
				valToAdd = Substr("0", Length(Char(val1))) || Char(val1);
				Insert Into(valList, prodPart || "-" || valToAdd)
			);
		,
			Insert Into(valList, prodPart || "-" || Substr("0", Length(Char(val))) || Char(val))
		)
	);
);

//if starting table doesn't have OUTPUT column create it
If(!Contains(dt << Get Column Names("String"), "OUTPUT"), 
	dt << New Column("OUTPUT", Character, "Nominal");
);

//dt[0,"OUTPUT"] = valList; subscripting, maybe a bit more difficult to understand
dt:OUTPUT << Set Values(valList);

 

Simplified version what they script tries to do:

1. Script gets all unique values in ProductName column

2. Script loops over each of those in two part For Each loop

3. First loop loops over the unique values and splits them into Product and number parts

4. Serialnumber part is then looped again to split it into single values. These single values are concatenated with Product part and then the string is inserted into valList list (this 4. isn't exactly as simple as I explained here).

5. After both loops have finished you should have a list, which has all possible serialnumbers. Product part will be in alphabetical order.

6. This list is then added to OUTPUT if it exists, if it doesn't it is first created.

-Jarmo