cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar
cchueng
Level II

Parse string of a column and fill another column

Hi, 

I have a table containing ProductName and I want to generate the OUTPUT column.  The OUTPUT column a concatenation of ProductID with Serial#. How do I accurately parse the Serial# and create the OUTPUT column?

Serial # examples:

  • 07-08_13 means Serial# 07, 08, 13
  • 09-12 means Serial# 09, 10, 11, 12
  • 01-03_05-08 means Serial# 01, 02, 03, 05, 06, 07, 08
  • 01_04_05-08 means Serial# 01, 04, 05, 06, 07, 08
  • 09_11_13_14 means Serial# 09,11,13,14

 

cchueng_5-1633542341785.png

 

ProductNameOUTPUT
AABBCC_07-08_13AABBCC-07
AABBCC_07-08_13AABBCC-08
AABBCC_07-08_13AABBCC-13
AABBCC_09-12AABBCC-09
AABBCC_09-12AABBCC-10
AABBCC_09-12AABBCC-11
AABBCC_09-12AABBCC-12
EEFFGG_02EEFFGG-02
HHIIJJ_01-07HHIIJJ-01
HHIIJJ_01-07HHIIJJ-02
HHIIJJ_01-07HHIIJJ-03
HHIIJJ_01-07HHIIJJ-04
HHIIJJ_01-07HHIIJJ-05
HHIIJJ_01-07HHIIJJ-06
HHIIJJ_01-07HHIIJJ-07
KKLLLMM_01-03_05-08KKLLLMM-01
KKLLLMM_01-03_05-08KKLLLMM-02
KKLLLMM_01-03_05-08KKLLLMM-03
KKLLLMM_01-03_05-08KKLLLMM-05
KKLLLMM_01-03_05-08KKLLLMM-06
KKLLLMM_01-03_05-08KKLLLMM-07
KKLLLMM_01-03_05-08KKLLLMM-08
NNOOPP_01_04_05-08NNOOPP-01
NNOOPP_01_04_05-08NNOOPP-04
NNOOPP_01_04_05-08NNOOPP-05
NNOOPP_01_04_05-08NNOOPP-06
NNOOPP_01_04_05-08NNOOPP-07
NNOOPP_01_04_05-08NNOOPP-08
QQRRSS_09_11_13_14QQRRSS_09
QQRRSS_09_11_13_14QQRRSS_11
QQRRSS_09_11_13_14QQRRSS_13
QQRRSS_09_11_13_14QQRRSS_14

 

 

 

 

 

 

20 REPLIES 20
cchueng
Level II

Re: Parse string of a column and fill another column

Hi jthi,

Your script will not work if my ProductName is a mixed of chars and numbers. I attached a table. This is how most of my productName will look like. Thanks for helping me out.

jthi
Super User

Re: Parse string of a column and fill another column

Quickly checking there are three different issues:

 

1. Breaking the "rule" that there is always same amount of ProductNames as should be serials. N4TA59_13-19 in data only once, should be 7 times
2. Serial number part not starting with _ (N4PX74-01-06)
3. Range not starting with _ (N4TA47_13-14-15-16-17-18) or these should be just single values

 

All these will require different fixes. Are these data quality issues with your data or should the script be able to handle all of these?

-Jarmo
cchueng
Level II

Re: Parse string of a column and fill another column

The ProductName data are manually entered. These are exceptions and will improve over time. Please ignore them.

jthi
Super User

Re: Parse string of a column and fill another column

Because the data is manually entered this will most likely not handle all special cases, but seems to be working for current values:

 

Names Default To Here(1);

//Test table
/*
dt = New Table("Untitled 4993",
	Compress File When Saved(1),
	New Column("ProductName",
		Character,
		"Nominal",
		Set Values(
			{"AABBCC_07-08_13", "AABBCC_07-08_13", "AABBCC_07-08_13", "AABBCC_09-12", "AABBCC_09-12", "AABBCC_09-12", "AABBCC_09-12", "EEFFGG_02",
			"HHIIJJ_01-07", "HHIIJJ_01-07", "HHIIJJ_01-07", "HHIIJJ_01-07", "HHIIJJ_01-07", "HHIIJJ_01-07", "HHIIJJ_01-07", "KKLLLMM_01-03_05-08",
			"KKLLLMM_01-03_05-08", "KKLLLMM_01-03_05-08", "KKLLLMM_01-03_05-08", "KKLLLMM_01-03_05-08", "KKLLLMM_01-03_05-08", "KKLLLMM_01-03_05-08",
			"NNOOPP_01_04_05-08", "NNOOPP_01_04_05-08", "NNOOPP_01_04_05-08", "NNOOPP_01_04_05-08", "NNOOPP_01_04_05-08", "NNOOPP_01_04_05-08",
			"QQRRSS_09_11_13_14", "QQRRSS_09_11_13_14", "QQRRSS_09_11_13_14", "QQRRSS_09_11_13_14", "N4TA47_13-14-15-16-17-18", "N4TA59_13-19", "N4PX74-01-06"}
		)
	)
);
*/

dt = Current Data Table();
Summarize(dt, uniqProd = by(:ProductName));

newVals = Associative Array();
For Each({prodname_str}, uniqProd,
	prodPart = Word(1, prodname_str, "_-");
	serialPart = Substr(prodname_str, Length(prodPart) + 2);
	//find and remove next range from serialPart
	newVals[prodname_str] = {};
	While(Length(serialPart) > 0 & serialPart != "_" & serialPart != "-",
		nextRange = Regex(serialPart, "\d*-\d*");
		If(IsMissing(nextRange),
			break();
		);
		serialPart = Substitute(serialPart, nextRange, "");
		If(Contains({"-", "_" }, Substr(serialPart, 1, 1)),
			serialPart = Substr(serialPart, 2); //remove -_ if first character
		);
		
		nextRange = Words(nextRange, "-");
		rangeVals = Index(Num(nextRange[1]), Num(nextRange[2]));
		For Each({val}, rangeVals,
			val = char(val);
			val = Substr("00", Length(val) + 1) || val;
			Insert Into(newVals[prodname_str], prodPart||"-"||val);
		);
	);
	singleVals = Transform Each({val}, Words(serialPart, "_"),
		val = char(val);
		val = Substr("00", Length(val) + 1) || val;
		prodPart||"-"||val;
	);
	Insert Into(newVals[prodname_str], singleVals);
);

vals = {{}, {}};
For Each({{key, value}}, newVals,
	productname_list = Repeat({key}, N Items(value));
	Insert Into(vals[1], productname_list);
	Insert Into(vals[2], Sort List(value));
);

//add values to new table
dt = New Table("Productname",
	New Column("ProductName", Character, "Nominal", Set Values({})),
	New Column("OUTPUT", Character, "Nominal", Set Values({}))
);
dt:ProductName << Set Values(vals[1]);
dt:OUTPUT << Set Values(vals[2]);

 

 

-Jarmo

Re: Parse string of a column and fill another column

Hi @cchueng ,

There are a couple of ways. Some are scripting base (look for a reply from someone suggesting this with and example) or you could use successive rounds of Text to Columns (example found here Solved: How can I split text in a column into multiple columns? - JMP User Community) Then you could concatenate the desired columns by selecting them and then either use a column formula or use the New Formula Column option after right clicking on one of the selected columns. Look for the Character options for concat.  The column formula is more flexible and gives you a lot of options see the help on this topic (Create Formulas in JMP)

I hope that gives you some direction.

Best,

Chris Kirchberg, M.S.2
Data Scientist, Life Sciences - Global Technical Enablement
JMP Statistical Discovery, LLC. - Denver, CO
Tel: +1-919-531-9927 ▪ Mobile: +1-303-378-7419 ▪ E-mail: chris.kirchberg@jmp.com
www.jmp.com
cchueng
Level II

Re: Parse string of a column and fill another column

Hi Chris,

I have a script that will address some of the ProductName and sometimes will parse wrongly. Can you review and add codes to address those ProductNames that are not working so far?

 

Will work for below examples :

EEFFGG_02

HHIIJJ_01-07

KKLLLMM_01-03_05-08

 

Will not work yet for below examples :

AABBCC_07-08_13

NNOOPP_01_04_05-08

QQRRSS_09_11_13_14

dt=current data table();

(dt << New Column( "ProductID", Character )) << Set Selected;
(dt << New Column( "OUTPUT", Character )) << Set Selected;
 
//Parse string in ProductName column 

For Each Row(
    dt,
    //Assign the ProductID value
    :ProductID = Word( 1, :ProductName, "_-" ); //"_"
 
    //If the ProductName changes, capture the start and end
    If( :ProductName != Lag( :ProductName, 1 ),
       
        start = Num( Word( 2, :ProductName, "_-" ) );
        end = Num( Word( 3, :ProductName, "_-" ) );
            
    );
    
  //Assign the OUTPUT value
    :OUTPUT = :ProductID || "-" || Repeat( "0", 2 - Length( Char( start ) ) ) || Char( start );
    
    //After end value & there is an _, set start to value after _
    If( start == end & Contains( :ProductName, "_"),
        start = Num( Word(-2, :ProductName, "_-" ) ),
        start++);
);

txnelson
Super User

Re: Parse string of a column and fill another column

Here is a script that produces a new table that expands the rows as requested.  By placing the data into a new table, it bypasses the issue where there are not a sufficient number of rows for the expansion.  It does appear to handle all of the known patterns.

txnelson_0-1633945647193.png

Names Default To Here( 1 );
dt = Current Data Table();

dt2 = New Table( "The Output",
	New Column( "Product Name", character ),
	New Column( "ProductID", Character ),
	New Column( "OUTPUT", Character )
);

For( dtRow = 1, dtRow <= N Rows( dt ), dtRow++,
	If( dtRow == 1 | dt:ProductName[dtRow - 1] != dt:ProductName[dtRow],
		prodID = Word( 1, dt:ProductName[dtRow], "_" );
		remainder = Substr( dt:ProductName[dtRow], Length( prodID ) + 2 );
		While( Word( 1, remainder, "_" ) != "",
			theWord = Word( 1, remainder, "_" );
			If( Contains( theWord, "-" ),
				numDigets = Length( Word( 1, theWord, "-" ) ) - 1;
				zeros = Repeat( "0", numDigets );
				For( loop = Num( Word( 1, theWord, "-" ) ) + 1,
					loop <= Num( Word( 2, theWord, "-" ) ), loop++,
					dt2 << Add rows( 1 );
					theRow = N Rows( dt2 );
					dt2:ProductName[theRow] = dt:ProductName[dtRow];
					dt2:ProductID[theRow] = Word( 1, dt:ProductName[dtRow], "_-" );
					dt2:output[theRow] = dt2:ProductID[theRow] || "-" ||
					Substr( zeros, Length( Char( loop ) ) ) || Char( loop );
				);
			,
				dt2 << Add Rows( 1 );
				theRow = N Rows( dt2 );
				dt2:output[theRow] = prodID || "-" || Word( 1, theWord, "-" );
				dt2:ProductName[theRow] = dt:ProductName[dtRow];
				dt2:ProductID[theRow] = Word( 1, dt:ProductName[dtRow], "_-" );
			);
			remainder = Substr( remainder, Length( theWord ) + 2 );
		);
	)
);
Jim
cchueng
Level II

Re: Parse string of a column and fill another column

Thanks Jim but the OUTPUT appears to be incorrect. For example the first 3 lines, the OUTPUT should be :

AABBCC-07

AABBCC-08

AABBCC-13

txnelson
Super User

Re: Parse string of a column and fill another column

It just had a minor error where a -1 was not removed when I restructured my code.

txnelson_0-1634012182394.png

Names Default To Here( 1 );
dt = Current Data Table();

dt2 = New Table( "The Output",
	New Column( "Product Name", character ),
	New Column( "ProductID", Character ),
	New Column( "OUTPUT", Character )
);

For( dtRow = 1, dtRow <= N Rows( dt ), dtRow++,
	If( dtRow == 1 | dt:ProductName[dtRow - 1] != dt:ProductName[dtRow],
		prodID = Word( 1, dt:ProductName[dtRow], "_" );
		remainder = Substr( dt:ProductName[dtRow], Length( prodID ) + 2 );
		While( Word( 1, remainder, "_" ) != "",
			theWord = Word( 1, remainder, "_" );
			If( Contains( theWord, "-" ),
				numDigets = Length( Word( 1, theWord, "-" ) ) - 1;
				zeros = Repeat( "0", numDigets );
				For( loop = Num( Word( 1, theWord, "-" ) ),
					loop <= Num( Word( 2, theWord, "-" ) ), loop++,
					dt2 << Add rows( 1 );
					theRow = N Rows( dt2 );
					dt2:ProductName[theRow] = dt:ProductName[dtRow];
					dt2:ProductID[theRow] = Word( 1, dt:ProductName[dtRow], "_-" );
					dt2:output[theRow] = dt2:ProductID[theRow] || "-" ||
					Substr( zeros, Length( Char( loop ) ) ) || Char( loop );
				);
			,
				dt2 << Add Rows( 1 );
				theRow = N Rows( dt2 );
				dt2:output[theRow] = prodID || "-" || Word( 1, theWord, "-" );
				dt2:ProductName[theRow] = dt:ProductName[dtRow];
				dt2:ProductID[theRow] = Word( 1, dt:ProductName[dtRow], "_-" );
			);
			remainder = Substr( remainder, Length( theWord ) + 2 );
		);
	)
);

Please make sure you take the time to study the code, so you completely understand how it is working.

Jim
cchueng
Level II

Re: Parse string of a column and fill another column

Hi Jim,

Can it be done such that there is no dt2? Can you add the ProductID and Output columns in dt?

Thanks!