Solved: Re: Recode columns containing character value at the end of a column

saneal · Jul 1, 2021 05:45 PM

Hello,

I have a table like

ID	Column2
ABC001235G
ABC001234
ABC001235
ABC001234D
ABC001234F

I would like to recode the ID column such that all my values like 'ABC001234D' and 'ABC001234F' become 'ABC001234' - basically I want to drop that last letter in the statement. However, I cannot do this through manual process. There are too many rows to do and I couldn't possibly write it manually for each ID that is possible to be used in the script. Looking for a way to do this through scripting. Thanks!

jthi · Jul 2, 2021 03:14 AM

For this I would go with regex. I liked this at onepoint to get going with regex at least on some level: regexone also check out JMP Character Functions - Regex and Regex from scripting index.

Here is an example with two possible ways to remove end characters using Regex in JMP (they can be easily modified to support non-capital letters also):

Names Default To Here(1);

dt = New Table("Untitled 3",
	Add Rows(5),
	Compress File When Saved(1),
	New Column("ID",
		Character,
		"Nominal",
		Set Values(
			{"ABC001235G", "ABC001234", "ABC001235", "ABC001234D", "ABC001234F"}
		)
	)
);


dt << New Column("Regex1", Character, Nominal, Formula(Regex(:ID, "^[A-Z]+(\d+)")));
dt << New Column("Regex2", Character, Nominal, Formula(Regex(:ID, "[A-Z]*$", "", GLOBALREPLACE);));

Depending on the use case choose one (easy to test with the example code how they work with different strings):

First one will match 1 or more capital characters from the start of the string followed by any amount of digits (so it will ignore anything after the digits).

Second one will match zero or more capital characters from the end of string and replace them with empty string.

-Jarmo

View solution in original post

Kevin_Anderson · Jul 1, 2021 06:35 PM

Hi, saneal!

Here's a formula that might help you, but you will have to make some assumptions about the naming conventions and accuracy of the ID variable.

If your ID values consistently have 3 characters at the beginning, you could make a new column and use this formula:

Substr( :ID, 1, 3 ) || Regex( :ID, "\d+" )

This formula will allow for inconsistent length of numeric values. It's not too hard to adapt the formula for having inconsistent numbers of characters at the beginning of the ID values, either. But if someone ever mistypes the ID as 0 (the number) instead of O (the letter), or any other certain typographical errors, you may get unexpected results.

Good luck,

Kevin

ErraticAttack · Jul 1, 2021 08:28 PM

You can do this with the following formula:

val = Reverse( :ID );
While( Is Missing( Num( Substr( val, 1, 1 ) ) ),
	val = Substr( val, 2 )
);
Reverse( val );

This will remove all trailing non-numeric values

Jordan

jthi · Jul 2, 2021 03:14 AM