cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Check out the JMP® Marketplace featured Capability Explorer add-in
Choose Language Hide Translation Bar
saneal
Level III

Recode columns containing character value at the end of a column

Hello,

 

I have a table like

IDColumn2
ABC001235G 
ABC001234 
ABC001235 
ABC001234D 
ABC001234F 

 

I would like to recode the ID column such that all my values like 'ABC001234D' and 'ABC001234F' become 'ABC001234' - basically I want to drop that last letter in the statement. However, I cannot do this through manual process. There are too many rows to do and I couldn't possibly write it manually for each ID that is possible to be used in the script. Looking for a way to do this through scripting. Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
jthi
Super User

Re: Recode columns containing character value at the end of a column

For this I would go with regex. I liked this at onepoint to get going with regex at least on some level: regexone also check out JMP Character Functions - Regex and Regex from scripting index.

 

Here is an example with two possible ways to remove end characters using Regex in JMP (they can be easily modified to support non-capital letters also):

 

Names Default To Here(1);

dt = New Table("Untitled 3",
	Add Rows(5),
	Compress File When Saved(1),
	New Column("ID",
		Character,
		"Nominal",
		Set Values(
			{"ABC001235G", "ABC001234", "ABC001235", "ABC001234D", "ABC001234F"}
		)
	)
);


dt << New Column("Regex1", Character, Nominal, Formula(Regex(:ID, "^[A-Z]+(\d+)")));
dt << New Column("Regex2", Character, Nominal, Formula(Regex(:ID, "[A-Z]*$", "", GLOBALREPLACE);));


Depending on the use case choose one (easy to test with the example code how they work with different strings):

First one will match 1 or more capital characters from the start of the string followed by any amount of digits (so it will ignore anything after the digits).

Second one will match zero or more capital characters from the end of string and replace them with empty string.

 

-Jarmo

View solution in original post

3 REPLIES 3
Kevin_Anderson
Level VI

Re: Recode columns containing character value at the end of a column

Hi, saneal!

 

Here's a formula that might help you, but you will have to make some assumptions about the naming conventions and accuracy of the ID variable.

 

If your ID values consistently have 3 characters at the beginning, you could make a new column and use this formula:

 

Substr( :ID, 1, 3 ) || Regex( :ID, "\d+" )

 

This formula will allow for inconsistent length of numeric values.  It's not too hard to adapt the formula for having inconsistent numbers of characters at the beginning of the ID values, either.  But if someone ever mistypes the ID as 0 (the number) instead of O (the letter), or any other certain typographical errors, you may get unexpected results.

 

Good luck,

Kevin

ErraticAttack
Level VI

Re: Recode columns containing character value at the end of a column

You can do this with the following formula:

 

val = Reverse( :ID );
While( Is Missing( Num( Substr( val, 1, 1 ) ) ),
	val = Substr( val, 2 )
);
Reverse( val );

This will remove all trailing non-numeric values

Jordan
jthi
Super User

Re: Recode columns containing character value at the end of a column

For this I would go with regex. I liked this at onepoint to get going with regex at least on some level: regexone also check out JMP Character Functions - Regex and Regex from scripting index.

 

Here is an example with two possible ways to remove end characters using Regex in JMP (they can be easily modified to support non-capital letters also):

 

Names Default To Here(1);

dt = New Table("Untitled 3",
	Add Rows(5),
	Compress File When Saved(1),
	New Column("ID",
		Character,
		"Nominal",
		Set Values(
			{"ABC001235G", "ABC001234", "ABC001235", "ABC001234D", "ABC001234F"}
		)
	)
);


dt << New Column("Regex1", Character, Nominal, Formula(Regex(:ID, "^[A-Z]+(\d+)")));
dt << New Column("Regex2", Character, Nominal, Formula(Regex(:ID, "[A-Z]*$", "", GLOBALREPLACE);));


Depending on the use case choose one (easy to test with the example code how they work with different strings):

First one will match 1 or more capital characters from the start of the string followed by any amount of digits (so it will ignore anything after the digits).

Second one will match zero or more capital characters from the end of string and replace them with empty string.

 

-Jarmo