- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Recode columns containing character value at the end of a column
Hello,
I have a table like
ID | Column2 |
ABC001235G | |
ABC001234 | |
ABC001235 | |
ABC001234D | |
ABC001234F |
I would like to recode the ID column such that all my values like 'ABC001234D' and 'ABC001234F' become 'ABC001234' - basically I want to drop that last letter in the statement. However, I cannot do this through manual process. There are too many rows to do and I couldn't possibly write it manually for each ID that is possible to be used in the script. Looking for a way to do this through scripting. Thanks!
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Recode columns containing character value at the end of a column
For this I would go with regex. I liked this at onepoint to get going with regex at least on some level: regexone also check out JMP Character Functions - Regex and Regex from scripting index.
Here is an example with two possible ways to remove end characters using Regex in JMP (they can be easily modified to support non-capital letters also):
Names Default To Here(1);
dt = New Table("Untitled 3",
Add Rows(5),
Compress File When Saved(1),
New Column("ID",
Character,
"Nominal",
Set Values(
{"ABC001235G", "ABC001234", "ABC001235", "ABC001234D", "ABC001234F"}
)
)
);
dt << New Column("Regex1", Character, Nominal, Formula(Regex(:ID, "^[A-Z]+(\d+)")));
dt << New Column("Regex2", Character, Nominal, Formula(Regex(:ID, "[A-Z]*$", "", GLOBALREPLACE);));
Depending on the use case choose one (easy to test with the example code how they work with different strings):
First one will match 1 or more capital characters from the start of the string followed by any amount of digits (so it will ignore anything after the digits).
Second one will match zero or more capital characters from the end of string and replace them with empty string.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Recode columns containing character value at the end of a column
Hi, saneal!
Here's a formula that might help you, but you will have to make some assumptions about the naming conventions and accuracy of the ID variable.
If your ID values consistently have 3 characters at the beginning, you could make a new column and use this formula:
Substr( :ID, 1, 3 ) || Regex( :ID, "\d+" )
This formula will allow for inconsistent length of numeric values. It's not too hard to adapt the formula for having inconsistent numbers of characters at the beginning of the ID values, either. But if someone ever mistypes the ID as 0 (the number) instead of O (the letter), or any other certain typographical errors, you may get unexpected results.
Good luck,
Kevin
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Recode columns containing character value at the end of a column
You can do this with the following formula:
val = Reverse( :ID );
While( Is Missing( Num( Substr( val, 1, 1 ) ) ),
val = Substr( val, 2 )
);
Reverse( val );
This will remove all trailing non-numeric values
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Recode columns containing character value at the end of a column
For this I would go with regex. I liked this at onepoint to get going with regex at least on some level: regexone also check out JMP Character Functions - Regex and Regex from scripting index.
Here is an example with two possible ways to remove end characters using Regex in JMP (they can be easily modified to support non-capital letters also):
Names Default To Here(1);
dt = New Table("Untitled 3",
Add Rows(5),
Compress File When Saved(1),
New Column("ID",
Character,
"Nominal",
Set Values(
{"ABC001235G", "ABC001234", "ABC001235", "ABC001234D", "ABC001234F"}
)
)
);
dt << New Column("Regex1", Character, Nominal, Formula(Regex(:ID, "^[A-Z]+(\d+)")));
dt << New Column("Regex2", Character, Nominal, Formula(Regex(:ID, "[A-Z]*$", "", GLOBALREPLACE);));
Depending on the use case choose one (easy to test with the example code how they work with different strings):
First one will match 1 or more capital characters from the start of the string followed by any amount of digits (so it will ignore anything after the digits).
Second one will match zero or more capital characters from the end of string and replace them with empty string.