cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
SpannerHead
Level IV

Count the numeric passages in a string

This is probably a very specific one but I have a string contained in a JMP cell and I need to be able to isolate a specific passage of numeric characters in the string.  The string contains both characters and numbers and the sequence can be variable.  For example if I have the cell entry

 

aaa1b222c3

 

and I want to zero in on the 222 passage, I considered a means to count the number of numeric passages as being 3 and then isolating the second of those.

 

I came up with a way to do this with Regex but it's too convolved to be worth posting.  I'm sure there's a more adaptive way of getting there.  All ideas appreciated.

 

Slán

 

SpannerHead

 


Slán



SpannerHead
2 ACCEPTED SOLUTIONS

Accepted Solutions
jthi
Super User

Re: Count the numeric passages in a string

This might be enough or you might have to add few extra conditions (check that it is either at the start/end or there is letter before/after)

Names Default To Here(1);

str = "Val1b200X300abcc1";
r = Regex Match(str, "(\d+)X(\d+)");
Show(Num(r[2]), Num(r[3]));
-Jarmo

View solution in original post

SpannerHead
Level IV

Re: Count the numeric passages in a string

Jarmo

 

Great stuff!  Here's what I came up with in the end.  Label is the column containing the string I'm trying to perform surgery on.

 

	New Column("Width", Character, "Nominal", Formula(Regex(:Label, "(\d+)X", "\1")), Set Selected);
	New Column("Length", Character, "Nominal", Formula(Regex(:Label, "X(\d+)", "\1")), Set Selected);

 

Seems to work.

 

Slán

 

SpannerHead


Slán



SpannerHead

View solution in original post

8 REPLIES 8
jthi
Super User

Re: Count the numeric passages in a string

Could you provide more examples of the possible strings and the parts you wish to isolate? Also what do you want to do with those parts? Extract them? Remove them? Check if the exist?

-Jarmo
SpannerHead
Level IV

Re: Count the numeric passages in a string

Jarmo

 

My overall goal is to create a new column that contains the numeric value as a cell input.  This derives from the fact that the original entry contains dimensional information and I'd like that to be automatically available in numeric form.  One of the numeric passages is a length, the other is a width.  There are other numbers included and I need a means to bypass those.  The actual string will look closer to

 

Val1b200X300

 

which is a 200 long by 300 wide device and I need new columns containing cells generated reflecting those values.  The additional numeric characters are not consistent in length or placement which adds complexity to the task.

 

Slán

 

SpannerHead

 


Slán



SpannerHead
jthi
Super User

Re: Count the numeric passages in a string

Are those always three characters long? Are they always the last two numbers? Are they always separated by "X"?

-Jarmo
SpannerHead
Level IV

Re: Count the numeric passages in a string

Jarmo

 

Partial violation of Murphy's law.  Not always the same length, not always the last 2 numbers but always separated by an X.

 

Slán

 

SpannerHead


Slán



SpannerHead
jthi
Super User

Re: Count the numeric passages in a string

This might be enough or you might have to add few extra conditions (check that it is either at the start/end or there is letter before/after)

Names Default To Here(1);

str = "Val1b200X300abcc1";
r = Regex Match(str, "(\d+)X(\d+)");
Show(Num(r[2]), Num(r[3]));
-Jarmo
SpannerHead
Level IV

Re: Count the numeric passages in a string

Jarmo

 

Great stuff!  Here's what I came up with in the end.  Label is the column containing the string I'm trying to perform surgery on.

 

	New Column("Width", Character, "Nominal", Formula(Regex(:Label, "(\d+)X", "\1")), Set Selected);
	New Column("Length", Character, "Nominal", Formula(Regex(:Label, "X(\d+)", "\1")), Set Selected);

 

Seems to work.

 

Slán

 

SpannerHead


Slán



SpannerHead
jthi
Super User

Re: Count the numeric passages in a string

This depends on your data, but you might want to add a small addition to your regex patterns: add the check for other number on other side of X to avoid issues where you would just have something like "1Xa"

Names Default To Here(1);

strs = {"A1X2", "A1Xa", "1X2", "a1X23", "aX2", "A1X2a"};

For Each({str}, strs,
	a1 = Regex(str, "(\d+)X", "\1");
	a2 = Regex(str, "X(\d+)", "\1");
	// vs
	b1 = Regex(str, "(\d+)X(\d+)", "\1");
	b2 = Regex(str, "(\d+)X(\d+)", "\2");
	
	Write("\!NCurrent string: ", str);
	Write("\!NFirst pattern matches: ", a1, " ", a2);
	Write("\!NSecond pattern matches: ", b1, " ", b2);
	Write("\!N");
);

Current string: A1X2
First pattern matches: 1 2
Second pattern matches: 1 2

Current string: A1Xa
First pattern matches: 1 .
Second pattern matches: . .

Current string: 1X2
First pattern matches: 1 2
Second pattern matches: 1 2

Current string: a1X23
First pattern matches: 1 23
Second pattern matches: 1 23

Current string: aX2
First pattern matches: . 2
Second pattern matches: . .

Current string: A1X2a
First pattern matches: 1 2
Second pattern matches: 1 2
-Jarmo
jthi
Super User

Re: Count the numeric passages in a string

Also my first understanding of the initial problem was to extract the longest sequence of numbers from a string, so here are few scripts for that (might be useful to someone).

 

1. Using Words() (similar idea could be used to extract non-numeric strings)

 

Names Default To Here(1);

str = "aaa1b222c3";

alphabets = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
nums = Words(str, alphabets);

lengths = Transform Each({num}, nums, Output("Matrix"), Length(num));

l_idx = Loc Max(lengths);
longest_num_seq = nums[l_idx];

show(nums, longest_num_seq); //nums = {"1", "222", "3"}, longest_num_seq = "222";

 

 

2. Using While + Regex

 

Names Default To Here(1);

str = "aaa1b222c3";

nums = {};
strcopy = str;
While(!IsMissing(match = Regex(strcopy, "\d+")),
	idx = Contains(strcopy, match);
	Insert Into(nums, match);
	Remove From(strcopy, idx, Length(match));
);

l_idx = Loc Max(lengths);
longest_num_seq = nums[l_idx];

show(nums, longest_num_seq); //nums = {"1", "222", "3"}, longest_num_seq = "222";

 

-Jarmo