cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
jswislar
Level III

Extracting a character string of varying lengths and positions

I want to extract character strings from a variable (Dx1), but the length and position of the strings varies, but they are separated by dashes. Sometimes there are multiple dashes. For example...

 

"7812-abcdef     -9"

"v7892-xyz    -0"

"812-fghkj    -0"

"17361-gf-jhyt   -9"

 

What I would like to do is create a new variable that extracts the digits/characters before the first dash, and then another variable that extracts the characters between the first and second dashes. So,

Var1      Var2

7812      abcdef

v7892    xyz

812        fghkj

17361    gf-jhyt

 

I tried following the example posted here, but it just extracted the dash. Changing the +1 to another number extacts that number of characters, but my data length varies.

 

https://community.jmp.com/t5/Uncharted/JSL-Character-String-Functions/ba-p/21323

 

Here's what I wrote. What am I doing wrong?

 

 

Trim(
Substr(
:Name( "Dx1" ),
Contains( :Name( "Dx1" ), "-" ),
1
)
)

 

 

2 ACCEPTED SOLUTIONS

Accepted Solutions
txnelson
Super User

Re: Extracting a character string of varying lengths and positions

You want to use the Word() function.  If you create a new column, "Var1", and place the below formula into the column, you will get the results you want

word(1,:Dx1,"-")

For Var2 it would be:

word(2,:Dx1,"-")

The Word() function is documented in the Scripting Index

     Help==>Scripting Index==>Word

Jim

View solution in original post

txnelson
Super User

Re: Extracting a character string of varying lengths and positions

65,000 rows is not a very large table for JMP to handle.  JMP can handle data tables with billions of rows.  What the issue is, is data inconsistancies in your data table.  The code assumes that at least one numeric field will be found after finding the character values for Var2.  Apparently there is a situation where a value of Dx1 does not end with numbers.  The following is a change in the format that adjusts for that issue

Final_Value = Word( 2, :Dx1, "-" );
i = 3;
(While( Is Missing( Num( Word( i, :Dx1, "-" ) ) ) == 1 & Word( i, :Dx1, "-" ) != "",
	Final_Value = Final_Value || "-" || Word( i, :Dx1, "-" );
	i++;
) ; Final_Value);
Jim

View solution in original post

14 REPLIES 14
uday_guntupalli
Level VIII

Re: Extracting a character string of varying lengths and positions

@jswislar : 
     One way to tackle this could be to use Contains and Substr functions 

a = {"abc-12","abc - 45","abc-  89"}; 
List1 = list(); 
List2 = list(); 
For(i = 1 , i <= N Items(a), i++,
		Pos = Contains(a[i],"-"); 
		Insert Into(List1,Substr(a[i],1,Pos-1)); 
		Insert Into(List2,Substr(a[i],Pos+1,Length(a[i]))); 
   );




Best
Uday
pmroz
Super User

Re: Extracting a character string of varying lengths and positions

The words function is your friend here.  Use the dash and space as delimiters and you can extract what you want easily.  Here's an example that uses formulas in a table:

dt = New Table( "Untitled", Add Rows( 4 ), 
	New Column( "dx1", Character, "Nominal",
		Set Values( {"7812-abcdef     -9", "v7892-xyz    -0", "812-fghkj    -0",
			"17361-gf-jhyt   -9"}
		)
	),
// Get the first word for Var1
	New Column( "Var1", Character, "Nominal", Formula( Words( :dx1, "- " )[1] ) ),

// Get the second word for Var1
	New Column( "Var2", Character, "Nominal", Formula( Words( :dx1, "- " )[2] ) )
);
txnelson
Super User

Re: Extracting a character string of varying lengths and positions

You want to use the Word() function.  If you create a new column, "Var1", and place the below formula into the column, you will get the results you want

word(1,:Dx1,"-")

For Var2 it would be:

word(2,:Dx1,"-")

The Word() function is documented in the Scripting Index

     Help==>Scripting Index==>Word

Jim
jswislar
Level III

Re: Extracting a character string of varying lengths and positions

Thanks! That works well.

jswislar
Level III

Re: Extracting a character string of varying lengths and positions

Is there a way to get WORD to ignore some dashes? Some of the second words I'm trying to extract are hyphenated, so I'm only getting a partial extraction. For example, for "938-ex-marine" I get "938" and "ex" rather than "ex-marine"

txnelson
Super User

Re: Extracting a character string of varying lengths and positions

JSL will allow the building and parsing of any these complex conditions.  It is up to the humans to determine what the rules are for such parsing.

 

In your example, what is the rule to determine what hyphenated values should be read as a combined value, and when to treat them as separate values?

Jim
jswislar
Level III

Re: Extracting a character string of varying lengths and positions

The only rule that I can think of that would be consistent would be that the second word appears beween numbers (#). Ultimately I want the phrase between the hyphens. So, "#-word     -#" and "#-word-word     -#" and "#-word-word word     -#"

txnelson
Super User

Re: Extracting a character string of varying lengths and positions

Here is a formula that will do what you stated as the rule....

Final_Value = Word( 2, :Dx1, "-" );
i = 3;
(While( Is Missing( Num( Word( i, :Dx1, "-" ) ) ) == 1,
	Final_Value = Final_Value || "-" || Word( i, :Dx1, "-" );
	i++;
) ; Final_Value);

However, in your sample above, the sample "17361-gf-jhyt   -9" ends up with a Var2 value of "gf-jhyt".  Is that what you want?

 

Regardless, I hope you can  see that you can construct as complex of a formula as you may need to parse the string into whatevery the correct values are.  The only limitation is that you need to be able to determine the rules that need to be used.

Jim
jswislar
Level III

Re: Extracting a character string of varying lengths and positions

Thanks, Jim. That is what I was looking for.
How do I run this script? I tried it as a column formula. That didn't work, but I didn't expect it to. I also opened a script window and ran it there, but it just stopped responding.