cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
jswislar
Level III

Extracting a character string of varying lengths and positions

I want to extract character strings from a variable (Dx1), but the length and position of the strings varies, but they are separated by dashes. Sometimes there are multiple dashes. For example...

 

"7812-abcdef     -9"

"v7892-xyz    -0"

"812-fghkj    -0"

"17361-gf-jhyt   -9"

 

What I would like to do is create a new variable that extracts the digits/characters before the first dash, and then another variable that extracts the characters between the first and second dashes. So,

Var1      Var2

7812      abcdef

v7892    xyz

812        fghkj

17361    gf-jhyt

 

I tried following the example posted here, but it just extracted the dash. Changing the +1 to another number extacts that number of characters, but my data length varies.

 

https://community.jmp.com/t5/Uncharted/JSL-Character-String-Functions/ba-p/21323

 

Here's what I wrote. What am I doing wrong?

 

 

Trim(
Substr(
:Name( "Dx1" ),
Contains( :Name( "Dx1" ), "-" ),
1
)
)

 

 

14 REPLIES 14
txnelson
Super User

Re: Extracting a character string of varying lengths and positions

The code in included is a Column Formula.  I have attached a data table with the Dx1 column you used in your example, along with the new Var1 and Var2 columns that have their Column Formulas specified. 

Jim
jswislar
Level III

Re: Extracting a character string of varying lengths and positions

Thanks, Jim. That is how I tried it previouslly. For some reason it seems to crash JMP Pro 13 when I run it. Maybe because my data table is 65,000 lines?

txnelson
Super User

Re: Extracting a character string of varying lengths and positions

65,000 rows is not a very large table for JMP to handle.  JMP can handle data tables with billions of rows.  What the issue is, is data inconsistancies in your data table.  The code assumes that at least one numeric field will be found after finding the character values for Var2.  Apparently there is a situation where a value of Dx1 does not end with numbers.  The following is a change in the format that adjusts for that issue

Final_Value = Word( 2, :Dx1, "-" );
i = 3;
(While( Is Missing( Num( Word( i, :Dx1, "-" ) ) ) == 1 & Word( i, :Dx1, "-" ) != "",
	Final_Value = Final_Value || "-" || Word( i, :Dx1, "-" );
	i++;
) ; Final_Value);
Jim
jswislar
Level III

Re: Extracting a character string of varying lengths and positions

Thanks again, Jim.

I've used many more rows than this. It was just a hypothesis.

Every case of DX1 ends in a number, but the number of spaces between what I'm extracting for Var2 and that number varies. Maybe that was causing it. Either way, your solution worked.

ms
Super User (Alumni) ms
Super User (Alumni)

Re: Extracting a character string of varying lengths and positions

The Regex() function could be an efficient alternative here if you experience stability or performance problems (and a while-loop that goes infinite in a column formula can be frustrating to debug).

 

//Formula for Var1
Regex(Dx1, "(^.*?)-", "\1");

//Formula for Var2
Regex(Dx1, "-(.+)-\d", "\1");