cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
StarfruitBob
Level VI

Regex: computation speed & incomplete formula run

Hello,

 

I have two questions about regex.

  1. Below I've shown some code. I'm creating formula columns that parse information from another column. The formulas grab the correct information, but ALWAYS stop at row 3024. The datasets I work with can be tens or rows, or much larger.
    dt << New column( "Column 1", formula( regex( :Other column, "pattern" ) ) );
    dt << New column( "Column 2", formula( regex( :Other column, "pattern", "\1" ) ) );
    dt << New column( "Column 3", formula( num( regex( :Other column, "pattern" ) ) ) );

    dt << Rerun Formulas; // Doesn't help!
  2. In the debugger, timing my script, the timer states that the script is complete in ~1 second, and that these three lines of code take up the bulk of the time. There are other small things going on as well.  However, most often, JMP locks up when running it and I have to either stop the script from running in the debugger, or I have to end the task in task manager.

I've used regex in a for() and for each() and had similar issues.  However, regex seems like for() and for each() more, as the formula routinely finished, but the time it takes to compete the script run locks up JMP and takes a very long time, if it finishes at all.

This is very basic regex scripting. I've done much more complex things in for loops for other projects, with much larger datasets. The dataset itself was imported as a CSV, but this is the only difference I can spot between it and other datasets.  Any ideas as to what's happening? 

Learning every day!
1 ACCEPTED SOLUTION

Accepted Solutions
Craige_Hales
Super User

Re: Regex: computation speed & incomplete formula run

I'd guess the pattern includes star or plus operations, nested in a way that takes a long time for certain data, and row 3024 has that data pattern.
Craige

View solution in original post

4 REPLIES 4
jthi
Super User

Re: Regex: computation speed & incomplete formula run

No idea what is happening, but if you don't need the formulas you can try using << Set Each Value.

-Jarmo
StarfruitBob
Level VI

Re: Regex: computation speed & incomplete formula run

Unfortunately, the reason I'm using regex is because the location, and value, of the num() of interest changes within the searched string, but it's formatting stays the same. If I move the regex function to a different line and pass the number(string) retrieved to set each value( num() ) after the row creation to populate the row, the regex() function still has the same issue.

Learning every day!
Craige_Hales
Super User

Re: Regex: computation speed & incomplete formula run

I'd guess the pattern includes star or plus operations, nested in a way that takes a long time for certain data, and row 3024 has that data pattern.
Craige
StarfruitBob
Level VI

Re: Regex: computation speed & incomplete formula run

Oh my gosh, this is embarrassing... I took a look at my regex pattern and started to compare with the strings at, and around, row 3024.  I think our database needs some fine tuning; the string has no match for the regex pattern in many of these rows.  Eek!

 

Thanks for leading me to do this, @Craige_Hales!

Learning every day!