cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
lala
Level VIII

Why does regex replacement not remove empty lines?

The purpose is to download web content, keep the visible content, and remove lines with only Spaces.
I use the following JSL but can't remove blank lines?

 

Thanks!

u = "https://www.jmp.com/support/help/zh-cn/17.2/jmp/jsl-terminology.shtml";
txt = Load Text File( u );
a = Length( txt );
t1 = Regex( txt, "<(.[^>]{0,})>", "", globalreplace );
t2 = Regex( t1, "  ", "", globalreplace );
t2 = Regex( t2, "^( {0,})\!n", "", globalreplace );
t2 = Regex( t2, "\!n\!n", "", globalreplace );
3 ACCEPTED SOLUTIONS

Accepted Solutions
Craige_Hales
Super User

回复: Why does regex replacement not remove empty lines?

...and typically, I'd replace runs of white space with

t2 = Regex( t1, "[ \n\r]+", " ", globalreplace );

(a space is also in the character set, and the replacement is a space) to keep the output a little more like the input, with the runs of white space compressed to a single space. (I removed the other t2= statements as well.)

 &#32; Expression 2: Sum = 0; For( i = 1, i &lt;= 10, i++, 	Sum += i; 	Sh
ow( i, Sum ); ); You can format your script in any way that you like. However, th
e script editor can also format your script for you. The Scripting Guide uses th
e script editor’s default formatting for capitalization, spaces, returns, tabs, and s
o on. See Work with the Script Editor for more information about using the scri
pt editor. Note: The only white space exception is two-character operators (suc
h as &lt;= or ++). The operators cannot be separated by a space. 需要更多信
息?有问题?从 JMP 用户社区得到解答&#32;(community.jmp.com). 本网站在
启用 JavaScript 的情况下效果最佳。 ";
Craige

View solution in original post

Craige_Hales
Super User

回复: Why does regex replacement not remove empty lines?

@Jeff_Perkinson 

The site is sending you an "unsupported browser" page rather than the data you are expecting (your new link is slightly different).

Talk with the JMP sales team about your needs. I believe some documentation is already translated.

 

This works:

 

u = "https://www.jmp.com/support/help/zh-cn/17.2/jmp/jsl-terminology.shtml";
txt = Load Text File( u );
regexmatch(txt,"JSLString.+?</p>")

 

{
"JSLString\!">\!"Hello, World\!"</span>;</pre>
<pre id=\!"ww220404\!" class=\!"code\!"><span class=\!"JSLOperatorName\!">Show</span>( A );</pre>
<p id=\!"ww220406\!" class=\!"codeOutput\!">A = \!"Hello, World\!";</p>"
}

 

Craige

View solution in original post

Craige_Hales
Super User

回复: Why does regex replacement not remove empty lines?

No, and maybe yes.

Add flag to Regex Match() to find all non-overlapping occurances of pattern 

Regex: add options for all flags 

You can use regex inside a pattern match to do this:

u = "https://www.jmp.com/support/help/zh-cn/17.2/jmp/jsl-terminology.shtml";
txt = Load Text File( u );
matches = {}; // record matches here
rc = Pat Match(
	txt,
	Pat Repeat(
		Pat Pos() >> pos// remember the pos just before JSLString text
		+ Pat Regex( "JSLString.+?</p>" ) >> str// keep the matched text in str
		+ Pat Test( // use the test to inject some JSL into the matcher
			matches[nitems(matches)+1] = Eval List( {pos, str} ); // record the match
			1; // the "test" succeeds
		) + Pat Arb() // skip over more and more arbitrary text
	) + Pat R Pos( 0 ) // make sure reach the end
);
Show( rc, nitems(matches) );
For( i = 1, i <= nitems(matches), i += 1,
	Show( i, matches[i] )
);

Also, your regex does not really match the HTML. It will miss some and combine some. HTML is not the proper markup for reworking the text. The above finds 2 of 3 matches on the page (combining 2 and 3 into the 2nd) because the </p> does not happen between 2 and 3.

rc = 1;
N Items(matches) = 2;
...first match...
i = 1;
matches[i] = {8591, "JSLString\!">\!"Hello, World\!"</span>;</pre>
          <pre id=\!"ww220404\!" class=\!"code\!"><span class=\!"JSLOperatorName\!">Show</span>( A );</pre>
          <p id=\!"ww220406\!" class=\!"codeOutput\!">A = \!"Hello, World\!";</p>"};
... second match...
i = 2;
matches[i] = {10559, "JSLString\!">\!"My Line Graph\!"</span> ),</pre>
          <pre id=\!"ww229621\!" class=\!"code\!">		Frame Size( <span class=\!"JSLNumber\!">300</span>, <span class=\!"JSLNumber\!">500</span> ),</pre>
          <pre id=\!"ww229622\!" class=\!"code\!">		<span class=\!"JSLOperatorName\!">Marker</span>( <span class=\!"JSLOperatorName\!">Marker State</span>( <span class=\!"JSLNumber\!">3</span> ), [<span class=\!"JSLNumber\!">11</span> <span class=\!"JSLNumber\!">44</span> <span class=\!"JSLNumber\!">77</span>], [<span class=\!"JSLNumber\!">75</span> <span class=\!"JSLNumber\!">25</span> <span class=\!"JSLNumber\!">50</span>] );</pre>
          <pre id=\!"ww229623\!" class=\!"code\!">		<span class=\!"JSLOperatorName\!">Pen Color</span>( <span 
...there was no <p>, so the second match continues...
class=\!"JSLString\!">\!"Blue\!" </span>);</pre>
          <pre id=\!"ww229612\!" class=\!"code\!">		<span class=\!"JSLOperatorName\!">Line</span>( [<span class=\!"JSLNumber\!">10 30 70</span>], [<span class=\!"JSLNumber\!">88 22 44</span>] ));</pre>
          <p id=\!"ww236173\!" class=\!"body\!">Note that the <span class=\!"code\!">Frame Size()</span> arguments <span class=\!"code\!">300</span> and <span class=\!"code\!">500</span> are not named. The position of these arguments implies meaning; the first argument is always the width, the second argument is always the height.</p>"};

 

 

 

Craige

View solution in original post

14 REPLIES 14
lala
Level VIII

回复: Why does regex replacement not remove empty lines?

The same JSL can replace such empty lines, I do not know what went wrong?

Thanks!

txt =
"-->\!n\!n\!n\!nJSL Terminology\!n\!n\!n\!n\!n\!n\!nScripting Guide > Introduction to Writing JSL Scripts > JSL Terminology";
t1 = Regex( txt, "\!n\!n+", "", globalreplace );
Craige_Hales
Super User

回复: Why does regex replacement not remove empty lines?

I used hex(t2) to see what was left in the string:

...0A0D0A0D0AE69CACE7BD91E7AB99E59CA8E590AFE794A8204A
61766153637269707420E79A84E68385E586B5E4B88BE69588E69E
9CE69C80E4BDB3E380820D0A0D0A0D0A0D0A0D0A0D0A0D0A0D0A"

The 0D0A pairs are CR LF pairs. \!n is JMP's escape for LF. I'd use this regex to match a run of any mix-and-match of CR and LF:

t2 = Regex( t2, "[\n\r]+", "", globalreplace );

Regex does not use the ! in the escape.

Craige
Craige_Hales
Super User

回复: Why does regex replacement not remove empty lines?

...and typically, I'd replace runs of white space with

t2 = Regex( t1, "[ \n\r]+", " ", globalreplace );

(a space is also in the character set, and the replacement is a space) to keep the output a little more like the input, with the runs of white space compressed to a single space. (I removed the other t2= statements as well.)

 &#32; Expression 2: Sum = 0; For( i = 1, i &lt;= 10, i++, 	Sum += i; 	Sh
ow( i, Sum ); ); You can format your script in any way that you like. However, th
e script editor can also format your script for you. The Scripting Guide uses th
e script editor’s default formatting for capitalization, spaces, returns, tabs, and s
o on. See Work with the Script Editor for more information about using the scri
pt editor. Note: The only white space exception is two-character operators (suc
h as &lt;= or ++). The operators cannot be separated by a space. 需要更多信
息?有问题?从 JMP 用户社区得到解答&#32;(community.jmp.com). 本网站在
启用 JavaScript 的情况下效果最佳。 ";
Craige
Craige_Hales
Super User

回复: Why does regex replacement not remove empty lines?

or even this

t2 = Regex( t1, "\s+", " ", globalreplace );

which uses regex "match any white space" \s which probably includes a few other characters like form-feed that are unlikely to occur.

Craige
lala
Level VIII

回复: Why does regex replacement not remove empty lines?

Thank Craige!

Your help helped me further learn how JMP handles regex.Thank you so much for your time!
But the complicated ones don't:
For example, this JSL download txt

20240509151922.png
How can I match <">JSLString"> across multiple lines before the first </p>?
Replace it with something like this.txt has many places like this, need to use JSL implementation.

2024-05-09_15-21-18.png

 

lala
Level VIII

回复: Why does regex replacement not remove empty lines?

  • I tried EmEditor can implement. But JSL's did not succeed.

Thanks Experts!

2024-05-09_15-28-54.png

lala
Level VIII

回复: Why does regex replacement not remove empty lines?

JSLString.+?</p>
Craige_Hales
Super User

回复: Why does regex replacement not remove empty lines?

That looks like it should work. I'd need a complete example to understand what it is doing.

The JSL regex function . (period) always matches a newline, or any other character. + (plus) means repeat the . one or more times.

The reason you want to use .+? is to reluctantly(as few as possible) match one or more characters. Without the ? (question mark) the .+ would be greedy(as many as possible) and probably skip forward too far.

 

Craige
lala
Level VIII

回复: Why does regex replacement not remove empty lines?

Thank Craige!

I want to capture and automatically translate JMP online help content about JSL into Chinese.

		u = "https://www.jmp.com/support/help/zh-cn/17.2/#page/jmp/jsl-terminology.shtml";
		txt = Load Text File( u );
		t1 = Regex( txt, "<(.[^>]{0,})>", "", globalreplace );
		t2 = Regex( t1, "[ \n\r][ \n\r]+", "\!n  ", globalreplace );


However, the section about JSL code in the help document does not need to be translated, want to use the keyword

2024-05-09_22-10-00.png

 

"JSLOperatorName" in the web source code, and the first </p> after it to mark the scope of the code that does not need to be translated.
Since sections of JSL code on the same page may span multiple lines, JSL regex is used to automatically recognize tags.

2024-05-09_22-11-18.png