Choose Language Hide Translation Bar
Highlighted
lwx228
Level VII

Continue asking about Regex segmentation

I made a mistake.Writing code like this doesn't get you the right result.

c=Regex("abcdef{:[[ghi]]:[{op},{q}]}", ":\[\[]\([^\]]*)\]", "[\1]");

Need to get:

[{op},{q}]

So how to write JSL. Thank you very much!

 

 

 

2 ACCEPTED SOLUTIONS

Accepted Solutions
Highlighted
Craige_Hales
Staff (Retired)

Re: Continue asking about Regex segmentation

JMP 15 does a pretty good job with this JSON data (rename the file from .txt to .json, it will be easier @Justin_Chilton ); JMP 14 does not appear to handle this json case correctly. Here's the JMP 15 import using best guess, and adding the stack option:

Best Guess is under the red triangleBest Guess is under the red triangleyou might want to do two imports like this:

get the little table of column namesget the little table of column names

get the bigger table of dataget the bigger table of data

You can leave out columns you don't want. You could apply the column names from the little table to the big table, renaming value0 to 2018.

Craige

View solution in original post

Highlighted
Craige_Hales
Staff (Retired)

Re: Continue asking about Regex segmentation

If the data is json or xml, you should use JMP's json/xml parser. It is not easy to write a parser for these nested structures, and regex alone is not enough. Importing the xml/json to a table and then finding the row and column is a great choice.

 

If the data must be handled some other way, you could do it with just a simple JSL loop and a bunch of regex tests, something like this:

txt = loadtextfile(...);
ok=1;
while( ok,
 ... try various regex that remove a bit of data from txt ... set ok to 0 if nothing works.
)

That works for small files and can be easily maintained. But a large file with many bits of data will be too slow because removing a bit of data from a large txt string copies a lot of data, for each removal. (it is an n^2 problem: doubling the file size will 4X the run time.)

 

For large files, it is best to write a single pattern match that does not modify the input string. You can use regular expressions or patterns within a patRepeat() pattern that matches the entire file, and you'll use patTest(... ;1) to run some JSL inside the match that saves the parsed results into a JSL matrix or list or associative array. Before JMP had a built-in json parser, @XanGregg and I wrote some JSL to parse json files. It is still a great example, but JMP's built-in parser is better.

 

Here are some other posts that have JSL that parses a file using patMatch(... patRepeat( pattern ) ... )

https://community.jmp.com/t5/Uncharted/The-Other-Kind-of-Model/ba-p/231703 parsing a data file for a 3D model

https://community.jmp.com/t5/Uncharted/Web-Logs/ba-p/29389 parsing an Apache log file

https://community.jmp.com/t5/Uncharted/WordNet/ba-p/28984 parsing wordnet file

https://community.jmp.com/t5/Uncharted/Pronounce-Elephant/ba-p/21272 parsing the CMU phoneme file

 

I often use the >>log() function to debug complicated pattern matches.

https://community.jmp.com/t5/Uncharted/Backtracking-Secrets/ba-p/20984 using the >>log() for debugging

 

and

https://community.jmp.com/t5/Uncharted/Pattern-Matching/ba-p/21005 introduction to patMatch()

https://community.jmp.com/t5/Uncharted/Regex/ba-p/21008 introduction to regex()

 

Craige

View solution in original post

14 REPLIES 14
Highlighted
lwx228
Level VII

Re: Continue asking about Regex segmentation

It doesn't work either:

c=Regex("abcdef{:[[ghi]]:[{op},{q}]}", ":\[\[]{([^\]]*)\]", "[{\1]");
Highlighted
txnelson
Super User

Re: Continue asking about Regex segmentation

Here is how I would do it:

Names Default To Here( 1 );
x = "abcdef{:[[ghi]]:[{op},{q}]}";

c=word(1,word(3,x,":"),"]") || "]";
Jim
Highlighted
Ryan_Gilmore
Community Manager Community Manager

Re: Continue asking about Regex segmentation

Here's a potential alternative:

 

c = Regex( "abcdef{:[[ghi]]:[{op},{q}]}", "\!\[(\{\w+\},*)+]" );
Highlighted

Re: Continue asking about Regex segmentation

Another form:

 

Names Default to Here( 1 );

string = "abcdef{:[[ghi]]:[{op},{q}]}";

c = Regex( string, "\{:.+:(.+)\}", "\1" );
Learn it once, use it forever!
Highlighted
lwx228
Level VII

Re: Continue asking about Regex segmentation

The above code for actual data is still not up to scratch.
I don't think I found the key.

2019-12-05_08-33.png

 

Please invite experts to continue the guidance.Thanks!

Highlighted

Re: Continue asking about Regex segmentation

The actual text is much more complex than the example that you posed and we solved.

 

You posted this problem before. I tried then to first convert the text to a JMP list and then iterate over the items, rather than trying to work with the whole text. JMP fought me the whole way and nothing I tried worked. Perhaps someone else knows how to get a list. It seems so simple. I think that processing the strings with Regex() would be much more straight-forward with the text broken up into list items.

Learn it once, use it forever!
Highlighted
lwx228
Level VII

Re: Continue asking about Regex segmentation

I don't know enough.

I thought it was just partition
Start with :[{
and end with the second}]
so can get [{...}] and between them.

It seems that my original understanding was too hasty.

Thanks!

Highlighted
lwx228
Level VII

Re: Continue asking about Regex segmentation

I got two orthogonal codes in the VBA community, but I still won't change them to JSL code.
Ask for expert guidance.Thanks!

Sub t()
 Set reg = CreateObject("vbscript.regexp")
 reg.Pattern = "^(?:.*?(\[.*?]))*|."
 reg.Global = True
 Cells(1, 2) = reg.Replace(Cells(1, 1).Value, "$1")
 End Sub

2019-12-06_19-07.png

Highlighted
lwx228
Level VII

Re: Continue asking about Regex segmentation

The other code is

 

"\[[^\[]+(?=}$)"
Article Labels

    There are no labels assigned to this post.