Choose Language Hide Translation Bar
Highlighted

Re: Continue asking about Regex segmentation

Another idea is to design and write a parser for a nested data structure (list?) to stream over the text, build the data, and then collect the desired values.

Learn it once, use it forever!
Highlighted
Craige_Hales
Staff (Retired)

Re: Continue asking about Regex segmentation

JMP 15 does a pretty good job with this JSON data (rename the file from .txt to .json, it will be easier @Justin_Chilton ); JMP 14 does not appear to handle this json case correctly. Here's the JMP 15 import using best guess, and adding the stack option:

Best Guess is under the red triangleBest Guess is under the red triangleyou might want to do two imports like this:

get the little table of column namesget the little table of column names

get the bigger table of dataget the bigger table of data

You can leave out columns you don't want. You could apply the column names from the little table to the big table, renaming value0 to 2018.

Craige

View solution in original post

Highlighted
lwx228
Level VII

Re: Continue asking about Regex segmentation

Thanks for Experts help!

I downloaded JMP15 and learned how to do this.
I compared different ways to get the data.I understand the difference between them.

 

2019-12-10_14-48.png

Highlighted
lwx228
Level VII

Re: Continue asking about Regex segmentation

Because I need to loop through something like this.


I still want to learn how to use regularization to remove excess data and retain useful data for continuous processing with JSL.Thank you very much!

Highlighted
Craige_Hales
Staff (Retired)

Re: Continue asking about Regex segmentation

If the data is json or xml, you should use JMP's json/xml parser. It is not easy to write a parser for these nested structures, and regex alone is not enough. Importing the xml/json to a table and then finding the row and column is a great choice.

 

If the data must be handled some other way, you could do it with just a simple JSL loop and a bunch of regex tests, something like this:

txt = loadtextfile(...);
ok=1;
while( ok,
 ... try various regex that remove a bit of data from txt ... set ok to 0 if nothing works.
)

That works for small files and can be easily maintained. But a large file with many bits of data will be too slow because removing a bit of data from a large txt string copies a lot of data, for each removal. (it is an n^2 problem: doubling the file size will 4X the run time.)

 

For large files, it is best to write a single pattern match that does not modify the input string. You can use regular expressions or patterns within a patRepeat() pattern that matches the entire file, and you'll use patTest(... ;1) to run some JSL inside the match that saves the parsed results into a JSL matrix or list or associative array. Before JMP had a built-in json parser, @XanGregg and I wrote some JSL to parse json files. It is still a great example, but JMP's built-in parser is better.

 

Here are some other posts that have JSL that parses a file using patMatch(... patRepeat( pattern ) ... )

https://community.jmp.com/t5/Uncharted/The-Other-Kind-of-Model/ba-p/231703 parsing a data file for a 3D model

https://community.jmp.com/t5/Uncharted/Web-Logs/ba-p/29389 parsing an Apache log file

https://community.jmp.com/t5/Uncharted/WordNet/ba-p/28984 parsing wordnet file

https://community.jmp.com/t5/Uncharted/Pronounce-Elephant/ba-p/21272 parsing the CMU phoneme file

 

I often use the >>log() function to debug complicated pattern matches.

https://community.jmp.com/t5/Uncharted/Backtracking-Secrets/ba-p/20984 using the >>log() for debugging

 

and

https://community.jmp.com/t5/Uncharted/Pattern-Matching/ba-p/21005 introduction to patMatch()

https://community.jmp.com/t5/Uncharted/Regex/ba-p/21008 introduction to regex()

 

Craige

View solution in original post

Article Labels

    There are no labels assigned to this post.