<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to efficiently parse structured text strings of varying number of elements in Discussions</title>
    <link>https://community.jmp.com/t5/Discussions/How-to-efficiently-parse-structured-text-strings-of-varying/m-p/49265#M28002</link>
    <description>&lt;P&gt;I don't think the 145K rows will be too slow for any of the parsing ideas. I think you will probably want indicator columns for each kind of violation, 0 or 1. If you have JMP 13, you can use text explorer on the comments; I'd suggest running all the comments, without the constant descriptions, together into a single character field.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Thu, 04 Jan 2018 20:30:28 GMT</pubDate>
    <dc:creator>Craige_Hales</dc:creator>
    <dc:date>2018-01-04T20:30:28Z</dc:date>
    <item>
      <title>How to efficiently parse structured text strings of varying number of elements</title>
      <link>https://community.jmp.com/t5/Discussions/How-to-efficiently-parse-structured-text-strings-of-varying/m-p/49233#M27988</link>
      <description>&lt;P&gt;Happy New Year Everyone!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I started exploring this dataset from Kaggle:&amp;nbsp;&lt;A title="Chicago Restaurant Inspection" href="https://www.kaggle.com/chicago/chi-restaurant-inspections" target="_blank"&gt;https://www.kaggle.com/chicago/chi-restaurant-inspections&lt;/A&gt;. Each row is a restaurant inspection, with a single column for list of violations. Thankfully, the violations are delimited by "|". Sample below. The number of violations varies.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT size="1 2 3 4 5 6 7"&gt;40. REFRIGERATION AND METAL STEM THERMOMETERS PROVIDED AND CONSPICUOUS - Comments: STEM THERMOMETER NEEDED TO MONITOR FOOD TEMPERATURES. INSTRUCTED TO PROVIDE AND MAINTAIN. | 41. PREMISES MAINTAINED FREE OF LITTER, UNNECESSARY ARTICLES, CLEANING&amp;nbsp; EQUIPMENT PROPERLY STORED - Comments: OBSERVED A FEW ITEMS NOT ELEVATED 6INCHES OFF FLOOR IN REAR PREP AREA. MUST ELEVATE AND MAINTAIN. | 38. VENTILATION: ROOMS AND EQUIPMENT VENTED AS REQUIRED: PLUMBING: INSTALLED AND MAINTAINED - Comments: EXPOSED HAND HASHING SINK IN REAR KITCHEN FOOD PREP AREA DRAINING SLOW. MUST REPAIR AND MAINTAIN. | 32. FOOD AND NON-FOOD CONTACT SURFACES PROPERLY DESIGNED, CONSTRUCTED AND MAINTAINED - Comments: OBSERVED RUSTY FOOD STORAGE SHELVING ON BASEMENT. MUST REPAINT AND MAINTAIN. | 34. FLOORS: CONSTRUCTED PER CODE, CLEANED, GOOD REPAIR, COVING INSTALLED, DUST-LESS CLEANING METHODS USED - Comments: OBSERVED DIRT AND DEBRIS ON FOOR UNDERNEATH PALLETS ON BASEMENT. MUST CLEAN AND MAINTAIN.&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Looking at the structure: ViolationID#. Violation_Type - Comments. I can think of two ways to parse these strings into ID#, Type, Comments:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;1) Cols &amp;gt; Utilities &amp;gt; Text to Columns, then Tables &amp;gt; Stack, then delete blank rows, then parse the new column with Word(text,delimiter)&lt;/P&gt;&lt;P&gt;2) Use Words(Violations, "|") to create a list like the one below then work on each of the elements in the list&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT size="1 2 3 4 5 6 7"&gt;{"40. REFRIGERATION AND METAL STEM THERMOMETERS PROVIDED AND CONSPICUOUS - Comments: STEM THERMOMETER NEEDED TO MONITOR FOOD TEMPERATURES. INSTRUCTED TO PROVIDE AND MAINTAIN. ", " 41. PREMISES MAINTAINED FREE OF LITTER, UNNECESSARY ARTICLES, CLEANING&amp;nbsp; EQUIPMENT PROPERLY STORED - Comments: OBSERVED A FEW ITEMS NOT ELEVATED 6INCHES OFF FLOOR IN REAR PREP AREA. MUST ELEVATE AND MAINTAIN. ", " 38. VENTILATION: ROOMS AND EQUIPMENT VENTED AS REQUIRED: PLUMBING: INSTALLED AND MAINTAINED - Comments: EXPOSED HAND HASHING SINK IN REAR KITCHEN FOOD PREP AREA DRAINING SLOW. MUST REPAIR AND MAINTAIN. ", " 32. FOOD AND NON-FOOD CONTACT SURFACES PROPERLY DESIGNED, CONSTRUCTED AND MAINTAINED - Comments: OBSERVED RUSTY FOOD STORAGE SHELVING ON BASEMENT. MUST REPAINT AND MAINTAIN. ", " 34. FLOORS: CONSTRUCTED PER CODE, CLEANED, GOOD REPAIR, COVING INSTALLED, DUST-LESS CLEANING METHODS USED - Comments: OBSERVED DIRT AND DEBRIS ON FOOR UNDERNEATH PALLETS ON BASEMENT. MUST CLEAN AND MAINTAIN."}&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT size="3"&gt;I don't have a lot of experience using lists. &lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT size="3"&gt;&lt;STRONG&gt;Would it be more efficient to work with the list {}, use the Text to Columns approach or is there some better way?&lt;/STRONG&gt; If the list approach is the most efficient, what is the most efficient approach to do the equivalent to Tables &amp;gt; Stack for each row, element in the list?&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;FONT size="3"&gt;Alternatively, there is a column with InspectionID#. Would it be more efficient to break this into two tables using the InspectionID# as the key instead of creating a massive stacked table?&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 04 Jan 2018 16:28:34 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/How-to-efficiently-parse-structured-text-strings-of-varying/m-p/49233#M27988</guid>
      <dc:creator>markschahl</dc:creator>
      <dc:date>2018-01-04T16:28:34Z</dc:date>
    </item>
    <item>
      <title>Re: How to efficiently parse structured text strings of varying number of elements</title>
      <link>https://community.jmp.com/t5/Discussions/How-to-efficiently-parse-structured-text-strings-of-varying/m-p/49265#M28002</link>
      <description>&lt;P&gt;I don't think the 145K rows will be too slow for any of the parsing ideas. I think you will probably want indicator columns for each kind of violation, 0 or 1. If you have JMP 13, you can use text explorer on the comments; I'd suggest running all the comments, without the constant descriptions, together into a single character field.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 04 Jan 2018 20:30:28 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/How-to-efficiently-parse-structured-text-strings-of-varying/m-p/49265#M28002</guid>
      <dc:creator>Craige_Hales</dc:creator>
      <dc:date>2018-01-04T20:30:28Z</dc:date>
    </item>
    <item>
      <title>Re: How to efficiently parse structured text strings of varying number of elements</title>
      <link>https://community.jmp.com/t5/Discussions/How-to-efficiently-parse-structured-text-strings-of-varying/m-p/49326#M28036</link>
      <description>&lt;P&gt;Just playing with the first 100 rows, here's a map:&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Locations" style="width: 999px;"&gt;&lt;img src="https://community.jmp.com/t5/image/serverpage/image-id/8863iACCBE882052287F4/image-size/large?v=v2&amp;amp;px=999" role="button" title="Graph Builder.png" alt="Locations" /&gt;&lt;span class="lia-inline-image-caption" onclick="event.preventDefault();"&gt;Locations&lt;/span&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;Here's a formula column to get just the comment fields:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-jsl"&gt;a = :Violations;
b = "";
Pat Match( a, // full violation text
    Pat Repeat(
        Pat Arb() + "Comments:" 
         + ( Pat Break( "|" ) | Pat Rem()) &amp;gt;&amp;gt; c // the interesting bit
         + Pat Test( // not really a test, just some JSL to build text
            b = b || "|" || c; // build text here
            1; // always succeed the test
        )
    ),
    NULL, // no replacement
    FULLSCAN // get the final one
);
b;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Text explorer can use the new column to form clusters&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="clusters" style="width: 896px;"&gt;&lt;img src="https://community.jmp.com/t5/image/serverpage/image-id/8868iE915E904A5F28360/image-size/large?v=v2&amp;amp;px=999" role="button" title="Capture.PNG" alt="clusters" /&gt;&lt;span class="lia-inline-image-caption" onclick="event.preventDefault();"&gt;clusters&lt;/span&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Plotting the cluster number for the comments against the actual inspection result looks promising; cluster&amp;nbsp;2 comments did not get failing grades and cluster&amp;nbsp;4 comments failed. Typically something like "OBSERVED GREASE BUILD UP ON VENTILATION HOOD FILTERS. MUST CLEAN VENTILATION HOOD FILTERS" (pass) vs "NO HAND WASHING NOR REMOVAL OF FOOD GLOVES" (fail).&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="clustered by comment text" style="width: 664px;"&gt;&lt;img src="https://community.jmp.com/t5/image/serverpage/image-id/8866i7C841A070049B104/image-size/large?v=v2&amp;amp;px=999" role="button" title="Graph Builder2.png" alt="clustered by comment text" /&gt;&lt;span class="lia-inline-image-caption" onclick="event.preventDefault();"&gt;clustered by comment text&lt;/span&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 05 Jan 2018 20:02:43 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/How-to-efficiently-parse-structured-text-strings-of-varying/m-p/49326#M28036</guid>
      <dc:creator>Craige_Hales</dc:creator>
      <dc:date>2018-01-05T20:02:43Z</dc:date>
    </item>
    <item>
      <title>Re: How to efficiently parse structured text strings of varying number of elements</title>
      <link>https://community.jmp.com/t5/Discussions/How-to-efficiently-parse-structured-text-strings-of-varying/m-p/49390#M28088</link>
      <description>&lt;P&gt;Craige:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks! Being a text analytics newbie, I have never used the Pattern Matching functions before. Nice solution to the problem. I will definitely spend some time learning about regex and pattern matching.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;BTW - text analytics would make a great Mastering JMP Webinar topic!&lt;/P&gt;</description>
      <pubDate>Mon, 08 Jan 2018 18:28:04 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/How-to-efficiently-parse-structured-text-strings-of-varying/m-p/49390#M28088</guid>
      <dc:creator>markschahl</dc:creator>
      <dc:date>2018-01-08T18:28:04Z</dc:date>
    </item>
  </channel>
</rss>

