Happy New Year Everyone!
I started exploring this dataset from Kaggle: https://www.kaggle.com/chicago/chi-restaurant-inspections. Each row is a restaurant inspection, with a single column for list of violations. Thankfully, the violations are delimited by "|". Sample below. The number of violations varies.
40. REFRIGERATION AND METAL STEM THERMOMETERS PROVIDED AND CONSPICUOUS - Comments: STEM THERMOMETER NEEDED TO MONITOR FOOD TEMPERATURES. INSTRUCTED TO PROVIDE AND MAINTAIN. | 41. PREMISES MAINTAINED FREE OF LITTER, UNNECESSARY ARTICLES, CLEANING EQUIPMENT PROPERLY STORED - Comments: OBSERVED A FEW ITEMS NOT ELEVATED 6INCHES OFF FLOOR IN REAR PREP AREA. MUST ELEVATE AND MAINTAIN. | 38. VENTILATION: ROOMS AND EQUIPMENT VENTED AS REQUIRED: PLUMBING: INSTALLED AND MAINTAINED - Comments: EXPOSED HAND HASHING SINK IN REAR KITCHEN FOOD PREP AREA DRAINING SLOW. MUST REPAIR AND MAINTAIN. | 32. FOOD AND NON-FOOD CONTACT SURFACES PROPERLY DESIGNED, CONSTRUCTED AND MAINTAINED - Comments: OBSERVED RUSTY FOOD STORAGE SHELVING ON BASEMENT. MUST REPAINT AND MAINTAIN. | 34. FLOORS: CONSTRUCTED PER CODE, CLEANED, GOOD REPAIR, COVING INSTALLED, DUST-LESS CLEANING METHODS USED - Comments: OBSERVED DIRT AND DEBRIS ON FOOR UNDERNEATH PALLETS ON BASEMENT. MUST CLEAN AND MAINTAIN.
Looking at the structure: ViolationID#. Violation_Type - Comments. I can think of two ways to parse these strings into ID#, Type, Comments:
1) Cols > Utilities > Text to Columns, then Tables > Stack, then delete blank rows, then parse the new column with Word(text,delimiter)
2) Use Words(Violations, "|") to create a list like the one below then work on each of the elements in the list
{"40. REFRIGERATION AND METAL STEM THERMOMETERS PROVIDED AND CONSPICUOUS - Comments: STEM THERMOMETER NEEDED TO MONITOR FOOD TEMPERATURES. INSTRUCTED TO PROVIDE AND MAINTAIN. ", " 41. PREMISES MAINTAINED FREE OF LITTER, UNNECESSARY ARTICLES, CLEANING EQUIPMENT PROPERLY STORED - Comments: OBSERVED A FEW ITEMS NOT ELEVATED 6INCHES OFF FLOOR IN REAR PREP AREA. MUST ELEVATE AND MAINTAIN. ", " 38. VENTILATION: ROOMS AND EQUIPMENT VENTED AS REQUIRED: PLUMBING: INSTALLED AND MAINTAINED - Comments: EXPOSED HAND HASHING SINK IN REAR KITCHEN FOOD PREP AREA DRAINING SLOW. MUST REPAIR AND MAINTAIN. ", " 32. FOOD AND NON-FOOD CONTACT SURFACES PROPERLY DESIGNED, CONSTRUCTED AND MAINTAINED - Comments: OBSERVED RUSTY FOOD STORAGE SHELVING ON BASEMENT. MUST REPAINT AND MAINTAIN. ", " 34. FLOORS: CONSTRUCTED PER CODE, CLEANED, GOOD REPAIR, COVING INSTALLED, DUST-LESS CLEANING METHODS USED - Comments: OBSERVED DIRT AND DEBRIS ON FOOR UNDERNEATH PALLETS ON BASEMENT. MUST CLEAN AND MAINTAIN."}
I don't have a lot of experience using lists.
Would it be more efficient to work with the list {}, use the Text to Columns approach or is there some better way? If the list approach is the most efficient, what is the most efficient approach to do the equivalent to Tables > Stack for each row, element in the list?
Alternatively, there is a column with InspectionID#. Would it be more efficient to break this into two tables using the InspectionID# as the key instead of creating a massive stacked table?