I want to extract all the column name references found in a JMP formula to a list, specifically identify its column dependencies. JMP formulas can be quite complex; for example, they might create JSL code as output that includes column names. I am not interested in those column names as the formula does not depend on them. I am only interested in the column names it uses to compute the result. Is there a simple way to do this without programmatically parsing the formula?
Here is a sketch of my initial plan.
Convert the formula to a string and search for three elements: :, :\!", \!".
- First strip out all quotes. Find first \!" without a : preceding and then find its matching \!" without preceding : and without multiply escaped quotes. Recognize this as quote in the formula and remove it (or skip it). Then repeat for the entire formula
- Then search the string for names
- If : is found, continue until a "name breaking" character is found such as comma. Store as one of the names in string format
- If :\!" is found, continue until \!"n is found. Store the name string as one of the names
- Finally, deduplicate the list of names so they are unique.
Is there a more elegant way to do this?
If not, what can go wrong with the above? Are there other ways a column name can be specified in a formula?
I am willing to assume the formula was created by traditional point and click in the formula editor and does not use JSL code. So for example, JSL code that uses Parse() in the formula to resolve a column reference would be out of scope. Although, it would be nice if it could do that too :).