- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
How to split a column with unstructured string
Hi, experts
I like to split the column with unstructured string into bunches of columns. But I fail to make it because it is complicated.
my thinking is to split by line break, but can't make it from text to columns by delimiter. the alternative is to replace line break with any signals available in delimiter. the one option is to run Regex ( :Column 1, "^\R", "#",GLOBALREPLACE ), but it shows unexpected result.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: How to split a column with unstructured string
I had a similar idea : use substitute and search for 2x newlines.
To get the column names right, one could use the ":" as a separator, but:
- ":" is used also within the entries
- sample has a ":" - but misses a newline afterwards.
So I combened the two approaches:
- search 2x linebreak
- search ":"
to find the headers.
To separate the column name and the entry, one can stack the columns with the setting Contiguous.
Col2 is just used temporarily, so either it can be deleted at the end - or added as a Transform column.
Columns which are generated via Text to Columns from a Transform Column are the first ones in the table.
This makes it easy to select and stack them. Just count the new columns and select them by index.
With the Contiguous setting, some dummy columns are added, those are removed in the last step.
dt = Open( "$DOWNLOADS/example3.jmp" );
tmpCol = dt << Transform Column ( "split",
Character,
Formula(
Substitute(
Regex(
Substitute( :Column 1, "\!r\!n", "#", "Sample:", "Sample^\!n" ),
"#{2,}(.*?):",
"^\1^",
GLOBALREPLACE
),
"#", "\!r"
)
)
);
before = NItems(dt << get column names());
dt << Text to Columns(
columns (tmpCol),
Delimiters( "^" )
);
newCols = NItems(dt << get column names()) - before;
result = dt << Stack(
columns(
Eval(1::newCols)
),
Number of Series( Eval(newCols/2) ),
Contiguous,
Drop All Other Columns( 1 )
);
// Move column names up
result << Move up;
cols= result << get column names();
cols = Filter each({col}, cols, Contains(col << get name, "split"))
result << Delete Columns(cols);
Doesn't look nice and smooth ...
Plan B: set up a Turing machine to calculate the entries ; )
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: How to split a column with unstructured string
And if you love regex but still find JSL mostly incomprehensible, you can recode the column, and do a string replace (see red-triangle, "Replace string...") and do a regex replace to an oddball character ("%" is one I often use), then split the column (Cols -> Text to columns) on that character.
Then you can look like a JMP super-hero by binding alt-ctrl-shift-r to "Recode" and alt-ctrl-shift-s to "Text to Columns", and it becomes just a few keystrokes to do all of that.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: How to split a column with unstructured string
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: How to split a column with unstructured string
You can get this by substituting and then replacing double linebreaks (or JMP's \!N) with something (I used ¤
Names Default To Here(1);
dt = Open("$DOWNLOADS/example3.jmp");
dt << New Column("Col2", Character, Nominal, Formula(
Substitute(:Column 1, "\!N\!N", "¤")
));
And then one (or few more) extra steps are needed for the column names
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: How to split a column with unstructured string
I had a similar idea : use substitute and search for 2x newlines.
To get the column names right, one could use the ":" as a separator, but:
- ":" is used also within the entries
- sample has a ":" - but misses a newline afterwards.
So I combened the two approaches:
- search 2x linebreak
- search ":"
to find the headers.
To separate the column name and the entry, one can stack the columns with the setting Contiguous.
Col2 is just used temporarily, so either it can be deleted at the end - or added as a Transform column.
Columns which are generated via Text to Columns from a Transform Column are the first ones in the table.
This makes it easy to select and stack them. Just count the new columns and select them by index.
With the Contiguous setting, some dummy columns are added, those are removed in the last step.
dt = Open( "$DOWNLOADS/example3.jmp" );
tmpCol = dt << Transform Column ( "split",
Character,
Formula(
Substitute(
Regex(
Substitute( :Column 1, "\!r\!n", "#", "Sample:", "Sample^\!n" ),
"#{2,}(.*?):",
"^\1^",
GLOBALREPLACE
),
"#", "\!r"
)
)
);
before = NItems(dt << get column names());
dt << Text to Columns(
columns (tmpCol),
Delimiters( "^" )
);
newCols = NItems(dt << get column names()) - before;
result = dt << Stack(
columns(
Eval(1::newCols)
),
Number of Series( Eval(newCols/2) ),
Contiguous,
Drop All Other Columns( 1 )
);
// Move column names up
result << Move up;
cols= result << get column names();
cols = Filter each({col}, cols, Contains(col << get name, "split"))
result << Delete Columns(cols);
Doesn't look nice and smooth ...
Plan B: set up a Turing machine to calculate the entries ; )
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: How to split a column with unstructured string
thanks @jthi! for Make New Formula Column return references to the newly created columns :
Since JMP18, New Formula Column returns a list of the newly generated column
Text to Columns could return as well a list with references to the columns which were generated:
here is the Wish: saving list of cols created by text to column
Please note:
the workaround with the associative array and remove operation doesn't work here - it shuffles the columns by sorting then "alphabetically" with 10 <2 - which ruins the stack operation.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: How to split a column with unstructured string
And if you love regex but still find JSL mostly incomprehensible, you can recode the column, and do a string replace (see red-triangle, "Replace string...") and do a regex replace to an oddball character ("%" is one I often use), then split the column (Cols -> Text to columns) on that character.
Then you can look like a JMP super-hero by binding alt-ctrl-shift-r to "Recode" and alt-ctrl-shift-s to "Text to Columns", and it becomes just a few keystrokes to do all of that.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: How to split a column with unstructured string
@BHarris thanks for your advice. BTW, how do I do a regex replace linebreak with an oddball character over there?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: How to split a column with unstructured string
A linebreak is "\n" in regex speak. So search for "\n" (no quotes), and click "Use regular expressions", then replace with "%".
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: How to split a column with unstructured string
this is a wonderful idea to approach.