cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Browse apps to extend the software in the new JMP Marketplace
Choose Language Hide Translation Bar
ReliabilityWolf
Level III

How to split a column with unstructured string

Hi, experts

I like to split the column with unstructured string into bunches of columns. But I fail to make it because it is complicated. 

my thinking is to split by line break, but can't make it from text to columns by delimiter. the alternative is to replace line break with any signals available in delimiter. the one option is to run Regex ( :Column 1, "^\R", "#",GLOBALREPLACE ), but it shows unexpected result. 

ReliabilityWolf_0-1725805251605.png

 

2 ACCEPTED SOLUTIONS

Accepted Solutions
hogi
Level XII

Re: How to split a column with unstructured string

I had a similar idea :  use substitute and search for 2x newlines.

 

To get the column names right, one could use the ":" as a separator, but:
- ":" is used also within the entries

- sample has a ":" - but misses a newline afterwards.

 

So I combened the two approaches: 
- search 2x linebreak

- search ":"

to find the headers.

 

To separate the column name and the entry, one can stack the columns with the setting Contiguous.

 

Col2 is just used temporarily, so either it can be deleted at the end - or added as a Transform column.

Columns which are generated via Text to Columns from a Transform Column are the first ones in the table.
This makes it easy to select and stack them. Just count the new columns and select them by index.

 

With the Contiguous setting, some dummy columns are added, those are removed in the last step.

 

hogi_1-1725812518874.png

 

dt = Open( "$DOWNLOADS/example3.jmp" );
tmpCol = dt << Transform Column ( "split",
	Character,
	Formula(
		Substitute(
				Regex(
					Substitute( :Column 1, "\!r\!n", "#", "Sample:", "Sample^\!n" ),
					"#{2,}(.*?):",
					"^\1^",
					GLOBALREPLACE
				),
			"#", "\!r"
		)
	)
);
before = NItems(dt << get column names());
dt << Text to Columns(
	columns (tmpCol),
	Delimiters( "^" )
);
newCols = NItems(dt << get column names()) - before;

result = dt << Stack(
	columns(
		Eval(1::newCols)
	),
	Number of Series( Eval(newCols/2) ),
	Contiguous,
	Drop All Other Columns( 1 )
);


// Move column names up
result << Move up;

cols= result << get column names();
cols = Filter each({col}, cols, Contains(col << get name, "split"))
result << Delete Columns(cols);

Doesn't look nice and smooth ...

Plan B: set up a Turing machine to calculate the entries ; )

View solution in original post

BHarris
Level VI

Re: How to split a column with unstructured string

And if you love regex but still find JSL mostly incomprehensible, you can recode the column, and do a string replace (see red-triangle, "Replace string...") and do a regex replace to an oddball character ("%" is one I often use), then split the column (Cols -> Text to columns) on that character.

 

Then you can look like a JMP super-hero by binding alt-ctrl-shift-r to "Recode" and alt-ctrl-shift-s to "Text to Columns", and it becomes just a few keystrokes to do all of that.

View solution in original post

8 REPLIES 8
ReliabilityWolf
Level III

Re: How to split a column with unstructured string

 
jthi
Super User

Re: How to split a column with unstructured string

You can get this by substituting and then replacing double linebreaks (or JMP's \!N) with something (I used ¤

Names Default To Here(1);

dt = Open("$DOWNLOADS/example3.jmp");

dt << New Column("Col2", Character, Nominal, Formula(
	Substitute(:Column 1, "\!N\!N", "¤")
));

jthi_0-1725807408662.png

And then one (or few more) extra steps are needed for the column names

-Jarmo
hogi
Level XII

Re: How to split a column with unstructured string

I had a similar idea :  use substitute and search for 2x newlines.

 

To get the column names right, one could use the ":" as a separator, but:
- ":" is used also within the entries

- sample has a ":" - but misses a newline afterwards.

 

So I combened the two approaches: 
- search 2x linebreak

- search ":"

to find the headers.

 

To separate the column name and the entry, one can stack the columns with the setting Contiguous.

 

Col2 is just used temporarily, so either it can be deleted at the end - or added as a Transform column.

Columns which are generated via Text to Columns from a Transform Column are the first ones in the table.
This makes it easy to select and stack them. Just count the new columns and select them by index.

 

With the Contiguous setting, some dummy columns are added, those are removed in the last step.

 

hogi_1-1725812518874.png

 

dt = Open( "$DOWNLOADS/example3.jmp" );
tmpCol = dt << Transform Column ( "split",
	Character,
	Formula(
		Substitute(
				Regex(
					Substitute( :Column 1, "\!r\!n", "#", "Sample:", "Sample^\!n" ),
					"#{2,}(.*?):",
					"^\1^",
					GLOBALREPLACE
				),
			"#", "\!r"
		)
	)
);
before = NItems(dt << get column names());
dt << Text to Columns(
	columns (tmpCol),
	Delimiters( "^" )
);
newCols = NItems(dt << get column names()) - before;

result = dt << Stack(
	columns(
		Eval(1::newCols)
	),
	Number of Series( Eval(newCols/2) ),
	Contiguous,
	Drop All Other Columns( 1 )
);


// Move column names up
result << Move up;

cols= result << get column names();
cols = Filter each({col}, cols, Contains(col << get name, "split"))
result << Delete Columns(cols);

Doesn't look nice and smooth ...

Plan B: set up a Turing machine to calculate the entries ; )

hogi
Level XII

Re: How to split a column with unstructured string

thanks @jthi! for Make New Formula Column return references to the newly created columns :
Since JMP18, New Formula Column returns a list of the newly generated column

 

Text to Columns could return as well a list with references to the columns which were generated:
here is the Wish: saving list of cols created by text to column 

Please note:
the workaround with the associative array and remove operation doesn't work here -  it shuffles the columns by sorting then "alphabetically" with 10 <2 - which ruins the stack operation.

 

 

BHarris
Level VI

Re: How to split a column with unstructured string

And if you love regex but still find JSL mostly incomprehensible, you can recode the column, and do a string replace (see red-triangle, "Replace string...") and do a regex replace to an oddball character ("%" is one I often use), then split the column (Cols -> Text to columns) on that character.

 

Then you can look like a JMP super-hero by binding alt-ctrl-shift-r to "Recode" and alt-ctrl-shift-s to "Text to Columns", and it becomes just a few keystrokes to do all of that.

ReliabilityWolf
Level III

Re: How to split a column with unstructured string

@BHarris thanks for your advice. BTW, how do I do a regex replace linebreak with an oddball character over there? 

ReliabilityWolf_0-1726285714494.png

 

BHarris
Level VI

Re: How to split a column with unstructured string

A linebreak is "\n" in regex speak.  So search for "\n" (no quotes), and click "Use regular expressions", then replace with "%".

ReliabilityWolf
Level III

Re: How to split a column with unstructured string

this is a wonderful idea to approach.