cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
New to using JMP? Hit the ground running with the Early User Edition of Discovery Summit. Register now, free of charge.
Register for our Discovery Summit 2024 conference, Oct. 21-24, where you’ll learn, connect, and be inspired.
Choose Language Hide Translation Bar
ReliabilityWolf
Level III

How to split a column with unstructured string

Hi, experts

I like to split the column with unstructured string into bunches of columns. But I fail to make it because it is complicated. 

my thinking is to split by line break, but can't make it from text to columns by delimiter. the alternative is to replace line break with any signals available in delimiter. the one option is to run Regex ( :Column 1, "^\R", "#",GLOBALREPLACE ), but it shows unexpected result. 

ReliabilityWolf_0-1725805251605.png

 

1 ACCEPTED SOLUTION

Accepted Solutions
hogi
Level XI

Re: How to split a column with unstructured string

I had a similar idea :  use substitute and search for 2x newlines.

 

To get the column names right, one could use the ":" as a separator, but:
- ":" is used also within the entries

- sample has a ":" - but misses a newline afterwards.

 

So I combened the two approaches: 
- search 2x linebreak

- search ":"

to find the headers.

 

To separate the column name and the entry, one can stack the columns with the setting Contiguous.

 

Col2 is just used temporarily, so either it can be deleted at the end - or added as a Transform column.

Columns which are generated via Text to Columns from a Transform Column are the first ones in the table.
This makes it easy to select and stack them. Just count the new columns and select them by index.

 

With the Contiguous setting, some dummy columns are added, those are removed in the last step.

 

hogi_1-1725812518874.png

 

dt = Open( "$DOWNLOADS/example3.jmp" );
tmpCol = dt << Transform Column ( "split",
	Character,
	Formula(
		Substitute(
				Regex(
					Substitute( :Column 1, "\!r\!n", "#", "Sample:", "Sample^\!n" ),
					"#{2,}(.*?):",
					"^\1^",
					GLOBALREPLACE
				),
			"#", "\!r"
		)
	)
);
before = NItems(dt << get column names());
dt << Text to Columns(
	columns (tmpCol),
	Delimiters( "^" )
);
newCols = NItems(dt << get column names()) - before;

result = dt << Stack(
	columns(
		Eval(1::newCols)
	),
	Number of Series( Eval(newCols/2) ),
	Contiguous,
	Drop All Other Columns( 1 )
);


// Move column names up
result << Move up;

cols= result << get column names();
cols = Filter each({col}, cols, Contains(col << get name, "split"))
result << Delete Columns(cols);

Doesn't look nice and smooth ...

Plan B: set up a Turing machine to calculate the entries ; )

View solution in original post

5 REPLIES 5
ReliabilityWolf
Level III

Re: How to split a column with unstructured string

 
jthi
Super User

Re: How to split a column with unstructured string

You can get this by substituting and then replacing double linebreaks (or JMP's \!N) with something (I used ¤

Names Default To Here(1);

dt = Open("$DOWNLOADS/example3.jmp");

dt << New Column("Col2", Character, Nominal, Formula(
	Substitute(:Column 1, "\!N\!N", "¤")
));

jthi_0-1725807408662.png

And then one (or few more) extra steps are needed for the column names

-Jarmo
hogi
Level XI

Re: How to split a column with unstructured string

I had a similar idea :  use substitute and search for 2x newlines.

 

To get the column names right, one could use the ":" as a separator, but:
- ":" is used also within the entries

- sample has a ":" - but misses a newline afterwards.

 

So I combened the two approaches: 
- search 2x linebreak

- search ":"

to find the headers.

 

To separate the column name and the entry, one can stack the columns with the setting Contiguous.

 

Col2 is just used temporarily, so either it can be deleted at the end - or added as a Transform column.

Columns which are generated via Text to Columns from a Transform Column are the first ones in the table.
This makes it easy to select and stack them. Just count the new columns and select them by index.

 

With the Contiguous setting, some dummy columns are added, those are removed in the last step.

 

hogi_1-1725812518874.png

 

dt = Open( "$DOWNLOADS/example3.jmp" );
tmpCol = dt << Transform Column ( "split",
	Character,
	Formula(
		Substitute(
				Regex(
					Substitute( :Column 1, "\!r\!n", "#", "Sample:", "Sample^\!n" ),
					"#{2,}(.*?):",
					"^\1^",
					GLOBALREPLACE
				),
			"#", "\!r"
		)
	)
);
before = NItems(dt << get column names());
dt << Text to Columns(
	columns (tmpCol),
	Delimiters( "^" )
);
newCols = NItems(dt << get column names()) - before;

result = dt << Stack(
	columns(
		Eval(1::newCols)
	),
	Number of Series( Eval(newCols/2) ),
	Contiguous,
	Drop All Other Columns( 1 )
);


// Move column names up
result << Move up;

cols= result << get column names();
cols = Filter each({col}, cols, Contains(col << get name, "split"))
result << Delete Columns(cols);

Doesn't look nice and smooth ...

Plan B: set up a Turing machine to calculate the entries ; )

hogi
Level XI

Re: How to split a column with unstructured string

thanks @jthi! for Make New Formula Column return references to the newly created columns :
Since JMP18, New Formula Column returns a list of the newly generated column

 

Text to Columns could return as well a list with references to the columns which were generated:
here is the Wish: saving list of cols created by text to column 

Please note:
the workaround with the associative array and remove operation doesn't work here -  it shuffles the columns by sorting then "alphabetically" with 10 <2 - which ruins the stack operation.

 

 

ReliabilityWolf
Level III

Re: How to split a column with unstructured string

this is a wonderful idea to approach.