Subscribe Bookmark RSS Feed

Using several characters as delimiter for the Words function

anne_sa

Community Trekker

Joined:

Feb 24, 2016

Hello everyone,

I would like to know if it is possible to use a string with several characters as a delimiter for the Words function.

Here is a dummy example:

When I type :

words("Test1eeTest2","ee")

I would like to get:

{"Test1","Test2"}

Instead of:

{"T”,”st1","T”,”st2"}

Is there a way to get that?

Thanks in advance for your help.

Best regards,

1 ACCEPTED SOLUTION

Accepted Solutions
Solution

You could use regex to change the ee to another single character.  If you think @ does not occur in the data,

words(regex("Test1eeTest2eeTest3eeTest4","ee","@",GLOBALREPLACE),"@")

{"Test1", "Test2", "Test3", "Test4"}

If you are not sure about @ but are pretty sure the ASCII Unit Separator character does not occur, you could use

words(regex("Test1eeTest2","ee","\!U001F",GLOBALREPLACE),"\!U001F")


The Unit Separator is unlikely to be in your data:


11451_pastedImage_6.png

the US is really a single character in spite of appearances.  Your font may show it differently, or not at all.

Craige
8 REPLIES
Solution

You could use regex to change the ee to another single character.  If you think @ does not occur in the data,

words(regex("Test1eeTest2eeTest3eeTest4","ee","@",GLOBALREPLACE),"@")

{"Test1", "Test2", "Test3", "Test4"}

If you are not sure about @ but are pretty sure the ASCII Unit Separator character does not occur, you could use

words(regex("Test1eeTest2","ee","\!U001F",GLOBALREPLACE),"\!U001F")


The Unit Separator is unlikely to be in your data:


11451_pastedImage_6.png

the US is really a single character in spite of appearances.  Your font may show it differently, or not at all.

Craige
anne_sa

Community Trekker

Joined:

Feb 24, 2016

That's work perfectly !

Thank you so much for your fast answer Craige@JMP.

Jeff_Perkinson

Community Manager

Joined:

Jun 23, 2011

Craige@JMP beat me to the punch with the general technique, but I'll put a plug in for the Substitute() function which I find more readable than the regular expressions. It's up to you which you choose to use.

words(substitute("Test1eeTest2", "ee", "!"),"!")

/*:

{"Test1", "Test2"}

-Jeff
pmroz

Super User

Joined:

Jun 23, 2011

One advantage of regex is you can make it case-insensitive with the IGNORECASE argument.

words(regex("Test1eeTest2eeTest3EETest4","ee","@", IGNORECASE, GLOBALREPLACE),"@");

{"Test1", "Test2", "Test3", "Test4"}

Craige_Hales

Staff

Joined:

Mar 21, 2013

Substitute might be a lot better, depending who will maintain the JSL later!

Craige
anne_sa

Community Trekker

Joined:

Feb 24, 2016

Thanks to all of you for sharing your advices and points of view!

Except the fact that regex can be case-insentitive, could you please tell me what is the difference between the two functions?

Craige_Hales

Staff

Joined:

Mar 21, 2013

The two functions barely overlap.  Regex only works with text, and can use very complicated patterns.  Substitute only works with very simple patterns, and can operate on text, lists, and expressions.

Regular Expressions

Character Functions

JSL Character String Functions

Craige
anne_sa

Community Trekker

Joined:

Feb 24, 2016

Alright thank you for these details Craige@JMP!