cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
anne_sa
Level VI

Using several characters as delimiter for the Words function

Hello everyone,

I would like to know if it is possible to use a string with several characters as a delimiter for the Words function.

Here is a dummy example:

When I type :

words("Test1eeTest2","ee")

I would like to get:

{"Test1","Test2"}

Instead of:

{"T”,”st1","T”,”st2"}

Is there a way to get that?

Thanks in advance for your help.

Best regards,

1 ACCEPTED SOLUTION

Accepted Solutions
Craige_Hales
Super User

Re: Using several characters as delimiter for the Words function

You could use regex to change the ee to another single character.  If you think @ does not occur in the data,

words(regex("Test1eeTest2eeTest3eeTest4","ee","@",GLOBALREPLACE),"@")

{"Test1", "Test2", "Test3", "Test4"}

If you are not sure about @ but are pretty sure the ASCII Unit Separator character does not occur, you could use

words(regex("Test1eeTest2","ee","\!U001F",GLOBALREPLACE),"\!U001F")


The Unit Separator is unlikely to be in your data:


11451_pastedImage_6.png

the US is really a single character in spite of appearances.  Your font may show it differently, or not at all.

Craige

View solution in original post

8 REPLIES 8
Craige_Hales
Super User

Re: Using several characters as delimiter for the Words function

You could use regex to change the ee to another single character.  If you think @ does not occur in the data,

words(regex("Test1eeTest2eeTest3eeTest4","ee","@",GLOBALREPLACE),"@")

{"Test1", "Test2", "Test3", "Test4"}

If you are not sure about @ but are pretty sure the ASCII Unit Separator character does not occur, you could use

words(regex("Test1eeTest2","ee","\!U001F",GLOBALREPLACE),"\!U001F")


The Unit Separator is unlikely to be in your data:


11451_pastedImage_6.png

the US is really a single character in spite of appearances.  Your font may show it differently, or not at all.

Craige
anne_sa
Level VI

Re: Using several characters as delimiter for the Words function

That's work perfectly !

Thank you so much for your fast answer Craige@JMP.

Jeff_Perkinson
Community Manager Community Manager

Re: Using several characters as delimiter for the Words function

Craige@JMP beat me to the punch with the general technique, but I'll put a plug in for the Substitute() function which I find more readable than the regular expressions. It's up to you which you choose to use.

words(substitute("Test1eeTest2", "ee", "!"),"!")

/*:

{"Test1", "Test2"}

-Jeff
pmroz
Super User

Re: Using several characters as delimiter for the Words function

One advantage of regex is you can make it case-insensitive with the IGNORECASE argument.

words(regex("Test1eeTest2eeTest3EETest4","ee","@", IGNORECASE, GLOBALREPLACE),"@");

{"Test1", "Test2", "Test3", "Test4"}

Craige_Hales
Super User

Re: Using several characters as delimiter for the Words function

Substitute might be a lot better, depending who will maintain the JSL later!

Craige
anne_sa
Level VI

Re: Using several characters as delimiter for the Words function

Thanks to all of you for sharing your advices and points of view!

Except the fact that regex can be case-insentitive, could you please tell me what is the difference between the two functions?

Craige_Hales
Super User

Re: Using several characters as delimiter for the Words function

The two functions barely overlap.  Regex only works with text, and can use very complicated patterns.  Substitute only works with very simple patterns, and can operate on text, lists, and expressions.

Regular Expressions

Character Functions

JSL Character String Functions

Craige
anne_sa
Level VI

Re: Using several characters as delimiter for the Words function

Alright thank you for these details Craige@JMP!