cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
lehaofeng
Level V

How to recognize Chinese characters in a string?

Hi,I have a problem on regex,

For example:

regex("零件2305wu","[\u4e00-\u9fa5]+")

I want to extract the "零件" but the output is "2305".

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
lehaofeng
Level V

Re: How to recognize Chinese characters in a string?

It seems to have been found.

 

regex("零件2305wu","[^\x00-\xff]+")

 

View solution in original post

4 REPLIES 4
lehaofeng
Level V

Re: How to recognize Chinese characters in a string?

It seems to have been found.

 

regex("零件2305wu","[^\x00-\xff]+")

 

Craige_Hales
Super User

Re: How to recognize Chinese characters in a string?

Nice! I was going to suggest similar,

regex("零件2305wu","[^\x01-\x7f]+")

The characters from x00 to x7F are ASCII; x00 might make a note in the log, so maybe start at x01. Characters from x80 to xFF are similar to ASCII and you might want to leave them out as well. Outside that range is non-ASCII Unicode.

The [ square brackets ] make a character set, and the leading ^ means not in this set. The minus means a range. The + means one or more.

 

Craige
lala
Level VIII

Re: How to recognize Chinese characters in a string?

tx=regex("零件2305wu","([一-﨩]{0,})");
  • Is this possible, but reasonable?

Thanks Experts!

Craige_Hales
Super User

Re: How to recognize Chinese characters in a string?

https://www.google.com/search?q=unicode+%E4%B8%80  (4E00) is smaller than

https://www.google.com/search?q=unicode+%EF%A8%A9  (FA29) so it should be a valid range. It matches about 11/16 of the Unicode characters.

{0,} means zero or more, just like *

 

it appears to work, keeping the first two characters and rejecting the last 6 characters.

Craige