cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
  • Instantly extract effect sizes, F-ratios, and FDR-adjusted p-values from your models with the Calculate Effects Sizes extension, available now in the JMP Marketplace!
  • New to JMP? Join us Sept. 23-24 for the Early User Edition of Discovery Summit, tailor-made for new users. Register now for free!
  • See how to use the JMP Marketplace – Free tools to expand JMP capabilities. Register. July 10, 2 pm US Eastern Time.

Discussions

Solve problems, and share tips and tricks with other JMP users.
Choose Language Hide Translation Bar
lehaofeng
Level V

How to recognize Chinese characters in a string?

Hi,I have a problem on regex,

For example:

regex("零件2305wu","[\u4e00-\u9fa5]+")

I want to extract the "零件" but the output is "2305".

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
lehaofeng
Level V

Re: How to recognize Chinese characters in a string?

It seems to have been found.

 

regex("零件2305wu","[^\x00-\xff]+")

 

View solution in original post

4 REPLIES 4
lehaofeng
Level V

Re: How to recognize Chinese characters in a string?

It seems to have been found.

 

regex("零件2305wu","[^\x00-\xff]+")

 

Craige_Hales
Super User

Re: How to recognize Chinese characters in a string?

Nice! I was going to suggest similar,

regex("零件2305wu","[^\x01-\x7f]+")

The characters from x00 to x7F are ASCII; x00 might make a note in the log, so maybe start at x01. Characters from x80 to xFF are similar to ASCII and you might want to leave them out as well. Outside that range is non-ASCII Unicode.

The [ square brackets ] make a character set, and the leading ^ means not in this set. The minus means a range. The + means one or more.

 

Craige
lala
Level IX

Re: How to recognize Chinese characters in a string?

tx=regex("零件2305wu","([一-﨩]{0,})");
  • Is this possible, but reasonable?

Thanks Experts!

Craige_Hales
Super User

Re: How to recognize Chinese characters in a string?

https://www.google.com/search?q=unicode+%E4%B8%80  (4E00) is smaller than

https://www.google.com/search?q=unicode+%EF%A8%A9  (FA29) so it should be a valid range. It matches about 11/16 of the Unicode characters.

{0,} means zero or more, just like *

 

it appears to work, keeping the first two characters and rejecting the last 6 characters.

Craige

Recommended Articles