cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
hardner
Level VI

best practices around special characters in JSL?

example="blah blah Width = 25.0 �m ..some more stuff";

widthtext=regex(example,"Width\s=\s+\d+[.]\d+\s[\w?�]m");

widthtext=regex(example,"Width\s=\s+\d+[.]\d+\s[\w?\x{FFFD}]m");

 Sorry this is less a very specific problem that a general issue I'm looking for recommendations on as I've had this issue in different contexts.   

 

In this specific example I'm using loadtextfile() to get some text that I need to parse and that text can have special characters.  here I'm using the variable "example" to show what that looks like to JMP -it has replaced something with a "mystery" symbol. (in case it doesn't continue to show up here it's a black diamond with a question mark in it.) I know that symbol is unicode FFFD (and I know in this case that originally it was a mu, for microns though not sure what particular encoding it had that JMP couldn't work out).  I know there are options in the preferences for the charset to use when loading the text file so maybe I could trial and error until this really shows up as a mu.  But since my script is looking through a lot of different files I'm not sure I can set a preference that will not have this issue of failing to work out a character and in any case my issue is really with how to talk about any special character in my script, not with having "mystery" instead of mu.

 

 In the example above I pasted in the character as it was shown in the JMP log in order to have it in JSL (second line of code RegEx).   That works.   contains() works too with the special doodad pasted into it as an argument.  But when I save the JSL and reopen it the special character is replaced with ? and the code using the pasted symbol no longer works.  seemingly any special characters in JSL are at risk of getting lost.

 

My third line is fixing that for RegEx.  I can call out the character by number so it will run after opening  - on a case where instead of hardcoded example I open a text file and have those symbols in there. But what if I want to do any other string things?  so far I converted a contains() to a regex() but are there any other ways to use symbols in JSL?

 

I've had similar issues around greek letters JMP uses in its reports (like a sigma).  I want to write code that tests for some text in the report but if I paste in the symbol the JSL will get replaced.  These are symbols JMP is choosing and using and yet I can't use them in my JSL robustly.  

1 ACCEPTED SOLUTION

Accepted Solutions
Craige_Hales
Super User

Re: best practices around special characters in JSL?

The diamond-question mark is the Unicode replacement character. 

JMP prefers Unicode, but can guess other character sets with varying degrees of success. 

JMP also may try to preserve the character set of a file when it is re-saved, but if the file had byte sequences that are not valid in the input character set, they become the replacement character, and it will become the question mark in a non Unicode character set conversion. (I think some variation of this is what happened in your example.)

It is possible you could select the right character set (preferences), and it is also possible the file is using mu from a non-Unicode character set while claiming to be a Unicode file with a Byte Order Mark.

There are a number of web sites that make it fairly easy to get the Unicode hex data; I've used Windows Character Map here:

Unicode escapeUnicode escape

(edit) If you open the file again, I'd suggest trying windows 1252 as a starting point. I think Load Text File has options for specifying the character set without using the preferences.

Set it back to Best Guess when doneSet it back to Best Guess when done

Craige

View solution in original post

4 REPLIES 4
Craige_Hales
Super User

Re: best practices around special characters in JSL?

The diamond-question mark is the Unicode replacement character. 

JMP prefers Unicode, but can guess other character sets with varying degrees of success. 

JMP also may try to preserve the character set of a file when it is re-saved, but if the file had byte sequences that are not valid in the input character set, they become the replacement character, and it will become the question mark in a non Unicode character set conversion. (I think some variation of this is what happened in your example.)

It is possible you could select the right character set (preferences), and it is also possible the file is using mu from a non-Unicode character set while claiming to be a Unicode file with a Byte Order Mark.

There are a number of web sites that make it fairly easy to get the Unicode hex data; I've used Windows Character Map here:

Unicode escapeUnicode escape

(edit) If you open the file again, I'd suggest trying windows 1252 as a starting point. I think Load Text File has options for specifying the character set without using the preferences.

Set it back to Best Guess when doneSet it back to Best Guess when done

Craige
hardner
Level VI

Re: best practices around special characters in JSL?

Thanks!  the bit I was missing was how to talk about that in JSL outside of regex in a way that wouldn't get replaced in the JSL file and I see from your example this is how...

 

"\!UFFFD"

 

That raises a related question though about what the character is and it's not so much that I want JMP to find the right character set to open the file (whole other issue) but that I want to talk about the character as used by JMP in JSL and not have the JSL corrupted on closing and open the file.

 

  In this case I looked up the character pretty easily and found a code that identified it and it worked in JMP. Say I have the character in JMP in the log (maybe JMP did find and report  a mu character or say it's the sigma JMP itself uses in a report -  for code that will parse some text I only care what JMP is using as the characters in that text )... is there any direct way to get a code like "UFFFD" for the character JMP is actually using?    something like what_is_this_character("�").  When I keep having this issue I generally can see and copy and paste the character as used by JMP and that's the context in which I want to talk about it robustly in JSL.

 

Thanks!

 

 

 

Craige_Hales
Super User

Re: best practices around special characters in JSL?

Yes

 

x = "\!U266b";
Show( x ); // "♫"
y = Char To Blob( x, encoding = "utf-16be" );
Show( Hex( y ) ); // "266B"

 

if you are seeing the Unicode replacement character (the question mark in a black diamond) then you may also have other characters that are being mapped to the replacement character. If possible, find the right character set to avoid the many-to-one problem.

 

There may be a more clever way than the snippet above. The CharToBlob function converts the character to a 16-bit big-endian Unicode representation, and the Hex function converts the blob to printable hex.

 

If you don't specify the encoding charset for CharToBlob, it will default to UTF-8, and for this particular character, you'll get a 3-byte value, "E299AB" fileformat.info page for BEAMED EIGHTH NOTES

Craige
hardner
Level VI

Re: best practices around special characters in JSL?

Thanks!