cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar
ron_horne
Super User (Alumni)

concatenate direction changes direction when using arabic text

dear members of the community,
i have three columns - one character and two numeric - that i would like to concatenate into a long string as in the picture and the attachment.
for some reason, when the character is written in Arabic the concatenate results is in the wrong order.

 

"('" || :word || "', " || Char( :abc ) || ", " || Char( :def ) || ")"

ron_horne_0-1692699853894.png

this is not good for my application since i export the column as a matrix in the next stage to use in SQL and the whole process fails.
many thanks for your input.

 

2 ACCEPTED SOLUTIONS

Accepted Solutions
ron_horne
Super User (Alumni)

Re: concatenate direction changes direction when using arabic text

i did write to the support as @txnelson suggested. hope to hear from them soon.
what @Craige_Hales wrote made me think that the error in my case is order dependent so i tried changing the concatenation order as follows and it looks like it works but i am still not sure why and whether it is robust.

ron_horne_0-1692729122814.png

 

 

 

View solution in original post

Craige_Hales
Super User

Re: concatenate direction changes direction when using arabic text

Please take a look at the attached table's extra columns and the formulas that create them.

Table with formulas to parse the Right to Left stringTable with formulas to parse the Right to Left string

The first four columns are the originals. quotes_and_parens uses substitute() to change the apostrophe to a quotation mark and the parens to curly braces. The parse column shows the output of parsing quotes_and_parens. At this point it still looks hopeless, but look at parse 1,2,3: they are extracting items 1, 2, and 3 from the list parse produced, in the order that makes sense.

 

Your solution, placing the RtoL string last, will work the same and is equally robust and has the advantage of displaying correctly.

 

The mis-displayed result is because of the Unicode rules for mixing and matching RtoL and LtoR strings. The error only happens when the string is displayed. The internal string is correct, and parsing it is probably fine. If SQL can't parse it, that seems odd and hard to explain, assuming SQL support Unicode otherwise.

 

I also found this confusing, especially when trying to select text that has portions of the selection reversed from other portions. Once I understood the reversing is being done at presentation time, it made more sense.

Craige

View solution in original post

9 REPLIES 9
txnelson
Super User

Re: concatenate direction changes direction when using arabic text

Very interesting.  I am sure it has to do with the Arabic language writing right to left.  Therefore, concatenating something after an Arabic word, would be placing it to the left of the word.   I suggest that you go to JMP Support for assistance.  

Jim
Craige_Hales
Super User

Re: concatenate direction changes direction when using arabic text

Yes. I don't know the rules, but

x="واتساب ويب";
write("a"||x||",b")
/*:
aواتساب ويب,b

compare to

x="واتساب ويب";
write("a"||x||",3")
/*:
aواتساب ويب,3

Character 3 vs character b.

Craige
Craige_Hales
Super User

Re: concatenate direction changes direction when using arabic text

So it appears that the concatenated string (internally) is what you expect, but the presentation is different from what you expect.

capture.png

I don't have an answer, maybe someone else is familiar with right-to-left expected behaviors.

Craige
Craige_Hales
Super User

Re: concatenate direction changes direction when using arabic text

Not sure. You might not need to change anything. Or you might need to use

https://en.wikipedia.org/wiki/Left-to-right_mark

and maybe https://en.wikipedia.org/wiki/Right-to-left_mark (but I think not.)

For example

write(x);
write("\!n");
write("1",x,"\!U200E2");
/*:
واتساب ويب
1واتساب ويب‎2

\!U200E is JMP's representation of the Unicode Left-to-Right mark. You'd be adding that character to the string to make it print the way you expect, but it will mess up string compares later (maybe...who knows without testing...)

I still don't know the rules. Interested to know the answer too!

 

edit: when I said "you might not have to change anything" I meant some parsers might be perfectly happy with the text, which is probably laid out in exactly the character order you need in memory. I know it looks out of order when displayed, but that might not be how the SQL parser sees it.

 

more: from https://en.wikipedia.org/wiki/Bidirectional_text --

Unicode bidi support

The Unicode standard calls for characters to be ordered 'logically', i.e. in the sequence they are intended to be interpreted, as opposed to 'visually', the sequence they appear. This distinction is relevant for bidi support because at any bidi transition, the visual presentation ceases to be the 'logical' one. Thus, in order to offer bidi support, Unicode prescribes an algorithm for how to convert the logical sequence of characters into the correct visual presentation. For this purpose, the Unicode encoding standard divides all its characters into one of four types: 'strong', 'weak', 'neutral', and 'explicit formatting'.

Craige
ron_horne
Super User (Alumni)

Re: concatenate direction changes direction when using arabic text

i did write to the support as @txnelson suggested. hope to hear from them soon.
what @Craige_Hales wrote made me think that the error in my case is order dependent so i tried changing the concatenation order as follows and it looks like it works but i am still not sure why and whether it is robust.

ron_horne_0-1692729122814.png

 

 

 

ron_horne
Super User (Alumni)

Re: concatenate direction changes direction when using arabic text

This issue is persistent also when i attempt to get the column as a list, in this case the string is twisted in a different way.

ron_horne_0-1692734360644.png

 

 

 

Craige_Hales
Super User

Re: concatenate direction changes direction when using arabic text

Please take a look at the attached table's extra columns and the formulas that create them.

Table with formulas to parse the Right to Left stringTable with formulas to parse the Right to Left string

The first four columns are the originals. quotes_and_parens uses substitute() to change the apostrophe to a quotation mark and the parens to curly braces. The parse column shows the output of parsing quotes_and_parens. At this point it still looks hopeless, but look at parse 1,2,3: they are extracting items 1, 2, and 3 from the list parse produced, in the order that makes sense.

 

Your solution, placing the RtoL string last, will work the same and is equally robust and has the advantage of displaying correctly.

 

The mis-displayed result is because of the Unicode rules for mixing and matching RtoL and LtoR strings. The error only happens when the string is displayed. The internal string is correct, and parsing it is probably fine. If SQL can't parse it, that seems odd and hard to explain, assuming SQL support Unicode otherwise.

 

I also found this confusing, especially when trying to select text that has portions of the selection reversed from other portions. Once I understood the reversing is being done at presentation time, it made more sense.

Craige
ron_horne
Super User (Alumni)

Re: concatenate direction changes direction when using arabic text

Many thanks @Craige_Hales 
Being NOT WYSIWYG is very counter intuitive. Getting the column as a list was also an attempt to confirm the final string will be parsable into SQL the way needed fo acceptance. Visually, it didn't look like it is going to work but it did!

ron_horne
Super User (Alumni)

Re: concatenate direction changes direction when using arabic text

i have received the following message from JMP support:

beginning of quote "


Hi Ron,

I discussed your issue with the developer responsible for string concatenation.  If you look at the Unicode characters that define the concatenated string, they are concatenated in the order you specified.  However, there are Unicode Bidirectional Text rules for displaying strings.  It is the display of the strings that indicates your unexpected direction.  Internally, the string is stored as expected.  Here is a short JSL example the developer provided to illustrate.

x = "?";
y = "0";
xy = x || y;
Show(xy, Hex(xy, "utf-16be"));
// xy = "?0";
// Hex(xy, "utf-16be") = "06280030";

U+0030 is the zero digit and comes second.

I also verified by saving the string as a text file and opening it in a 3rd party text editor that has an ASCII -> HEX converter.  A screenshot of the results are below.  After converting to Hex, I replaced the space, comma, and parentheses codes with their characters again so it is easier to follow.  Notice codes for 0 at the end of the Hex representation as you would expect.

 

ron_horne_1-1693044450573.png


The developer also provided a link to a blog that may help you deal with this in your database query.  

" end of quote
Therefore, i think we now have a comprehensive answer which is in line with what  @Craige_Hales suggested.