- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
concatenate direction changes direction when using arabic text
dear members of the community,
i have three columns - one character and two numeric - that i would like to concatenate into a long string as in the picture and the attachment.
for some reason, when the character is written in Arabic the concatenate results is in the wrong order.
"('" || :word || "', " || Char( :abc ) || ", " || Char( :def ) || ")"
this is not good for my application since i export the column as a matrix in the next stage to use in SQL and the whole process fails.
many thanks for your input.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: concatenate direction changes direction when using arabic text
i did write to the support as @txnelson suggested. hope to hear from them soon.
what @Craige_Hales wrote made me think that the error in my case is order dependent so i tried changing the concatenation order as follows and it looks like it works but i am still not sure why and whether it is robust.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: concatenate direction changes direction when using arabic text
Please take a look at the attached table's extra columns and the formulas that create them.
Table with formulas to parse the Right to Left string
The first four columns are the originals. quotes_and_parens uses substitute() to change the apostrophe to a quotation mark and the parens to curly braces. The parse column shows the output of parsing quotes_and_parens. At this point it still looks hopeless, but look at parse 1,2,3: they are extracting items 1, 2, and 3 from the list parse produced, in the order that makes sense.
Your solution, placing the RtoL string last, will work the same and is equally robust and has the advantage of displaying correctly.
The mis-displayed result is because of the Unicode rules for mixing and matching RtoL and LtoR strings. The error only happens when the string is displayed. The internal string is correct, and parsing it is probably fine. If SQL can't parse it, that seems odd and hard to explain, assuming SQL support Unicode otherwise.
I also found this confusing, especially when trying to select text that has portions of the selection reversed from other portions. Once I understood the reversing is being done at presentation time, it made more sense.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: concatenate direction changes direction when using arabic text
Very interesting. I am sure it has to do with the Arabic language writing right to left. Therefore, concatenating something after an Arabic word, would be placing it to the left of the word. I suggest that you go to JMP Support for assistance.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: concatenate direction changes direction when using arabic text
Yes. I don't know the rules, but
x="واتساب ويب"; write("a"||x||",b") /*: aواتساب ويب,b
compare to
x="واتساب ويب"; write("a"||x||",3") /*: aواتساب ويب,3
Character 3 vs character b.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: concatenate direction changes direction when using arabic text
So it appears that the concatenated string (internally) is what you expect, but the presentation is different from what you expect.
I don't have an answer, maybe someone else is familiar with right-to-left expected behaviors.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: concatenate direction changes direction when using arabic text
Not sure. You might not need to change anything. Or you might need to use
https://en.wikipedia.org/wiki/Left-to-right_mark
and maybe https://en.wikipedia.org/wiki/Right-to-left_mark (but I think not.)
For example
write(x); write("\!n"); write("1",x,"\!U200E2"); /*: واتساب ويب 1واتساب ويب2
\!U200E is JMP's representation of the Unicode Left-to-Right mark. You'd be adding that character to the string to make it print the way you expect, but it will mess up string compares later (maybe...who knows without testing...)
I still don't know the rules. Interested to know the answer too!
edit: when I said "you might not have to change anything" I meant some parsers might be perfectly happy with the text, which is probably laid out in exactly the character order you need in memory. I know it looks out of order when displayed, but that might not be how the SQL parser sees it.
more: from https://en.wikipedia.org/wiki/Bidirectional_text --
Unicode bidi support
The Unicode standard calls for characters to be ordered 'logically', i.e. in the sequence they are intended to be interpreted, as opposed to 'visually', the sequence they appear. This distinction is relevant for bidi support because at any bidi transition, the visual presentation ceases to be the 'logical' one. Thus, in order to offer bidi support, Unicode prescribes an algorithm for how to convert the logical sequence of characters into the correct visual presentation. For this purpose, the Unicode encoding standard divides all its characters into one of four types: 'strong', 'weak', 'neutral', and 'explicit formatting'.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: concatenate direction changes direction when using arabic text
i did write to the support as @txnelson suggested. hope to hear from them soon.
what @Craige_Hales wrote made me think that the error in my case is order dependent so i tried changing the concatenation order as follows and it looks like it works but i am still not sure why and whether it is robust.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: concatenate direction changes direction when using arabic text
This issue is persistent also when i attempt to get the column as a list, in this case the string is twisted in a different way.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: concatenate direction changes direction when using arabic text
Please take a look at the attached table's extra columns and the formulas that create them.
Table with formulas to parse the Right to Left string
The first four columns are the originals. quotes_and_parens uses substitute() to change the apostrophe to a quotation mark and the parens to curly braces. The parse column shows the output of parsing quotes_and_parens. At this point it still looks hopeless, but look at parse 1,2,3: they are extracting items 1, 2, and 3 from the list parse produced, in the order that makes sense.
Your solution, placing the RtoL string last, will work the same and is equally robust and has the advantage of displaying correctly.
The mis-displayed result is because of the Unicode rules for mixing and matching RtoL and LtoR strings. The error only happens when the string is displayed. The internal string is correct, and parsing it is probably fine. If SQL can't parse it, that seems odd and hard to explain, assuming SQL support Unicode otherwise.
I also found this confusing, especially when trying to select text that has portions of the selection reversed from other portions. Once I understood the reversing is being done at presentation time, it made more sense.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: concatenate direction changes direction when using arabic text
Many thanks @Craige_Hales
Being NOT WYSIWYG is very counter intuitive. Getting the column as a list was also an attempt to confirm the final string will be parsable into SQL the way needed fo acceptance. Visually, it didn't look like it is going to work but it did!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: concatenate direction changes direction when using arabic text
i have received the following message from JMP support:
beginning of quote "
Hi Ron,
I discussed your issue with the developer responsible for string concatenation. If you look at the Unicode characters that define the concatenated string, they are concatenated in the order you specified. However, there are Unicode Bidirectional Text rules for displaying strings. It is the display of the strings that indicates your unexpected direction. Internally, the string is stored as expected. Here is a short JSL example the developer provided to illustrate.
x = "?"; y = "0"; xy = x || y; Show(xy, Hex(xy, "utf-16be")); // xy = "?0"; // Hex(xy, "utf-16be") = "06280030";
U+0030 is the zero digit and comes second.
I also verified by saving the string as a text file and opening it in a 3rd party text editor that has an ASCII -> HEX converter. A screenshot of the results are below. After converting to Hex, I replaced the space, comma, and parentheses codes with their characters again so it is easier to follow. Notice codes for 0 at the end of the Hex representation as you would expect.
The developer also provided a link to a blog that may help you deal with this in your database query.
" end of quote
Therefore, i think we now have a comprehensive answer which is in line with what @Craige_Hales suggested.