<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How do I identify words (using formula in column) that have more than 2 characters different from my reference word? in Discussions</title>
    <link>https://community.jmp.com/t5/Discussions/How-do-I-identify-words-using-formula-in-column-that-have-more/m-p/459519#M70467</link>
    <description>&lt;P&gt;Thank you for extra info! Check out Shortest Edit Script from Scripting index:&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="jthi_0-1644415162212.png" style="width: 400px;"&gt;&lt;img src="https://community.jmp.com/t5/image/serverpage/image-id/39807i577783639FCF3959/image-size/medium?v=v2&amp;amp;px=400" role="button" title="jthi_0-1644415162212.png" alt="jthi_0-1644415162212.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;With that you might be able to do something, unless it is too slow. Pat Match could also maybe do this, but I haven't (yet) looked into how to use it properly.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-jsl"&gt;editList = Shortest Edit Script(:List of sequences, "FEBRUARY");
common = "";
For(i = 1, i &amp;lt;= N Items(editList), i++,
	If(editList[i][1] == "Common",
		common = common || editList[i][2]
	)
);
Length(common) &amp;lt; Length(:List of sequences) - 2;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="jthi_1-1644415221426.png" style="width: 400px;"&gt;&lt;img src="https://community.jmp.com/t5/image/serverpage/image-id/39808iDB1B5882DA8686B8/image-size/medium?v=v2&amp;amp;px=400" role="button" title="jthi_1-1644415221426.png" alt="jthi_1-1644415221426.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 09 Feb 2022 14:00:31 GMT</pubDate>
    <dc:creator>jthi</dc:creator>
    <dc:date>2022-02-09T14:00:31Z</dc:date>
    <item>
      <title>How do I identify words (using formula in column) that have more than 2 characters different from my reference word?</title>
      <link>https://community.jmp.com/t5/Discussions/How-do-I-identify-words-using-formula-in-column-that-have-more/m-p/459463#M70457</link>
      <description>&lt;P&gt;Hi Everyone,&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have a column composed by words that have combination of different characters (ie. AAAAAAAAAA - word with 10 characters). I want to identify in an adjacent column the words that have more than 2 characters different (ie. A&lt;STRONG&gt;B&lt;/STRONG&gt;AAAA&lt;STRONG&gt;CE&lt;/STRONG&gt;AA). The position at which the letters are different doesn't matter and additionally,&amp;nbsp;some words can also be shorter or longer than my reference word.&lt;/P&gt;&lt;P&gt;Therefore, I want to achieve a formula that would allow me to distinguish/identify words that have less/more than 2 different characters difference from my reference sequence and have the same length. I'm still a beginner in JMP and don't know if this is possible to achieve. Let me know if you have any suggestions :)&lt;/img&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 10 Jun 2023 23:43:55 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/How-do-I-identify-words-using-formula-in-column-that-have-more/m-p/459463#M70457</guid>
      <dc:creator>SaraHorta23</dc:creator>
      <dc:date>2023-06-10T23:43:55Z</dc:date>
    </item>
    <item>
      <title>Re: How do I identify words (using formula in column) that have more than 2 characters different from my reference word?</title>
      <link>https://community.jmp.com/t5/Discussions/How-do-I-identify-words-using-formula-in-column-that-have-more/m-p/459484#M70458</link>
      <description>&lt;P&gt;Take a look at the attached table, which should give you a start. The heart of it is a Formula Column (called 'Decision' and denoted by the '+' sign in the columns panel of the table). Clicking on the '+' gives:&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Screenshot 2022-02-09 at 10.15.32.png" style="width: 400px;"&gt;&lt;img src="https://community.jmp.com/t5/image/serverpage/image-id/39802i023117DBA2CD851A/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Screenshot 2022-02-09 at 10.15.32.png" alt="Screenshot 2022-02-09 at 10.15.32.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;You need some knowledge of JSL to understand this, and can use 'Help &amp;gt; Scripting Index' to figure out exactly how it works if you need to.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 09 Feb 2022 10:19:47 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/How-do-I-identify-words-using-formula-in-column-that-have-more/m-p/459484#M70458</guid>
      <dc:creator>ian_jmp</dc:creator>
      <dc:date>2022-02-09T10:19:47Z</dc:date>
    </item>
    <item>
      <title>Re: How do I identify words (using formula in column) that have more than 2 characters different from my reference word?</title>
      <link>https://community.jmp.com/t5/Discussions/How-do-I-identify-words-using-formula-in-column-that-have-more/m-p/459485#M70459</link>
      <description>&lt;P&gt;Many ways to do this. Here is one additional option:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-jsl"&gt;Names Default To Here(1);

/*
a = "ABAAAACEAA";
b = "AAAAAAAAAA";
Show(!IsMissing(Substitute(a, Left(a,1), "")));
Show(!IsMissing(Substitute(b, Left(b,1), "")));
*/
New Table("Untitled 2",
	Add Rows(3),
	New Column("Column 1", Character, "Nominal", Set Values({"ABAAAACEAA", "AAAAAAAAAA", "BBBBAAA"})),
	New Column("Column 2",
		Numeric,
		"Continuous",
		Format("Best", 12),
		Formula(!Is Missing(Substitute(:Column 1, Left(:Column 1, 1), "")))
	)
);&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Column1 is the column with characters&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-jsl"&gt;Formula(!Is Missing(Substitute(:Column 1, Left(:Column 1, 1), "")))&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;in this simple example idea is to replace all values in the string found from first position and if there are still some characters left, we know that there were more than 1 character in the string. If you have more than one value to replace, you should be able to use same idea by modifying Substitute function arguments&lt;/P&gt;</description>
      <pubDate>Wed, 09 Feb 2022 10:37:56 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/How-do-I-identify-words-using-formula-in-column-that-have-more/m-p/459485#M70459</guid>
      <dc:creator>jthi</dc:creator>
      <dc:date>2022-02-09T10:37:56Z</dc:date>
    </item>
    <item>
      <title>Re: How do I identify words (using formula in column) that have more than 2 characters different from my reference word?</title>
      <link>https://community.jmp.com/t5/Discussions/How-do-I-identify-words-using-formula-in-column-that-have-more/m-p/459487#M70461</link>
      <description>&lt;P&gt;Hi Ian,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks for the feedback. I tried to implement the formula you wrote but I don't get where can I define that my reference sequence is AAAAAAA and not something else. I replaced the "" in the items selection by "AAAAAAAA" but the result was not correct. Any suggestion how can I upgrade the formula you wrote?&lt;/P&gt;</description>
      <pubDate>Wed, 09 Feb 2022 11:26:04 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/How-do-I-identify-words-using-formula-in-column-that-have-more/m-p/459487#M70461</guid>
      <dc:creator>SaraHorta23</dc:creator>
      <dc:date>2022-02-09T11:26:04Z</dc:date>
    </item>
    <item>
      <title>Re: How do I identify words (using formula in column) that have more than 2 characters different from my reference word?</title>
      <link>https://community.jmp.com/t5/Discussions/How-do-I-identify-words-using-formula-in-column-that-have-more/m-p/459488#M70462</link>
      <description>&lt;P&gt;Hi Jthi, thanks for the suggestion.&lt;/P&gt;&lt;P&gt;To be honest I'm still trying to understand the formula :'). In the file I'm working on, I have only one reference word and I have to screen more than a million row entries with every possible combination of position/letter. I assume that this formula would not work because I cannot define those values in the formula. Would it be possible to adapt it considering a large database Ive?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 09 Feb 2022 11:36:43 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/How-do-I-identify-words-using-formula-in-column-that-have-more/m-p/459488#M70462</guid>
      <dc:creator>SaraHorta23</dc:creator>
      <dc:date>2022-02-09T11:36:43Z</dc:date>
    </item>
    <item>
      <title>Re: How do I identify words (using formula in column) that have more than 2 characters different from my reference word?</title>
      <link>https://community.jmp.com/t5/Discussions/How-do-I-identify-words-using-formula-in-column-that-have-more/m-p/459489#M70463</link>
      <description>&lt;P&gt;Looks like I misunderstood what you meant by "&lt;SPAN&gt;I want to identify in an adjacent column the words that have more than 2 characters different&lt;/SPAN&gt;". What I suggested would flag rows where each cell in the source column contains a string that is itself made up of more than two distinct characters (so there is no reference sequence or string).&lt;/P&gt;</description>
      <pubDate>Wed, 09 Feb 2022 12:46:33 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/How-do-I-identify-words-using-formula-in-column-that-have-more/m-p/459489#M70463</guid>
      <dc:creator>ian_jmp</dc:creator>
      <dc:date>2022-02-09T12:46:33Z</dc:date>
    </item>
    <item>
      <title>Re: How do I identify words (using formula in column) that have more than 2 characters different from my reference word?</title>
      <link>https://community.jmp.com/t5/Discussions/How-do-I-identify-words-using-formula-in-column-that-have-more/m-p/459490#M70464</link>
      <description>&lt;P&gt;We need some more examples. What is the number of differences for (don't over-specify, indicate where it isn't important.)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;ref = "AEIOUA" // trying to understand the rule for the repeated letter&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;test = "AEIOUA" // nDiff = 0 for identical string&lt;/P&gt;
&lt;P&gt;test = "AAEIOU" // nDiff = ? for same letters, reordered&lt;/P&gt;
&lt;P&gt;test = "AEIOU" // nDiff = ? for same letters, missing one on right&lt;/P&gt;
&lt;P&gt;test = "EIOUA" // nDiff = ? for same letters, missing one on left&lt;/P&gt;
&lt;P&gt;test = "EIOU" // nDiff = ? &lt;/P&gt;
&lt;P&gt;test = "AA" // dDiff = ?&lt;/P&gt;
&lt;P&gt;test = "AXA" // dDiff = ?&lt;/P&gt;
&lt;P&gt;test = "AXXXXA" // dDiff = ? for matching length&lt;/P&gt;
&lt;P&gt;test = "AEIUOA" // dDiff = ? for internal letters swapped&lt;/P&gt;
&lt;P&gt;test = "AEIOUEIOU" // nDiff = ? for repeated letters&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;These answers would help choose an algorithm:&lt;/P&gt;
&lt;P&gt;You said ~1e6 test words; how many ref words are you looking at?&lt;/P&gt;
&lt;P&gt;Will you need answers other than a "yes/no there are more than 2 differences"? Is a count of diffs going to be useful?&lt;/P&gt;</description>
      <pubDate>Wed, 09 Feb 2022 12:56:40 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/How-do-I-identify-words-using-formula-in-column-that-have-more/m-p/459490#M70464</guid>
      <dc:creator>Craige_Hales</dc:creator>
      <dc:date>2022-02-09T12:56:40Z</dc:date>
    </item>
    <item>
      <title>Re: How do I identify words (using formula in column) that have more than 2 characters different from my reference word?</title>
      <link>https://community.jmp.com/t5/Discussions/How-do-I-identify-words-using-formula-in-column-that-have-more/m-p/459492#M70465</link>
      <description>&lt;P&gt;Hi Craige,&lt;/P&gt;&lt;P&gt;Thanks for helping out. I have 1 to 5e6 and only one reference word. I basically want to filter out words that have more than 2 letter different comparing to a reference word and therefore&amp;nbsp;I do not require more than yes/no or 0/1 as answer. Going through your questions:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;test = "AEIOUA" // nDiff = 0 for identical string&lt;/P&gt;&lt;P&gt;test = "AAEIOU" // nDiff = 1 - for same letters, reordered.&lt;/P&gt;&lt;P&gt;- The first A is only one kept at the same position. (5 diff letters)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;test = "AEIOU" // nDiff = 1 for same letters, missing one on right&amp;nbsp;&lt;/P&gt;&lt;P&gt;test = "EIOUA" // nDiff = 1 for same letters, missing one on left&lt;/P&gt;&lt;P&gt;test = "EIOU" // nDiff = 1&lt;/P&gt;&lt;P&gt;test = "AA" // dDiff = 1&lt;/P&gt;&lt;P&gt;test = "AXA" // dDiff = 1&lt;/P&gt;&lt;P&gt;- Length is not same as&amp;nbsp;AEIOUA. For this I applied a function Len to filter those words out.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;test = "AXXXXA" // dDiff = 1 for matching length&lt;/P&gt;&lt;P&gt;- The first and last A are kept at the same position. (4 diff letters)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;test = "AEI&lt;STRONG&gt;UO&lt;/STRONG&gt;A" // dDiff = 0 for internal letters swapped&lt;/P&gt;&lt;P&gt;Only 2 letters different from AEIOUA, so it will pass the criteria.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;test = "AEIOUEIOU" // nDiff = 1 for repeated letters&lt;/P&gt;&lt;P&gt;Long sequence. Removed using Len.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Sorry, I was not clear when explaining my question/problem. I recreated an example of what i'm looking for in addition to your questions.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;Reference sequence&lt;/TD&gt;&lt;TD&gt;FEBRUARY&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;List of sequences&lt;/TD&gt;&lt;TD&gt;Label - more than 2 characters different from reference?&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;FEB&lt;STRONG&gt;A&lt;/STRONG&gt;UARY&lt;/TD&gt;&lt;TD&gt;No&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;STRONG&gt;WAA&lt;/STRONG&gt;RUARY&lt;/TD&gt;&lt;TD&gt;Yes&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;STRONG&gt;YRAURBEF&lt;/STRONG&gt;&lt;/TD&gt;&lt;TD&gt;Yes&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;FEBRU&lt;STRONG&gt;R&lt;/STRONG&gt;R&lt;STRONG&gt;R&lt;/STRONG&gt;&lt;/TD&gt;&lt;TD&gt;No&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;FEBRUAR&lt;STRONG&gt;W&lt;/STRONG&gt;&lt;/TD&gt;&lt;TD&gt;No&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;STRONG&gt;Y&lt;/STRONG&gt;E&lt;STRONG&gt;AURBEF&lt;/STRONG&gt;&lt;/TD&gt;&lt;TD&gt;Yes&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;&lt;STRONG&gt;Y&lt;/STRONG&gt;E&lt;STRONG&gt;AUR&lt;/STRONG&gt;ARY&lt;/TD&gt;&lt;TD&gt;Yes&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;</description>
      <pubDate>Wed, 09 Feb 2022 13:23:14 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/How-do-I-identify-words-using-formula-in-column-that-have-more/m-p/459492#M70465</guid>
      <dc:creator>SaraHorta23</dc:creator>
      <dc:date>2022-02-09T13:23:14Z</dc:date>
    </item>
    <item>
      <title>Re: How do I identify words (using formula in column) that have more than 2 characters different from my reference word?</title>
      <link>https://community.jmp.com/t5/Discussions/How-do-I-identify-words-using-formula-in-column-that-have-more/m-p/459519#M70467</link>
      <description>&lt;P&gt;Thank you for extra info! Check out Shortest Edit Script from Scripting index:&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="jthi_0-1644415162212.png" style="width: 400px;"&gt;&lt;img src="https://community.jmp.com/t5/image/serverpage/image-id/39807i577783639FCF3959/image-size/medium?v=v2&amp;amp;px=400" role="button" title="jthi_0-1644415162212.png" alt="jthi_0-1644415162212.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;With that you might be able to do something, unless it is too slow. Pat Match could also maybe do this, but I haven't (yet) looked into how to use it properly.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-jsl"&gt;editList = Shortest Edit Script(:List of sequences, "FEBRUARY");
common = "";
For(i = 1, i &amp;lt;= N Items(editList), i++,
	If(editList[i][1] == "Common",
		common = common || editList[i][2]
	)
);
Length(common) &amp;lt; Length(:List of sequences) - 2;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="jthi_1-1644415221426.png" style="width: 400px;"&gt;&lt;img src="https://community.jmp.com/t5/image/serverpage/image-id/39808iDB1B5882DA8686B8/image-size/medium?v=v2&amp;amp;px=400" role="button" title="jthi_1-1644415221426.png" alt="jthi_1-1644415221426.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 09 Feb 2022 14:00:31 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/How-do-I-identify-words-using-formula-in-column-that-have-more/m-p/459519#M70467</guid>
      <dc:creator>jthi</dc:creator>
      <dc:date>2022-02-09T14:00:31Z</dc:date>
    </item>
    <item>
      <title>Re: How do I identify words (using formula in column) that have more than 2 characters different from my reference word?</title>
      <link>https://community.jmp.com/t5/Discussions/How-do-I-identify-words-using-formula-in-column-that-have-more/m-p/459521#M70469</link>
      <description>&lt;P&gt;Thanks Jthi! Works nicely ;)&lt;/img&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 09 Feb 2022 14:35:54 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/How-do-I-identify-words-using-formula-in-column-that-have-more/m-p/459521#M70469</guid>
      <dc:creator>SaraHorta23</dc:creator>
      <dc:date>2022-02-09T14:35:54Z</dc:date>
    </item>
  </channel>
</rss>

