<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Count the number of Ts in a sequence in Discussions</title>
    <link>https://community.jmp.com/t5/Discussions/Count-the-number-of-Ts-in-a-sequence/m-p/84789#M37955</link>
    <description>&lt;P&gt;In the spirit of 'other solutions' here's another brute force one:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-jsl"&gt;NamesDefaultToHere(1);

// Given a string and a single character, finds the longest sequence of that character
// and returns the length and starting position of that sequence. If the sequence
// occurs more than once, only the first is identified
findLongestRepeatedCharacter =
Function({str, char}, {Default Local},
	n = Length(str);
	count = 0;
	currentCount = 1;
	// Traverse the string except for the last character
	for (i = 1, i &amp;lt;= n-1, i++,
		thisChar = Substr(str, i, 1);
		nextChar = Substr(str, i+1, 1);
		// If the current character and the next are both 'char' ...
		if((thisChar == char &amp;amp; nextChar == char),
			// ... increment 'currentCount'
			currentCount++,
			// ... else if they're not ...
			if(currentCount &amp;gt; count,
				// ... record 'currentCount' if it's bigger than we've seen so far
				count = currentCount;
				);
			// ... and reset 'currentCount'
			currentCount = 1;
			);
		);
	// Build the sequence we've found
	seq = Repeat(char, count);
	// Find where it occurs
	pos = Munger(str, 1, seq);
	// Return the results
	if (pos == 0,
		EvalList({0, pos}),
		EvalList({count, pos})
		);
	);

// Try it out
str = "(N1:25252525)AACCAA(N1)GACGTTAACAGTTCTTTG";
Print(findLongestRepeatedCharacter(str, "T"));
Print(findLongestRepeatedCharacter(str, "A"));
Print(findLongestRepeatedCharacter(str, "X"));&lt;/CODE&gt;&lt;/PRE&gt;</description>
    <pubDate>Fri, 30 Nov 2018 18:33:17 GMT</pubDate>
    <dc:creator>ian_jmp</dc:creator>
    <dc:date>2018-11-30T18:33:17Z</dc:date>
    <item>
      <title>Count the number of Ts in a sequence</title>
      <link>https://community.jmp.com/t5/Discussions/Count-the-number-of-Ts-in-a-sequence/m-p/84700#M37904</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;
&lt;P&gt;I am trying to find a formula that will count highest number times a letter is repeated consecuteivley in a sequence. I have attached an example where I am trying to write formula for poly Ts column, and it will generate count of Ts in a sequence consectively.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-center" image-alt="Capture-2.PNG" style="width: 640px;"&gt;&lt;img src="https://community.jmp.com/t5/image/serverpage/image-id/14657iF96440EE706B9930/image-size/large?v=v2&amp;amp;px=999" role="button" title="Capture-2.PNG" alt="Capture-2.PNG" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thank you,&lt;/P&gt;
&lt;P&gt;Pratish&lt;/P&gt;</description>
      <pubDate>Fri, 30 Nov 2018 15:46:17 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/Count-the-number-of-Ts-in-a-sequence/m-p/84700#M37904</guid>
      <dc:creator>padhikari</dc:creator>
      <dc:date>2018-11-30T15:46:17Z</dc:date>
    </item>
    <item>
      <title>Re: Count the number of Ts in a sequence</title>
      <link>https://community.jmp.com/t5/Discussions/Count-the-number-of-Ts-in-a-sequence/m-p/84717#M37909</link>
      <description>&lt;P&gt;A regular expression expert might have a nice pattern to scan and find all matches, but that is beyond my REGEX skills.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have provided two solutions that can be done using column formulas. Both might need some explanation.&lt;/P&gt;&lt;P&gt;The first uses nested character functions, the second uses the ShortestEditScript() function. By the way, you did not specify if you are counting T sequences prior to (N1), both use the entire string. The example table is attached and explanations are below&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="image.png" style="width: 999px;"&gt;&lt;img src="https://community.jmp.com/t5/image/serverpage/image-id/14647iA0FA129AA905D83C/image-size/large?v=v2&amp;amp;px=999" role="button" title="image.png" alt="image.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Assume&amp;nbsp; &amp;nbsp; s2 = "(N1:25252525)AACCAA(N1)GACGTTAACAGTTCTTTG";&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Character functions: Words(), Sort List(), Reverse(), list[n], Length()&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-jsl"&gt;Length(Reverse(Sort List(words(s2,"ACGN():0123456789")))[1]);&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Here is the log output for the respective functions&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-jsl"&gt;//:*/
words(s2,"ACGN():0123456789")
/*:
{"TT", "TT", "TTT"}
//:*/
Reverse(Sort List(words(s2,"ACGN():0123456789")))
/*:
{"TTT", "TT", "TT"}
//:*/
Reverse(Sort List(words(s2,"ACGN():0123456789")))[1]
/*:
"TTT"
//:*/
Length(Reverse(Sort List(words(s2,"ACGN():0123456789")))[1]);
/*:
3&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;ShortestEditScript() is an interesting function. Script&amp;nbsp;&lt;STRONG&gt;9_Extra_ShortestEditDistance.jsl&lt;/STRONG&gt;&amp;nbsp;written for &lt;U&gt;JSL Companion, Applications of the JMP Scripting Language Second Edition&lt;/U&gt; document 4 different methods for using this powerful and useful function.&amp;nbsp; For this example, I am using Sequnces() and requesting matrix output. It would take too much space to document this completely, in this forum, so I will just show the results and add a few comments.&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-jsl"&gt; msed = Shortest Edit Script( Strings(s2, Repeat("T",length(s2)),matrix(1)) );
 maximum(msed[loc(msed[0,1]==0),4]);&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;The two strings being compared are s2 and a string of all T's created by function Repeat("T", length(s2) ).&amp;nbsp; &amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-jsl"&gt; msed = Shortest Edit Script( Strings(s2, Repeat("T",length(s2)),matrix(1)) );

/*:
[-1 1 . 27,
  0 28 1 2,
 -1 30 . 5,
  0 35 3 2,
 -1 37 . 1,
  0 38 5 3,
 -1 41 . 1,
  1 . 8 34]
  
/* The matrix output  n x 4 where n = nrow(msed)
Column1:  -1 | 1| 0  -1--&amp;gt;remove, 1--&amp;gt;insert, 0--&amp;gt;common
Column2:  position in the 1st string .--&amp;gt;missing / not found
Column3:  position in the 2nd string .--&amp;gt;missing / not found
Column4:  length
*/&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;So now it is a matter of finding the locations in the 1st column of the matrix with 0's (matches/common/T's) this can be done with the loc() function. Then the length of the matching sequence is in the 4th column, so just find the maximum.&amp;nbsp; Note msed[0,1] represents the 1st column of the matrix msed.&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-jsl"&gt;loc(msed[0,1]==0)&lt;BR /&gt;/*:&lt;BR /&gt;[2, 4, 6] &lt;BR /&gt;//:*/&lt;BR /&gt;msed[loc(msed[0,1]==0),4]&lt;BR /&gt;/*:&lt;BR /&gt;[2, 2, 3]&lt;BR /&gt;//:*/&lt;BR /&gt; maximum(msed[loc(msed[0,1]==0),4]);&lt;BR /&gt;/*:&lt;BR /&gt;3&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;It will be interesting to see other solutions.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 29 Nov 2018 09:51:08 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/Count-the-number-of-Ts-in-a-sequence/m-p/84717#M37909</guid>
      <dc:creator>gzmorgan0</dc:creator>
      <dc:date>2018-11-29T09:51:08Z</dc:date>
    </item>
    <item>
      <title>Re: Count the number of Ts in a sequence</title>
      <link>https://community.jmp.com/t5/Discussions/Count-the-number-of-Ts-in-a-sequence/m-p/84722#M37912</link>
      <description>&lt;P&gt;Here's a simple brute force approach; not sure of the performance relative to &lt;a href="https://community.jmp.com/t5/user/viewprofilepage/user-id/70"&gt;@gzmorgan0&lt;/a&gt;'s methods.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-jsl"&gt;s2 = "(N1:25252525)AACCAA(N1)GACGTTAACAGTTCTTTG";
len = length(s2);
tstring = repeat("T", len);
maxlen  = 0;
for (i = len, i &amp;gt;= 1, i--,
	if (contains(s2, tstring),
		maxlen = i;
		break();
		,
		tstring = substr(tstring, 2);
	);
);
show(maxlen);&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Thu, 29 Nov 2018 13:46:55 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/Count-the-number-of-Ts-in-a-sequence/m-p/84722#M37912</guid>
      <dc:creator>pmroz</dc:creator>
      <dc:date>2018-11-29T13:46:55Z</dc:date>
    </item>
    <item>
      <title>Re: Count the number of Ts in a sequence</title>
      <link>https://community.jmp.com/t5/Discussions/Count-the-number-of-Ts-in-a-sequence/m-p/84739#M37925</link>
      <description>&lt;P&gt;Table formula using ShortestEditScript() used s2 in a portion of the formula that should have been Sequence. Table with corrected function is attached.&lt;/P&gt;</description>
      <pubDate>Thu, 29 Nov 2018 18:27:54 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/Count-the-number-of-Ts-in-a-sequence/m-p/84739#M37925</guid>
      <dc:creator>gzmorgan0</dc:creator>
      <dc:date>2018-11-29T18:27:54Z</dc:date>
    </item>
    <item>
      <title>Re: Count the number of Ts in a sequence</title>
      <link>https://community.jmp.com/t5/Discussions/Count-the-number-of-Ts-in-a-sequence/m-p/84789#M37955</link>
      <description>&lt;P&gt;In the spirit of 'other solutions' here's another brute force one:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-jsl"&gt;NamesDefaultToHere(1);

// Given a string and a single character, finds the longest sequence of that character
// and returns the length and starting position of that sequence. If the sequence
// occurs more than once, only the first is identified
findLongestRepeatedCharacter =
Function({str, char}, {Default Local},
	n = Length(str);
	count = 0;
	currentCount = 1;
	// Traverse the string except for the last character
	for (i = 1, i &amp;lt;= n-1, i++,
		thisChar = Substr(str, i, 1);
		nextChar = Substr(str, i+1, 1);
		// If the current character and the next are both 'char' ...
		if((thisChar == char &amp;amp; nextChar == char),
			// ... increment 'currentCount'
			currentCount++,
			// ... else if they're not ...
			if(currentCount &amp;gt; count,
				// ... record 'currentCount' if it's bigger than we've seen so far
				count = currentCount;
				);
			// ... and reset 'currentCount'
			currentCount = 1;
			);
		);
	// Build the sequence we've found
	seq = Repeat(char, count);
	// Find where it occurs
	pos = Munger(str, 1, seq);
	// Return the results
	if (pos == 0,
		EvalList({0, pos}),
		EvalList({count, pos})
		);
	);

// Try it out
str = "(N1:25252525)AACCAA(N1)GACGTTAACAGTTCTTTG";
Print(findLongestRepeatedCharacter(str, "T"));
Print(findLongestRepeatedCharacter(str, "A"));
Print(findLongestRepeatedCharacter(str, "X"));&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 30 Nov 2018 18:33:17 GMT</pubDate>
      <guid>https://community.jmp.com/t5/Discussions/Count-the-number-of-Ts-in-a-sequence/m-p/84789#M37955</guid>
      <dc:creator>ian_jmp</dc:creator>
      <dc:date>2018-11-30T18:33:17Z</dc:date>
    </item>
  </channel>
</rss>

