cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar
ruskicar
Level III

Find the longest repeating pattern

Hi,

 

Can somebody explain how to find the longest repeating pattern in a column?

For example, let's suppose I have the following table:

 

Screenshot 2023-03-13 at 11.40.35.png

 

The pattern with the largest size is: "1 5 6 8 7",  and it occurs 3 times in this column. Notice that there is another pattern in the table ("1 5 6"), which occurs 5 times, but is not the largest.

I tried with Screening - Explore patterns, and although it return which values are most duplicated, I can't get the exact pattern. Or multiple patterns, with the number of occurrences. 

Maybe there is some option I should enable? Or should I use some other platform? Or if somebody can help me out with a JSL script.

 

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
Victor_G
Super User

Re: Find the longest repeating pattern

Hi @ruskicar,

 

One possible option is to look at the platform "Explore pattern", accessible through Analyze, Screening, Explore pattern as you mention.


If you click on the red triangle next to "Column 1" in the platform window, you have the option to "Select Longest Duplicate Sequences". It will highlight in your datatable 16 rows, corresponding to 2 times the sequence "1 5 6 8 7 1 5 6", which is the longest repeating sequence you have in your dataset, not "1 5 6 8 7".

 

I hope this answer will help you,

 

Victor GUILLER
Scientific Expertise Engineer
L'Oréal - Data & Analytics

View solution in original post

3 REPLIES 3
Victor_G
Super User

Re: Find the longest repeating pattern

Hi @ruskicar,

 

One possible option is to look at the platform "Explore pattern", accessible through Analyze, Screening, Explore pattern as you mention.


If you click on the red triangle next to "Column 1" in the platform window, you have the option to "Select Longest Duplicate Sequences". It will highlight in your datatable 16 rows, corresponding to 2 times the sequence "1 5 6 8 7 1 5 6", which is the longest repeating sequence you have in your dataset, not "1 5 6 8 7".

 

I hope this answer will help you,

 

Victor GUILLER
Scientific Expertise Engineer
L'Oréal - Data & Analytics
Craige_Hales
Super User

Re: Find the longest repeating pattern

@Victor_G  perfect.

 

I wrote this JSL and was unhappy with the answer it provides that includes overlapping duplicates.

// build a sample table
dtElements = New Table( "elements", New Column( "element", Numeric, setvalues( J( 20, 1, Random Integer( 1, 3 ) ) ) ) );
nr = N Rows( dtElements );

// build key table
maxKeyLen = 12; // too big will be slow, too small will miss something
dtKeys = New Table( "keys", addrows( nr ), New Column( "key", expression, Set Display Width( 239 ) ) );

For( irow = 1, irow <= nr, irow += 1,
	maxrow = Min( nr, irow + maxKeyLen );
	dtKeys:key[irow] = dtElements:element[irow :: maxrow];
);

// sort key table to put similar together
dtKeys << Sort( By( :key ), Replace Table, Order( Ascending ) );

// get the match length of adjacent rows
// ===========================================
// this code finds two runs of 1,1,1 in 1,1,1,1  <<<<<<<<<< IMPORTANT <<<<<<<<<<<
// ===========================================
dtKeys << New Column( "matchLength" );
For( irow = 2, irow <= nr, irow += 1,
	a = dtKeys:key[irow];
	b = dtKeys:key[irow - 1];
	result = Min( N Items( a ), N Items( b ) );
	For Each( {{aa, bb}, i}, across( a, b, Count( "Shortest" ) ),
		If( aa != bb,
			result = i - 1;
			Break();
		)
	);
	dtKeys:matchLength[irow] = result;
);

dtKeys << Sort( By( :matchLength ), Replace Table, Order( Descending ) );

Show( dtKeys:key[1][1 :: dtKeys:matchLength[1]] );

The JSL uses an Expression column, ForEach( Across, Count ) and handles both numeric and character input. But I don't think it will quite do what @ruskicar  wants.

 

The idea is to create a new column of values built from the next N rows. Sorting that column puts runs of similar starting patterns adjacent.

Craige
ruskicar
Level III

Re: Find the longest repeating pattern

Yes, thanks, this is what I was looking for. The example was a bit lousy I see, but the answer is correct :). Too many red triangles, so I didn't spot that option :).