I need some help writing a script to find the most common value across several matrices. I have a data table with hundreds of rows. Each row has a sequence column (eg ABCDABCD) and I have created a column to show the positions of a certain letter within that sequence (eg The location of A is [1,5,0,0]).
(I am fairly new at writing scripts so right now my matrices have 0 listed when there are no more positions containing that letter. If there is an obvious, more efficient way to find multiple positions within a string, please let me know. Otherwise, is there a way to exclude 0 as one of the modes?)
I now would like to look at all the positions called out in all rows and find the most common positions.
What I have:
row matrix column
1 [2, 3, 6, 8, 0, 0]
2 [2, 4, 5, 10, 0, 0]
3 [2, 7, 9, 10, 0, 0]
What I want:
A list showing the modes in order of most occurrences to least
position 2 - 3 occurrences
position 10 - 2 occurrences
I attached the script I currently have that contains the position finding formula,
Here is a script that uses a different method to get you to a point where you have all of the Letter values parsed into a new column from which you can do your analytics
dt=current data table(); // Create a new table with just the Sequence column in it dtbuild = dt << subset(columns("Sequence"), selected rows(0)); // Find the longest sequence Maxstring = col max(length(:sequence)); // Add a new column called Row Num that will let you keep track of what row // each letter came from dtBuild << New column("Row Num", formula(row())); // Convert the data into real values dtBuild:Row Num << delete property("formula"); // Create a new column for each possible position in the sequence number, and // strip off the ith character and place it into the column For(i=1,i<=Maxstring,i++, dtBuild << new column("C" || char(i),character, formula(substr(:sequence,eval(i),1))); column(dtBuild,i) << delete property("formula"); ); // Create a list of the columns in the dtBuild data table TheColumns = dtBuild << get column names(); // We are only going to stack the columns created that have bits of the // sequence in them, so get rid of the names of the first 2 column in the data table TheColumns = remove(TheColumns,1,2); // Stack all of the columns in to a data table that can be worked with dtFinal = dtBuild << Stack( columns( TheColumns), Source Label Column( "Position" ), Stacked Data Column( "Letter" ), Name( "Non-stacked columns" )(Keep( :Row Num )) ); // Get rid of the interium table close(dtBuild, nosave); // Delete rows that have no letter value dtFinal << select where(:Letter == ""); dtFinal << delete rows;
I need to be able to locate the positions of different letters by row and then compare the rows. For example I need to find the locations of all "A" in row 1's sequence and all the locations of "C" in row 2's sequence. Then I need to be able to see the most frequently occurring location in the table.
From the script you have given me (thank you by the way!) I still am unable to do this table wide comparison.
dt << select where(:(letter == "A" & :Row Num == 1) | (:Letter == "C" & :Row Num == 2);
Doesn't that give you your first requirement?
And your second requiremnt should be able to be calculated using Tabulate, if I am correctly understanding what you want