cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
View Original Published Thread

How to loop through rows from groups created by 2 columns

nthai
Level III

Hi all,

 

i have a pretty simple question but unable to find the exact answer from the forum. Here's my sample data table:

nthai_1-1675063870291.png

 

Basically it has 4 columns, and i need to loop through each of combination items generating from "group 1" and "group 2", but if and only when "item" # is different then open the all  file path in "path" column. For example on first 2 rows, we have the combination is "aA", and because items also different so the script need to open both file path 1 and 2.  

1 ACCEPTED SOLUTION

Accepted Solutions
jthi
Super User


Re: How to loop through rows from groups created by 2 columns

I'm not exactly sure I understand the problem correctly but below are three different options using, Col Sum/Col Cumulative Sum and Col Max(ColSum())

Names Default To Here(1);

dt = New Table("Untitled",
	Add Rows(8),
	Compress File When Saved(1),
	New Column("g1", Character, "Nominal", Set Values({"a", "a", "a", "a", "a", "c", "c", "c"})),
	New Column("g2", Character, "Nominal", Set Values({"A", "A", "B", "B", "B", "A", "A", "A"})),
	New Column("i", Numeric, "Continuous", Format("Best", 12), Set Values([1, 3, 2, 4, 4, 5, 7, 9])),
	New Column("p",
		Character,
		"Nominal",
		Set Values({"filepath1", "filepath2", "filepath3", "filepath4", "filepath4.1", "filepath5", "filepath6", "filepath7"})
	)
);

// depending how we handle duplicated g1+g2+i we would use different formulas
// if we won't open any
new_col = dt << New Column("ValidFile", Numeric, Continuous, << Set Each Value(
	Col Sum(1, :g1, :g2, :i); 
));

/*
// if we open the first one
new_col = dt << New Column("ValidFile", Numeric, Continuous, << Set Each Value(
	Col Cumulative Sum(1, :g1, :g2, :i); 
));
*/

/*
// if we don't open g1+g2 at all
new_col = dt << New Column("ValidFile", Numeric, Continuous, << Set Each Value(
	Col Max(Col Sum(1, :g1, :g2, :i), :g1, :g2)
));
*/

rows_to_open = dt << Get Rows Where(:ValidFile == 1);
filepaths_to_open = dt[rows_to_open, "p"];
show(filepaths_to_open);
// dt << Delete Column(new_col);

// To open files loop over filepaths_to_open
stop();
file_list = {};
For Each({file_path}, filepaths_to_open,
	Insert Into(file_list, Open(file_path));
);

jthi_0-1675116098849.png

Idea is that we pick the rows where ValidFile is 1 and then open those files. This image demonstrates the calculations for the three different formulas.

 

Edit:

Fixed old reference rows_to_open1 to rows_to_open

-Jarmo

View solution in original post

14 REPLIES 14
Ressel
Level VI


Re: How to loop through rows from groups created by 2 columns

Hi, the below at least starts what you want to do. You will only have to change the "Print( dtTemp:path[j], dtTemp:path[j + 1] )" part with the function to open a file.

Unfortunately, however, the way I wrote it, this script prints "file path 6" twice to the log, so it will also open "file path6" twice if modified accordingly. I am sure there is a way around it (like checking whether a file path has already been opened maybe), but it is getting late here. Anyway, if I figure out how to do the rest as well, I'll let you know. Feedback is appreciated!

 

dt = New Table( "today's challenge",
	Add Rows( 7 ),
	New Column( "group1",
		Character,
		"Nominal",
		Set Values( {"a", "a", "a", "a", "c", "c", "c"} )
	),
	New Column( "group2",
		Character,
		"Nominal",
		Set Values( {"A", "A", "B", "B", "A", "A", "A"} ),
		Set Display Width( 70 )
	),
	New Column( "item",
		Numeric,
		"Continuous",
		Format( "Best", 12 ),
		Set Values( [1, 3, 2, 4, 5, 7, 9] )
	),
	New Column( "path",
		Character,
		"Nominal",
		Set Selected,
		Set Values(
			{"file path 1", "file path 2", "file path 3", "file path 4",
			"file path 5", "file path 6", "file path 7"}
		)
	)
);

dt << New Column( "PairedGroup", formula( :group1 || :group2 ) );

PairedGroupList = Associative Array( dt:PairedGroup ) << Get Keys;

dt << Clear Column Selection;

For( i = 1, i <= N Items( PairedGroupList ), i++,
	dt << Select Where( :PairedGroup == PairedGroupList[i] );
	dtTemp = dt << Subset( Invisible, Selected Rows );
	For( j = 1, j <= N Rows( dtTemp ), j++,
		If( dtTemp:item[j] != dtTemp:item[j + 1],
			Print( dtTemp:path[j], dtTemp:path[j + 1] )
		)
	);
	Close( dtTemp, no save );
);

 

jthi
Super User


Re: How to loop through rows from groups created by 2 columns

@Ressel 

I think it would be easiest to re-think the looping structure, but you could make the current one also work they way you want without double printing: keep a list of what has been printed or figure out some sort of additional logic to prevent that from happening . I think the idea below should work:


For( i = 1, i <= N Items( PairedGroupList ), i++,
	dt << Select Where( :PairedGroup == PairedGroupList[i] );
	dtTemp = dt << Subset( Invisible, Selected Rows );
	For( j = 1, j < N Rows( dtTemp ), j++,
		If( dtTemp:item[j] != dtTemp:item[j + 1],
			Print(dtTemp:path[j]);
			If(j + 1 == N Rows( dtTemp ),
				Print(dtTemp:path[j + 1]);
			)
		)
	);
	Close( dtTemp, no save );
);

For example here you wouldn't loop over the last value, but check if the next value would be last, if it would be then print that also (if you did printthe current rows one).

-Jarmo
Ressel
Level VI


Re: How to loop through rows from groups created by 2 columns

@jthi, neat, as always. Good learning, thank you.

nthai
Level III


Re: How to loop through rows from groups created by 2 columns

Hi Ressel,

 

I've tried your solution, indeed it printed "file path6" twice as you mentioned. Another problem i saw is when i purposely changed row 5 item value from 5 to 7 in order to have 2 identical rows, it's printed the last row ("file path 6"). Is there any way to avoid that? I'm seeing alternative for loop from jthi also have same issue.

nthai_0-1675138466871.pngnthai_1-1675138484787.png

 

StarfruitBob
Level VI


Re: How to loop through rows from groups created by 2 columns

Please investigate NChooseK Matrix() and see if it's right for you.  In the scripting index it's described as: Creates a matrix of nChooseK(n,k) rows and K columns forming all the combinations of k integers from 1 to n.

 

I wonder if it can be changed to work with character data types?

 

Names Default To Here( 1 );
Print( NChooseK Matrix( 5, 3 ) );

__________________________
Prints:
[	1 2 3, 
	1 2 4, 
	1 2 5, 
	1 3 4, 
	1 3 5, 
	1 4 5, 
	2 3 4, 
	2 3 5, 
	2 4 5, 
	3 4 5]
Learning every day!
jthi
Super User


Re: How to loop through rows from groups created by 2 columns

I'm not exactly sure I understand the problem correctly but below are three different options using, Col Sum/Col Cumulative Sum and Col Max(ColSum())

Names Default To Here(1);

dt = New Table("Untitled",
	Add Rows(8),
	Compress File When Saved(1),
	New Column("g1", Character, "Nominal", Set Values({"a", "a", "a", "a", "a", "c", "c", "c"})),
	New Column("g2", Character, "Nominal", Set Values({"A", "A", "B", "B", "B", "A", "A", "A"})),
	New Column("i", Numeric, "Continuous", Format("Best", 12), Set Values([1, 3, 2, 4, 4, 5, 7, 9])),
	New Column("p",
		Character,
		"Nominal",
		Set Values({"filepath1", "filepath2", "filepath3", "filepath4", "filepath4.1", "filepath5", "filepath6", "filepath7"})
	)
);

// depending how we handle duplicated g1+g2+i we would use different formulas
// if we won't open any
new_col = dt << New Column("ValidFile", Numeric, Continuous, << Set Each Value(
	Col Sum(1, :g1, :g2, :i); 
));

/*
// if we open the first one
new_col = dt << New Column("ValidFile", Numeric, Continuous, << Set Each Value(
	Col Cumulative Sum(1, :g1, :g2, :i); 
));
*/

/*
// if we don't open g1+g2 at all
new_col = dt << New Column("ValidFile", Numeric, Continuous, << Set Each Value(
	Col Max(Col Sum(1, :g1, :g2, :i), :g1, :g2)
));
*/

rows_to_open = dt << Get Rows Where(:ValidFile == 1);
filepaths_to_open = dt[rows_to_open, "p"];
show(filepaths_to_open);
// dt << Delete Column(new_col);

// To open files loop over filepaths_to_open
stop();
file_list = {};
For Each({file_path}, filepaths_to_open,
	Insert Into(file_list, Open(file_path));
);

jthi_0-1675116098849.png

Idea is that we pick the rows where ValidFile is 1 and then open those files. This image demonstrates the calculations for the three different formulas.

 

Edit:

Fixed old reference rows_to_open1 to rows_to_open

-Jarmo
nthai
Level III


Re: How to loop through rows from groups created by 2 columns

Hi jthi,

 

Thank you for such a great answer with 3 different options. One of side note is i have to remove "1" from rows_to_open1 variable in order to make the code running. If you can correct this typo for future reference, that would be great!

 

Other than that, i have a question related to your answer. What is the purpose of value "1" in each of Col Sum, Col Cumulative Sum and Col Max(Col Sum? And i don't get why we can sum character value from columns g1 and g2?

 

jthi
Super User


Re: How to loop through rows from groups created by 2 columns

Edited my message to rename old variable. We aren't adding up character columns, we are using them from grouping (<byVar>). And we use 1 so we have something that will increase by 1 for each row in the group. It might be a good idea to create different types of formulas by using those functions to make it easier to understand how they work, scripting index also provides some examples.

Col Sum(name,<By var, ...>) 

-Jarmo
nthai
Level III


Re: How to loop through rows from groups created by 2 columns

Thank you jthi! One last question, how am i supposed to further modify formula to ignore if those columns g1, g2 and i only have 1 combination of valid file?

nthai_0-1675147621046.png