I have an auto-generated dataset and some of the columns, those that have a substring "X_Clean" in it, will have some null values. Sometimes there are 2, 3, 4 or even 5 different columns, depends on the dataset. I want to delete the rows only when ALL "n" of these "X_Clean" columns have null values but never if at least one of them is populated with a value.
I was thinking of doing a FOR loop to generate the column list of interest, and this works.
col_list = dt <<get column names(string);
for(i=nitems(col_list), i>0, i--,
if(!contains(col_list[i], "X_Clean"),
remove from(col_list, i);
));
But then I'm not sure of the next step. Because I can't do another FOR loop for each individual column, I need to either summarize the values of all these columns and if null then delete...or something else?
Here is an example of my table, in this case only the 3rd line of date would qualify to be deleted.
Entity | 2X_Clean | 3X_Clean | 4X_Clean |
A | 1 | 3 | 6 |
B | | 4 | 5 |
A | | | |
B | 4 | 1 | 3 |
A | 3 | 4 | 4 |
6 | 1 | 1 | 0 |
Here is how I would have done it if I could hard code specific columns, except I can't because they vary.
dt << Select Where(Is Missing(:Name( "2X_Clean" ) | :Name( "3X_Clean" ) | :Name( "4X_Clean" )))<<Delete Rows;
Any ideas? Thanks!