Loop? to count missing data from work with several columns and with column name...

Markku · Jun 9, 2023 3:02 PM

I have a question that I hope this community can help me with. I wish to add a column showing the consecutive days missing from the last day and backwards.

I also wish to add a column counting the days where the result is higher than 0.02 from 7 columns.

I have found a solution for both by using a series of if statements but wish to learn if it can be done in a more elegant way, maybe by using a loop.

Lastly the dates in the table is changing week to week so I wish this formula to be independent of the column names. Is this possible?

I have attached a similarly looking data table, with my solutions.

Thank you in advance

/Markku

txnelson · Dec 30, 2020 09:27 AM

Here is the alternative formula that I came up with

If( Row() == 1,
	start = 2;
	end = 8;
);
count = end;
While( Is Missing( Column( count )[Row()] ) == 1 & count >= start, count-- );
end - count;

Jim

ron_horne · Jan 2, 2021 09:22 PM

Dear @Markku ,

i also struggle to create dynamic scripts and parse them into a column formula. In the past i had a similar situation to yours, where the number of repeated measures (in your case batches repeat each day) could change from one observation (batch) to another.

my solution may look a bit rough for your needs but i think it should work.

please do check if it robust to different scenarios you may have before usage.

with your data table open run this script and inspect the results.

Names Default To Here( 1 );
dt = Current Data Table();

// get list of columns 
Cols = dt << get column names ( Continuous );
maxlist = {};

For( i = 1, i <= N Items( cols ), i++,
	If( Contains( cols[i], "Max" ),
		Insert Into( maxlist, cols[i] )
	)
);


dt2 = dt << Stack(
	columns(maxlist	),
	Source Label Column( "Time" ),
	Stacked Data Column( "Data" )
);

dt2 << Sort(
	By( :Batch#, :Time ),
	Order( Ascending, Descending ),
	replace table
);


dt2 <<New Column( "Index",
	Numeric,
	"Continuous",
	Format( "Best", 12 ),
	Formula( Sum( :Batch#[Index( 1, Row() )] == :Batch# ) )
);

dt2:Index << Delete Formula;


dt2 << Select where ( and ( ismissing (:Data), :Index == colmin (:Index, :Batch#)  ) );
dt2 << Go To (:Batch#);
dt2 << Select Matching Cells();
dt2 << invert row selection ();
dt2 << delete rows ();


dt2 << select where (! ismissing (:Data))<< delete rows ();

dt2 <<New Column( "Index2",
	Numeric,
	"Continuous",
	Format( "Best", 12 ),
	Formula( Sum( :Batch#[Index( 1, Row() )] == :Batch# ) )
);
dt2:Index2 << Delete Formula;

dt2 << select where (:Index > :Index2  ) << delete rows ();

dt3 = dt2 << Summary(
	Group( :Batch# ),
	Freq( "None" ),
	Weight( "None" ),
	Link to original data table( 0 ),

);

close (dt2, no save);

dt3:N Rows << set name ("No of last days missing");

dt << new column ("No of last days missing", set each value (0));

dt << Update(
	With( dt3 ),
	Match Columns( :Batch# = :Batch# )
);

close (dt3, no save);

//
dt5 =  dt << Stack(
	columns(maxlist	),
	Source Label Column( "Time" ),
	Stacked Data Column( "Data" )
);

dt5 << select where (ismissing (:Data)) << delete rows;
dt5 << select where (:Data<=0.02) << delete rows;

dt6 = dt5 << Summary(
	Group( :Batch# ),
	Freq( "None" ),
	Weight( "None" ),
	Link to original data table( 0 )
);

close (dt5, no save);

dt6:N Rows << set name ("Days over 0.02");

dt << new column("Days over 0.02", set each value(0));
dt<< Update(
	With( dt6 ),
	Match Columns( :Batch# = :Batch# )
);

close (dt6, no save);

let us know if it works,

Ron

jthi · Jan 3, 2021 04:24 AM

My solution is most likely a bit more complicated than necessary (full script attached with example_new table creation). This solution expects the columns of interest to be named with "Max(something" (and should work even when there are more than seven "Max(" columns).

These both formula columns will loose their ability to re-calculate on data change and you will have to rerun them "manually" from tables red triangle if data changes on the datatable:

Formulas for columns:

//modified idea based on of @txnelson solution
tempCol = dt << New Column("# last days missing_community", Numeric, Nominal,
	Formula(
		Local({
			//get information on column names
			dt = Current Data Table(),
			colNames = dt << Get Column Names(String)
			},
			//get list of column names with only Max( in name
			For(i = N Items(colNames), i > 0, i--,
				If(!Contains(colNames[i], "Max("),
					Remove From(colNames, i);
				)
			);
			//get values on rows
			//this might make a mistake if columns before Max( have are missing values
			matLength = N Items(colNames) + 1;
			rowValues = dt[Row(), 0][1::matLength];
			count = 0;
			//calculate missing values from the end of matrix and stop on first found value
			While(Is Missing(rowValues[matLength]), count++; matLength--);
			count;
		)
	)
);

dt << New Column("#days higher than 0,02_community", Numeric, Nominal,
	Formula(
		Local({
			//get information on column names
			dt = Current Data Table(),
			colNames = dt << Get Column Names(String)
			},
			//get values on row and set values of colums without Max( in name to 0
			rowVals = dt[Row(), 0];
			For(i = N Items(colNames), i > 0, i--,
				If(!Contains(colNames[i], "Max("),
					rowVals[i] = 0;
				)
			);
			//create matrix with 0 and 1 based on if value is over 0.02
			rowVals_over = rowVals > 0.02;
			//get sum of previous matris to calculate Max( columns with values over 0.02
			Sum(rowVals_over);
		);
	)
);

-Jarmo

Loop? to count missing data from work with several columns and with column names which change names between the analysis (dates)

Re: Loop? to count missing data from work with several columns and with column names which change names between the analysis (dates)

Re: Loop? to count missing data from work with several columns and with column names which change names between the analysis (dates)

Re: Loop? to count missing data from work with several columns and with column names which change names between the analysis (dates)