Re: Replacing missing data by regressed values - Page 3

guilhem · Jul 8, 2016 10:39 AM

Hi

I have a time series for which the variable has missing values. I would like to replace the missing values by a linear model derived from the closest available values in time.

What would be the best way?

And if the script could do that to all the columns of my table (except the time), that would be even better

Thanks

Example

What I have

t	Output
0	10
1	.
2	.
4	30
10	.
20	10

What I want:

t	Output
0	10
1	15
2	20
4	30
10	22.5
20	10

txnelson · Apr 3, 2020 04:21 PM

If you want to make the value in the Interpreted column to be equal to 0 whenever the :Time value is 0, a change to the code something like below would work

For( i = 1, i <= N Rows( values ), i++,
	If( :Time[i] == 0, values[i]=0);  // **************
	If( Is Missing( values[i] ),
		counter++,
		If( counter > 0,
			For( k = 1, k <= counter, k++,
				base = values[i - counter - 1];
				incr = Abs( (base - values[i]) / (counter + 1) );
				If( base < values[i],
					values[(i - counter - 1) + k] = base + k * incr,
					values[(i - counter - 1) + k] = base - k * incr
				);
			);
			counter = 0;
		)
	)
);

Jim

tatersalad · Apr 3, 2020 09:01 PM

Jim,

Thanks so much. this works really well now. can you explain to me what the last "values[Row()];" does

txnelson · Apr 3, 2020 11:26 PM

The way JMP formulas work, along with JMP functions, is that the last variable created or referenced, is what is passed from the formula or function. Therefore, in your replacement formula, what is returned as the rows value, is the value from the "Values" matrix equal to the Row() number. And since the "Values" matrix is where the columns values had been gotten with the << get values message, and the interpolations were made on the "Values" matrix, by specifying
values[Row()]
the formula is moving the values one row at a time into the new column.
Another way to explain this, is to go through the process flow of the formula
All formulas loop through all rows in a data table from row 1 through the last row in the data table.
On the processing of row 1, all of the values from column "p (g/L)" are moved into a matrix for processing. Processing loops from element 1 in the matrix, through the end of the matrix, performing interpolations when necessary. By the end of the looping, all elements of the matrix are filled in. Then the value from the first element in the "Values" matrix, is returned to the formula as the value to be used for the 1st row in the column.
Then the row value is moved to run the formula for the 2nd row, and the only statements executed for the 2nd row is
values[Row()]
which returns to the formula the value of the columns 2 row, which is element 2 from the "Values" matrix.
The formula continues looping once for each subsequent row, only executing the statement
values{Row()]

I hope this makes sense

Jim