cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
guilhem
Level II

Replacing missing data by regressed values

Hi

I have a time series for which the variable has missing values. I would like to replace the missing values by a linear model derived from the closest available values in time.

What would be the best way?

And if the script could do that to all the columns of my table (except the time), that would be even better

Thanks

Example

What I have

tOutput
010
1.
2.
430
10.
2010

What I want:

tOutput
010
115
220
430
1022.5
20

10

22 REPLIES 22
txnelson
Super User

Re: Replacing missing data by regressed values

If you want to make the value in the Interpreted column to be equal to 0 whenever the :Time value is 0, a change to the code something like below would work

For( i = 1, i <= N Rows( values ), i++,
	If( :Time[i] == 0, values[i]=0);  // **************
	If( Is Missing( values[i] ),
		counter++,
		If( counter > 0,
			For( k = 1, k <= counter, k++,
				base = values[i - counter - 1];
				incr = Abs( (base - values[i]) / (counter + 1) );
				If( base < values[i],
					values[(i - counter - 1) + k] = base + k * incr,
					values[(i - counter - 1) + k] = base - k * incr
				);
			);
			counter = 0;
		)
	)
);
Jim
tatersalad
Level II

Re: Replacing missing data by regressed values

Jim,

 

Thanks so  much. this works really well now. can you explain to me what the last "values[Row()];" does

txnelson
Super User

Re: Replacing missing data by regressed values

The way JMP formulas work, along with JMP functions, is that the last variable created or referenced, is what is passed from the formula or function. Therefore, in your replacement formula, what is returned as the rows value, is the value from the "Values" matrix equal to the Row() number. And since the "Values" matrix is where the columns values had been gotten with the << get values message, and the interpolations were made on the "Values" matrix, by specifying
values[Row()]
the formula is moving the values one row at a time into the new column.
Another way to explain this, is to go through the process flow of the formula
All formulas loop through all rows in a data table from row 1 through the last row in the data table.
On the processing of row 1, all of the values from column "p (g/L)" are moved into a matrix for processing. Processing loops from element 1 in the matrix, through the end of the matrix, performing interpolations when necessary. By the end of the looping, all elements of the matrix are filled in. Then the value from the first element in the "Values" matrix, is returned to the formula as the value to be used for the 1st row in the column.
Then the row value is moved to run the formula for the 2nd row, and the only statements executed for the 2nd row is
values[Row()]
which returns to the formula the value of the columns 2 row, which is element 2 from the "Values" matrix.
The formula continues looping once for each subsequent row, only executing the statement
values{Row()]

I hope this makes sense
Jim