cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
guilhem
Level II

Replacing missing data by regressed values

Hi

I have a time series for which the variable has missing values. I would like to replace the missing values by a linear model derived from the closest available values in time.

What would be the best way?

And if the script could do that to all the columns of my table (except the time), that would be even better

Thanks

Example

What I have

tOutput
010
1.
2.
430
10.
2010

What I want:

tOutput
010
115
220
430
1022.5
20

10

22 REPLIES 22
tatersalad
Level II

Re: Replacing missing data by regressed values

Jim,

 

is there a difference between "rows" and "Rows"? could you please explain it. I'm a noob. 

txnelson
Super User

Re: Replacing missing data by regressed values

JMP is not case specific in functions and statements, so rows=Rows
Jim
tatersalad
Level II

Re: Replacing missing data by regressed values

Jim,

 

Thank you for clarifying. I;m having a problem with the script. it wont execute and interpolate the last interval.  i cant tell if it has something to do with the nesting of the loops or the "while" conditions. basically regardless of table length it cant fill in the last interval of missing data. 

col_name_list = Current Data Table() << get column names( string );      
If( !Contains( col_name_list, "Interpolated p (g/L)" ),      
	Current Data Table() << New Column(      
		"Interpolated p (g/L)",      
		Numeric,     
		Continuous,     
		Formula(
			If(
				Is Missing( :Name( "time (h)" ) ), 
					Empty(),
				:name( "time (h)" ) == 0,
					0,
				!Is Missing( :Name( "p (g/L)" ) ), 
				:name( "p (g/L)" ), 	
				dt = Current Data Table();
				For( rows = 1, rows <= (N Rows( dt )), rows++, 
					If( Rows == 1, counter = 0 );
					If( Is Missing( dt:name( "Interpolated p (g/L)" )[rows] ) == 1,
						counter++);
					If( counter > 0 & Is Missing( dt:name( "Interpolated p (g/L)" )[rows] ) == 0, 
					
						incr = (dt:name( "Interpolated p (g/L)" )[Rows] - dt:name( "Interpolated p (g/L)" )[Rows - counter - 1]) / (dt
						:name( "time (h)" )[Rows] - dt:name( "time (h)" )[Rows - counter - 1]);
						For( i = rows - counter, i <= rows - 1, i++,
							dt:name( "Interpolated p (g/L)" )[i] = dt:name( "Interpolated p (g/L)" )[i - 1] + (incr * (dt:name( "time (h)" )[i]
							-dt:name( "time (h)" )[i - 1]))
						);
						
						counter = 0;
					);
				);
			)
		)
			
	)
);

 

tatersalad
Level II

Re: Replacing missing data by regressed values

heres the table

txnelson
Super User

Re: Replacing missing data by regressed values

You are running into a recursive issue in the formula, when you try to change values within the formula column itself.  I have reworked your formula to do all of the work upfront using a matrix.  The resulting values are then used for each rows values 

Names Default To Here( 1 );

col_name_list = Current Data Table() << get column names( string );
If( !Contains( col_name_list, "Interpolated p (g/L)" ),
	Current Data Table() << New Column( "Interpolated p (g/L)",
		Numeric,
		Continuous,
		Formula(
			dt = Current Data Table();
			If( Row() == 1,
				counter = 0;
				values = dt:Name( "p (g/L)" ) << get values;
				If( Is Missing( values[1] ),
					values[1] = 0
				);
				counter = 0;
				For( i = 1, i <= N Rows( values ), i++,
					If( Is Missing( values[i] ),
						counter++,
						If( counter > 0,
							For( k = 1, k <= counter, k++,
								base = values[i - counter - 1];
								incr = Abs( (base - values[i]) / (counter + 1) );
								If( base < values[i],
									values[(i - counter - 1) + k] = base + k * incr,
									values[(i - counter - 1) + k] = base - k * incr
								);
							);
							counter = 0;
						)
					)
				);
			);
			values[Row()];
		)
			
	)
);
Jim
tatersalad
Level II

Re: Replacing missing data by regressed values

Jim,

 

Thanks so much i'll give it a shot. I'm curious though; going back to the original script i thought the recursive error came from row 3, because of it would go to calculate incr and get rows-counter-1 = 3-2-1 =0. i'd ignore the error message and it successfully interpolates the rest of the column, except for the very end.  adding more data to the table to be be interpolated moves the problem down. i thought they might be separate issues. 

txnelson
Super User

Re: Replacing missing data by regressed values

From my experiences, I have had too many issues when attempting to reference the actual formula column within the formula. Thus, the approach shown. I also was concerned with your formula from the standpoint of the number of times you are running through the data with your For() loop. My method steps through the data table only once....not counting the iteration replacement.
Jim
tatersalad
Level II

Re: Replacing missing data by regressed values

Jim,

 

I got the script to work. I had to add some math so as to account for the delta time for both the incr calculation for interpolation evaluation loop. final problem is that since i am planning on joining multiple run files together and running the script, I'll have multiple spots where time = 0 (time: 0,12,24,48,0,12,24,48...), thus the first interpolated value should be 0 for the given chunk of data. I'm not sure how to do this with the matrix notation. 

txnelson
Super User

Re: Replacing missing data by regressed values

From the code I supplied you, is this what you are asking about?


				If( Is Missing( values[1] ),
					values[1] = 0
				);
Jim
tatersalad
Level II

Re: Replacing missing data by regressed values

that does it for the first value, but what i need is everything "Time" is 0, the interpolated value is zero. the time column will have multiple 0's