Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- JMP User Community
- :
- Discussions
- :
- Replacing missing data by regressed values

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Replacing missing data by regressed values

Jul 8, 2016 7:39 AM
(4163 views)

Hi

I have a time series for which the variable has missing values. I would like to replace the missing values by a linear model derived from the closest available values in time.

What would be the best way?

And if the script could do that to all the columns of my table (except the time), that would be even better

Thanks

Example

What I have

t | Output |
---|---|

0 | 10 |

1 | . |

2 | . |

4 | 30 |

10 | . |

20 | 10 |

What I want:

t | Output |
---|---|

0 | 10 |

1 | 15 |

2 | 20 |

4 | 30 |

10 | 22.5 |

20 | 10 |

22 REPLIES 22

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Replacing missing data by regressed values

try this for starters

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Replacing missing data by regressed values

Thanks Ron but I don’t see anything in this that does interpolation?

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Replacing missing data by regressed values

here is a script that will do the task

`Names Default To Here( 1 ); dt = Current Data Table(); For( rows = 1, rows <= N Rows( dt ), rows++, If( Rows == 1, counter = 0 ); If( Is Missing( dt:output[rows] ) == 1, counter = counter + 1, If( counter > 0, incr = (dt:output[Rows] - dt:output[Rows - counter - 1]) / (counter + 1); For( i = Rows - counter, i <= Rows - 1, i++, dt:output = dt:output[i - 1] + incr ); counter = 0; ) ); );`

```
// Set all default referenced memory variables to this unique application
Names Default To Here( 1 );
// Set the variable dt to be a pointer to the current active data table
dt = Current Data Table();
// Loop through the data table from row 1 to the last row and use a
// variable names "rows" as the index
For( rows = 1, rows <= N Rows( dt ), rows++,
// If this is row 1 initialize a variable named counter to 1
If( Rows == 1, counter = 0 );
// If the current row's value for the column named "Output" in the
// data table referenced by "dt" is missing (blank, null) increase
// the counter
If( Is Missing( dt:output[rows] ) == 1,
counter++
);
// If row is found with a non missing value for the column called Output,
// and the counter is > 0, which indicates previous rows with missing
// values, then calculate the required values and place them in their
// appropriate rows
If( counter > 0 & Is Missing( dt:output[rows] ) == 0,
// Calculate the incremental steps to use for the missing cells
incr = (dt:output[Rows] - dt:output[Rows - counter - 1]) / (counter + 1);
// Loop across the rows with missing values and fill in the data
For( i = Rows - counter, i <= Rows - 1, i++,
dt:output[i] = dt:output[i - 1] + incr
);
// reinitialize the counter
counter = 0;
);
);
```

Jim

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Replacing missing data by regressed values

I had a similar problem some years ago and wrote a script to do imputation by "local regression".

Here's an updated version:

```
dt = Current Data Table();
n = N Rows(dt);
x = Column(1);// Column("t")
For(i = 2, i <= N Col(dt), i++,
y = Column(i);
For Each Row(
If(1 < Row() < n & Is Missing(y[]),
// find previous non-missing row
Eval(Eval Expr(r1 = Max(dt << get rows where(!Is Missing(y[]) & Row() < Expr(Row())))));
// find next non-missing row
Eval(Eval Expr(r2 = Min(dt << get rows where(!Is Missing(y[]) & Row() > Expr(Row())))));
y[] = y[r1] + (y[r2] - y[r1]) / (x[r2] - x[r1]) * (x[] - x[r1]);
)
)
);
```

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Replacing missing data by regressed values

Spot on MS,

Cheers!

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Replacing missing data by regressed values

txnelson,

could you update the script with comments? I am new to JSL and need help understanding how it applies to my problem. I have the same need as OP but only have one column that i need to fill in. alternatively, I have used the following VBA code in the past in to fill in missing data in excel. I was wondering if there is something similar to the End(xlDown).value function in JSL. the offset is used to specify where the time point data is. Thanks!

```
Sub interpolatedata()
For Each cell In Range("k4:k29")
If cell.Value = "" Then
y2 = cell.End(xlDown).Value
x2 = cell.End(xlDown).Offset(0, -8).Value
y1 = cell.End(xlUp).Value
x1 = cell.End(xlUp).Offset(0, -8).Value
x = cell.Offset(0, -8).Value
y = (y1) + (x - x1) * (y2 - y1) / (x2 - x1)
cell.Value = y
End If
Next
```

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Replacing missing data by regressed values

Here is an updated/annotated version of my original code. All you should have to do, is to change the column name from "Output" to whatever the name of the column you want to change values to.

```
// Set all default referenced memory variables to this unique application
Names Default To Here( 1 );
// Set the variable dt to be a pointer to the current active data table
dt = Current Data Table();
// Loop through the data table from row 1 to the last row and use a
// variable names "rows" as the index
For( rows = 1, rows <= N Rows( dt ), rows++,
// If this is row 1 initialize a variable named counter to 1
If( Rows == 1, counter = 0 );
// If the current row's value for the column named "Output" in the
// data table referenced by "dt" is missing (blank, null) increase
// the counter
If( Is Missing( dt:output[rows] ) == 1,
counter++
);
// If row is found with a non missing value for the column called Output,
// and the counter is > 0, which indicates previous rows with missing
// values, then calculate the required values and place them in their
// appropriate rows
If( counter > 0 & Is Missing( dt:output[rows] ) == 0,
// Calculate the incremental steps to use for the missing cells
incr = (dt:output[Rows] - dt:output[Rows - counter - 1]) / (counter + 1);
// Loop across the rows with missing values and fill in the data
For( i = Rows - counter, i <= Rows - 1, i++,
dt:output[i] = dt:output[i - 1] + incr
);
// reinitialize the counter
counter = 0;
);
);
```

Jim

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Replacing missing data by regressed values

thanks it worked! you commented

`If this is row 1 initialize a variable named counter to 1`

, did you mean "counter to 0"?

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Replacing missing data by regressed values

Good catch....it should be

`If this is row 1 initialize a variable named counter to 0`

Jim