Here is another approach with interpolation of the dropouts, using some (relatively simple) formula columns, many of which could be eliminated by using a more complex interpolation formula, at the expense of easy troubleshooting and formula-writing. (See attached table for formulas and graph scripts)
The results seems consistent with what we want, at least in the sample dataset: (red points will be dropped)
Here is how it works so you can judge whether the assumptions made hold for your data:
First, interpolation of the zeros occurring due to "dropouts". Linear interpolation was chosen; splines and quadratics were considered but the results were inferior to my eye.
Note the "Cum Zeros Behind" through "Interpolated Power" columns in rows 4-6 below.
"Cum Zeros Behind" is an inclusive lookback for zeros.
"Cum Zeros Ahead" is an inclusive look-ahead for zeros.
"Zero Block" is the number of contiguous zeros in the block.
"Groups for Zeros" starts at 1 and increases whenever a new group of zeros occurs
Interpolation occurs when there is a short (fewer than 7) block of zeros, not starting on row 1.
****Note: In the sample data, the longest case of interpolation spanned 4 zeros, while the shortest "true" break between intervals was 7 zeros. This is a pretty slim margin for error, and is a possibly significant drawback of this approach. It is important that the shortest "true" break be longer than the longest "dropout".
As mentioned previously, you don't need most of these columns, although the formula for the interpolation will grow much uglier without them, and troubleshooting will be (much) more difficult.
After interpolation, only blocks of at least 7 consecutive zeros remain in the data (zeros possibly occurring in the first few rows excepted.)
The "Interval" column groups the data. It starts at 1 and increments whenever:
-the timestamp changes by more than 100, or
-interpolated power moves from 0 to some nonzero value
The "Post-spike" column is a flag that sets during power's ramp-down from the spike to steady(ish) state.
It is (re)set to zero:
- at row 1
- whenever the interval column changes
- whenever a new block of zeros occurs
It sets to 1 whenever interpolated power moves from some value above 90 to some power less than or equal to 90. Based on the data in the sample this seemed reasonable.
We can now decide which data to ignore--the purpose of the "Ignore" column. We ignore data when any of the following are true:
- The maximum over the interval is < 400. (This is the "background noise"). Again, based on the provided data.
- Interpolated power moves above 100, post-spike. (Indicating we've moved into "background noise", rather than ramping down towards 0)
- Interpolated power is 0
"Final Power" is the same as "Interpolated Power", but set to missing when "Ignore" == 1.
"Interval Time" computes time relative to the interval (i.e., starts at 0 and increments from there). Like "Final Power", it is set to missing whenever "Ignore" == 1.
I did not disable formula evaluations and subset out the non-ignored data, but this is a likely next step.
It will be interesting to see how Dan breaks this next
Cheers,
Brady