cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar

JMP Workflow Challenge 1: Motif Extraction and Identification from Continuous Power Data

PowerCurve.png

Calling JMP workflow and JSL wizards. Here's a Thursday challenge for you. The attached data set is set of power data off my running power meter. What we want to do is identify and correctly tag the 8 motifs that occur during this ~1000 point signal data set attached to this post, create a new column with function ID and write it in a way that it can be used to automate the tagging of future data that follows this same envelope. 

 

The data collection assumptions for this data set and the future are:

  • Prior to the start of a motif the power is 0.
  • There is a period before and after the motifs that will be non-zero (warm-up and warm-down). This is not part of the system under test and should not be tagged.
  • Each motif has an attack, sustain, decay and release. The shape of the function is identical from motif to motif. It goes from 0 to some peak over a period of samples and then ramps down to the baseline noise floor (walking power). The motif ID is over when the power reads 0 again.
  • There will be an unknown number of functions to identify in the future. There are 8 in the data set. The solution needs to be invariant to that.
  • The overall length of the functions as well as the peak can vary.

We are working on the left edge of the analytic workflow today (shown in green below). The data comes from a file (.csv), and eventually when we automated this workflow a folder of .csv files. The Data Access is being used, and what the task is today is to perform the Data Blending and Cleaning tasks on this example file to have the data ready to expand the workflow to other analytic capabilities in the future.

DataWorkflowChallenge.png

Leave your solutions in the comments.

 

 

Connect with me on LinkedIn: https://bit.ly/3MWgiXt
26 REPLIES 26

Re: JMP Workflow Challenge 1: Motif Extraction and Identification from Continuous Power Data

It is surprising how far you can get with a single formula column:

 

dt << new column("Motif ID", nominal, formula(Col Cumulative Sum(
	Row() > 6 & Summation( i = 1, 6, Abs( Lag( :power, i ) ) ) == 0 & :power != 0
)));

However, there is still some cleanup to do... this approach creates a "0" group ID, which is the leading data, and does not cut off the last motif where we'd like. The "6" in the summation is because I designate a new group if at least 6 consecutive zeros are encountered prior to a power change.

 

There are of course many other ways to do this that are more sophisticated, involving comparisons to the background floors, attacks/decays, etc... but if a simple approach like this one works, great.

 

The script below incorporates this idea, but cleans up the dataset, as well. Ideally the values in the last few lines would be parameterized.

 

names default to here(1);

dt = current data table();

dtSub = dt << subset(selected rows(0), selected columns(0));

//create ID column
dtSub << new column("Motif ID", nominal, formula(Col Cumulative Sum(
	Row() > 6 & Summation( i = 1, 6, Abs( Lag( :power, i ) ) ) == 0 & :power != 0
)));
dtSub:MotifID << delete formula;

//the next 2 loops truncate the dataset, assuming it ends like the sample dataset ended.
for(i  = nrow(dtSub), abs(dtSub:power[i] - dtSub:power[i-1]) < 20, i--,
	dtSub << Delete Rows(i)
);

for(i  = nrow(dtSub), abs(dtSub:power[i] - dtSub:power[i-1]) > 5, i--,
	dtSub << Delete Rows(i)
);

//delete ID 0 group
dtSub << select where(:Motif ID == 0) << delete rows;

//delete 0 power rows, but not when they lie within a group
i = nrow(dtSub)-1;
while (i > 1,
	if(dtSub:Motif ID[i] < dtSub:MotifID[i+1] & dtSub:power[i] == 0,
		while( dtSub:power[i] == 0,
			dtSub << delete rows(i--);
		)
	);
	i--
);
	

 

 

brady_brady_0-1647542630658.png

 

Re: JMP Workflow Challenge 1: Motif Extraction and Identification from Continuous Power Data

Nice, Brady! Great work. It’s interesting that there is some data quality problems, too, I can fit a peak model to smooth that out. Maybe those could be marked as missing because they are not really 0, or interpolated?
Connect with me on LinkedIn: https://bit.ly/3MWgiXt
Byron_JMP
Staff

Re: JMP Workflow Challenge 1: Motif Extraction and Identification from Continuous Power Data

@brady_brady That's a pretty slick method for solving the problem.

I did it with three simple column formulas.

 

Byron_JMP_0-1647612669075.png

 

new column("time point", formula(If( Row() == 1,
	1,
	If( :power == 0 & Lag( :time point, 1 ) <= 50,
		Lag( :time point, 1 ) + 1,
		If( :power == 0,
			1,
			Lag( :time point, 1 ) + 1
		)
	)
)));

new column("segments", formula(
If(
	Row() == 1, 0,
	:time point == 1, Lag( :segments, 1 ) + 1,
	Lag( :segments, 1 )
)));


new column("cycle", formula(
If( :time point > 70 | :segments == 0,
	.,
	:segments
)));
JMP Systems Engineer, Health and Life Sciences (Pharma)

Re: JMP Workflow Challenge 1: Motif Extraction and Identification from Continuous Power Data

Uh oh! Problem when I am trying to scale this. We now have two days of data. Ran Brady's script to identify hopefully 16 Motifs.

 

I have new data coming in now and ran the script @brady_brady wrote. Correctly identified 16 motifs. But there is some oddity in the 8th one now. Looks like the algorithm is confused by the new data. I do have the time stamp, maybe that could help in making this scale?

 

Warmup Of 2nd Day Problem.png2days of Power.png

Off to try @Byron_JMP 's solution.

Connect with me on LinkedIn: https://bit.ly/3MWgiXt

Re: JMP Workflow Challenge 1: Motif Extraction and Identification from Continuous Power Data

Ooo! Nice @Byron_JMP . I like the formula column solution. Just tried to scale it. It fails in a different way than @brady_brady 's solution.

 

Motif 9 looks a little suspect...

2daysPowerByron.png

Connect with me on LinkedIn: https://bit.ly/3MWgiXt

Re: JMP Workflow Challenge 1: Motif Extraction and Identification from Continuous Power Data

Here it is with the third file concatenated with @Byron_JMP 's script. Looks like the same thing occurs. We're tagging part of the area to exclude as a motif. Good thing is that it seems to fail in the same spot or using the same failure mode. You can see it in the cycle group 9 and 18 and also in the sample/power plot.

3files.pngTimeSeriew9and18.png

Connect with me on LinkedIn: https://bit.ly/3MWgiXt

Re: JMP Workflow Challenge 1: Motif Extraction and Identification from Continuous Power Data

Assumptions matter. What are you willing to assume about the data? There's where your issues will lie.

 

1) how many cooldown/warmups are possible? Do you know the mean power for these? Does it vary?

2) Motifs can vary in length... will they, however, always be of similar length, whatever that is? In the sample data thus far, this has been the case... will it remain the case going forward?

3) What about zeros? Should they be included as the leading part of the motif, or not? All of them, a certain number of them, none of them?

4) What about zeros that occur mid-motif? Are these legitimate? What about non-zero points mid-motif? How can we tell whether these are legitmate? Should we try?

 

etc.etc.etc

 

Cheers,

Brady

 

Re: JMP Workflow Challenge 1: Motif Extraction and Identification from Continuous Power Data

Hmm. Good questions. Always good to explore assumptions when thinking about a data problem. And also ask the people who own the system that is generating the data!

 

1) Only one cooldown warmup per session. Warmup doesn't really vary that much. Mean is like ~250 +-30 watts unless I've had a few cups of coffee... 

WarmupDistribution.png

2) As you can see I really can only hold my top speed for 6-10 seconds and then the power starts dropping. Maybe in the future they will be longer/wider. We'll have to explore that.

3) Zeros after the walking period shouldn't be included. They are a reset period. It's largely 0 watts prep from start. Sprint for a period of time. Slow down to a stop. Walk back. Return to zero for the set of the next interval.

4) Zeros during the motif are bad data and should be interpolated. I've not stopped. It's just that the sensor has dropped out. They should be fixed, marked as missing or interpolated through otherwise I could see it causing some issues with the subsequent analysis.

 

Connect with me on LinkedIn: https://bit.ly/3MWgiXt
jthi
Super User

Re: JMP Workflow Challenge 1: Motif Extraction and Identification from Continuous Power Data

I have done something a bit similar at work where I was counting temperature cycles for thermal shock oven. I think in that case I did it with couple of different formula columns, I'll have to take a look at the code at work and see if it could be modified for this case.

 

If I remember correctly my idea was to detect "start point" of the cycle (0 power in this case and get last value of those), then check that there are enough values which are over some threshold (here it could be 5 values over 350) and after that check where cycle ends (flat part with ~20 values between 80-100). And then have some running variables (motid_id, cycle_ongoing, ...) which I would increase and reset as needed.

-Jarmo