Discussions

Caozheng0115 · Aug 7, 2025 01:50 PM

Hi, all
I am a new member to the community. I am assigned a task to perform recurrence analysis on our data. I never used JMP or did any recurrence analysis. So I will bring some stupid questions for sure.
The data we have are robots repair. We want to see repairs event pattern for the bots and I was told to use recurrence analysis (don't know why).

So we have around 15000 robots, I used the 'bot_id', 'accumulated_transactions', 'site', 'cost', 'bot_type' and 'component' columns. 'Site' includes different location where the robots work. 'cost' is the 0, 1 value ( 1 means a repair, and 0 means the last seen event of the bots). 'component' are different parts under repair. We want to see effect of sites, bot_type, and parts on mcf and repair pattern. I primarily use Log Linear NHPP model.

I have some concerns with the data. We have around 1000 loaner bots that travel around all sites. If I select 'site' column as 'Grouping' role in the analysis to see the MCF for different sites, does it affect the result? My understand is that each site will consider the loaner bot as an independent bot now. Should I exclude those like 1000 loaner bots if I want to 'Grouping' by 'site'?

The data I received have the 'cost' value as 0 at the last event for each site and bot. So if a bot worked in two sites, there will be two 0 values for this bot, each as the last event per site. Does this affect how JMP count active bots when calculate MCF value? My understanding is that 0 cost value means the end of the bot. If there are multiple 0 values for a bot, how is that handled?

I have other questions but want to stop now before this gets too long.
Appreciate you help.

yusuke_ono · Aug 21, 2025 10:57 PM

Thank you again for your reply, @Caozheng0115 -san

You can specify any positive costs. In non-parametric estimates, the cumulative costs are calculated. In parametric estimations, costs are used as weights for log likelihood function. (If you would like to know the mathematical details, I would like to explain it more, so please reply so.)

I am very afraid that I misunderstood your situation and that I might have failed to explain "time-varying effect". In your last example, the "Site" column is not "time-varying effect". "Time-varying" effects are effects whose values changes within each ID. In the following data, the Site column is a time-varying effect.

I am also afraid that I misunderstand the difference in robots. I assumed that all robots are the same version, type and grade, so I did not set the "ID" as a grouping effect or a model effect. (For example, if ID=001, ID=002 and ID=003 are C-3PO, R2-D2 and BB-8 in Star Wars, respectively, these three robots are completely different kinds. So, I think we should set the ID column into Grouping column and also include it to a model effect.)

In your last example, the "Site" is completely confounded with the "ID". So, we cannot separate the effect of Site from ID. (For example, C-3PO worked only in a desert, R2-D2 worked only near sea, and BB-8 worked only inside a comfortable building, we cannot separate the ID effect from the place effect.)

Although I must still misunderstand your situation, the following is an example of an analysis for your last example. I calculate the nonparametric cumulative costs for each ID=001 and 002 separately by including the ID to Grouping column. I exclude data for ID=003 from my analysis because there is only one failure. I fit the parametric Loglinear Poisson process model for each ID. Based on this model, the cumulative cost at Time=300 for ID=001 is estimated to 13300.37, and the one at Time=300 for 002 is done to 550.007.

I attach the data table, your last example, which I used for the above analysis.

Yusuke Ono (Senior Tester at JMP Japan)

View solution in original post

yusuke_ono · Aug 7, 2025 10:26 PM

I think your "site" variable is so-called "time-varying covariate" or "time-varying stress" in statistical jargons. If you want to include time-varying covariates in your recurrence model, you need to add a row at each time when the level of one of time-varing covariates is changed. At that time, the value of cost colum is set to zero. For example, if the ID1 robot moved from A to B, B to C and C to somewehre at time=1,4 and 5, respectively, you need to prepare for the following table.

bot_id site time cost
1 A 1 0
1 B 4 0
1 C 5 0

Your "component" variable seems like so-called "competing risk" in statistical jargons. JMP's Reccurrence Analysis does not support the cometing risk. You can do only following analysis.
(1) Set cost=1 whatever one of all kinds of failures occurs.
(2) Set cost=1 only when a specific failure occurs.

The attached data table shows a toy example for only one robot. The cost is set by the method 1) in the above. Run the table script in the attaced data table. The Profiler shows the estimated cumlative failure counts if robots always works at a specific site only (like always works at site="A" only).

I am very afraid that I do not understand your situation well. But I hope my above comment gives some idea for your analysis.

Yusuke Ono (Senior Tester at JMP Japan)

Caozheng0115 · Aug 8, 2025 6:59 AM

Thanks for the fast response. I understand you point on including 0 cost for each site change. I should provide more details.
I think the concern now is how JMP handle 0 cost. It seems that JMP will just exclude the bots earlier than it should, if the bot have 0 cost at early time. With less bots considered effective by JMP, the mcf value will be increased. Following is what I see. Top one is data with one 0 cost per bot. Bottom is data with multiple 0 cost (one per bot per site). The bottom mcf value is greater because jmp exclude the 0 cost value bot too early. That is my understanding, which might be wrong.

I ask the same question to ChatGPT and following is the answer. I felt that 0 zero cost per bot for the last event seen is what I should do. Let me know what you think.

Option A — Simple & Robust (recommended first)

Keep one row per event (Cost=1) with the Site value active at that event, and one final Cost=0 row per bot at the bot’s last observed transaction.
This approach uses the event rows to carry the time-varying Site value. It’s simple, avoids multiple right-censor rows per bot, and JMP will correctly use the Site value that was in effect at each event.

When to use: most common; sufficient if you only need the covariate at event times (which is usually the case).
Pros: easy, avoids JMP misinterpreting multiple 0s.
Cons: doesn’t explicitly model intervals with no events where the covariate changed — but that’s rarely necessary for NHPP estimation.

Data shape

Each row is either:

Event row: BotID, AccumulatedTransactions (event time), Cost = 1, Site = site_at_event
Final censor row (one per bot): BotID, AccumulatedTransactions (last observed), Cost = 0, Site = <last site or NA>

How JMP uses this

When you add Site as a column affecting Scale/Shape, JMP uses the Site value on each event row to estimate how Site shifts the log-rate and/or shape. Because each event row carries the site at that time, you have effectively encoded a time-varying covariate.

yusuke_ono · Aug 8, 2025 07:37 PM

@Caozheng0115 -san,

CORRECTION FOR MY FIRST REPLY

Your "component" variable seems like so-called "competing risk" in statistical jargons. JMP's Reccurrence Analysis does not support the cometing risk. You can do only following analysis.
(1) Set cost=1 whatever one of all kinds of failures occurs.
(2) Set cost=1 only when a specific failure occurs.

If you use the Cause column in the launch dialog, you can do the anaysis in (2) by the Recurrence Analysis platform. The attached data table shows an example.

Yusuke Ono (Senior Tester at JMP Japan)

Caozheng0115 · Aug 11, 2025 08:48 AM

Hi, Thank for your fast reply. Could you explain why option 2 not option 1? In your sample data, it looks that both way(a column for cost and individual cost for two cause) work and generate the same results. I guess If I want to show one cause in figure 1 and the other cause in figure 2, I need to make new columns for Cost for different cause so the result are consistent. Is that what you are suggesting? How about a similar situation for Grouping then? For instance If I group by site. Is there a way to show single site result and figure which is the same as that for results generated from data include multiple sites?

I check the sample data you sent. It looks that you use Mode as cause, site as scale effect in the first saved analysis. The model you used is proportional NHPP. Is there a reason for the model pick? My understanding is that proportional NHPP model assumes a common baseline MCF curve for all causes, scaled up or down by multiplicative factors. In my case, the different cause effects are not assumed to be proportional, so I used the Log linear model. Let me know if that is the model I should use.

I have trouble understanding the fit curve generated. Is the red fit curve for Fail 1? The fit curve is really far away from the red MCF curve. Why is that happening? There are two blue fit curves. Which one is the fit curve for Fail2? Why are there two fit curves generated? What is the green fit curve?

Or the fit curves are for different sites. But the red one only has one curve instead of two. I don't understand why that happened.
Thanks.

yusuke_ono · Aug 8, 2025 07:01 PM

Thank you for yor reply.

In this reply, I would like to explain when the rows with Cost=0 are needed.

There are two kind of situations where we need the rows with Cost=0. The one is to represent the last time which you observed each robot. The other one is to represent the change of time-varying covariates.

If you observed until "t", and if the robot did not fail at the last observed time "t", you need to include the last row with Cost=0. This situation is called "right-censoring" in statistical jargons. In recurrent analysis, the right-censoring is usuall because we can repir eash robot so we cannot observe the "real death" or "complete death" of the robot. For example, the last time you observed the ID001 robot is 100, and the robot did not fail at t=100, you need to include the following row.

ID t Cost
001 ... ...
001 ... ...
001 100 0

This row is needed to notify to JMP that this robot was alive until when.

If you observed until "t", and if the robot failed at that time "t", you do not need to include the last row with Cost=0. Instead, you need to include the last row with Cost=1. For example, the last time you observed the ID001 robot is 100, and the robot failed at t=100, you need to include the following row with Cost=1.

ID t Cost
001 ... ...
001 ... ...
001 100 1

Rows with The Cost=0 are also needed to represent the change of covariates. Fo example, if the covariate X is changed from A to B at time=50, then the following row with Cost=0 is needed.

ID t X Cost
001 ... ... ...
001 50 A 0
001 ... B ...

This row is needed to notify to JMP that the change of covariates. As I wrote in the previous reply, this kind of covariate are called "time-varying covariate".

The MCF Plot is affected by whether you include the last observed row with Cost=0 or not. The MCF Plot is not affected by whether you include the row with Cost=0 for time-varying covariates or not. But the results by Fit Model are affected by both types of rows with Cost=0.

Is this reply what you expect? If not, I would like you to reply again.

Yusuke Ono (Senior Tester at JMP Japan)

Caozheng0115 · Aug 11, 2025 6:31 AM

Thanks a lot. This is very clear. You are a life saver.
I have one more follow up question on this. If I want to compare effect of sites, should I put site as Grouping role, scale effect, or shape effect? I want to show MCF for each site, the fit curve for each site, and the parameter estimate(γ and δ) as well.
I saw two options:

1. Do not include site as Grouping, fill them as scale and shape effectors. Then I can compared the parameters of the model(log linear NHPP) among different sites. But there is only one MCF curve generated for the total in this method.
2. Include site as Grouping, fill them as scale and shape effectors as well (don't know if I can do that). This way will generate MCF for each site, fit curve for each site, as well as model parameters to compare the effect among sites. The issue I found for this way is that Grouping will generate individual model for each site (I could be wrong), and it is not a good way to compare effects of site using different models.
What I should do. Should I generate individual curve in Option2 and use the results table for parameter estimate using Option 1? Then the curves I show will not match exactly the parameter table.
Thanks a lot.

yusuke_ono · Aug 11, 2025 08:56 PM

Thank you again for your reply.

Although I am afraid that I misudernstand your situation, I this the option (1) must be correct.

As far as I understand, I think your Site variable is a time-varying covariate.

If so, you should NOT set the Site variable as the Grouping column.

If you set the SITE in the following table,...

ID SITE TIME

1 1 15

1 2 31

1 1 49

1 2 57

1 1 102

... then, JMP recognize that there are two IDs.

ID SITE TIME

1-1 1 15

2-1 2 31

1-1 1 49

2-1 2 57

1-1 1 102

So, (if I do not misunderstand your situation and if your Site variable is a time-varying covariate) you should NOT inlucde the Site variable for the Grouping column.

There is one additional note.

The nonparametric estimates in the MCF Plot, which is the first plot you see before fitting the model, does not consider the time varying covariate. And,the curve estimated by the model shows the curve when all time-varying covariate are fixed to each Site value. So, we cannot use the MCF plot to check the model is fitted well or not if your model has time-varying covariates.

The following script shows a simple toy example. In this simulated data, the scale (theta) and shape(beta) parameters are changed after the first 30 time points. The Site variables shows the change, and this Site is the time-varying covriate. Just only for simplicity, I use Power Nonhomogeneous Poisson Process.

The time-varying covariate, Site, should not be included in the Grouping column.

Names Default To Here(1);
Random Reset(111111);
n = 30;
nid = 100;
data = J(2*n*nid, 4, .);
b1 = 3;
theta1 = 5;
b2 = 8;
theta2 = 8;

k = 0;
For( id = 1, id <= nid, id++,
x = 0;
For( i = 1, i <= n - 1, i++,
x = (x ^ b1 + Random Exp() * theta1 ^ b1) ^ (1 / b1);
k++;
data[k, 1] = id;
data[k, 2] = 1;
data[k, 3] = x;
data[k, 4] = 1;
);
k++;
data[k, 1] = id;
data[k, 2] = 1;
data[k, 3] = x;
data[k, 4] = 0;

For( i = 1, i <= n - 1, i++,
x = (x ^ b2 + Random Exp() * theta2 ^ b2) ^ (1 / b2);
k++;
data[k, 1] = id;
data[k, 2] = 2;
data[k, 3] = x;
data[k, 4] = 1;
);
k++;
data[k, 1] = id;
data[k, 2] = 2;
data[k, 3] = x;
data[k, 4] = 0;


);

dt = As Table(data,<<Column Names({"ID","Site", "Time", "Cost"}));
Column(dt,1) << Modeling Type("Nominal");
Column(dt,2) << Modeling Type("Nominal");


dt << Recurrence Analysis(
Y( :Time ), Label( :ID ), Cost( :Cost ),
Event Plot( 1 ),
Fit Model(
Scale Effects( :Site ),
Shape Effects( :Site ),
Run Model,
 Model Type( "Power Nonhomogeneous Poisson Process" )
)
);

In this simple toy example, the true model is fitted. So, the model fit is very well. But the MCF Plot cannot show the goodness of fit.

In short, do not inlucde time-varying covariates in the Grouping column, and do not use the MCF Plot to check the goodness of fit if your model has time-varing covariates.

Yusuke Ono (Senior Tester at JMP Japan)

Caozheng0115 · Aug 21, 2025 11:29 AM

Hi,

Thanks for you reply, this is really helpful. Somehow I missed the email form JMP community to remind me to look at your response. Sorry for the late reply.
I will NOT use site as GROUPING role.

I have performed the analysis and presented some of the results. My manager is happy. So thanks a lot.

The next step is to see the COST (actual money) in the MCF to see the average money per bot repair.

So I am planning to put the money amount instead of the 1(indicating repair) in the cost role column. The 0 will still be kept to indicated censor status.

It that the right way to study average money per bot repair? Following is the data sample with actual money in Cost role column.
I am using JMP 18 standard.
Thanks.

yusuke_ono · Aug 21, 2025 10:57 PM

Thank you again for your reply, @Caozheng0115 -san

You can specify any positive costs. In non-parametric estimates, the cumulative costs are calculated. In parametric estimations, costs are used as weights for log likelihood function. (If you would like to know the mathematical details, I would like to explain it more, so please reply so.)

I am very afraid that I misunderstood your situation and that I might have failed to explain "time-varying effect". In your last example, the "Site" column is not "time-varying effect". "Time-varying" effects are effects whose values changes within each ID. In the following data, the Site column is a time-varying effect.

I am also afraid that I misunderstand the difference in robots. I assumed that all robots are the same version, type and grade, so I did not set the "ID" as a grouping effect or a model effect. (For example, if ID=001, ID=002 and ID=003 are C-3PO, R2-D2 and BB-8 in Star Wars, respectively, these three robots are completely different kinds. So, I think we should set the ID column into Grouping column and also include it to a model effect.)

In your last example, the "Site" is completely confounded with the "ID". So, we cannot separate the effect of Site from ID. (For example, C-3PO worked only in a desert, R2-D2 worked only near sea, and BB-8 worked only inside a comfortable building, we cannot separate the ID effect from the place effect.)

Although I must still misunderstand your situation, the following is an example of an analysis for your last example. I calculate the nonparametric cumulative costs for each ID=001 and 002 separately by including the ID to Grouping column. I exclude data for ID=003 from my analysis because there is only one failure. I fit the parametric Loglinear Poisson process model for each ID. Based on this model, the cumulative cost at Time=300 for ID=001 is estimated to 13300.37, and the one at Time=300 for 002 is done to 550.007.

I attach the data table, your last example, which I used for the above analysis.

Yusuke Ono (Senior Tester at JMP Japan)

Discussions

Recurrence Analysis in JMP 18 Standard

Re: Recurrence Analysis in JMP 18 Standard

Re: Recurrence Analysis in JMP 18 Standard

Re: Recurrence Analysis in JMP 18 Standard

Option A — Simple & Robust (recommended first)

Data shape

How JMP uses this

Re: Recurrence Analysis in JMP 18 Standard

Re: Recurrence Analysis in JMP 18 Standard

Re: Recurrence Analysis in JMP 18 Standard

Re: Recurrence Analysis in JMP 18 Standard

Re: Recurrence Analysis in JMP 18 Standard

Re: Recurrence Analysis in JMP 18 Standard

Re: Recurrence Analysis in JMP 18 Standard

Recommended Articles