cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar
Ceg1
Level II

Life distribution interval censoring compare distributions plot

Hi,

I have a question regarding Life Distribution platform, precisely using compare distribution outline box with interval censored data. 

I would like illustrate my question with an example. I created some dummy numbers and tried to fit a Weibull distribution to it. The behaviors that I will describe below repeat as I am using actual data.

Ceg1_1-1676278776488.png

 

My general question is how JMP calculates Y-axis (probability) positions for interval censored data? 

I wonder why two points with start value of 120 and 200 are plotted on the same height on Y axis?

Next, why sometimes an uncensored observation is marked on a plot using single marker (ex. point 4), some marked using 2 markers (ex. points 1 and 3, as shown with red ellipses) and some are not marked at all (ex. missing point 2)?

Finally, I would like to ask, why I can only select on this plot point 1 (using cursor) and other points cannot be selected in any way, except, when using data table directly. Additionally, I can only move label of point 1, other are inactive.

 

Thank you for your help,

Regards,

Ceg1

1 ACCEPTED SOLUTION

Accepted Solutions
peng_liu
Staff

Re: Life distribution interval censoring compare distributions plot

I believe that you are using JMP 16 or an earlier version. There was a bug which is associated with the marker selection issue that you stumbled upon.

And there is also a change in JMP 17. So if you run the analysis in JMP 17, you will see a different plot. The difference reflects a change to the y-axis positions of those points. But let me dial back all the way to the beginning to explain what is going on.

 

1. When data are interval censored, the "Nonparametric Estimate" of the distribution uses the so-called "Turnbull Estimator". You can find the numerical result of the estimate in the "Nonparametric Estimate" outline node. Here is the screenshot from JMP 16. And the Turnbull estimate is in the third column.

peng_liu_0-1676337097427.png

2. The markers that you have questions about are associated with the estimate. This is the tricky part. Traditionally, markers are associated with data. But here the markers in this plot are associated with the nonparametric estimate, which is a model. This explains why some data points do not seem to appear in the plot.

3. Now let me explain how to the Turnbull estimate is plotted.

3.1 First, what you show here is one of two representations of the nonparametric estimate plots. For Turnbull, it might be easier for me to talk about the other representation, which is more rigorously representing the Turnbull estimate. To see the other one, you need to turn off "Show Points" option in the menu in JMP 16; see next screenshot:

peng_liu_1-1676337456268.png

After turning it off, you should see the following plot. There is one red dot, and three red horizontal lines. This style is known as the "step-function" representation of a nonparametric estimate.

peng_liu_2-1676337513758.png peng_liu_0-1676337097427.png

They correspond to the Turnbull estimate. Let me explain them one at a time. We need to look at the Nonparametric Estimate and the plot side by side. So we don't have to scroll up and down. Now, look at the first row in the table, it says that from the time origin (here it means 0) to 60, the probability estimate is 0. Because we are drawing the Y-axis using the Weibull probability scale, this line does not show up. But if you change the Y-axis to linear, you should see that additional line from 0 to 60, at y=0. Now, look at the second row in the table, it says, from time 100 to time 100, the probability estimate is 0.19047620. It means that a line collapses down to a dot. That is what the red dot is corresponding to. The third row through the fifth row in the table define three individual lines at respective probability estimates. Notice the third and fourth lines have the same probability estimates. That determines the two lines are at the same level.

3.2 Now toggle back to see the markers. I put them side by side, and it is now more obvious where the y-axis positions of the markers come from.

peng_liu_2-1676337513758.png peng_liu_3-1676338269125.png

In addition, in order to accommodate the tradition that markers are brush-able in JMP, the software tries to make as much sense as possible to associate the estimate (the model) with the data. But I should explain what is going on using JMP 17. Due to the bug in JMP 16 and the change in JMP 17, explanation of this marker style plot in JMP 16 will bring more confusion. I am switching gear to JMP 17 in the following. Resetting the item number as well to be clear.

 

1. In JMP 17, the Nonparametric Estimate report for this data is the following. Notice the third column's name is "Midpoint Estimate", and there is an additional last column "Turnbull Estimate". So this table moves what was in the second column to the last. And put "Midpoint Estimate" at the third column.

peng_liu_4-1676338730136.png

2. JMP 17 has a new submenu for nonparametric plot options.

peng_liu_5-1676338864687.png

3. The following 3 screenshots are associated with the first 3 options. I do not bother to paste the one associated with "None".

peng_liu_7-1676338969017.png peng_liu_8-1676338990052.png peng_liu_9-1676339025263.png

4. So as you may guess. The "Step Function" plot did not change. The "Points" plot, the marker version, changed. More specifically, the markers' y-axis positions changed. And the new positions are corresponding to the second column - "Midpoint Estimate" - in the above table. Now I am going to explain what are the "Midpoint Estimate". Look at the second row, the Midpoint Estimate and Turnbull Estimate. The Midpoint one 0.09523810 is the average of 0 and 0.19047620, the first and second row values under Turnbull. Look at the third row. On this row, the Midpoint estimate 0.38095239 is the average of 0.19047620 and 0.57142857, the second and third row values under Turnbull. So on so forth.

peng_liu_4-1676338730136.png

Besides y-axis positions, I also need to point out the x-axis positions of those markers. The markers' x-axis positions are the beginning of the steps in the step-function representation. So they are 100, 120, 190, and 220, or the values in the first column Start.

5. The Midpoint estimate was already used for plotting purpose when data are only right censored. The same decision for interval censored data, however, was not made in previous versions.

6. The Midpoint estimate is also known as "midpoint adjustment". And such adjustment is not unique. There are other kinds of adjustments in the literature. Midpoint adjustment is crucial for plotting right censored data, because otherwise the plot will give the misconception that a parametric estimate is biased if the marker version of nonparametric estimate is overlaid. The adjustment is not crucial to interval censored data. A decision was made in JMP17 development cycle to make the two situations consistent.

7. Now, maybe the most mind twisting thing is about the association between the four markers and data points. It is to accommodate the tradition that markers are associated with data and brush-able. So the behavior is implementation dependent. The behavior has to do with the x-axis positions of those markers. In JMP 17, if you brush the first marker on the lower left, whose x-axis position is 100, you should see three rows in the data tables are highlighted. Now look at those three rows, they all have 100 tucked within the corresponding censoring intervals. That is the rule of association. Meanwhile, you should see the second and third markers also get highlighted. Their x-axis positions are 120 and 190. Since the highlights go either ways, from data table to plot. The highlighted rows happen to loop in 120 and 190 as well.

peng_liu_0-1676341508085.png

 

In the end, as a summary. The markers in the plot are associated with nonparametric estimate. The markers are associated with data points through matching markers' x-axis positions with observations. The change from JMP16 to JMP17 should not impact any existing decisions. But the platform itself is a little more consistent moving forward, besides providing more options to accommodate different preferences.

 

View solution in original post

3 REPLIES 3

Re: Life distribution interval censoring compare distributions plot

The groups are estimated separately.

 

What are the groups? What is the row-wise membership in each group?

peng_liu
Staff

Re: Life distribution interval censoring compare distributions plot

I believe that you are using JMP 16 or an earlier version. There was a bug which is associated with the marker selection issue that you stumbled upon.

And there is also a change in JMP 17. So if you run the analysis in JMP 17, you will see a different plot. The difference reflects a change to the y-axis positions of those points. But let me dial back all the way to the beginning to explain what is going on.

 

1. When data are interval censored, the "Nonparametric Estimate" of the distribution uses the so-called "Turnbull Estimator". You can find the numerical result of the estimate in the "Nonparametric Estimate" outline node. Here is the screenshot from JMP 16. And the Turnbull estimate is in the third column.

peng_liu_0-1676337097427.png

2. The markers that you have questions about are associated with the estimate. This is the tricky part. Traditionally, markers are associated with data. But here the markers in this plot are associated with the nonparametric estimate, which is a model. This explains why some data points do not seem to appear in the plot.

3. Now let me explain how to the Turnbull estimate is plotted.

3.1 First, what you show here is one of two representations of the nonparametric estimate plots. For Turnbull, it might be easier for me to talk about the other representation, which is more rigorously representing the Turnbull estimate. To see the other one, you need to turn off "Show Points" option in the menu in JMP 16; see next screenshot:

peng_liu_1-1676337456268.png

After turning it off, you should see the following plot. There is one red dot, and three red horizontal lines. This style is known as the "step-function" representation of a nonparametric estimate.

peng_liu_2-1676337513758.png peng_liu_0-1676337097427.png

They correspond to the Turnbull estimate. Let me explain them one at a time. We need to look at the Nonparametric Estimate and the plot side by side. So we don't have to scroll up and down. Now, look at the first row in the table, it says that from the time origin (here it means 0) to 60, the probability estimate is 0. Because we are drawing the Y-axis using the Weibull probability scale, this line does not show up. But if you change the Y-axis to linear, you should see that additional line from 0 to 60, at y=0. Now, look at the second row in the table, it says, from time 100 to time 100, the probability estimate is 0.19047620. It means that a line collapses down to a dot. That is what the red dot is corresponding to. The third row through the fifth row in the table define three individual lines at respective probability estimates. Notice the third and fourth lines have the same probability estimates. That determines the two lines are at the same level.

3.2 Now toggle back to see the markers. I put them side by side, and it is now more obvious where the y-axis positions of the markers come from.

peng_liu_2-1676337513758.png peng_liu_3-1676338269125.png

In addition, in order to accommodate the tradition that markers are brush-able in JMP, the software tries to make as much sense as possible to associate the estimate (the model) with the data. But I should explain what is going on using JMP 17. Due to the bug in JMP 16 and the change in JMP 17, explanation of this marker style plot in JMP 16 will bring more confusion. I am switching gear to JMP 17 in the following. Resetting the item number as well to be clear.

 

1. In JMP 17, the Nonparametric Estimate report for this data is the following. Notice the third column's name is "Midpoint Estimate", and there is an additional last column "Turnbull Estimate". So this table moves what was in the second column to the last. And put "Midpoint Estimate" at the third column.

peng_liu_4-1676338730136.png

2. JMP 17 has a new submenu for nonparametric plot options.

peng_liu_5-1676338864687.png

3. The following 3 screenshots are associated with the first 3 options. I do not bother to paste the one associated with "None".

peng_liu_7-1676338969017.png peng_liu_8-1676338990052.png peng_liu_9-1676339025263.png

4. So as you may guess. The "Step Function" plot did not change. The "Points" plot, the marker version, changed. More specifically, the markers' y-axis positions changed. And the new positions are corresponding to the second column - "Midpoint Estimate" - in the above table. Now I am going to explain what are the "Midpoint Estimate". Look at the second row, the Midpoint Estimate and Turnbull Estimate. The Midpoint one 0.09523810 is the average of 0 and 0.19047620, the first and second row values under Turnbull. Look at the third row. On this row, the Midpoint estimate 0.38095239 is the average of 0.19047620 and 0.57142857, the second and third row values under Turnbull. So on so forth.

peng_liu_4-1676338730136.png

Besides y-axis positions, I also need to point out the x-axis positions of those markers. The markers' x-axis positions are the beginning of the steps in the step-function representation. So they are 100, 120, 190, and 220, or the values in the first column Start.

5. The Midpoint estimate was already used for plotting purpose when data are only right censored. The same decision for interval censored data, however, was not made in previous versions.

6. The Midpoint estimate is also known as "midpoint adjustment". And such adjustment is not unique. There are other kinds of adjustments in the literature. Midpoint adjustment is crucial for plotting right censored data, because otherwise the plot will give the misconception that a parametric estimate is biased if the marker version of nonparametric estimate is overlaid. The adjustment is not crucial to interval censored data. A decision was made in JMP17 development cycle to make the two situations consistent.

7. Now, maybe the most mind twisting thing is about the association between the four markers and data points. It is to accommodate the tradition that markers are associated with data and brush-able. So the behavior is implementation dependent. The behavior has to do with the x-axis positions of those markers. In JMP 17, if you brush the first marker on the lower left, whose x-axis position is 100, you should see three rows in the data tables are highlighted. Now look at those three rows, they all have 100 tucked within the corresponding censoring intervals. That is the rule of association. Meanwhile, you should see the second and third markers also get highlighted. Their x-axis positions are 120 and 190. Since the highlights go either ways, from data table to plot. The highlighted rows happen to loop in 120 and 190 as well.

peng_liu_0-1676341508085.png

 

In the end, as a summary. The markers in the plot are associated with nonparametric estimate. The markers are associated with data points through matching markers' x-axis positions with observations. The change from JMP16 to JMP17 should not impact any existing decisions. But the platform itself is a little more consistent moving forward, besides providing more options to accommodate different preferences.

 

Ceg1
Level II

Re: Life distribution interval censoring compare distributions plot

Thank you peng_liu for this exhaustive information and examples. It is very illustrative.