## Are these equivalent ways to model the response variable?

Community Trekker

Joined:

Jun 10, 2016

Hello,

My example involves a repeated measures experiment where a sample data point is collected from each subject every 10 minutes, for a total of 10 measurements (t0, t1, ..., t9).

As a simple example, the response variable of interest is: the interval of time between consecutive eye blinks.

My question... are the following two definitions of the response variable equivalent?

1) The number of eye blinks recorded in each 10-minute time window

2) The average time between consecutive blinks in each 10-minute time window

Also, would these be classified as Poisson, binomial, or are they normally distributed?

Thank you,

JP

1 ACCEPTED SOLUTION

Accepted Solutions

Community Trekker

Joined:

Mar 27, 2015

Solution

I think you should not treat those two responses the same. Just compare the two scenarios:

Person 1 has a very regular frequency of blinking.

Person 2 has a higher frequency of blinking but one time for whatever reason one time she had a long time between two blinks.

DataMeans and Std. Devs

Both might end up with the same number of blinks but they have different mean times between blinks.

The number of blinks is probably poisson distributed.

The average time between blinks might be approximated by a normal distribution but is probably more something like a weibull-distribution or gamma-distribution, as times cannot be negative.

5 REPLIES

Community Trekker

Joined:

Mar 27, 2015

Solution

I think you should not treat those two responses the same. Just compare the two scenarios:

Person 1 has a very regular frequency of blinking.

Person 2 has a higher frequency of blinking but one time for whatever reason one time she had a long time between two blinks.

DataMeans and Std. Devs

Both might end up with the same number of blinks but they have different mean times between blinks.

The number of blinks is probably poisson distributed.

The average time between blinks might be approximated by a normal distribution but is probably more something like a weibull-distribution or gamma-distribution, as times cannot be negative.

Community Trekker

Joined:

Jun 10, 2016

Person 1 could likely have more than 10 blinks as there are 15 seconds (9*5 + 40 = 85s vs. 10*10 = 100s) unaccounted for if the test interval was 100s. This is assuming that the potential 11th blink for person 1 doesn't have a time greater than 15 seconds from the 10th, in which it would contribute to the next time interval.

That being said, your point was clear and makes a lot of sense.

Super User

Joined:

Jun 4, 2014

I agree that the two methods are not the same.  Here is an example that may be a little extreme, but is useful:  Suppose you are tracking recordable injuries per month at your company location via process behavior charts.  Some months there are zero injuries, some months there are 1, or 2, o3 , etc.  If the average number of recordable injuries per month is low (< 6), then the data is "chunky" (see Dr. Donald Wheeler articles on Chunky data), and the control chart method does not yield yield useful results/conclusions.  HOWEVER, if you track by "Days between recordable injuries", the control chart works quite well.

What is the distribution of the data?  What does it matter?  The I-MR chart does not require the data to have a particular distribution!

Steve

Community Trekker

Joined:

Jun 10, 2016

Thank you. I will look into the literature of 'chunky' data.

Joined:

Jun 5, 2014

If ultimately trying to build regression models with lots of 'zeros' in the response variable set, there are a family of regression techniques known generically as 'zero inflated'. The zero inflated modeling capability is a part of JMP Pro in the Fit Model -> Generalized Regression personality, Distribution: ZI Binomial (and others). Here is a link to the relevant sections of the JMP online documentation as well:

http://www.jmp.com/support/help/13/Distribution.shtml