Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

Highlighted

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Apr 29, 2020 3:31 PM
(1459 views)

Dear members of the forum,

I am analyzing this data set with a dependent binary variable (Target), using a logistic regression for hypothesis testing rather than prediction on this real-life data.

I would like to utilize the model parameterization in order to assess whether the trend over time (years) is stronger (constant) or perhaps the momentary change is more meaningful as a breaking point.

When used separately, models are similar with perhaps better fit using the continuous Year variable. On the other hand, the technical change is the reason for modeling in the first place so can’t be ignored.

Once put together with the interaction all parameters are insignificant (perhaps multicollinearity, small sample size, too much variance or all together) which gives the false impression of no change in propensity over time.

Please do let me know if you have any suggestions in terms of model specification or comparisons for reaching a clearer conclusion.

Attached is the data table with a script for the alternative models.

1 ACCEPTED SOLUTION

Accepted Solutions

Highlighted

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

You use the Logistic Regression platform. Use **date**, not year. Create a binary variable for **period** (*before*, *after*). Fit a model where the linear predictor is **date**, **period**, and **period*date**. Change the period values based on date. (Imagine that the observations are sorted by date in ascending order. Set all rows to **period** = *after*.for the first fit. Now advance period to the second date, so **period** = *before* for row 1 and the remainder remain set as **period** = *after*. Progress until **period** = *before* for all rows.. Collect the AICc criterion for each fit. Plot the criterion versus date to see where the change occurred. Evaluate that model as you normally would.

Note that you might want to avoid the boundary issue with all rows having the same level for period.

I am not sure if this approach will help. Just an idea.

Learn it once, use it forever!

6 REPLIES 6

Highlighted
##

When I look at your models, it looks like year and change are highly significant individually. When you put the cross (interaction) effect, none of the parameter estimates look significant, but the overall model is. But I don't understand the sense of putting an interaction between year and change in your model - the change variable just reports whether the year is before or after some intervention time, so it doesn't make much sense to me to put the interaction. I suppose what you are asking for is whether the year has a different effect on the target in the Pre and post periods. I think the more direct way to examine this is to Fit Y (target) by X (year) and put the change (Pre and Post) in the By box. When you do this you will see that year is far from significant in the Pre period but quite significant in the Post period. I think that answers your question. If you need a test of whether the slope (nonlinear here) is significantly different in the two periods, I'm not exactly sure what the appropriate test is - but I think it is evident that the effect of year differs significantly in the two periods.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Parameterization choice and model comparison in logistic regression.

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Parameterization choice and model comparison in logistic regression.

Thank you @dale_lehman for the advice.

That is the thing, Ideally, I would be able to assess whether there is an overall trend over time as well as a braking point at 2005. Therefore, the use of both the year and the dummy variable were supposed to allow exactly that. This way I would have an estimate for the slope in each period and well as a dummy for continuity between them.

I can estimate the slope for the post period directly by reversing the value ordering of “Change” as in the attached output. I get the same insignificance. Splitting the data, I get an overall significant model with an insignificant slope. Which to me shoes this is trend not very robust – am I right?

From all the alternative models, could you suggest a logical path do determine whether there is a trend in each period and is there a braking point (change in trend).

Highlighted
##

I think you are too focused on finding a low p-value. It looks like your data shows what you are looking for - the year has no clear effect prior to 2005, but a significant effect after. This is readily seen by running two separate logistic regressions - target as a function of year, in the pre- and post- periods. What more do you need?

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Parameterization choice and model comparison in logistic regression.

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Parameterization choice and model comparison in logistic regression.

I hope that this reply follows the first three!

The problem sounds like 'change point analysis' and the approach is very similar to those used in this technique. You have a sliding definition of period 1 and period 2 and use a metric like minimum AICc to decide the point where the behavior (e.g., mean, slope, et cetera) changed. That is, as the change point varies, you re-fit the model and capture the metric, then plot the metric versus the change point.

Learn it once, use it forever!

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Parameterization choice and model comparison in logistic regression.

Thank you for your answer, I was hoping there is another way of introducing the Year and Change variables in a model since this is not the classical ANCOVA where two independent variables present completely different dimensions (i.e. age and gender).

Thanks for the lead, but I am not sure at all my data is appropriate. Could you tell me how to go about in terms of platform and variable roles?

I was not sure my data is appropriate. This is real data from political science (numbers are real, variable names changed), therefore, observations are not exactly sequential, they do have an exact date but it is not meaningful within the year. I am also not sure I should be looking for the best fit split in the data. The technical change may have taken time to “kick in” but there is no interest in estimating that lag (which would be very important in process monitoring). I just need a way of estimating the probability of target across the time periods comparing the two periods (pre and post) while controlling for the other variables.

I was hoping I can crudely estimate the trend using the Year variable and test whether there is a change in it or perhaps gain insight that it is just a drop between pre and post.

This discussion is helping me go through the whole thinking process of the modeling and the context. Thanks a lot!

Highlighted

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

You use the Logistic Regression platform. Use **date**, not year. Create a binary variable for **period** (*before*, *after*). Fit a model where the linear predictor is **date**, **period**, and **period*date**. Change the period values based on date. (Imagine that the observations are sorted by date in ascending order. Set all rows to **period** = *after*.for the first fit. Now advance period to the second date, so **period** = *before* for row 1 and the remainder remain set as **period** = *after*. Progress until **period** = *before* for all rows.. Collect the AICc criterion for each fit. Plot the criterion versus date to see where the change occurred. Evaluate that model as you normally would.

Note that you might want to avoid the boundary issue with all rows having the same level for period.

I am not sure if this approach will help. Just an idea.

Learn it once, use it forever!

Article Labels

There are no labels assigned to this post.