cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
] />

Discussions

Solve problems, and share tips and tricks with other JMP users.
Choose Language Hide Translation Bar
bobby
Level II

"Response must have at least two levels" comes up when adding a frequency in logistic regression

Hi,

I'm trying to run a logistic regression with species presence/absence data as the response variable (across several years), and various environmental conditions (e.g. temperature) as predictors. I also have abundance data (number of individuals counted each day when the species was present, zero if none observed, null if no survey was conducted), which I would like to include as a frequency.

I can run a logistic regression model perfectly fine without including the abundance data. However, when I include the abundance data as a frequency (or as a weighting - tried both) I get the error "Response variable must have at least two levels".

I have my response variable set as a nominal variable (0 for absent, 1 for present, and • for null i.e. no survey conducted). The abundance data column is set to continuous, and the predictors are all continuous.

I thought that the problem may lie in the huge amount of null values in the dataset (the species is only active for a few months of the year, so is not monitored most of the time). However, I still got the same error when I excluded all of the null values.

To see if the issue occurred specifically with the abundance data I tried putting something irrelevant but continuous (Year of Observation) in as a pretend 'frequency', and JMP had no problems running that.

I'm using JMP Student Edition 19.0.1.

Thanks in advance! :)

~ Bobby

8 REPLIES 8
MRB3855
Super User

Re: "Response must have at least two levels" comes up when adding a frequency in logistic regression

Hi @bobby : Welcome to the community. Without seeing your data, I can only speculate. But, it sounds like abundance is always zero for either presence or absence.

bobby
Level II

Re: "Response must have at least two levels" comes up when adding a frequency in logistic regression

Hi @MRB3855 - thanks! And yes, abundance is always zero for absence.

dlehman1
Level VI

Re: "Response must have at least two levels" comes up when adding a frequency in logistic regression

Showing the data would certainly make it much easier to answer, but I'll speculate here.  When you use abundance as either a frequency or weight you are limiting your response - presence or absence - to be zero whenever abundance is zero (and missing when abundance is missing - which will be ignored in the logistic regression).  So, the values for abundance that will be used are only those where it is present.  Abundance as a weight/frequency sounds like it is erasing the presence/absence variable so that only presence appears as values for the response variable.

I do wonder about how you are treating abundance however.  Aren't you trying to predict abundance as a continuous variable, but with a large number of zeroes?  It almost sounds like a censored data problem or at least a problem where you want to predict the size of the population when a bunch of values are zero and the others are various levels of a continuous variable.  It is hard for me to suggest how to best model this situation, but it doesn't sound like using abundance as a frequency or weight is the correct way to do it.

bobby
Level II

Re: "Response must have at least two levels" comes up when adding a frequency in logistic regression

Hi @dlehman, thank you! Unfortunately I'm not sure whether or not I can put the real data up so I'm erring on the side of caution, but the table below is a mock-up of the kind of data I've got (although I have a lot more zeroes and null observations). You're right about the abundance data - lots of zeroes, and then some days with non-zero counts (being the continuous variable).

Regarding the model - the fish start emerging in summer, but the exact dates vary with environmental factors (as does the size of the population once the fish have emerged). I'm trying to determine which factors are most highly correlated with fish presence, with the view of ultimately being able to predict when fish will start emerging. While I'm not concerned about predicting the population size, I'm bringing the observed abundance in as an indicator of how many more fish emerge as the season goes on so that there's a bit more of the actual relationship between fish emerging and the environmental factors (as a side note - fish observed are collected, so there's no issue with recollection). I hope I'm explaining that well enough - let me know if it doesn't make sense! (I'm newish to stats :beaming_face_with_smiling_eyes:)

Typing this out, it occurred to me that perhaps one way to go about it is to disregard the presence/absence measure and just work with abundance? Which will still require handling the zeroes somehow, so I'll have a play around with that...

Year Month Day Fish present/absent (1/0) Fish abundance (# fish observed) Mean daily water temperature (°C) Cumulative summer water temperature (°C) Creek depth (m)
2016 1 29 12.0 803.0 0.6
2016 1 30 0 0 12.8 815.8 0.6
2016 1 31 1 3 14.2 830.0 0.7
2016 2 1 1 7 14.7 844.7 0.7
2016 2 2 0 0 13.3 858.0 0.5
2016 2 3 1 2 14.1 872.1 0.7
2016 2 4 0 0 12.6 884.7 0.8
2016 2 5 12.9 897.6 0.6
2016 2 6 13.4 811.0 0.7

 

Victor_G
Super User

Re: "Response must have at least two levels" comes up when adding a frequency in logistic regression

Hi @bobby,

Welcome in the Community !

@dlehman1 found the problem to your situation: when using the column abundance as Freq or weight, the zeros present in this column will make the rows with absence of the fish excluded from the logistic regression fit. It's the same situation as if you didn't have these rows.

I would recommend splitting your modeling task in two:

  1. First, using only the absence/presence column to detect in which conditions the fish are present or absent with a logistic regression.
  2. Second, for cases when fish are observed, try to model the number of fish (probably with a Poisson Generalized Regression since you're dealing with counts).

Maybe the two tasks could be done in one if you're predicting only the fish abundance (number of fish) with a (zero inflated) Poisson distribution to handle both zeros and non-zero counts.

I think the two modeling approaches (logistic regression and Poisson regression) answers slightly different questions: what are the conditions preventing or enabling the presence of fish (logistic regression) vs. how the different conditions may increase the presence of fish (Poisson regression).

Hope this answer will help you,

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)
dlehman1
Level VI

Re: "Response must have at least two levels" comes up when adding a frequency in logistic regression

@Victor_G has proposed good solutions I believe.  I am not very familiar with those specific techniques, but I would mention that this sort of problem seems to be a frequency-severity problem (the name comes from insurance modeling).  You have two things to predict:  whether or not fish are present (analogous to whether or not an insurance claim is made) and then a conditional probability predicting the number of fish when they are present (the severity of the claim).  One of these is a nominal variable and the other is continuous, hence to idea of approaching this as a two step problem.  I defer to Victor_G's suggested modeling methods, but I wanted to mention that I see this as a common sort of problem.

bobby
Level II

Re: "Response must have at least two levels" comes up when adding a frequency in logistic regression

Thanks - that's a nice example!

bobby
Level II

Re: "Response must have at least two levels" comes up when adding a frequency in logistic regression

Hi @Victor_G ,

Thank you! That sounds like a good way to go about it - much appreciated :)

Recommended Articles