BookmarkSubscribe
Choose Language Hide Translation Bar

Comparing results

I need help in determining if the rate constants I obtained from non-linear regression are different. I have 2 sets of samples from the same location, but each set was under a different condition (i.e., different flowrate), and I'm trying to determine if changing the flowrate makes a difference to the rate constants. I fit the data points with an equation and the non-linear regression gives me the rate constant based on the fitted curve and the approximate standard error. Does anyone know how can I determine if changing the variable (i.e., flowrate) affects the result (rate constant)?
3 REPLIES 3

Re: Comparing results

I think I'd approach it by running a nonlinear regression analysis on each data set separately, and recording the residual sum of squares and corresponding degrees of freedom (SSE and DFE in the Nonlinear Fit Solution window) you get from each of the two analyses of variance. The combined residual sum of squares will be just the sum of the two SSE values, and the combined residual mean square (MSE) will be just that divided by the sum of the two DFE values.

Secondly, run an analysis on the combined data set (i.e. assuming a single flow rate parameter) and again take a note of the residual sum of squares and the degrees of freedom you get. The total residual sum of squares from the separate analyses above should be less than the figure you now get because you originally fitted a more complex version of the same model; the real question is whether it's significantly less.

To test it, work out the difference between the two, and divide that difference by the difference in the two residual degrees of freedom (which should be exactly 1, since one model has just one parameter more than the other one): that gives you the residual mean square of the difference. Compare that with the residual mean square of the more complex model (because that's the best estimate you've got of pure random variation) in an F test. If the result is significant, that means there's a significant difference between the two parameters, i.e. the two flow rates - because it means that including an additional parameter has resulted in significantly more variation being explained by the model.

Re: Comparing results

Hi David,

Many thanks for your reply. Can I clarify on what you suggested:

> To test it, work out the difference between the two,
> and divide that difference by the difference in the
> two residual degrees of freedom (which should be
> exactly 1, since one model has just one parameter
> more than the other one): that gives you the residual
> mean square of the difference.

I assume the "difference between the two" that you mentioned means the residual sum of squares (SSE) between the combined data set analysis, and the combined SSE obtained from the separate analyses. The difference between the two residues degrees of freedom is 2, however. I'm not sure if it's because I don't have equal no. of data points for each set.

Would the results be valid if I have unequal variance between the two data sets?

Re: Comparing results

Hi - no, the difference between the two residual degrees of freedom should just be the difference betwen the number of parameters being fitted under the more complex model and the number being fitted under the simpler one. The number of data points in the two data sets shouldn't have any effect on the calculations.

So for example, if the underlying model you were fitting were y=A*exp(-alpha*t), and you wanted to know if the estimate of alpha was the same for two different data sets, say of sizes N1 and N2, then the simpler model would have just two parameters (A and alpha), whereas the more complex model would have three (A, alpha1 and alpha2). I'm assuming here that A is the same under both models: if it isn't, then there would be a difference of 2 between the two residual degrees of freedom.

You should find then that the simpler model would have 2 df for the regression, (N1+N2) df for the total sum of squares, and (N1+N2-2) for the residual. The more complex model would have 3 df for the regression, (N1+N2) df for the total, and (N1+N2-3) for the residual. (Incidentally, I haven't got an "(N-1)" df term for the total sum of squares - which is what you'd normally expect to see in an analysis of variance table for an ordinary linear regression - because I'm not fitting a constant term here. Usually only the slope of a regression line is being tested: the overall mean is subtracted from the data beforehand so that the slope can be estimated independently of the intercept, and it's that subtraction of a constant that gives rise to the "(N-1)" total degrees of freedom in the ANOVA table.)

If the residual variances of the two data sets are clearly unequal, that'll violate the assumptions that are implicit in the ANOVA calculations, so I think I'd see if there's any normalizing transformation that could be applied to the data before it's analyzed to minimize any heterogeneity present. In the example of the model I described above (which is just an exponential decay curve), if you plotted out such a data set you'd probably find that the data became less variable as t increases, so it would make sense to log the data prior to analysis anyway. But of course that would change the model you'd want to fit, since if you log the original equation you get ln(y) = ln(A) - alpha*t, which is just a simple linear regression in ln(A) and alpha. In that instance, logging the data would make the problem a lot easier to solve, in addition to being the right thing to do if applying such a transformation has the effect of normalizing the residual variation. I don't know exactly what equation you want to fit to your data here, but it could well be that a suitably-chosen transformation could help in your situation also.

I'm sorry that's a bit long-winded, but does it help? (BTW, a good place to find an example of nonlinear optimisation being performed is the "Algae Mitscherlich" data set in the JMP online help: you'll find examples of three different models being fitted in which some of the alphas and betas are assumed to be the same across the various fits. There are 120 data points in total, of which 8 are automatically excluded, giving residual degrees of freedom of 112 minus the number of parameters fitted in each case.)