Dose-Response Curve Fitting for Ill-Behaved Data (2020-US-30MP-641)

Martin Kane, Managing Scientist, Exponent

Analytical methods for pharmaceutical development often require the use of dose-response curves and the fitting of an appropriate statistical model. Common functions used are the Rodbard and Hill function: different parameterizations of four parameter logistic functions. This presentation will discuss using JMP and the JMP Scripting Language to fit these non-linear functions, even when then are ill-behaved. Real-world data will be used to demonstrate how to use JMP’s various non-linear fitting routines and possible methods of dealing with messy data.

Auto-generated transcript...

Speaker	Transcript
mkane	Okay. Hi everybody, my name is Martin Kane. I'm with Exponent and I'm here to talk about dose-response curve fitting for ill-behaved data
	here at the 2020 JMP Discovery Summit. First of all, I'd like to thank the conference advisory committee for inviting me to give this talk.
	And I really appreciate the opportunity and a chance to share the learnings that I've done through JMP. I use JMP all the time every day.
	And as a consultant, it becomes my primary tool for performing analysis. So this is something that I have been working on recently and thought it would be a good thing to share.
	So let's get into this. First of all,
	there's a disclaimer. The ideas in these slides belong to me, Martin Kane, do not necessarily represent those of my company Exponent.
	So with that being said,
	what are we going to talk about? So what are dose-response models? What shape do they often follow? Typical statistical models for those.
	How do we access these models in JMP? Difference between curve fit and nonlinear. What are the benefits of each? And what are the drawbacks of each? That's an area I will spend some time on.
	And we'll talk about initial values as well and the importance of having good solid initial values for these nonlinear models.
	I have a demonstration and I will then use the data in that demonstration to look at it ill-behaved data. What does that mean, and what can we do about it using the curve fit and nonlinear platforms in JMP?
	Okay.
	Dose-response models. So they can come in both linear and nonlinear formats.
	Typical models, though, are based on what we call the 3, 4 or 5 parameter logistic models. Those are very typical and there are many of them; there aren't just three.
	So what are the shapes of some of these models? Obviously linear, linear, excuse me, is a straight line, just a standard regression where we've plotted
	some sort of response concentration often versus...our log concentrations on the x axis versus the y axis, which is some sort of a response.
	And I will talk about it in just the next slide, but oftentimes this is based on some sort of fluorescence.
	And those values can be quite large, in terms of their range. So it's not uncommon to take the log values of those as well.
	We can have the exponential type model. This might work in some portion of the dose-response curve, but often is not sufficient for the entire curve. But more common than not is some form of...
	some sort of parameter logistic, and this example shows the four parameter logistic, the Hill function. This can also be called in a slightly different orientation, the Rodbard function.
	There's several different versions that have this same general shape, where we have some sort of upper asymptote, some sort of lower asymptote.
	There is a
	center point along this curve, somewhere halfway between the upper and lower asymptote, which we call the EC50 or IC50
	value. That point is on the x axis, is how we use that. And then we also have a slope to this curve and this slope is here in this linear section. It's not truly linear, but it's close to linear.
	And the slope is the fourth parameter and down the bottom, that's this a parameter in this in this equation.
	d is our upper asymptote, that's the top; c is the lower asymptote at the bottom;
	I mentioned a was the slope; and b is the IC50 or EC50 value, that's the halfway point and the x axis concentration for that. So this is the equation of a typical 4 parameter logistic function.
	Okay. So, in the world of biologics and pharmaceuticals, a lot of the standard test method format is based on what's called an assay.
	And assays themselves are nothing more than a test method for biologics, for pharmaceuticals. In this particular case what we see here is a standard 96-well plate
	that's used for these types of assays. Each of these little circles actually represents a well, actually a little divot in a plastic plate where materials can be put.
	And so we can fill up all 96 wells, which will have some sort of binding agent on the bottom of the wells and some sort of fluorescence
	material in it as well. And so, once these are put under a certain wavelength of light, they will fluoresce. And depending on how much binding takes place, you'll get different fluorescence values.
	And like I said those fluorescence values can go anywhere from 10 to 100,000 or maybe even a million. It just depends on the format. It can be quite large, though.
	Typically on a plate, though, we will put seven or eight different concentrations of a curve, and the curve would be,
	as we showed in the previous slide, representative of one single material with various amounts for the concentrations.
	Typically we start at the top of the plate and we put the highest concentration and we might serially dilute that concentration down to the wells below it.
	So if the top starts out at say value of 16, we might do a 4-to-1 serial solution. And so we end up with four in the next column, one in the next row, excuse me,
	one-fourth, one-16th, and so on down the line. So we end up with serially diluted material going down the plate. Oftentimes there are duplicates, so columns five and six might have the same material, just in there twice, and that's so that we can get some form of variability in our curve.
	Oftentimes as well, when we're running an assay for a biologic or pharmaceutical, we will test multiple doses at the same time.
	So the point there is that on the same plate, at the same time, when we have various doses that we're trying to compare to one another.
	So for instance, columns 11 and 12 might have one dose, call it at one milligram per kilogram, and columns 9 and 10 might have a different dose at say, .1 milligrams per kilogram. And we might be wanting to compare those different doses at the same time.
	The other thing to mention is JMP has the ability to test for what's called parallelism. I'll talk about that. And built in, there are functions for testing parallelism using either the F test for the chi square method.
	Okay, so let's take a look at some data in JMP and get right into it.
	So here we go. Right over here on the left I have a JMP journal that I'm going to use for this demo. And this is for nonlinear bio assay materials and for ill-behaved data.
	The two platforms, as I mentioned earlier, that I will be discussing are the curve fit and the nonlinear platforms. So let's
	open up our sample data. Let's pull up some sample data. Now I initially had wanted to use an actual data from a client, they declined to let me do that.
	But the data that JMP has built in, in the bio assay sample data set, works just fine. It's very similar to what I would have used and we can use that.
	So first of all, let's take a look and see what we have. We have some sort of concentration, as I mentioned, the serial dilution. In this particular case, it looks like each
	row down is three-fourths of the row above it. There's some sort of log concentration. That's just log 10 of the concentration.
	Formulation looks like it has various formulations, or those could just as easily be doses. And toxicity, that may be the y value, our response could be fluorescence or log or fluorescence, something like that. So if we take a look at this data using
	Graph Builder, just because it's easy, we put toxicity in the y axis and maybe we put the log concentration on the x axis.
	We can see that there is a similar looking function to that 4 parameter logistic that I mentioned earlier, except it's reversed in terms of its direction. That's not a problem.
	The cubic spline that's used fits the data quite well. We can remove that and just look at the data.
	Now, obviously there's a lot of data here. Looks like there's four values per concentration. Oh, there was a formulation that we haven't talked about yet.
	And that, we could take that and we could do various things, right, in Graph Builder. We can put that in group y, and we can get four different curves out of this. Let's use cubic spline.
	standard, Test A, Test B, and Test C.
	Now you can change the colors. Those are harder to see because we have a single curve fit. We could also just take it and put it in the overlay area,
	which is the most common area that I typically use, and what this does is this actually fits an individual cubic spline for each of my various formulations.
	That's kind of nice. And we can see that three of them are quite similar, except for one is different. The green one. The green one is Test B.
	Okay, so let's remember that, Test B is different than the others. And standard is just one of the various formulations that are being looked at. So it looks like they're trying to compare three different tests against the standard, which is
	interesting. Okay, so I'm going to close this down. Now
	what sort of curve fit functions do we have for these nonlinears? So under analyze, specialized modeling, you have two that we can use. One is called curve fit and one is called nonlinear. Both of these will work for nonlinear data.
	And let's start with the curve fit function.
	So in the curve fit, we want to put some sort of y response toxicity in our y, and log concentration for our regressor. And initially, if I just say okay,
	what we get looks just like what we had in Graph Builder. There's one exception here and that exception is I can come up here under the red triangle
	linear, quadratic, cubic and so on. Sigmoidals
	logistics, probits, Gompertz. It doesn't tell you what the functions really look like or what their equations are. You have to know the right one. Well, I happen to know that I want the Hill function,
	and that's hidden here in the sigmoid curves, logistic curves, and here it is, fit logistic 4 parameter Hill. That's the function that I would like to use. So
	I can click on that and I get what looks a lot like what I had in Graph Builder, except now I actually have parameter estimates down at the bottom.
	Remember, we had a lower asymptote; an upper asymptote; a growth rate, which is the slope;
	and the inflection point, that's the EC50 value, that's the point halfway between the top and the bottom on the x axis. And those are the estimates. This is nice, but I really want separate graphs for each of my
	different
	formulations, so I'm going to redo this. I'm going to relaunch this except I'm going to add formulation. Now I could put formulation in the by category or in the group category. If I put in the by category and I click OK,
	I get four separate curves. That's not bad. And if I hold down my control key and I click on the red triangle for any of them, and I go to sigmoid logistic, 4 parameter Hill,
	what it will do is it will actually fit a curve for each of the four separately and give me the estimates for each of the four. So there's the first, one standard; here's Test A,
	its estimates; there's Test B with its estimates. Notice the asympts are different. And Test C. This is nice, but not still not exactly what I'm looking for.
	So I'm going to actually close this. I'm going to
	relaunch the analysis, except in this particular time, I'm going to take the formulation and put in the group category. And once again, by doing that, now I see all four are kind of overlaid on top of each other. And I can come up here and click on sigmoid, logistic, 4 parameter hill,
	and now what it shows me as the four different curve fits overlaid on top of each other in the plot.
	I can also get the parameter estimates for those four right here. So these should be identical to what we saw in the last screen.
	But visually now, I can take a look at these plots for the four overlaid on top of each other and see how they look. Do they look similar to each other or not?
	So this is, this is pretty good.
	I mean, this is, this might be good enough for what you might need. And if you want to pull these estimates out of this particular parameter...
	parameter estimates, I can right click on it and I can say make into a data table, which then allows me to take this data table with the estimates in it, and I could do something with that, whatever I happened to want.
	So that's, that's good to know. I'm going to close that out.
	So let's take a quick look at the nonlinear platform. So, analyze, specialized modeling, nonlinear. This looks similar. And if I put toxicity in my y response, I'll say formulation in my group, and log concentration in my x, and I say, okay.
	I get, oh wait, this fits...this says fit curve. We just did a fit curve, didn't we? And yes we did actually. This is identical to if I come up here under analyze, screening, fit curve, I say recall and I say, okay,
	they are identical. If I don't do anything different in the nonlinear platform, I actually end up in the fit curve platform. So what can we do that's different than the specialized modeling nonlinear?
	I'm going to see recall pull everything back in notice there's a model library on the left. And also notice this x could be a predicter formula, not just x values.
	So if I click on model library, I have a lot of models that I could choose from. And once again,
	I don't really know what these are. But notice, if I click on one, I can get a function, I can even say show graph. And it gives me a picture of it. Oh, that looks like something like I'm looking for. But it's kind of flipped in the wrong direction. So,
	this, this might work for me. Um, but what I don't see here is one called the Hill function. I see the Rodbard function. That's the five parameter, Richard. Where was it? Rodbard models here, that's similar, but there's no Hill function in all of it. So it's not exactly what I want.
	But it does allow me to do some things. The one thing that the nonlinear platform lets me do
	is it allows me to actually
	specify parameters themselves and lock them in.
	So what do I mean by that? Well, let's just say that I go to model library and I say logistic, 4 parameter. And I say,
	is it make formula, I believe? Toxicity here. Log concentration here. Oops.
	Formulation log concentration here and I say, okay, and this is standard.
	Nice. Okay.
	It actually does fit 4 parameters using a function that's not quite the function that I'm looking for.
	And you know this is not bad. But here, notice I can actually, like, change these different parameters using the sliders. That's kind of interesting.
	So I can say make formula and what it did is it actually put a formula on my data table here. If I take a look at the formula, it's over here and it has parameters with initial values and it has this big long equation for all four fit in the formula.
	So instead I can actually come back here now and I can put that in my x predicter and I can say okay.
	And what it does is it comes up and shows me all these, you know, the four functions and what the initial values were. If you click Go, it'll actually try and fit these.
	And notice that actually it did fit them in a count of four iterations. Pretty quick, actually, where there's a limit of 60, it fit them and fit them well. But notice I have in here the ability to change and I can change via sliders down here.
	Okay.
	Or I could change up here using actual...I could type in actual values, but I can change the current values and I can lock them in. This can be rather helpful and I will demonstrate that next with my ill-behaved data.
	I'm going to close this out and I'm going to close, get rid of this particular column.
	Okay, so we have this data set, our initial data set still there. Let's take a look at our ill-behaved data now.
	What I'm gonna do is I'm going to get rid of every, every other row of data and all of the low end concentration data points. So I'm going to push this button, thin data and eliminate lowest points. Every other one is gone, and now it's going to get rid of all the lowest ones.
	So what exactly does that do to our data? If we take a look at it. Let's just go over to fit model really quick, specialized, fit curve, excuse me, fit curve. And so recall.
	We only have the highest five points on the curve now. If I come up here and I say fit curve, and I do sigmoid, logistic...sigmoid, logistic, Hill.
	And it actually fit curves to those five data points separately. But notice something, I bring this up and bring this way out.
	Notice my lower asymptotes here are just completely different from one another. There's a third one. And if I keep going, eventually I'll get to the fourth one, which is way down here at like minus 80. So four lower asymptotes that are just completely different.
	So it doesn't make any sense, right? It fits the top part of the curves well, but it really, it really doesn't fit the bottom part of the curves well at all. The tops look pretty good but the bottoms don't.
	So rather than extrapolating, oftentimes when we're running these assays,
	the client, user, what they'll do is they'll put blanks, so material on the plate that has no concentration on purpose. This is usually some sort of background material that's in the assay itself.
	And in this particular case, I happen to know that blanks were used and the average of those blanks, as I say, down here was .5.
	So the average was .55. So really somewhere over here by the time we get to .55, all of the lower asymptotes should actually come down to .55.
	But I can't, I can't change that here in the fit curve platform. Ah, but I can in the nonlinear platform.
	But I don't have the Hill function in the nonlinear platform, so this gets kind of confusing.
	But we can get around this problem. Using a thick curve platform, once I fit my model, the logistic 4P Hill, I can actually save a formula.
	I can save a prediction formula. Or I can save a parametric prediction formula.
	And there's a difference here. The prediction formula saves these exact functions just exactly the way that they were, and the parametric prediction formula is a little bit different. In this particular case,
	this is the parametric prediction formula. If we take a look at it, what it shows is, it shows here are the four equations.
	Just as I thought they would be, so you know, if the formulation is standard, use this; if it's Test A use this. And down here are all the parameters. And actually, if one takes a look at this, if I was to copy this and paste it into another document...
	I do that really quick and I come up here and I say File, New, excuse me....file, new, will a journal work?
	Journal
	doesn't work. That's ok.
	So let's
	have a new script.
	Paste. There we go. Notice at the top are all of my parameters.
	At the bottom is actually the formulation...the formula that we were seeing over here on the right. So everything comes over, but these are initial values with the function itself. I just wanted to show that because it's kind of hidden unless you understand the parameters.
	But with this, I can now come over...
	come over (let me back up. Sorry.) to analyze, specialized modeling, nonlinear, and I can use this predicter in my x value.
	My y can be my toxicity; formulation is my group, I can say okay and now here are those same five data points per curve.
	And the strange looking plots, but notice I have the ability, again, to change things. I can...I know that the c parameter is my lower asymptote. So I can change each of these to .55 and I can lock that in.
	So .55.
	By doing this, you're seeing the curves actually changing. It's not, it's not fitting them yet, but it's allowing us to actually force a value that we believe is the correct value.
	So what I want to bring this up and brings over. What you'll see
	is that it made all of them .55 for the lower asymptotes. And now I could click on go, which actually is then going to be the fitting,
	what you see is that in just seven iterations it fit the rest of the parameters to those four curves, such that the lower asymptotes are all .55.
	That's great. That's exactly what I what I want in this particular case. And so once again under the red triangle, I can save a formula.
	In this case, I can't save the parametric prediction. I can just save that prediction, but I can do that. So I can use that over here. And where might I use that?
	Remember, I said that if I come over here under Graph Builder and I was to put my toxicity to my y axis; log concentration on the x;
	formulation, maybe in my overlay, this is what I see. I can also bring over here
	the formula...the formula...yes...yes...itself. And sometimes this can be useful. In this case they look really really similar. What I can do is I actually can take away the curve from my points and the smoother can be on the formulation itself.
	So this is the direction. And so actually, these are the curves that belong to the function themselves, not the smoother. And so this is one way to actually show the correct curves for this data set, even when the data is ill-behaved.
	And I'm saying ill-behaved here because we just don't have a lower asymptote but we have something that we can use in place of it.
	So that's what I wanted to show you. And I think this is really kind of cool. The thing is,
	you have to be able to go back and forth, or you need to at times, between the thick curve and the nonlinear platforms to get what you really want to get out of JMP, out of
	the functionality that you really need, but I think this is this is really great. So you could clear the row states, for instance, and you could actually show all of the data (I guess I should have wrapped this up, but I didn't), various formulations and overlay, log concentration
	And this is the curves, but you could actually use the prediction formula instead in this to
	to get the actual formulas that you that you want. And this is, this is really kind of nice to see.
	It's not something that is really talked about. There is a
	link, a blog link in the JMP discussions that Mark Bailey and somebody else put out, I think, two years ago that describes this methodology, a lot of this methodology.
	I just found it yesterday, long after I already figured it out myself, but I thought it was worth sharing to everybody, how we can fit these nonlinear models, especially in the dose-response world or
	these different biologics are pharmaceuticals, especially with all the talk these days of Covid 19 and there's a lot of work going on in this area. So with that,
	I guess I want to say thank you. Last but not least, I have some contact information. I am Martin in the JMP discussion forums and I post out there, somewhat frequently.
	My email address is also listed down here if you have any questions for me. So with that, thank you very much and I appreciate your time. Happy to take any questions. Thank you.

Presented At Discovery Summit Americas 2020

Presenter

Martin Kane

Dose-Response Curve Fitting for Ill-Behaved Data (2020-US-30MP-641)

Presenter

Files

Advanced Statistical Modeling

Automation and Scripting

Basic Data Analysis and Modeling

Data Exploration and Visualization

Design of Experiments

Predictive Modeling and Machine Learning

Quality and Process Engineering

Reliability Analysis