When you collect data from measurements over time or other dimensions, you might want to focus on the shape of the data. Examples can be dissolution profiles of drug tablets or distribution of measurement from sensors. Functional data analysis and regression-based models are alternative options for analyzing such data. Regression models can be nonlinear or multivariate or both. This presentation compares various approaches, emphasizing pros and cons and also offering the option to combine them. The underlying framework supporting this work is information quality, which permits us to consider the level of information quality provided by the two approaches and the possible advantages in combining them. The presentation combines case studies and a JMP demo.

 

Video Player is loading.
Current Time 0:00
Duration 0:00
Loaded: 0%
Stream Type LIVE
Remaining Time 0:00
 
1x
  • Chapters
  • descriptions off, selected
  • captions off, selected

     

    Hi, I'm Ron Kenett.

    This is a joint talk with Chris Gotwalt.

    The talk is on Functional Data Analysis and Nonlinear Regression Models.

    And in order to examine the options

    and what we get out of this type of analysis,

    we will take an information quality perspective.

    In a sense, this is a follow up to a talk we gave last year

    at the same Discovery Summit.

    So I will start with simple examples

    to introduce FDA and Nonlinear Regression.

    And then Chris will cover a complex and more

    substantially more complex example of optimization,

    which includes mixture experiments designed to match a reference profile.

    So the story starts with data on tablets that are dissolved

    and measurements are done at different time points,

    five minutes, ten minutes, 15 minutes, every five minutes,

    and then 20 minutes, ten minutes later,

    30, and then 15 minutes later, 45 minutes.

    We have 12 tablets that are our product

    and 12 tablets that are the reference.

    Our goal is to have a product that matches the reference.

    And in this type of data, we have a profile,

    and we consider two options,

    FDA and NLR.

    In Chris' example, we'll talk about something called the F 2,

    which is a third option for analyzing this type of data.

    So here's what it looks like with the graph builder.

    On the left, we have the reference profiles.

    On the right, we have the tablets and the test.

    We can see this is an example from my book on modern industrial statistics,

    the book with Shelley Zacks, which is now in its third edition.

    So on the left you can see there is a tablet that seems

    a bit different.

    It's labeled T 5R.

    And if we run a functional data analysis of this data,

    T 5R does look different.

    We see that the growth part is different.

    It has a slow growth but consistent growth.

    It does not have the shape that we see in the other dissolution curves.

    This was done with a Quadratic B-spline with 1 knot,

    and the quadratic was in this case,

    fitting the data better than the QB.

    This is a bit of an unusual situation.

    So because of the shapes, the Quadratic B-spline was a better fit.

    If we look at T 1R,

    the first tablet that has still different shape,

    it shoots up and then it stays.

    So basically, the tablet has dissolved.

    Obviously, beyond a high number of dissolution,

    there's not much left for the solution.

    So T 1R and T 5R, they seem different.

    T 5R stands out more than T 1R.

    So, yes, T 5R on the cluster analysis

    on the functional principal components does stand out.

    So here we see how functional principal components

    scatter plot of the first two functional principal components points,

    what we observe visually.

    And T1R, which is next to T 2R,

    is a different cluster.

    We can proceed with a nonlinear regression approach.

    Here we are fitting a Gompertz three parameter model

    with three parameters,

    the asymptote, the growth rate, and the inflection point.

    This is the model and when we fit the profiles,

    we again see that T 5R stands out.

    So we have the same qualitative impression that we had with FDA.

    Now we have these three parameters listed

    and we can run a profiler on the model because we now have a model.

    This is where T 1R stands.

    So by running the profile on the different tablets,

    we can also see how similar or different they look like.

    This is the table that maps out the parameters of this asymptotes.

    so T 1R growth rate .21,

    T5R, this tablet that stood out.

    Growth rate .075,

    very slow growth rate,

    consistent but slow.

    The inflection point is 11.5, way on the right.

    So we can see through this parameter values the difference.

    We can also pick up two tablets that stand out for growth rates.

    T2R 1.77,

    and T8R, almost no growth.

    We'll get back to T 2 and T 8 in a minute.

    If we take the principal components of these three parameters space.

    So we conceal the parameters as if these are the measurements

    and we run a multivariate control shot.

    We can see T 1, this is the first one

    and we can see T 5, this is the blue one, the fifth one.

    This we already saw.

    They are within the control limits of the T Square

    multivariate statistical distance control chart.

    And T 2 and T 8 that I highlighted before now stand out

    and we can see qualitatively why.

    This is the model degradation approach in the guidance documents

    that is used for modeling dissolution curves.

    In running such analysis with an information quality perspective,

    the first question is asking what is the goal of the analysis?

    And then we can consider the method of analysis.

    Here we're using nonlinear regression and functional data analysis.

    Chris will get into how this is combined

    with data derived from experimental design.

    We have a utility function and the information quality

    is the utility of applying a method F on data X conditio n of the goal.

    It is evaluated with eight dimensions.

    And here Chris again we'll talk about data resolution and data structure.

    So Chris, the floor is yours.

    Thanks Ron.

    Now I'm going to give an example that is a little bit more complicated

    than the first one.

    In Ron's example,

    he was comparing the dissolution curves of test tablets

    to those from a set of reference tablets.

    In that situation, the expectation is that

    the curves should generally be following the same path.

    And he showed how to find anomalous curves

    that deviate from the rest of the population.

    In this second example, we also have a reference dissolution curve.

    But we are analyzing data from a designed experiment

    where the goal is to find a formulation in two polymer additives

    and the amount of force used in the tablet production process

    that leads to a close match

    to the reference splashes the solution curve.

    The graph you see here shows

    the data from the reference curve that we want to match.

    To do this, I'm going to demonstrate three analyses of the data

    that use different methods and models

    to find factor settings that match the reference that lead to a...

    To do this, I'm going to demonstrate three analysis of this data

    that use different methods and models to find factor settings

    that will best match the reference curve.

    In the first analysis,

    I'm going to summarize each of the DoE curves

    down to a single metric called F2

    that is typically used in dissolution curve analysis,

    a measure of agreement with the reference match.

    There, I'll use standard DoE methods to model that F2 response

    and then find the factor settings that are predicted

    to best agree with the reference.

    In the second analysis,

    I'll use a functional DoE modeling approach

    where I model the curves using these blinds,

    extract functional principle component stores, and model them.

    I'll load the reference batch as a target function

    in the Functional Data Explorer platform

    and then use the FDoE profiler

    to find the closest match recommended by that model.

    These first two approaches use little subject matter information

    about these types of tablets.

    In the third analysis,

    I'll model the curves using a nonlinear model

    that was known to fit this type of tablet well

    and use the Curve DoE option in the fit Curve platform

    to model the relationship between the DoE factors

    and the shape of the curve.

    I want to credit Clay Barker for adding this capability to JMP Pro 16.

    I think it has a lot of promise for modeling curves

    whose general shape can be assumed to be known in advance

    to come from one of the supported nonlinear models.

    At the end, verification batches were made

    using the recommended formulation settings for each of the three analyses,

    and we compared them to a new reference batch.

    What we found was that the nonlinear regression-based approach

    led to the closest match to the reference.

    What we see here is a scatter plot matrix of the four factors

    in the designed experiment.

    There was a mixture constraint between the two polymers,

    as well as a constraint on the total amount of polymer

    and the proportions of the individual polymers.

    Here's a look at some of the raw data from the experiment.

    At the top of the table, we have data from the reference that we wish to match.

    There are 16 DoE formulations or batches in the experiment.

    We can only see data from two of them in this picture, though.

    There were six tablets per formulation.

    There were four dissolution measurements per tablet.

    Here we see plots of the dissolution curves for each of the 16 DoE formulations

    with the dissolution curve of the reference batch here at the lower right.

    Now, I'm going to do a quick preliminary information quality assessment

    using the questions that you'll find in the spreadsheet

    that you can download from the JMP user community page.

    The first part of the assessment is related to the data resolution.

    In this case, I think we're looking pretty good.

    The data scale is well aligned

    with the stated goal because it's a design experiment.

    The measuring devices seem to be reliable and precise,

    and the data analysis is definitely going to be suitable

    for the data aggregation level,

    and we'll be illustrating different kinds of data aggregation

    as we extract features from these dissolution curves.

    As far as the data structure goes, we're in pretty good shape.

    The data is certainly aligned with the stated goal,

    we don't have any problems with outliers or missing values,

    and the analysis methods are all suitable for the data structure,

    although we do see some variation

    in the quality of the results depending on the type of analysis we do.

    As far as data integration goes, this is a pretty simple analysis.

    We have multiple responses,

    and we're exploring different ways of combining them into extracted features.

    So there's a common workflow to all three of the analysis I'm going to be showing.

    First, we have to get the data into a form

    that is analyzable by the platform that we're using.

    Then there's around a feature extraction.

    Then we model those features.

    That's where there's a lot of difference between the methods,

    and then we use the profiler in different ways

    to find a formulation that closely matches the reference.

    First, I'm going to go over the F2 analysis.

    F 2 is a standard measure of agreement of a dissolution curve

    relative to a reference dissolution curve.

    In the formula,

    the Rs are the means of the reference curve at each time point,

    and the Ts are the means of the non-reference curve.

    The convention is to say that the two curves are equivalent

    when F2 is greater than or equal to 50.

    It's important to point out

    that I'm including this F 2 based analysis

    not just as an example of a dissolution DoE analysis,

    but more broadly as an example of how reducing a response

    that is inherently a curve down to a single number

    leads to a much lower quality analysis

    and results at the end

    than a procedure that treats curves as first-class citizens.

    So now I'm going to share the F2 analysis of the dissolution DoE data.

    The first thing we have to do is calculate the batch means

    of the dissolution curves at the different time points.

    Then we create a formula column

    that calculates the F2 dissolution curve agreement statistic

    for each of these curves relative to the reference batch,

    and we modeled the F2 using the DoE factors as inputs

    and use the profiler to find the factor settings that match the reference.

    Before the analysis,

    we use the table's summary feature

    to calculate the means of the dissolution measurements

    by batch and across each of the times.

    We can save ourselves a little bit of work by using all of the DoE factors here

    as grouping variables so they'll be carried through

    into the subsequent table.

    Now we have a 17-row data set

    and we hide and exclude the reference batch.

    Now take note of the values of the dissolution means for the reference

    because we're going to use those

    when we create a formula column that calculates the F2 agreement metric

    for each of the batches relative to the reference batch

    and now we're going to be able to use this F2 formula column

    as a response to be modeled.

    We use the model script created by the DoE platform

    to set up our model for us.

    We place F2 as our response variable

    and we're going to analyze this data today

    using the generalized regression platform in JMP Pro.

    When we get into the platform

    we see that it has automatically done a standard lease squares analysis

    because it found that there were enough degrees of freedom

    in the data for it to do so

    and it's given us an AICc of 155.6.

    I'm going to see if we can do better by trying

    a best subsets reduction of the model

    and when we do that we see

    that the AICc of that best subsets goes down to 136

    smaller is better with the AICc

    and the difference of 20 is pretty substantial

    so I would conclude that the normal best subset is a better model

    than the standardly squares one.

    I'm going to try one more thing, though,

    and fit a log normal distribution with best subsets to the data.

    When I do that, the AICc goes down a little bit further to 130.6.

    That's a modest difference, but it's good enough

    that I'm going to conclude that we are going to work

    with the Lo gNormal,

    especially because we know that we're working with a strictly positive response

    and the LogNormal distribution fits data that is strictly positive.

    From there, the analysis is pretty straightforward,

    so I'm going to jump straight ahead to using the profiler.

    F2 is an agreement metric that we want to maximize.

    So we get into the profiler,

    we turn on Desirability functions and have them set to maximize,

    and then maximize desirability to find the combination of factor settings

    that this model says gives us the closest match to the reference,

    and that would be at this combination of factor settings that we see here.

    Now the F2 analysis is complete,

    and we're going to go into the second analysis,

    the functional DoE analysis.

    For this analysis,

    we're going to work with the data in a stacked format

    where all of the dissolution measurements have been combined into a single column,

    and we have a time column as well.

    The first thing we do is go into the Functional Data Explorer platform.

    In the platform launch,

    we put dissolution as our response, time is our X,

    the batch column as our ID,

    and we supply the four DoE factors as supplementary variables.

    Once we're in the platform,

    we take a look at the data using the initial data plot.

    This particular data set doesn't need any clean up or alignment options,

    but we are going to go ahead and load the reference dissolution curve

    as a target function.

    For relatively simple functions like these,

    I typically use B -Splines for my functional model.

    When we do that, we see our B-Spline model fit,

    and the initial fit that has come up is a cubic model that is behaving poorly.

    It's interpolating the data points well,

    but kind of doing crazy things in between them.

    So I'm going to change from the default recommended model

    over to a quadratic Spline model instead of the cubic one.

    We do that by simply clicking on Quadratic over here

    in the right of the B-Spline model fit.

    We'll see that this quadratic model fits the data well.

    A functional principal components analysis is automatically calculated,

    and we see that the Functional Data Explorer platform has found

    three functional principal components.

    The leading one is very dominant,

    explaining 97. 9 percent of the functional variation.

    And it looks like this is a level set up or down kind of shape component.

    The second one looks like a rate component,

    and the third one almost looks like a quadratic.

    Looking a little closer at this quadratic B-Spline model fit,

    we see that this model is fitting the individual dissolution curves

    pretty well.

    So now we're ready to do our functional DoE analysis.

    Each of our individual dissolution curves has been approximated now

    by an underlying mean function common to all the batches

    plus a batch dependent FPC score times the first eigenfunction,

    plus another batch dependent FPC score times the second eigenfunction,

    and so on with the third one.

    What we're going to do is set up individual DoE models

    for each of these functional principal component scores

    as responses using our DoE factors as inputs.

    The Functional Data Explorer platform, of course,

    makes all this simple and kind of ties it up into a bow for us.

    And when I say that,

    it ties it up in a bow for us, what I really mean is the FDoE profiler.

    So this pane here shows our predicted trajectory

    of dissolution as a function of time,

    and then we can see how that trajectory would change

    by altering the DoE factors.

    That relationship with the DoE factors

    comes from these three generalized regression models

    for each of our functional principal component scores.

    If we want, we can open those up

    and we can look at the relationship

    between the DoE factors and that functional principal component score,

    and we could even alter the model

    by moving around to other ones in the solution path.

    I just want to point out that it's possible to change the DoE model

    for an FPC score.

    In the interest of time,

    I'm just going to have to move on and not demonstrate that, though.

    We have diagnostic plots,

    the most important one probably being the Actual by Predicted Plot.

    This has our plotted dissolution measurements on the Y-axis

    and the predicted dissolution values using the functional DoE model.

    And as always, we want to see that plot

    have data points tight along the 45 degree line.

    And in this case, I think this model looks pretty good.

    We don't want to see any patterns in our residuals,

    and I'm not seeing any bad ones here.

    So this model looks pretty good, and we're going to work with it.

    So I've already explained how this pane right here

    represents the predicted dissolution curve as a function of time

    and the individual DoE factors.

    Now, these other two rows here are because

    we've loaded the reference as a target function.

    So this row is the difference

    of the predicted dissolution curve from the target reference curve.

    And then the bottom pane here is the integrated distance

    of the predicted curve from the target.

    When we maximize desirability in this profiler,

    it gives us the combination of factor settings

    that minimize this integrated distance from the target.

    So I'm going to do that by bringing up maximize desirability.

    And now we see the results of the functional DoE analysis,

    where we have identified .725 of polymer A

    275 of polymer B,

    a total polymer of about .17,

    and a compression force of about 1700

    minimize the distance between our predicted curve and the reference.

    Now, we've done two analyses.

    Both of those analyses have recommended that we go

    to the lowest setting of polymer A and the highest setting of polymer B.

    They differ in their recommendations

    for what total polymer amount to use and how much compression force to use.

    The third analysis I'm going to do is the Curve DoE analysis.

    This is going to be structured pretty similar

    in some ways to the functional DoE analysis,

    in that we're going to use the same version of the data

    where dissolution measurements are all in one column

    and we have a time column.

    But we don't have a built-in target function option

    in the Fit Curve platform yet.

    So the first thing we have to do is fit just the reference batch

    and save its prediction formula back to the table.

    Then we do a Curve DoE analysis,

    which is largely similar to a functional DoE analysis

    in that we're extracting features from the curves

    modeling the curves.

    Then we go to the graph profile to find settings that best match the reference.

    The nonlinear model that we're going to be using is

    a three parameter Weibull Growth Curve,

    which has a long history in the analysis of dissolution curves.

    Weibull Growth Curve have an asymptote parameter A

    that represents the value as time goes to infinity.

    There's what's called the inflection point parameter

    that I see is a scaling factor that kind of stretches out

    or squeezes in the entirety of the curve.

    And then there's also a growth rate parameter

    that dictates the shape of the curve.

    What I think is really valuable about using this model

    relative to the functional DoE model or the F2 type analysis

    is that we're going to be modeling features extracted from the data

    that have real scientific meaning,

    especially the asymptote and inflection point parameters.

    Now Curve DoE analysis doesn't have a target matching capability

    like the Functional Data Explorer.

    So we begin the analysis by excluding all of the DoE rows in the data table.

    These are represented with the set column equal to A.

    So I select a cell there,

    select matching cells, and then hide and exclude those rows

    so that I only have the reference batch not excluded.

    Then I go to the Fit Curve platform,

    load it up, get in there, fit the Weibull growth model,

    and then I save that prediction formula back to the table.

    Once we complete the Curve DoE analysis,

    we're going to compare the Curve DoE prediction formula

    to this reference predictor

    to find combinations of the factor settings

    that get us as close to this curve as possible.

    So now we unhide and exclude the DoE batches,

    go back into the Fit Curve platform,

    just like the Functional Data Explorer platform.

    We're going to load up the DoE factors as supplementary variables.

    Now that we're in the platform, we can fit our Weibull growth model.

    The initial fit here looks pretty good.

    Looks like we're capturing the shape of the dissolution curves.

    One thing I like to do next is to make a parameter table.

    This creates a data table with our fitted down the near regression parameters.

    I like to look at these in the distribution platform

    to see if there are outliers in there or anything unusual.

    I also like to look at the patterns I see in the multivariate platform

    just gives you a better sense of what's going on

    with the nonlinear model fit.

    Once we know that everything is looking pretty good,

    we can do our curve DoE analysis

    and this looks very much like the functional DoE analysis from before.

    We have a profiler that shows

    the relationship between dissolution and time

    and how that relationship changes as a function of our DoE factors.

    And then we also have a generalized regression model

    for each of those three parameters that we can take a look at individually.

    The first thing I would do before trying to use the model in any way

    is look at the Actual by Predicted Plot,

    so that's what we see here.

    This is the predicted values incorporating

    both the time model from the mean function and the eigen functions,

    as well as the DoE models on the nonlinear regression parameters.

    This looks pretty good

    because there is a fairly easy interpretation

    for the viable growth model parameters.,

    It can be useful and interesting to open up

    the individual model fits for these parameters.

    For example, here are the coefficients for the inflection point model.

    Because inflection point is a strictly positive quantity,

    a LogNormal best subsets model has been fixed with the data

    by the generalized regression platform.

    We see that the mixture manifests have been forced in

    and that the compression forced by polymer A interaction

    is the only other term in the model.

    What this means is that

    if we hold the polymer proportions constant

    and increase the compression force,

    we would expect a larger value of the inflection point.

    One would observe this as a tablet that takes longer to dissolve,

    which is exactly what we would expect to have happen.

    We can save the curve DoE prediction formula back to the table

    and we can see in all its gory detail

    how the model for asymptote inflection point

    and growth rate are combined with time to come up with our overall prediction

    for the dissolution curve based on.

    Fortunately, with junk we don't have to look at the formula too closely though,

    because we have profilers that

    let us see the relationships in a visual way

    rather than an algebraic one.

    To solve our problem of finding the combination of factors

    that give us the dissolution curve

    that would be closest to the reference,

    I created a formula column that calculated

    the percentage difference of the predicted curve,

    taking into consideration the DoE factors from the reference.

    The last step of the analysis is

    to bring up this percent difference response

    in the profiler that is under the graph menu,

    being sure to check the Expand Intermediate formulas option.

    This led to a profiler where we're able to see the percent difference

    from the reference as a function of time and the DoE factors,

    I've shared the region where the difference

    is less than one percent in green.

    By manually adjusting the factors,

    I was able to find settings where the predicted curve

    is less than one percent from the reference across all time values.

    This looks really good, but in practice I bet that this is overly optimistic.

    Here we see the optimal values of the factor settings for all three analyses.

    The curve DoE analysis is in the interior of the range for the polymers.

    The optimal value for total polymer is . 16

    which is close to the functional DoE analysis result,

    and compression force is in between the optimal values

    recommended by the F2 analysis and the functional DoE analysis.

    After this, we made new formulations based on the recommended factor settings

    from each of these models and measured their dissolution curves

    as well as took a new set of measurements from the reference.

    Here we see a summary of the final results from the verification runs.

    The new reference distribution curve is in black,

    and the curve DoE in green is the closest curve to it,

    followed by the F DoE curve in blue.

    The result of modeling F2 is in red, and it did the poorest overall.

    This should perhaps not be too surprising.

    The F2 metric was the simplest,

    reducing the data down to a single metric and did the poorest.

    The functional DoE model had to empirically

    derive the shapes of the curves

    and then model three features of those shapes,

    essentially using more of the information in the data.

    The curve DoE led to the best formulation

    because it used the data efficiently via some prior knowledge

    about the parametric form of the dissolution curves.

    We see that the results of the F2 based analysis

    are not equivalent with the new reference patch,

    while the approaches that treated curves as first class objects are equivalent.

    What this means is that the approach would have required

    at least another round of DoE runs,

    and so an inefficient analysis leads to an inefficient use

    of time and resources.

    I'm going to close the presentation

    with a retrospective Info Q Assessment of the Results.

    Overall, we found that the curved DoE prediction

    generalized the best to new data,

    but was the most difficult analysis to perform.

    I want to note that if we didn't have a known nonlinear model

    to work with that fit the data well, we could not have done that analysis.

    The functional DoE analysis and the F 2 based approach

    can be used more broadly in other situations.

    The profiler leads to excellent communication scores

    for all three analyses.

    The ability to see how the shape of the dissolution curve changes

    with the DoE factors in the functional and curve-based approaches

    leads me to give them a better communication score.

    I see the curve DoE approach

    having the highest communication score by a little bit

    because we're directly modeling more meaningful parameters

    than the functional DoE approach.

    That's all we have for you today.

    I want to thank you for your time, interest, and attention.

    Published on ‎05-17-2024 02:08 PM by | Updated on ‎05-17-2024 02:24 PM

    When you collect data from measurements over time or other dimensions, you might want to focus on the shape of the data. Examples can be dissolution profiles of drug tablets or distribution of measurement from sensors. Functional data analysis and regression-based models are alternative options for analyzing such data. Regression models can be nonlinear or multivariate or both. This presentation compares various approaches, emphasizing pros and cons and also offering the option to combine them. The underlying framework supporting this work is information quality, which permits us to consider the level of information quality provided by the two approaches and the possible advantages in combining them. The presentation combines case studies and a JMP demo.

     

    Video Player is loading.
    Current Time 0:00
    Duration 0:00
    Loaded: 0%
    Stream Type LIVE
    Remaining Time 0:00
     
    1x
    • Chapters
    • descriptions off, selected
    • captions off, selected

       

      Hi, I'm Ron Kenett.

      This is a joint talk with Chris Gotwalt.

      The talk is on Functional Data Analysis and Nonlinear Regression Models.

      And in order to examine the options

      and what we get out of this type of analysis,

      we will take an information quality perspective.

      In a sense, this is a follow up to a talk we gave last year

      at the same Discovery Summit.

      So I will start with simple examples

      to introduce FDA and Nonlinear Regression.

      And then Chris will cover a complex and more

      substantially more complex example of optimization,

      which includes mixture experiments designed to match a reference profile.

      So the story starts with data on tablets that are dissolved

      and measurements are done at different time points,

      five minutes, ten minutes, 15 minutes, every five minutes,

      and then 20 minutes, ten minutes later,

      30, and then 15 minutes later, 45 minutes.

      We have 12 tablets that are our product

      and 12 tablets that are the reference.

      Our goal is to have a product that matches the reference.

      And in this type of data, we have a profile,

      and we consider two options,

      FDA and NLR.

      In Chris' example, we'll talk about something called the F 2,

      which is a third option for analyzing this type of data.

      So here's what it looks like with the graph builder.

      On the left, we have the reference profiles.

      On the right, we have the tablets and the test.

      We can see this is an example from my book on modern industrial statistics,

      the book with Shelley Zacks, which is now in its third edition.

      So on the left you can see there is a tablet that seems

      a bit different.

      It's labeled T 5R.

      And if we run a functional data analysis of this data,

      T 5R does look different.

      We see that the growth part is different.

      It has a slow growth but consistent growth.

      It does not have the shape that we see in the other dissolution curves.

      This was done with a Quadratic B-spline with 1 knot,

      and the quadratic was in this case,

      fitting the data better than the QB.

      This is a bit of an unusual situation.

      So because of the shapes, the Quadratic B-spline was a better fit.

      If we look at T 1R,

      the first tablet that has still different shape,

      it shoots up and then it stays.

      So basically, the tablet has dissolved.

      Obviously, beyond a high number of dissolution,

      there's not much left for the solution.

      So T 1R and T 5R, they seem different.

      T 5R stands out more than T 1R.

      So, yes, T 5R on the cluster analysis

      on the functional principal components does stand out.

      So here we see how functional principal components

      scatter plot of the first two functional principal components points,

      what we observe visually.

      And T1R, which is next to T 2R,

      is a different cluster.

      We can proceed with a nonlinear regression approach.

      Here we are fitting a Gompertz three parameter model

      with three parameters,

      the asymptote, the growth rate, and the inflection point.

      This is the model and when we fit the profiles,

      we again see that T 5R stands out.

      So we have the same qualitative impression that we had with FDA.

      Now we have these three parameters listed

      and we can run a profiler on the model because we now have a model.

      This is where T 1R stands.

      So by running the profile on the different tablets,

      we can also see how similar or different they look like.

      This is the table that maps out the parameters of this asymptotes.

      so T 1R growth rate .21,

      T5R, this tablet that stood out.

      Growth rate .075,

      very slow growth rate,

      consistent but slow.

      The inflection point is 11.5, way on the right.

      So we can see through this parameter values the difference.

      We can also pick up two tablets that stand out for growth rates.

      T2R 1.77,

      and T8R, almost no growth.

      We'll get back to T 2 and T 8 in a minute.

      If we take the principal components of these three parameters space.

      So we conceal the parameters as if these are the measurements

      and we run a multivariate control shot.

      We can see T 1, this is the first one

      and we can see T 5, this is the blue one, the fifth one.

      This we already saw.

      They are within the control limits of the T Square

      multivariate statistical distance control chart.

      And T 2 and T 8 that I highlighted before now stand out

      and we can see qualitatively why.

      This is the model degradation approach in the guidance documents

      that is used for modeling dissolution curves.

      In running such analysis with an information quality perspective,

      the first question is asking what is the goal of the analysis?

      And then we can consider the method of analysis.

      Here we're using nonlinear regression and functional data analysis.

      Chris will get into how this is combined

      with data derived from experimental design.

      We have a utility function and the information quality

      is the utility of applying a method F on data X conditio n of the goal.

      It is evaluated with eight dimensions.

      And here Chris again we'll talk about data resolution and data structure.

      So Chris, the floor is yours.

      Thanks Ron.

      Now I'm going to give an example that is a little bit more complicated

      than the first one.

      In Ron's example,

      he was comparing the dissolution curves of test tablets

      to those from a set of reference tablets.

      In that situation, the expectation is that

      the curves should generally be following the same path.

      And he showed how to find anomalous curves

      that deviate from the rest of the population.

      In this second example, we also have a reference dissolution curve.

      But we are analyzing data from a designed experiment

      where the goal is to find a formulation in two polymer additives

      and the amount of force used in the tablet production process

      that leads to a close match

      to the reference splashes the solution curve.

      The graph you see here shows

      the data from the reference curve that we want to match.

      To do this, I'm going to demonstrate three analyses of the data

      that use different methods and models

      to find factor settings that match the reference that lead to a...

      To do this, I'm going to demonstrate three analysis of this data

      that use different methods and models to find factor settings

      that will best match the reference curve.

      In the first analysis,

      I'm going to summarize each of the DoE curves

      down to a single metric called F2

      that is typically used in dissolution curve analysis,

      a measure of agreement with the reference match.

      There, I'll use standard DoE methods to model that F2 response

      and then find the factor settings that are predicted

      to best agree with the reference.

      In the second analysis,

      I'll use a functional DoE modeling approach

      where I model the curves using these blinds,

      extract functional principle component stores, and model them.

      I'll load the reference batch as a target function

      in the Functional Data Explorer platform

      and then use the FDoE profiler

      to find the closest match recommended by that model.

      These first two approaches use little subject matter information

      about these types of tablets.

      In the third analysis,

      I'll model the curves using a nonlinear model

      that was known to fit this type of tablet well

      and use the Curve DoE option in the fit Curve platform

      to model the relationship between the DoE factors

      and the shape of the curve.

      I want to credit Clay Barker for adding this capability to JMP Pro 16.

      I think it has a lot of promise for modeling curves

      whose general shape can be assumed to be known in advance

      to come from one of the supported nonlinear models.

      At the end, verification batches were made

      using the recommended formulation settings for each of the three analyses,

      and we compared them to a new reference batch.

      What we found was that the nonlinear regression-based approach

      led to the closest match to the reference.

      What we see here is a scatter plot matrix of the four factors

      in the designed experiment.

      There was a mixture constraint between the two polymers,

      as well as a constraint on the total amount of polymer

      and the proportions of the individual polymers.

      Here's a look at some of the raw data from the experiment.

      At the top of the table, we have data from the reference that we wish to match.

      There are 16 DoE formulations or batches in the experiment.

      We can only see data from two of them in this picture, though.

      There were six tablets per formulation.

      There were four dissolution measurements per tablet.

      Here we see plots of the dissolution curves for each of the 16 DoE formulations

      with the dissolution curve of the reference batch here at the lower right.

      Now, I'm going to do a quick preliminary information quality assessment

      using the questions that you'll find in the spreadsheet

      that you can download from the JMP user community page.

      The first part of the assessment is related to the data resolution.

      In this case, I think we're looking pretty good.

      The data scale is well aligned

      with the stated goal because it's a design experiment.

      The measuring devices seem to be reliable and precise,

      and the data analysis is definitely going to be suitable

      for the data aggregation level,

      and we'll be illustrating different kinds of data aggregation

      as we extract features from these dissolution curves.

      As far as the data structure goes, we're in pretty good shape.

      The data is certainly aligned with the stated goal,

      we don't have any problems with outliers or missing values,

      and the analysis methods are all suitable for the data structure,

      although we do see some variation

      in the quality of the results depending on the type of analysis we do.

      As far as data integration goes, this is a pretty simple analysis.

      We have multiple responses,

      and we're exploring different ways of combining them into extracted features.

      So there's a common workflow to all three of the analysis I'm going to be showing.

      First, we have to get the data into a form

      that is analyzable by the platform that we're using.

      Then there's around a feature extraction.

      Then we model those features.

      That's where there's a lot of difference between the methods,

      and then we use the profiler in different ways

      to find a formulation that closely matches the reference.

      First, I'm going to go over the F2 analysis.

      F 2 is a standard measure of agreement of a dissolution curve

      relative to a reference dissolution curve.

      In the formula,

      the Rs are the means of the reference curve at each time point,

      and the Ts are the means of the non-reference curve.

      The convention is to say that the two curves are equivalent

      when F2 is greater than or equal to 50.

      It's important to point out

      that I'm including this F 2 based analysis

      not just as an example of a dissolution DoE analysis,

      but more broadly as an example of how reducing a response

      that is inherently a curve down to a single number

      leads to a much lower quality analysis

      and results at the end

      than a procedure that treats curves as first-class citizens.

      So now I'm going to share the F2 analysis of the dissolution DoE data.

      The first thing we have to do is calculate the batch means

      of the dissolution curves at the different time points.

      Then we create a formula column

      that calculates the F2 dissolution curve agreement statistic

      for each of these curves relative to the reference batch,

      and we modeled the F2 using the DoE factors as inputs

      and use the profiler to find the factor settings that match the reference.

      Before the analysis,

      we use the table's summary feature

      to calculate the means of the dissolution measurements

      by batch and across each of the times.

      We can save ourselves a little bit of work by using all of the DoE factors here

      as grouping variables so they'll be carried through

      into the subsequent table.

      Now we have a 17-row data set

      and we hide and exclude the reference batch.

      Now take note of the values of the dissolution means for the reference

      because we're going to use those

      when we create a formula column that calculates the F2 agreement metric

      for each of the batches relative to the reference batch

      and now we're going to be able to use this F2 formula column

      as a response to be modeled.

      We use the model script created by the DoE platform

      to set up our model for us.

      We place F2 as our response variable

      and we're going to analyze this data today

      using the generalized regression platform in JMP Pro.

      When we get into the platform

      we see that it has automatically done a standard lease squares analysis

      because it found that there were enough degrees of freedom

      in the data for it to do so

      and it's given us an AICc of 155.6.

      I'm going to see if we can do better by trying

      a best subsets reduction of the model

      and when we do that we see

      that the AICc of that best subsets goes down to 136

      smaller is better with the AICc

      and the difference of 20 is pretty substantial

      so I would conclude that the normal best subset is a better model

      than the standardly squares one.

      I'm going to try one more thing, though,

      and fit a log normal distribution with best subsets to the data.

      When I do that, the AICc goes down a little bit further to 130.6.

      That's a modest difference, but it's good enough

      that I'm going to conclude that we are going to work

      with the Lo gNormal,

      especially because we know that we're working with a strictly positive response

      and the LogNormal distribution fits data that is strictly positive.

      From there, the analysis is pretty straightforward,

      so I'm going to jump straight ahead to using the profiler.

      F2 is an agreement metric that we want to maximize.

      So we get into the profiler,

      we turn on Desirability functions and have them set to maximize,

      and then maximize desirability to find the combination of factor settings

      that this model says gives us the closest match to the reference,

      and that would be at this combination of factor settings that we see here.

      Now the F2 analysis is complete,

      and we're going to go into the second analysis,

      the functional DoE analysis.

      For this analysis,

      we're going to work with the data in a stacked format

      where all of the dissolution measurements have been combined into a single column,

      and we have a time column as well.

      The first thing we do is go into the Functional Data Explorer platform.

      In the platform launch,

      we put dissolution as our response, time is our X,

      the batch column as our ID,

      and we supply the four DoE factors as supplementary variables.

      Once we're in the platform,

      we take a look at the data using the initial data plot.

      This particular data set doesn't need any clean up or alignment options,

      but we are going to go ahead and load the reference dissolution curve

      as a target function.

      For relatively simple functions like these,

      I typically use B -Splines for my functional model.

      When we do that, we see our B-Spline model fit,

      and the initial fit that has come up is a cubic model that is behaving poorly.

      It's interpolating the data points well,

      but kind of doing crazy things in between them.

      So I'm going to change from the default recommended model

      over to a quadratic Spline model instead of the cubic one.

      We do that by simply clicking on Quadratic over here

      in the right of the B-Spline model fit.

      We'll see that this quadratic model fits the data well.

      A functional principal components analysis is automatically calculated,

      and we see that the Functional Data Explorer platform has found

      three functional principal components.

      The leading one is very dominant,

      explaining 97. 9 percent of the functional variation.

      And it looks like this is a level set up or down kind of shape component.

      The second one looks like a rate component,

      and the third one almost looks like a quadratic.

      Looking a little closer at this quadratic B-Spline model fit,

      we see that this model is fitting the individual dissolution curves

      pretty well.

      So now we're ready to do our functional DoE analysis.

      Each of our individual dissolution curves has been approximated now

      by an underlying mean function common to all the batches

      plus a batch dependent FPC score times the first eigenfunction,

      plus another batch dependent FPC score times the second eigenfunction,

      and so on with the third one.

      What we're going to do is set up individual DoE models

      for each of these functional principal component scores

      as responses using our DoE factors as inputs.

      The Functional Data Explorer platform, of course,

      makes all this simple and kind of ties it up into a bow for us.

      And when I say that,

      it ties it up in a bow for us, what I really mean is the FDoE profiler.

      So this pane here shows our predicted trajectory

      of dissolution as a function of time,

      and then we can see how that trajectory would change

      by altering the DoE factors.

      That relationship with the DoE factors

      comes from these three generalized regression models

      for each of our functional principal component scores.

      If we want, we can open those up

      and we can look at the relationship

      between the DoE factors and that functional principal component score,

      and we could even alter the model

      by moving around to other ones in the solution path.

      I just want to point out that it's possible to change the DoE model

      for an FPC score.

      In the interest of time,

      I'm just going to have to move on and not demonstrate that, though.

      We have diagnostic plots,

      the most important one probably being the Actual by Predicted Plot.

      This has our plotted dissolution measurements on the Y-axis

      and the predicted dissolution values using the functional DoE model.

      And as always, we want to see that plot

      have data points tight along the 45 degree line.

      And in this case, I think this model looks pretty good.

      We don't want to see any patterns in our residuals,

      and I'm not seeing any bad ones here.

      So this model looks pretty good, and we're going to work with it.

      So I've already explained how this pane right here

      represents the predicted dissolution curve as a function of time

      and the individual DoE factors.

      Now, these other two rows here are because

      we've loaded the reference as a target function.

      So this row is the difference

      of the predicted dissolution curve from the target reference curve.

      And then the bottom pane here is the integrated distance

      of the predicted curve from the target.

      When we maximize desirability in this profiler,

      it gives us the combination of factor settings

      that minimize this integrated distance from the target.

      So I'm going to do that by bringing up maximize desirability.

      And now we see the results of the functional DoE analysis,

      where we have identified .725 of polymer A

      275 of polymer B,

      a total polymer of about .17,

      and a compression force of about 1700

      minimize the distance between our predicted curve and the reference.

      Now, we've done two analyses.

      Both of those analyses have recommended that we go

      to the lowest setting of polymer A and the highest setting of polymer B.

      They differ in their recommendations

      for what total polymer amount to use and how much compression force to use.

      The third analysis I'm going to do is the Curve DoE analysis.

      This is going to be structured pretty similar

      in some ways to the functional DoE analysis,

      in that we're going to use the same version of the data

      where dissolution measurements are all in one column

      and we have a time column.

      But we don't have a built-in target function option

      in the Fit Curve platform yet.

      So the first thing we have to do is fit just the reference batch

      and save its prediction formula back to the table.

      Then we do a Curve DoE analysis,

      which is largely similar to a functional DoE analysis

      in that we're extracting features from the curves

      modeling the curves.

      Then we go to the graph profile to find settings that best match the reference.

      The nonlinear model that we're going to be using is

      a three parameter Weibull Growth Curve,

      which has a long history in the analysis of dissolution curves.

      Weibull Growth Curve have an asymptote parameter A

      that represents the value as time goes to infinity.

      There's what's called the inflection point parameter

      that I see is a scaling factor that kind of stretches out

      or squeezes in the entirety of the curve.

      And then there's also a growth rate parameter

      that dictates the shape of the curve.

      What I think is really valuable about using this model

      relative to the functional DoE model or the F2 type analysis

      is that we're going to be modeling features extracted from the data

      that have real scientific meaning,

      especially the asymptote and inflection point parameters.

      Now Curve DoE analysis doesn't have a target matching capability

      like the Functional Data Explorer.

      So we begin the analysis by excluding all of the DoE rows in the data table.

      These are represented with the set column equal to A.

      So I select a cell there,

      select matching cells, and then hide and exclude those rows

      so that I only have the reference batch not excluded.

      Then I go to the Fit Curve platform,

      load it up, get in there, fit the Weibull growth model,

      and then I save that prediction formula back to the table.

      Once we complete the Curve DoE analysis,

      we're going to compare the Curve DoE prediction formula

      to this reference predictor

      to find combinations of the factor settings

      that get us as close to this curve as possible.

      So now we unhide and exclude the DoE batches,

      go back into the Fit Curve platform,

      just like the Functional Data Explorer platform.

      We're going to load up the DoE factors as supplementary variables.

      Now that we're in the platform, we can fit our Weibull growth model.

      The initial fit here looks pretty good.

      Looks like we're capturing the shape of the dissolution curves.

      One thing I like to do next is to make a parameter table.

      This creates a data table with our fitted down the near regression parameters.

      I like to look at these in the distribution platform

      to see if there are outliers in there or anything unusual.

      I also like to look at the patterns I see in the multivariate platform

      just gives you a better sense of what's going on

      with the nonlinear model fit.

      Once we know that everything is looking pretty good,

      we can do our curve DoE analysis

      and this looks very much like the functional DoE analysis from before.

      We have a profiler that shows

      the relationship between dissolution and time

      and how that relationship changes as a function of our DoE factors.

      And then we also have a generalized regression model

      for each of those three parameters that we can take a look at individually.

      The first thing I would do before trying to use the model in any way

      is look at the Actual by Predicted Plot,

      so that's what we see here.

      This is the predicted values incorporating

      both the time model from the mean function and the eigen functions,

      as well as the DoE models on the nonlinear regression parameters.

      This looks pretty good

      because there is a fairly easy interpretation

      for the viable growth model parameters.,

      It can be useful and interesting to open up

      the individual model fits for these parameters.

      For example, here are the coefficients for the inflection point model.

      Because inflection point is a strictly positive quantity,

      a LogNormal best subsets model has been fixed with the data

      by the generalized regression platform.

      We see that the mixture manifests have been forced in

      and that the compression forced by polymer A interaction

      is the only other term in the model.

      What this means is that

      if we hold the polymer proportions constant

      and increase the compression force,

      we would expect a larger value of the inflection point.

      One would observe this as a tablet that takes longer to dissolve,

      which is exactly what we would expect to have happen.

      We can save the curve DoE prediction formula back to the table

      and we can see in all its gory detail

      how the model for asymptote inflection point

      and growth rate are combined with time to come up with our overall prediction

      for the dissolution curve based on.

      Fortunately, with junk we don't have to look at the formula too closely though,

      because we have profilers that

      let us see the relationships in a visual way

      rather than an algebraic one.

      To solve our problem of finding the combination of factors

      that give us the dissolution curve

      that would be closest to the reference,

      I created a formula column that calculated

      the percentage difference of the predicted curve,

      taking into consideration the DoE factors from the reference.

      The last step of the analysis is

      to bring up this percent difference response

      in the profiler that is under the graph menu,

      being sure to check the Expand Intermediate formulas option.

      This led to a profiler where we're able to see the percent difference

      from the reference as a function of time and the DoE factors,

      I've shared the region where the difference

      is less than one percent in green.

      By manually adjusting the factors,

      I was able to find settings where the predicted curve

      is less than one percent from the reference across all time values.

      This looks really good, but in practice I bet that this is overly optimistic.

      Here we see the optimal values of the factor settings for all three analyses.

      The curve DoE analysis is in the interior of the range for the polymers.

      The optimal value for total polymer is . 16

      which is close to the functional DoE analysis result,

      and compression force is in between the optimal values

      recommended by the F2 analysis and the functional DoE analysis.

      After this, we made new formulations based on the recommended factor settings

      from each of these models and measured their dissolution curves

      as well as took a new set of measurements from the reference.

      Here we see a summary of the final results from the verification runs.

      The new reference distribution curve is in black,

      and the curve DoE in green is the closest curve to it,

      followed by the F DoE curve in blue.

      The result of modeling F2 is in red, and it did the poorest overall.

      This should perhaps not be too surprising.

      The F2 metric was the simplest,

      reducing the data down to a single metric and did the poorest.

      The functional DoE model had to empirically

      derive the shapes of the curves

      and then model three features of those shapes,

      essentially using more of the information in the data.

      The curve DoE led to the best formulation

      because it used the data efficiently via some prior knowledge

      about the parametric form of the dissolution curves.

      We see that the results of the F2 based analysis

      are not equivalent with the new reference patch,

      while the approaches that treated curves as first class objects are equivalent.

      What this means is that the approach would have required

      at least another round of DoE runs,

      and so an inefficient analysis leads to an inefficient use

      of time and resources.

      I'm going to close the presentation

      with a retrospective Info Q Assessment of the Results.

      Overall, we found that the curved DoE prediction

      generalized the best to new data,

      but was the most difficult analysis to perform.

      I want to note that if we didn't have a known nonlinear model

      to work with that fit the data well, we could not have done that analysis.

      The functional DoE analysis and the F 2 based approach

      can be used more broadly in other situations.

      The profiler leads to excellent communication scores

      for all three analyses.

      The ability to see how the shape of the dissolution curve changes

      with the DoE factors in the functional and curve-based approaches

      leads me to give them a better communication score.

      I see the curve DoE approach

      having the highest communication score by a little bit

      because we're directly modeling more meaningful parameters

      than the functional DoE approach.

      That's all we have for you today.

      I want to thank you for your time, interest, and attention.



      0 Kudos