Choose Language Hide Translation Bar
mhembree
Level I

Using JMP Prediction Profiler to Assess Optimal SNP-Assay Allelic Discrimination in Corn Samples (2021-US-EPO-843)

Level: Intermediate

 

Madisun Hembree, Engineer II, Product Development, Thermo Fisher Scientific

 

Since the field of in vitro diagnostics (IVD) is ever-evolving, new technologies in the field require a process by which to measure its benefit. Optimization requires output variables to be defined and their hypothesized key factor variables to be tested. Experiments can be run under conditions where the defined factor variables lead to the most desirable conditions of the output variables. Single nucleotide polymorphism (SNP) genotyping assays, one example of an IVD, detect and distinguish SNP variations in a sample. Optimization for these assays requires the discrimination between allele types to be maximized. Before new formulations or process changes, it is important to establish baseline allelic discrimination patterns with all control materials.

We can use JMP’s Fit Model, Prediction Profiler, and Simulator to create a toolset to quantify the desirability of different conditions for multiple key factor variables. This study aims to create a toolset that considers the key factor variables of corn kernel type and SNP-assay type to characterize the effect on three output variables: call rate, cluster spread, and cluster to NTC distance. This toolset can then be added to validate new formulations or process changes in the future.

 

 

Auto-generated transcript...

 


Speaker

Transcript

Jason Wiggins, JMP hey Madison.
Madisun Hembree Sorry, I was panicking.
  I thought it was my fault, I was like oh no.
Jason Wiggins, JMP No it's so funny that like to do these to set these up there's a list of I don't know, maybe 20 different things that you have to get right.
Madisun Hembree In order for everything we.
Jason Wiggins, JMP Are and like you know, the more steps you have an assistant well, the more options, there are a mess it up.
Madisun Hembree yeah.
Jason Wiggins, JMP yeah so anyway, I think I think we're good I see you.
Madisun Hembree perfect.
Jason Wiggins, JMP yeah it doesn't look like there any weird camera artifacts.
  I would say, if there was a way to dim the light in the background that could be helpful, but.
  Exactly to.
  This did Oh, I know what's going on.
Madisun Hembree Is that too dark so.
Jason Wiggins, JMP No, I think that's just fine.
Madisun Hembree Okay.
Jason Wiggins, JMP that's better yeah that's that's better.
Madisun Hembree So, like, I have the sunlight and I just turned off the upper light.
Jason Wiggins, JMP Are you signing in from your office.
Madisun Hembree yeah I signed them from my office, I tried to there are too many people working from home.
  I am, and I was, like everyone has a meeting at the same time, I was like I'll grab a conference room that we're in right.
Jason Wiggins, JMP Now perfect all right, well before we get started, one of my tasks is to remind you that this recording will be posted online in the Community discovery page and is that OK, with you.
Madisun Hembree Yes, I got all of the materials approved by my legal team so perfect.
Jason Wiggins, JMP Great I just noticed that we are actually recording but our conversation will be edited out.
  Okay, and you are ready to go whenever you would like.
Madisun Hembree Okay, then I'll just pull it up on my and let's see.
  That opens.
Madisun Hembree mode.
  Lee OK.
Jason Wiggins, JMP OK, I see your screen you're up.
Madisun Hembree To over do you do you see that, or does the faces are my face kind of blur any Gray box out of the presentation.
Jason Wiggins, JMP It doesn't look like it on my end no.
  And I'm pretty sure this is what people viewing the video will see so if.
  you're pregnant and I see the.
  Your poster.
Madisun Hembree Okay.
  Let me take a step by step.
  So I will just.
  have it.
  Okay, then so I'll just do a quick introduction myself and I'll start with the presentation.
  I think that'll be good.
  Okay.
  Okay.
Madisun Hembree Hi guys, my name is Madisun Hembree and I am a product development engineer at Thermo Fisher Scientific and I work in their genetics science division.
  And today I'm going to be talking about how I use the JMP prediction profiler to assess optimal SNP-assay allelic discrimination in corn samples. So to give you a little introduction first,
  when I am determining my assay performance, or you think about it, really any performance of any system, there are two types of variables that need to be defined.
  First are your key factor variables. These are going to be the variables that you're going to be changing, you're going to test different conditions of.
  And in my case, I'm going to be testing different SNP-assay type that are looking for different genes and run them across multiple different corn kernel types.
  And the second type of variable are your output variables. These are going to be your variables that you're going to use to measure which combination of your key factor variables
  are optimal based off of your user-defined needs. So in my case, I'm going to be measuring call rate, cluster spread, and cluster to NTC distance of my data, which
  I'll talk a little bit more about what those metrics mean in the next section. So this poster is going to be looking at my benchmark experiments that I use to really baseline
  the assay's performance of my starting materials. And since I have a few different types of output variables that are going to be weighed a little bit differently, it's hard to do that by hand. So using JMP prediction profiler you can set
  a desirability function that will take into account for user-defined needs, and it'll help you mathematically create a desirability ranking based off of your materials. Software JMP is going to be able to help us a lot in this.
  So next, talking a little bit about what a SNP-assay is.
  We have DNA; it holds all of our genetic information.
  And the most common form of genetic variation is in a SNP. A SNP is a single nucleotide polymorphism, where a single DNA block, which is a nucleotide,
  is swapped out. And when we are running SNP assays, we're looking to genotype
  our samples. And when you're looking at that data, you're looking at the graph similar to what we see on the right, which is an allelic discrimination plot. So our ability to be able to genotype those into different categories
  is pertinent in us being able to get the information we need from that sample. And so, in my experiment, I want to be able to determine that the different combinations of my key factor variables are giving me
  an optimal discrimination plot. And to do that, I'm going to use three key metrics. The first is call rate.
  So the call rate is the percentage of successful genotype calls per passing SNP.
  For that I want that percentage to be really high, so I want to maximize that, and I want it to be between about 80 to 100% of us being have successfully genotype our set of samples.
  Second, I'm going to go over is the cluster spread, so that's here we see in the red category
  over here. And that is the standard deviation of distance between each data points to the center of its cluster. We want that number
  to be minimized. We want that number to be really small so that we're able to discern between the different groups you see can see in the red green and the blue.
  The smaller it is, the better it is for us to be able to tell the difference between those and successfully genotype our samples.
  Lastly, is the cluster to NTC distance. This is the mean distance of all of the data points in the cluster to the NTC, which is our no template control.
  So we want to make sure that our genotype cluster is far away from the control that has no sample but up to a certain point, the farther away that cluster is
  doesn't add as much benefit, so the desirability of that distance isn't really a linear relationship. And so something that JMP is going to be able to do is
  weigh the importance of each of these variables. We want the call rate and the cluster spread to be held more importantly when we decide how desirable our combinations are than the cluster to NTC distance. It's still important, but not as important as the other two variables.
  So going back, I ran my experiments. I used my SNP assays on all my different corn kernels. I ran them
  on a PCR. I was able to pull that data, and from that data I manually calculated the call rate, cluster spread, and cluster to NTC distance.
  Of all the different combinations of my key factor variable, I'm going to show you guys...I input that into a JMP table and I'm going to give you guys a little demo on how I use the prediction profiler to generate a desirability function.
  So, as you can see here, I have both of my key factor variables, as well as all my output variables. What I'm gonna do is I'm going go to analyze and I'm going to click fit model.
  From there I can take all of my output variables and put that in the Y category, and then take all of my key factor variables and add that into my model effects. And since I'm really only interested in the prediction profiler, I'm going to hit minimal reporting, just so I don't see as many graphs.
  When I run this, pull up the prediction profiler, you go under the least squares fit and you click the profiler to populate that at the bottom.
  So this is just kind of graphing all of our results. You'll see all of our output variables on the Y axis and all of our key factor variables below. So the next step is, we want to set our desirability function.
  To do that, we've go under optimization and desireability and you'll see JMP autopopulates these desirability curves on the right. So JMP will automatically set those all to maximize and you'll see this desirability
  kind of mapping below. So what you want to do is be able to set those, based off of what our defined needs were. So first we talked about the call rate.
  Like I said before, I want to maximize that. I really want it to be showing higher desirability between 80 to 100%.
  You'll see that I'll set my high setting really desirable. The desirability is between zero and one, so I want 100% to be the most desirable.
  And I'll set the middle to 80% and I want to leave that so that, as it goes below 80%, it's
  a lot more undesirable. That desirability curve is not going to be a linear relationship. We want it to stay between that 80 to
  100%. I'm going to leave that importance as 1 and we'll see a little bit later on how to do that.
  So next is our cluster spread. And this we talked about, we really want to minimize this, and right now what
  those numbers are on the left for the values are just going to be the ranges of what I have in my experimental data. So I'm just going to
  normalize them a little bit, make them a little bit nicer from zero to two. And this, since I just want to minimize that, I'm going to keep it a fairly linear relationship in that desirability curve.
  So I'm going to set it very desirable to be closer to zero as small as it can get and a little bit less going. I'm going to keep it same importance as the call rate.
  And the very last one, this is the cluster to NTC distance that we talked about. This is going to be the one that
  we want to maximize. We have our range of zero to 12 of our distance from what we have in our data, and then I want to go in and I'm going to set some
  desirabilities and saying I want it to be, you know, desirable to be towards 12. As you get to six, it's a little less desirable, but not a lot.
  And as we get down below, decrease it a little bit, but the most important thing is I'm going to take this important and I'm going to decrease it.
  So I'm going to change it to .5, and this will change how the desirability number is calculated, how much it weighs that. So as
  you can see here, it's populated the desirability curve and you can see some of them have a linear relationship, some of them are curved.
  What we want to do is ensure that we hit maximized desirability, because when we do that, you'll see down here for each of those, a desirability number will pop up. And right now it's showing that my corn kernel three,
  with my corn 11 SNP assay will give me the most desirable settings. And so something you can do is save that formula, so we can take a look into
  that formula to see how it is calculated, so I can take that and when I sort it, it's going to need to pop up a new data table. You'll just click continue for it to pop up that data table.
  And once that pops up, what you'll see is a list again, and once I scroll over, you'll see the desirability numbers
  and how they rank them from the highest to the lowest. And something that we'll be able to see is how that number is calculated. You can go into the formula.
  And the most important thing is when we set our importance,
  you can see here under call rate, cluster spread, and cluster to NTC, that number that pops out in the desirability is from all of these factors added together. And you can see that each of those are multiplied by a scaler multiplier.
  And the cluster to NTC, the thing you'll notice, was multiplied by .2, where the other two are multiplied by .4.
  That means when I get this desirability number out for each of my combinations of corn kernel type and SNP assay type, it is taking into account the call rate and the cluster spread
  higher and kind of showing more importance than the cluster to NTC distance.
  So when these are populated, that is the kind of key to this and being able to determine which variable you want to hold higher when calculating this desirability number. And so I have this ranked list, I have these desirability numbers.
  But, you know, why are these useful? So this was a baseline experiment for me to understand my assay performance. When I go on to do any optimization, say for my reaction mixtures, I maybe want to change the sequence in my SNP assay, I want to try to
  optimize maybe my multiplex master mix.
  When doing this, I need to understand what materials I want to use, because I didn't know what genes were going to be in those corn kernels.
  So I now have this list, and I can scroll to the bottom and find ones that are really undesirable. There may be one but don't have that gene at all. I can now remove those based off of that desirabilty ranking.
  There may be ones where have lower desirability where I'm like you know I think I can optimize my assay
  to make this better. So now I can choose from that list which materials I want to use going forward. But doing that by hand would have been,
  to me, pretty impossible. I wouldn't really know how to do that mathematically and how to hold things higher than one another with my output variable.
  So JMP was really quickly able to give me that list to be able to go from when I began to do my optimization process in kind of whatever category I need.
  So that was my poster presentation. I would like to give thanks to my mentors, Joyce Wile and Ferrier Le, for providing me with the materials, as well as providing ongoing support. And I hope you guys enjoyed the presentation. Thank you.