Dear Dr. DOE, isn’t Discrete Numeric just Categorical?

7 Kudos

Dr. DOE describes discrete numeric factors, and why many people who could use it aren't

Full Transcript (Automatically Generated)

Hi, everybody. Before I get started, please I'm gonna encourage you to go over to the segment area of the onic jmp on air and please love to know what do questions you have. Okay, so let's get to the question we have today. question we have today is, is discrete numeric the same thing as categorical? Aren't they interchangeable, and while they might look the same, and under the right circumstances, under the right conditions, you can fit equivalent models, they are definitely not the same. In fact, if I force a discrete numeric variable to be categorical, I might be asking for more observations that I really, really need. Vice versa. If I take a categorical variable, and I treat it as discrete numeric, I might have too few real resources to get the information that I want.

The main underlying difference the is the relationship That the variable the factor has to my response with discrete numeric, despite there being discrete groups, discrete levels that I'm looking at, I assume that there's still an underlying continuous relationship that even though I might not be able to observe values between my levels, they still exist theoretically. And I could still build that relationship. Now the important aspect about that is I can create a model. When I fit to my daily data, I can create a model that's much more efficient that might require fewer observations. Okay, so let me let me demonstrate that let me show some of that visually, so you can see what I'm talking about.

Okay, so I'm going to start visually, and I've got the same data and make it very simple, we're going to look at one factor. And we're going to have five levels in here. Here's the data as if I were to visualize it being continuous.

Here, I visualize it as being categorical. Again, same data, same levels. On the left hand side, that's continuous thinking about maybe as different concentrations that I'm looking at. And on the right hand side, it's categorical. Think of that as some sort of discrete value, something like shift, maybe supplier, maybe detergent type, and so on. Now, as I'd mentioned, the important aspect about a discrete continuous variable is there's a there's a continuous relationship underlying that, that if I look at the data, on the left hand side, I can say Well, what I have there is a line and really knowing a little bit about math, I don't need all five of those levels. I can estimate made that line with just two of the levels. In the case of it, if I were to design this as DLP, I get the high and the low level. So I can I can take those, even if I design my experiment with those additional levels, I can take those levels that I created those those two, three and four, and I could repurpose them for something else. So I can take them, for example, to create a better estimate of my experimental noise.

So the important thing here is that I don't need if I make certain assumptions about the underlying relationship between my discrete numeric variable and my response, I don't necessarily need all those observations, much different in the case on the right and the case on the right when I'm dealing with discrete categorical values. I need all five of those levels. I lose one of those five levels, I've lost information. Okay, now, the nice thing about discrete numeric as with any numeric variable, is that

I don't necessarily have to have a straight line again, On the right, I've got categorical variables, I need all five levels on the left, I can probably fit a reasonably good model looking at just three of the levels. So again, those of you who've had experience in designing and analyzing experiment experiments knowing I could fit some curvature by looking at, let's say, a higher low setting and then something in the middle. So three observations, where I start to have problems where that discrete numeric starts to benefit from the additional additional levels that start it starts to look more and more like a categorical variable is when I run into some of these. Think of these is almost as if edge cases these are when my model, the typical model that I think with a Doa doesn't work as well as in this first example I have data that is asymptotic. So sometimes when I have asked him topic data, you know, your typical DLP model might not fit might not fit the data quite as well. However, the nice thing is I can always try. It's from the standpoint of practicality. It's I don't have to have the right model. I don't have to have the perfect model. I just have to have a model that fits well enough to answer my my question.

So that's the first case, the second case where I might have potential problems with trying to fit your standard standard, underlying continuous models when I run into some sort of edge condition where I'm where I'm falling off a cliff or running into a cliff. And again, I've got I've got this, this this drop off, that's very precipitous, okay. Again, this is something that if I were to have designed this as discrete numeric, I can always try to fit that model, and it will, it will fit it fit a model. The other nice benefit is that with discrete numeric, given enough resources, I can always default to what's a categorical equivalent model. Alright, So let's do this. Let's actually jmp into the custom designer to take a look at discrete numerical, and categorical. And let's, let's contrast that with just straight continuous. Okay, this I think that would that makes a very nice bridge to show you. What is how are these two concepts linked. Alright, so here I've got my continuous we'll stick that in the back here I've got my categorical variable, and my discrete numeric variable. Let me go ahead and click the Continue button. A couple of things that you might notice is one is that with discrete numeric I get a lot more I get a lot more components to the model. I've got a squared term, I've got a cubic and a power to the fourth term in the model. Okay. You'll also notice that I've got fewer runs than I have, in the case of the categorical in the case of the categorical I only Have one factor, but there are five implicit levels there. So, as you see right off the bat that not only the minimum number of runs, but the default number of runs are much smaller for the discrete numeric. Now, this gets back to what I showed you earlier, by default, in the case of a discrete numeric variable, all I really need to have on the extreme is some measure of my intercept, some measure of my slope.

All right, so so those are the two. So I'll need two points at a minimum, to measure that line to get the slope of that line. If I ask for more, I'll be able to get some of those additional if possible, factors into my model. With the with the categorical, I'm out of luck, I need to have at least five observation. Preferably more to be able to get some estimate of error.

All right, so let's actually let me relate that back to two To the continuous case, this is not discrete numeric is just a continuous variable. How does this relate? How does this tie these two concepts together? So what I'm going to do with a discrete numeric, without continuous is I'm going to build the same model that we saw. In discrete numeric. I'm going to add a squared term, I'm going to add a tertiary and a power to the fourth term.

When I do that, you'll notice that what I have is models or or or do E's that that gives me models that are requiring the same resources. This is what I said earlier that that I can finish you know the exactly, if I were to fit my my data to these two models, I would get different but equivalent models. Okay, so this kind of underlies I'm really looking at the same information. I'm looking at the information slightly different ways. Now how does this relate back to district numeric variables so if you'll know if you remember with a discrete numeric three of those effects that

I put in the model were considered as if possible. So again, if I were to just right click and say if possible, now I'm back to my discrete numeric variable.

Okay, so that's the relationship and going from something that's discrete numeric and categorical and and relating the two with one another. Okay, so let me clean up my space a bit.

So why would I Why do I bother with discrete numeric in the first place? Well, we've seen one potential use of discrete numeric and that is the case where, you know I I hope the data is is fairly well behaved that it's that it's that it's that's linear or maybe has some curvature. But I'm not, I'm not really sure if I've come to the edge. So I want to have those additional data points in there. Just Just in case. There's another case and this is this is I think this is a really nice result of why discrete numeric variables are in there. From from the standpoint from a historical standpoint, discrete numeric was put in there one of the one of the reasons discrete numeric was put into the Dewey platform back in jmpers and 10. was because the way you would get that third level and additional levels into a design is you would put a squared term or a cubic term in the model. And unless you were a mathematician or statistician, or you had a lot of experience with DLP, you might not have realized that. So in version 10, the discrete numeric variable was added so that people can say well, I want I want a third level, I want a fourth level or additional levels in my design, and

Give them to me if possible. Now one of the one of the nice things, one of the nice results of having discrete numeric is that prior to jmp version 10, before discrete numeric, when I asked for that, let's say third level, it would always put it in the center of my design. And the reason I would do that is that the custom designers based on optimality criteria, and if I were to ask for a set of variables, for a set of factors, if I were to ask for a third level, the mode the optimal placeto put that is right in the center of the design. Well, what happens when I can't run that observation to center my design? Or what if my standard conditions the conditions I want to test against were slightly off center? How I mean, how would I how would I get that data point into my model and pay from jmpers antenna and the answer is I use discrete numeric. Right. So let me I've got a design that I have started. So let me, let me pull that up.

All right, let me go to my custom designer.

So here's, here's my customers I work again, we're gonna keep this simple, but this extends to as many discrete numeric variables. If you want, it can, I can include categorical and continuous variables in here, but we're going to focus on using this to get a not so center center point. So I've got two variables, I've got a and b, they're both discrete numeric I want to run a in three different levels. And I want to run be in four levels and and to make again, to make things simple, and to relate them back to sort of a standard design, we're going to assume that the high and the low settings for A and B are are in their standard units, negative one to the low and one for the high value. However, I want to assume that my when I run a and b, I don't necessarily run it. At the center, I don't run it at 00, my standard operating conditions are to run it at a equals point two, five, and be equal to point five. So the way I do that is I can specify those values in my, in my design. And addition, I might want to run let's say a point at B equals zero as well. So that again, the nice the flexibility, the great thing about the discrete numeric is I can do that I can set those values to wherever I can run my or wherever I would like to run my experiment. All right. Now if I were to I'm just going to stick with the with the defaults in terms of the model. I'm going to increase the number of runs just so we can I can cover my design my space a little bit better. So what what I wind up getting, what I wind up getting in those cases is you'll notice that if I were to if I were to ask for At three levels be at at four levels, given those given those fact factor settings, I would have this kind of design. Again, my idea here is that this is typically I might be running my, my, my, my standard operating conditions somewhere around here, but I can fill that space. However, however I would like to. Okay, so probably a good place to take a break, ask Are there any Are there any questions? Is there any questions? I'm not seeing any on the on the q&a window, but certainly, I've got about five minutes left. I would make sure if you've got questions, please, please ask them. You always have the opportunity to go to the segment area on the jmp community if you have questions after the episode or if you have some additional questions. Don, would you like a two minute hold for everyone to ask him question? For us, Sure, let's do that. Great.

Back to you done. Okay, thank you. Thank you, Julian, I've got a couple of questions. So So let me start with the first one. And that's just to summarize the relationship, the difference between regular, continuous and discrete in America. And the big the way I like to think of it as discrete numeric is a special case of continuous where, where I am forced to run my design at particular levels, because, I mean, that's only that's all that's available, or it's all I have time or the resources to run. As it turns out, I can always take that continuous variable. And I can set up my model in such a way to get what the discrete numeric would have given me, the discrete numerical is an easier way to do it. It sets up a lot of the things I would have otherwise had to do by hand. So really there again, discrete numeric is a special case of continuous. Second question, I'm probably going to have to love to take that offline. Do I avoid trials based on typical do UI design I'm not really exactly sure what you mean by avoiding trials. And the example that they gave was box bank and designs. Now, I mean, there's, that certainly will take us far afield. Maybe that's a good next episode is what do I have when I've got restrictions on my design space? What if I can't run a particular corner, or I have certain combinations that don't work, but certainly I want to encourage that person to, you know, hunt me out in the community iba and I can add a little bit more in terms of what you mean by avoiding trials. Do I have examples of a completed design and analysis using the cons? I do, but I would have it would take me far too long to find it. And to show it here. It happens it happens a lot. I the situation I remember in the past where I could have used something like this is where I had a very nonlinear response in the sense that we were looking at x ray damage relative to prices. sees relative to the amount of dosing and it was it was a very asymptotic. So I probably would have used discrete numeric to find out where where does that where it's kind of the band and the hockey stick there, I'd want to know where so it was with a case where one dose caused some damage and two was just absolutely too much for everything. So it would have been nice to be able to space those out using something like discrete numeric.

Okay. Some says, say prefer categorical over discrete numerical jmp. Again, again, discrete, very categorical is different than discrete numeric categorical, I can't get in between the levels they just they don't they're not defined discrete numeric. I'm making the relationship. I'm believing that there's a continuous relationship between those discrete numeric variables and my response or my responses. So they really are two different slightly slightly different but but but importantly, different animals. Okay, looks like I'm at the top of the hour. I don't want to I want to leave folks. The next Person enough time, so I'm going to have to take it. Back to you, Julian

Hi, everybody. Before I get started, please I'm gonna encourage you to go over to the segment area of the onic jmp on air and please love to know what do questions you have. Okay, so let's get to the question we have today. question we have today is, is discrete numeric the same thing as categorical? Aren't they interchangeable, and while they might look the same, and under the right circumstances, under the right conditions, you can fit equivalent models, they are definitely not the same. In fact, if I force a discrete numeric variable to be categorical, I might be asking for more observations that I really, really need. Vice versa. If I take a categorical variable, and I treat it as discrete numeric, I might have too few real resources to get the information that I want. The main underlying difference the is the relationship That the variable the factor has to my response with discrete numeric, despite there being discrete groups, discrete levels that I'm looking at, I assume that there's still an underlying continuous relationship that even though I might not be able to observe values between my levels, they still exist theoretically. And I could still build that relationship. Now the important aspect about that is I can create a model. When I fit to my daily data, I can create a model that's much more efficient that might require fewer observations. Okay, so let me let me demonstrate that let me show some of that visually, so you can see what I'm talking about. Okay, so I'm going to start visually, and I've got the same data and make it very simple, we're going to look at one factor. And we're going to have five levels in here. Here's the data as if I were to visualize it being continuous. Here, I visualize it as being categorical. Again, same data, same levels. On the left hand side, that's continuous thinking about maybe as different concentrations that I'm looking at. And on the right hand side, it's categorical. Think of that as some sort of discrete value, something like shift, maybe supplier, maybe detergent type, and so on. Now, as I'd mentioned, the important aspect about a discrete continuous variable is there's a there's a continuous relationship underlying that, that if I look at the data, on the left hand side, I can say Well, what I have there is a line and really knowing a little bit about math, I don't need all five of those levels. I can estimate made that line with just two of the levels. In the case of it, if I were to design this as DLP, I get the high and the low level. So I can I can take those, even if I design my experiment with those additional levels, I can take those levels that I created those those two, three and four, and I could repurpose them for something else. So I can take them, for example, to create a better estimate of my experimental noise. So the important thing here is that I don't need if I make certain assumptions about the underlying relationship between my discrete numeric variable and my response, I don't necessarily need all those observations, much different in the case on the right and the case on the right when I'm dealing with discrete categorical values. I need all five of those levels. I lose one of those five levels, I've lost information. Okay, now, the nice thing about discrete numeric as with any numeric variable, is that I don't necessarily have to have a straight line again, On the right, I've got categorical variables, I need all five levels on the left, I can probably fit a reasonably good model looking at just three of the levels. So again, those of you who've had experience in designing and analyzing experiment experiments knowing I could fit some curvature by looking at, let's say, a higher low setting and then something in the middle. So three observations, where I start to have problems where that discrete numeric starts to benefit from the additional additional levels that start it starts to look more and more like a categorical variable is when I run into some of these. Think of these is almost as if edge cases these are when my model, the typical model that I think with a Doa doesn't work as well as in this first example I have data that is asymptotic. So sometimes when I have asked him topic data, you know, your typical DLP model might not fit might not fit the data quite as well. However, the nice thing is I can always try. It's from the standpoint of practicality. It's I don't have to have the right model. I don't have to have the perfect model. I just have to have a model that fits well enough to answer my my question. So that's the first case, the second case where I might have potential problems with trying to fit your standard standard, underlying continuous models when I run into some sort of edge condition where I'm where I'm falling off a cliff or running into a cliff. And again, I've got I've got this, this this drop off, that's very precipitous, okay. Again, this is something that if I were to have designed this as discrete numeric, I can always try to fit that model, and it will, it will fit it fit a model. The other nice benefit is that with discrete numeric, given enough resources, I can always default to what's a categorical equivalent model. Alright, So let's do this. Let's actually jmp into the custom designer to take a look at discrete numerical, and categorical. And let's, let's contrast that with just straight continuous. Okay, this I think that would that makes a very nice bridge to show you. What is how are these two concepts linked. Alright, so here I've got my continuous we'll stick that in the back here I've got my categorical variable, and my discrete numeric variable. Let me go ahead and click the Continue button. A couple of things that you might notice is one is that with discrete numeric I get a lot more I get a lot more components to the model. I've got a squared term, I've got a cubic and a power to the fourth term in the model. Okay. You'll also notice that I've got fewer runs than I have, in the case of the categorical in the case of the categorical I only Have one factor, but there are five implicit levels there. So, as you see right off the bat that not only the minimum number of runs, but the default number of runs are much smaller for the discrete numeric. Now, this gets back to what I showed you earlier, by default, in the case of a discrete numeric variable, all I really need to have on the extreme is some measure of my intercept, some measure of my slope. All right, so so those are the two. So I'll need two points at a minimum, to measure that line to get the slope of that line. If I ask for more, I'll be able to get some of those additional if possible, factors into my model. With the with the categorical, I'm out of luck, I need to have at least five observation. Preferably more to be able to get some estimate of error. All right, so let's actually let me relate that back to two To the continuous case, this is not discrete numeric is just a continuous variable. How does this relate? How does this tie these two concepts together? So what I'm going to do with a discrete numeric, without continuous is I'm going to build the same model that we saw. In discrete numeric. I'm going to add a squared term, I'm going to add a tertiary and a power to the fourth term. When I do that, you'll notice that what I have is models or or or do E's that that gives me models that are requiring the same resources. This is what I said earlier that that I can finish you know the exactly, if I were to fit my my data to these two models, I would get different but equivalent models. Okay, so this kind of underlies I'm really looking at the same information. I'm looking at the information slightly different ways. Now how does this relate back to district numeric variables so if you'll know if you remember with a discrete numeric three of those effects that I put in the model were considered as if possible. So again, if I were to just right click and say if possible, now I'm back to my discrete numeric variable. Okay, so that's the relationship and going from something that's discrete numeric and categorical and and relating the two with one another. Okay, so let me clean up my space a bit. So why would I Why do I bother with discrete numeric in the first place? Well, we've seen one potential use of discrete numeric and that is the case where, you know I I hope the data is is fairly well behaved that it's that it's that it's that's linear or maybe has some curvature. But I'm not, I'm not really sure if I've come to the edge. So I want to have those additional data points in there. Just Just in case. There's another case and this is this is I think this is a really nice result of why discrete numeric variables are in there. From from the standpoint from a historical standpoint, discrete numeric was put in there one of the one of the reasons discrete numeric was put into the Dewey platform back in jmpers and 10. was because the way you would get that third level and additional levels into a design is you would put a squared term or a cubic term in the model. And unless you were a mathematician or statistician, or you had a lot of experience with DLP, you might not have realized that. So in version 10, the discrete numeric variable was added so that people can say well, I want I want a third level, I want a fourth level or additional levels in my design, and Give them to me if possible. Now one of the one of the nice things, one of the nice results of having discrete numeric is that prior to jmp version 10, before discrete numeric, when I asked for that, let's say third level, it would always put it in the center of my design. And the reason I would do that is that the custom designers based on optimality criteria, and if I were to ask for a set of variables, for a set of factors, if I were to ask for a third level, the mode the optimal placeto put that is right in the center of the design. Well, what happens when I can't run that observation to center my design? Or what if my standard conditions the conditions I want to test against were slightly off center? How I mean, how would I how would I get that data point into my model and pay from jmpers antenna and the answer is I use discrete numeric. Right. So let me I've got a design that I have started. So let me, let me pull that up. All right, let me go to my custom designer. So here's, here's my customers I work again, we're gonna keep this simple, but this extends to as many discrete numeric variables. If you want, it can, I can include categorical and continuous variables in here, but we're going to focus on using this to get a not so center center point. So I've got two variables, I've got a and b, they're both discrete numeric I want to run a in three different levels. And I want to run be in four levels and and to make again, to make things simple, and to relate them back to sort of a standard design, we're going to assume that the high and the low settings for A and B are are in their standard units, negative one to the low and one for the high value. However, I want to assume that my when I run a and b, I don't necessarily run it. At the center, I don't run it at 00, my standard operating conditions are to run it at a equals point two, five, and be equal to point five. So the way I do that is I can specify those values in my, in my design. And addition, I might want to run let's say a point at B equals zero as well. So that again, the nice the flexibility, the great thing about the discrete numeric is I can do that I can set those values to wherever I can run my or wherever I would like to run my experiment. All right. Now if I were to I'm just going to stick with the with the defaults in terms of the model. I'm going to increase the number of runs just so we can I can cover my design my space a little bit better. So what what I wind up getting, what I wind up getting in those cases is you'll notice that if I were to if I were to ask for At three levels be at at four levels, given those given those fact factor settings, I would have this kind of design. Again, my idea here is that this is typically I might be running my, my, my, my standard operating conditions somewhere around here, but I can fill that space. However, however I would like to. Okay, so probably a good place to take a break, ask Are there any Are there any questions? Is there any questions? I'm not seeing any on the on the q&a window, but certainly, I've got about five minutes left. I would make sure if you've got questions, please, please ask them. You always have the opportunity to go to the segment area on the jmp community if you have questions after the episode or if you have some additional questions. Don, would you like a two minute hold for everyone to ask him question? For us, Sure, let's do that. Great. Back to you done. Okay, thank you. Thank you, Julian, I've got a couple of questions. So So let me start with the first one. And that's just to summarize the relationship, the difference between regular, continuous and discrete in America. And the big the way I like to think of it as discrete numeric is a special case of continuous where, where I am forced to run my design at particular levels, because, I mean, that's only that's all that's available, or it's all I have time or the resources to run. As it turns out, I can always take that continuous variable. And I can set up my model in such a way to get what the discrete numeric would have given me, the discrete numerical is an easier way to do it. It sets up a lot of the things I would have otherwise had to do by hand. So really there again, discrete numeric is a special case of continuous. Second question, I'm probably going to have to love to take that offline. Do I avoid trials based on typical do UI design I'm not really exactly sure what you mean by avoiding trials. And the example that they gave was box bank and designs. Now, I mean, there's, that certainly will take us far afield. Maybe that's a good next episode is what do I have when I've got restrictions on my design space? What if I can't run a particular corner, or I have certain combinations that don't work, but certainly I want to encourage that person to, you know, hunt me out in the community iba and I can add a little bit more in terms of what you mean by avoiding trials. Do I have examples of a completed design and analysis using the cons? I do, but I would have it would take me far too long to find it. And to show it here. It happens it happens a lot. I the situation I remember in the past where I could have used something like this is where I had a very nonlinear response in the sense that we were looking at x ray damage relative to prices. sees relative to the amount of dosing and it was it was a very asymptotic. So I probably would have used discrete numeric to find out where where does that where it's kind of the band and the hockey stick there, I'd want to know where so it was with a case where one dose caused some damage and two was just absolutely too much for everything. So it would have been nice to be able to space those out using something like discrete numeric. Okay. Some says, say prefer categorical over discrete numerical jmp. Again, again, discrete, very categorical is different than discrete numeric categorical, I can't get in between the levels they just they don't they're not defined discrete numeric. I'm making the relationship. I'm believing that there's a continuous relationship between those discrete numeric variables and my response or my responses. So they really are two different slightly slightly different but but but importantly, different animals. Okay, looks like I'm at the top of the hour. I don't want to I want to leave folks. The next Person enough time, so I'm going to have to take it. Back to you, Julian

PatrickGiuliano · ‎04-15-2020

@DonMcCormack, Thank you for this wonderful contribution! Look forward to future posts and JMP On-Air Sessions. I'd be interested in getting a focused mini-dive into classical designs (e.g. resolution III, IV) versus DSD (advantages and disadvantages and how to run them). Maybe this topic would be too broad?

How about this: "What designs should I consider when most of my predictors are two-level categorical factors (e.g. pass/fail)?" How about if all of my predictors are two- and/or three-level categorical factors?

Similar to this focused session where you show the concepts and the implementation in JMP with very simple and generic data structure. I actually find this more useful sometimes than example data b/c I can get lost in the specific problem context.

Thanks again.

gchesterton · ‎09-24-2020

Quite helpful and relevant to a post I just posted on the JMP community discussion board.

JMP On Air