and I'm joined by my colleague, Olivia Lipp incott.
Olivia and I have given a presentation before called What Model When.
If you want to take a closer look at that presentation,
you can take a look at the link in the community post.
But today, we want to talk about something a little bit more.
Yeah, today we want to think about modeling type
and how modeling type impacts the analysis
for each of the four model goals that we talked about previously.
Right, and we're actually going to use the same data.
This is data that we pulled from Redfin that represents
the housing market in the Cincinnati area.
Here we're trying to look
at the price of homes relative to their square footage,
the number of beds, the number of baths,
and so on and so forth.
Previously, we've answered the question, what model when?
It really depends on what model you're going to choose
based on your goal for the analysis.
For segment, we're trying to examine relationships
where there's no intended response;
explain, we're trying to explain a relationship
and look at the underlying factors and how those affect the response;
predict, we're trying to predict future outcomes
or the response in new situations;
and identify, we're trying to find important variables.
Right.
Now let's bring the modeling type into the picture.
Both your responses and your factors can have different modeling types.
In JMP, there are three main modeling types:
continuous, nominal, and ordinal.
Continuous modeling type is represented by this blue triangle icon here,
and this refers to numeric data only.
The nominal modeling type is represented by this red icon,
and this is numeric or character data where values belong to categories,
but the order is not important.
For the ordinal modeling type, it is represented by this green icon,
and this can be either numeric or character data as well.
But in this case, values belong to ordered categories.
When you're doing an analysis in JMP,
you want to make sure you set up the correct modeling type,
because JMP will do the correct model for you,
will do the correct analysis depending on modeling type.
Andrea, I have a game for us to play.
It's called Name That Analysis.
Do you want to play?
Absolutely. I love games.
Awesome.
Here's your first question.
We want to identify
which features of a home are most important
to determining the price.
For example, square footage and number of bathrooms
can explain a large amount of the variation in price,
but other features are less important.
All right, Olivia.
I think you're making this first question easy for me.
Is the answer identify?
Let's see.
Yeah, you're right.
I did make that one a little bit easy to get us going,
but that is identify to find important variables within there.
There's a couple of different places in JMP
where we can use tools to identify if that's our modeling goal.
Under the Analyze menu under Screening, Predictive Modeling and Fit Model,
using tools like Predictor Screening, Bootstrap Forest,
Generalized Regression and Stepwise Selection.
For modeling type,
when we're looking at the goal of identify,
it's not going to affect things much.
JMP is going to do the correct analysis
as long as your modeling types are set appropriately.
We took a look at this and we took both the response and the factors
and changed them from continuous to nominal
and looked at how which factors came up as most important.
While the order of the factors varied, the dominant factors stayed the same.
All right.
It looks like if our goal is to identify important factors,
really, the exact modeling type we're using
isn't impacting things that much, it looks like, Olivia.
Right.
Our conclusions on which variables are important
aren't going to change much based on the modeling type.
All right.
Well, that is good to know.
I have a question for you.
Are you ready?
I'm ready.
All right, here is your question.
Let's say we want to build a model to predict house prices.
This model will be based on many important predictor variables we have in our data.
For example, we want to predict
the price of a house that we want to put on the market.
Which goal do you think we're working with here?
Okay, so it's not like question one where we're trying to see
which factors are most important to predict housing prices.
We're just really trying to get that final housing price prediction.
I'm going to go with predict.
All right, let's see if you're right.
Yes, you are right.
The goal of this analysis is predict.
There's lots of different platforms in JMP where you can build models for prediction.
Within each of those platforms in JMP where you can build the prediction models,
JMP will do the correct analysis for you,
depending on the modeling type of your response.
Here we have a table
of different modeling types for our responses:
continuous, nominal, and ordinal.
For a continuous response,
this is the typical one that we were talking about, right?
We want to predict the price of a home that we're going to put on the market.
Now, when we're building this type of model with a continuous response,
well, we want to know how powerful that model is.
What's the predictive power of that model?
We can use RSquared and the Root Average Squared Error
to diagnose that model.
Now, for a nominal and ordinal model, it's a little bit different.
For a model with a nominal response, we have categories as the response.
In this example, we're looking
at whether or not the price will be over or below $1 million.
That's what we want to predict.
For the ordinal response, here we have an ordered category.
We want to predict whether the price of the house
is going to be low, medium, and high.
For the nominal and ordinal examples,
again, we can look at RSquared and Root Average Squared Error
to evaluate those models.
But there's other things that we can use to evaluate those models,
like the misclassification rate and the area under the ROC curve.
Of course, our favorite tool in JMP
to take a look at our prediction model is the Prediction Profiler.
Let's take a look at the difference between the Prediction Profiler
for the modeling types of our responses.
For the continuous response, we can see that on the Y-axis,
we have the mean prediction plus or minus the confidence interval
given the value of the model factors here on each of the X -axes.
For the nominal and ordinal logistic models,
what we see on the Y -axis
is the probability of the response being in a certain category.
For the nominal logistic model, we have the probability
that the house is either going to be above or below a million dollars.
For this ordinal logistic model,
we can see the probability of having a low, medium, or high price.
Okay, so it sounds like the goal of what we want to predict
is also important when we're talking about that prediction goal,
whether we want to treat price as continuous
and get the predictions of the exact prices out of there,
or if we want to treat it as a category.
Right.
You just need to get
that response variable set up and your data set the correct way,
and then, of course, assign the correct modeling type,
and JMP is going to build the correct model for you.
All right, Andrea.
Are you ready for your next question?
I'm ready. Let's go.
Okay.
We want to quantify
the effect on home prices from additional bedrooms.
For example, on average,
every additional bedroom adds about $ 97,000 to the total home cost.
Adding a bedroom adds $97,000?
Man, Cincinnati is a tough housing market.
That's crazy.
All right, well, so let's see.
What are we trying to do here?
We're trying to quantify the effect here.
I think what we're trying to do is explain
that effect that bedrooms has on the price of a house.
I'm going to say explain.
You're correct.
Yeah, we're trying to describe the relationships.
In explain, we use the parameter estimates taken from the model equation
to quantify those relationships between the factors and the responses.
Typically, we use in JMP under the Fit Model menu location
tools like Standard Least Squares, Logistic and Ordinal Regression,
and Generalized Regression.
Modeling type can really impact
how our factored relationship with the response variable is interpreted.
We took a look, and we were looking
at how does the number of beds affect the housing price?
We changed beds from continuous, to nominal, to ordinal,
and see what that relationship was.
We can see under the continuous, that's where we've got
that every additional bedroom adds $97,000 about to the total home price.
That prediction profiler shows a linear relationship
when we treat beds as continuous.
But when we treat beds as nominal or ordinal,
there's not that straight linear relationship going on.
We see a spike in price for 4-5 bedrooms compared to going from 2-3 bedrooms.
Right.
I see with nominal and ordinal,
the prediction profiler looks almost exactly the same,
so it must be the same model.
However, I'm seeing with the parameter estimates,
they look a little bit different between nominal and ordinal.
What's going on there?
Yeah, so the nominal and ordinal modeling type,
and when we use that within a regression, is treating...
They're coded differently within the regression,
so the parameter estimates are different.
For nominal, that intercept,
we think of that as the mean house price across all the different bedrooms,
and each of those parameter estimates
are how much that number of beds increases or decreases that mean house price.
But for ord inal, because we're looking at order matters,
we think of the intercept as if there are zero bedrooms
and each of those parameter estimates
is the effect of adding an additional bedroom onto the price.
All right.
Modeling type is really going to affect my parameter estimates.
I really need to think about exactly what do I want to explain
as a part of this model when I'm doing this analysis.
Yes.
All right.
Are you ready for the final question, Olivia?
Yeah, bring it on.
All right, here's the question.
Let's say we want to identify groups of homes
that are similar based on a list of possible characteristics.
In other words,
we want to identify market segments based on things like square footage,
location, number of bedrooms, et cetera.
Which goal do you think this is?
I think you're trying to trick me with that identify,
and I'm not going to fall for it.
Okay.
But there are no responses within this question.
I think we're looking at clustering.
I'm going to say segment.
Okay.
Well, you're right, Olivia.
I did try and trick you a little bit because I really wanted to win.
But you're right, that's the key thing here,
is that there are no responses here in this analysis.
We are definitely looking at segment.
When our goal is segment,
we can use a couple of different clustering tools.
We can do Hierarchical Clustering,
K-Means C lustering, or Latent Class Analysis.
It's important to keep in mind that with Hierarchical Clustering,
you can only include...
Sorry, you can include all of the modeling types:
continuous, nominal, and ordinal.
But for K-Means Clustering,
you can only include variables that are continuous.
For Latent Class Analysis,
you can only include nominal or ordinal variables.
In our case here, when we're looking
at the number of bedrooms, lot size, year built, and square feet,
we have a combination of continuous and nominal variables.
Hierarchical Clustering may be the best clustering tool to use in this scenario.
It looks like with that parallel plot with Hierarchical Clustering,
maybe we could call Cluster 6 Amazing Location.
Yes.
If you think a large lot size is an amazing location,
yeah, we can definitely call that segment Amazing Location Homes.
Well, all right, Olivia,
despite me giving you a trick in that last question,
it looks like we ended up with a tie here again.
We'll have to rematch again soon.
Absolutely.
We talked about what model when, and that really,
what model you choose depends on your goal for the analysis,
whether it's segment, explain, predict, or identify.
Yeah, in terms of modeling type, again,
JMP is going to do the correct analysis for you,
especially with your responses.
If you're setting them up with the correct modeling type,
JMP is going to do the correct analysis for you.
If your goal is explain,
you might need to think a little bit about which modeling type to use,
depending on how you want to explain
the effect of something like the number of bedrooms.
Thank you, Olivia.
This is so much fun.
Let's do it again next year.