I'm a Data Engineer at JLL.
I have my Master Degree in Analytics from Texas A&M.
My capstone project is in RFQ Process Optimization
for Pre-engineered Metal Buildings.
I get to know JMP and SAS in school at Texas A&M.
I founded a wonderful tool to do analysis
and perform analysis with that.
But actually, my capstone project is in both in JMP and also Python.
I have done it in both.
Basically, the objective of my project is to predicting the weight
of pre-engineered metal building by considering various geometric
and geographic requirements.
Let's have an introduction to the metal buildings.
Metal buildings, basically pre-engineered metal buildings
are structures that utilize steel frames to support the roof and wall panel.
We can do the customizations on them based on the customer's requirement.
They can be commercial or residential.
They have an accuracy because they are pre-engineered
and the dimensions are precise.
They have flexibility.
We can change the design based on the customer needs
and customer requirements.
What exactly on this project we have to do
is that estimate the weight to predict the price.
The steel structure is a critical factor on the metal building pricing.
We want to see if we can predict that weight.
Eventually, we can predict the price
and eventually, we can work on the RFQ process.
Basically, let's take a look at the RFQ process.
What is the RFQ process that we want to optimize
and what is exactly their current system is doing.
As I mentioned, the business challenges on hand on this project
is to improve the RFQ process for metal buildings
and to see if we can improve the response time,
reduce the server cost,
and mitigate the missing opportunities for the company.
To send out the RFQ to their client,
basically, they are going to choose some process, some steps.
They review the inquiries that coming to the company, specify the purpose, size,
layout, design requirement of the building.
They check all of the technical specification,
the load capacity, roofing and insulation and other factors
to prepare accurate quotes that cover all of the materials, labor and cost for them,
calculate the price and send out the RFQ back to the customers.
Their current system that processing the RFQ for them basically is like this.
They are right now having three servers.
Each server is processing about 480 RFQ per day.
The time each RFQ will take to process is about 180 seconds.
The cost per RFQ is approximately 75 cents.
Dividing all of these together,
we can come up with a number of $0.004 per second to process RFQ.
To consider 480 per day per server,
it's going to be $360 per day to process the RFQs.
They have three servers.
Totally, it's going to be about $1,000 per day for all three servers
to process the RFQs.
If we consider 365 days in a year, it's going to be about $400,000 server
and maintenance cost for the company to process all of the RFQs.
Basically, we want to see if we can improve this process
for the company by predicting the weight based of the historical data.
The methods I have used on this project
probably based all of the machine learning projects for predicting.
It's going to be data collection and preprocessing, feature engineering
and selection, and model selection.
Let's take a look.
We started by defining exactly what is the business problem.
The business problem is that predicting the weight of the metal building.
The data collection and pre-processing is that
how I handle this part is that about 182 SQL tables were provided to me
as a part of subset of historical data from the manufacturer.
Around 500,000 record was included in those tables,
more than 500 features, US data only.
I have performed the ETL process
and cleaning my data set fine abnormal data in SAS, Python and JMP
and integrated all of my data in one table
with the help of these three tools.
My next step was to, the feature selection or extraction.
The preliminary feature selection, I performed it starting industrial study
and also getting the expertise judgment to eliminate the features
that it was really unrelated to our target.
Then I started correlation analysis to examine the correlation
and find the multicollinearity between my features
to eliminate more features.
Then descriptive analysis and visualization to check all
of the relationship between my data as I have done in JMP, Python,
and Tableau.
The next step for me was to start figuring out or exploring the models,
the algorithms based on the characteristic of my data.
As my data was, my target is an integral target,
so I started to look into the linear regression stepwise, random forest,
decision tree, and some other more algorithm.
Eventually, I trained and evaluate different models in both Python and JMP.
I compared the performance metrics and did the cross validation
and k-fold validation for all of my models.
If we take a look at here in the middle,
we can see the models that I trained my data on it.
Some of them are in Python, some of them are in JMP.
To compare this model to see which one is the best,
I had to compare the models based on the ASE,
and also to check which model is performing better
and also check the R-squared to see how close our forecast,
our prediction going to be to the actual data.
This is the residual plot for all of the models.
If you go back here,
you can see out of all of the models that I had run,
I have two models that are the best,
a decision tree in Python and boosted tree in JMP.
If we take a look at both of them, we can see most of the features
that both selected are the same, even the importance are close to each other.
Looking at the ASE,
we can see the ASE for decision tree is about 4,000,
but for the boosted tree is about 2,000, which is much more better.
Also, R-squared for the boosted tree is a little bit higher.
The residual, if we look at the residuals
from both softwares, we can see the residual is really good,
close to the actual values, or from Python is close to is around zero values.
Both model are really good and they're predicting good.
It means that we can predict the weight
of the building based of this historical data that we have.
Let's see how it's going to impact the business here.
Okay, we have a good model. This model is performing well.
Let's see if the business want to implement this model,
what are the benefits for them?
The first chart we can see was the chart
that I already presented to you as their current system.
We know they are each server, they have three servers
and each server is processing 480 RFQ per day
that totally it costs $360 per day for each server.
Consider that we want to process the same amount of the RFQs, 480.
But at this time, with implementing the model,
we can lower down the process time to one second
because the model going to just instantly running and giving the price to us.
Considering the cost going to be per RFQ going to be the same,
cost per second going to be the same, just processing time will be one second
out of 180 seconds.
The total cost will lower down to $2 per day per server.
Then they have three servers, it's going to be $6.
If we consider, it's going to be 365 days per year
that all three servers are running, it's going to be about $2,000.
Comparing with current cost,
they are paying for the server and processing time.
It's almost 99% cost saving for the business and the company.
But surprisingly, this is not the only cost they can save.
The other cost they can save is or they can bring more opportunities.
They are missing some opportunities.
Right now, they are processing 480 RFQs per day, one server.
But if we lower down the processing time to one second,
they can process 80,000 RFQs.
Considering the win rate for them will be 4%.
Right now, the total number they can win per day is going to be 19.
But with implementing the model because they are processing more RFQ,
so they can win more and they can bring more opportunities.
Actually, basically they are missing about 3,000 opportunities per day.
Probably this is going to be more than their capacity,
but this is one of the points
that they can think about their business development.
This is my project about they could have predict a good model and eventually
we can see they're going to be a huge business impact for the company.
I hope you enjoy it and thank you so much to Texas A&M
and Dr. Jones as my capstone coach.