Choose Language Hide Translation Bar

RFQ Process Optimization with JMP and Python - (2023-US-PO-1442)

The request for quotation (RFQ) process can be optimized using machine learning algorithms to predict the price of pre-engineered metal buildings (PEMBs) based on historical data. This process involves collecting data on PEMBs, preprocessing it by removing missing values and handling outliers, selecting relevant features through feature engineering, and splitting the data into training and testing sets. Popular machine learning algorithms such as linear regression, decision trees, random forest, or neural networks can be used to build the model, and performance metrics like mean squared error (MSE) and root mean squared error (RMSE) are used to assess the model.

 

Python and JMP are two tools that can be used for this predictive modeling. Python is a popular programming language with data analysis and machine learning libraries. At the same time, JMP is a commercial software package for statistical analysis and data visualization. Choosing between the two depends on specific project requirements, user skills, and available resources.

 

By building a predictive model for PEMBs, organizations can more accurately estimate the weight of future projects, leading to better budgeting, planning, and resource allocation.

 

 

Hello. My  name  is  Najmeh  Dehghani.

I'm  a  Data  Engineer  at  JLL.

I  have  my  Master  Degree  in Analytics from  Texas  A&M.

My  capstone  project is  in  RFQ  Process  Optimization

for  Pre-engineered  Metal  Buildings.

I  get  to  know  JMP  and  SAS in  school  at  Texas  A&M.

I  founded  a  wonderful  tool to  do  analysis

and  perform  analysis  with  that.

But  actually,  my  capstone  project is  in  both  in  JMP  and  also  Python.

I  have  done  it  in  both.

Basically,  the  objective  of  my  project is  to  predicting  the  weight

of  pre-engineered  metal  building   by  considering  various  geometric

and  geographic  requirements.

Let's  have  an  introduction to  the  metal  buildings.

Metal  buildings,  basically  pre-engineered metal  buildings

are  structures  that  utilize  steel  frames to  support  the  roof  and  wall  panel.

We  can  do  the  customizations  on  them based  on  the  customer's  requirement.

They  can  be  commercial  or  residential.

They  have  an  accuracy because  they  are  pre-engineered

and  the  dimensions  are  precise.

They  have  flexibility.

We  can  change  the  design based  on  the  customer  needs

and  customer  requirements.

What  exactly  on  this  project  we  have  to  do

is  that  estimate  the  weight to  predict  the  price.

The  steel  structure  is  a  critical  factor on  the  metal  building  pricing.

We  want  to  see if  we  can  predict  that  weight.

Eventually,  we  can  predict  the  price

and  eventually, we  can  work  on  the  RFQ  process.

Basically,  let's  take a  look  at  the  RFQ  process.

What  is  the  RFQ  process that  we  want  to  optimize

and  what  is  exactly their  current  system  is  doing.

As  I  mentioned,  the  business  challenges on  hand  on  this  project

is  to  improve  the  RFQ  process for  metal  buildings

and  to  see  if  we  can  improve the  response  time,

reduce  the  server  cost,

and  mitigate  the  missing opportunities  for  the  company.

To  send  out  the  RFQ  to  their  client,

basically,  they  are  going to  choose  some  process,  some  steps.

They  review  the  inquiries  that  coming to  the  company,  specify  the  purpose,  size,

layout,  design  requirement of  the  building.

They  check  all of  the  technical  specification,

the  load  capacity,  roofing and  insulation  and  other  factors

to  prepare  accurate  quotes  that  cover  all of  the  materials,  labor  and  cost  for  them,

calculate  the  price  and  send  out the  RFQ  back  to  the  customers.

Their  current  system  that  processing the  RFQ  for  them  basically  is  like  this.

They  are  right  now  having  three  servers.

Each  server  is  processing about  480  RFQ  per  day.

The  time  each  RFQ  will  take to  process  is  about  180  seconds.

The  cost  per  RFQ is  approximately  75  cents.

Dividing  all  of  these  together,

we  can  come  up  with  a  number of  $0.004  per  second  to  process  RFQ.

To  consider  480  per  day  per  server,

it's  going  to  be  $360  per  day to  process  the  RFQs.

They  have  three  servers.

Totally,  it's  going  to  be about  $1,000  per  day  for  all  three  servers

to  process  the  RFQs.

If  we  consider  365  days  in  a  year, it's  going  to  be  about  $400,000  server

and  maintenance  cost  for  the  company to  process  all  of  the  RFQs.

Basically,  we  want  to  see if  we  can  improve  this  process

for  the  company  by  predicting  the  weight based  of  the  historical  data.

The  methods  I  have  used  on  this  project

probably  based  all  of  the  machine learning  projects  for  predicting.

It's  going  to  be  data  collection and  preprocessing,  feature  engineering

and  selection,  and  model  selection.

Let's  take  a  look.

We  started  by  defining  exactly what  is  the  business  problem.

The  business  problem  is  that  predicting the  weight  of  the   metal building.

The  data  collection and  pre-processing  is  that

how  I  handle  this  part  is  that about  182  SQL  tables  were  provided  to  me

as  a  part  of  subset  of  historical data  from  the  manufacturer.

Around  500,000  record  was  included in  those  tables,

more  than  500  features,  US  data  only.

I  have  performed  the  ETL  process

and  cleaning  my  data  set  fine  abnormal data  in  SAS,  Python  and  JMP

and  integrated  all of  my  data  in  one  table

with  the  help  of  these  three  tools.

My  next  step  was  to, the  feature  selection  or  extraction.

The  preliminary  feature  selection, I  performed  it  starting  industrial  study

and  also  getting  the  expertise  judgment to  eliminate  the  features

that  it  was  really unrelated  to  our  target.

Then  I  started  correlation  analysis to  examine  the  correlation

and  find  the  multicollinearity between  my  features

to  eliminate  more  features.

Then  descriptive  analysis and  visualization  to  check  all

of  the  relationship  between  my  data as  I  have  done  in  JMP,  Python,

and  Tableau.

The  next  step  for  me  was  to  start figuring  out  or  exploring  the  models,

the  algorithms  based on  the  characteristic  of  my  data.

As  my  data  was, my  target  is  an  integral  target,

so  I  started  to  look  into  the  linear regression  stepwise,  random  forest,

decision  tree, and  some  other  more  algorithm.

Eventually,  I  trained  and  evaluate different  models  in  both  Python  and  JMP.

I  compared  the  performance  metrics and  did  the  cross  validation

and  k-fold  validation  for  all of  my  models.

If  we  take  a  look  at  here  in  the  middle,

we  can  see  the  models that  I  trained  my  data  on  it.

Some  of  them  are  in  Python, some  of  them  are  in  JMP.

To  compare  this  model to  see  which  one  is  the  best,

I  had  to  compare  the  models based  on  the  ASE,

and  also  to  check  which  model is  performing  better

and  also  check  the  R-squared to  see  how  close  our  forecast,

our  prediction  going to  be  to  the  actual  data.

This  is  the  residual  plot for  all  of  the  models.

If  you  go  back  here,

you  can  see  out  of  all of  the  models  that  I  had  run,

I  have  two  models  that  are  the  best,

a  decision  tree  in  Python and  boosted  tree  in  JMP.

If  we  take  a  look  at  both  of  them, we  can  see  most  of  the  features

that  both  selected  are  the  same,  even the  importance  are  close  to  each  other.

Looking  at  the  ASE,

we  can  see  the  ASE for  decision  tree  is  about  4,000,

but  for  the  boosted  tree  is about  2,000,  which  is  much  more  better.

Also,  R-squared  for  the  boosted tree  is  a  little  bit  higher.

The  residual,  if  we  look  at  the  residuals

from  both  softwares, we  can  see  the  residual  is  really  good,

close  to  the  actual  values,  or  from  Python is  close  to  is  around  zero  values.

Both  model  are  really  good and  they're  predicting  good.

It  means  that  we  can  predict  the  weight

of  the  building  based of  this  historical  data  that  we  have.

Let's  see  how  it's  going to  impact  the  business  here.

Okay,  we  have  a  good  model. This  model  is  performing  well.

Let's  see  if  the  business want  to  implement  this  model,

what  are  the  benefits  for  them?

The  first  chart  we  can  see  was  the  chart

that  I  already  presented  to  you as  their  current  system.

We  know  they  are  each  server, they  have  three  servers

and  each  server  is processing  480  RFQ  per  day

that  totally  it  costs $360  per  day  for  each  server.

Consider  that  we  want  to  process the  same  amount  of  the  RFQs,  480.

But  at  this  time, with  implementing  the  model,

we  can  lower  down the  process  time  to  one  second

because  the  model  going  to  just  instantly running  and  giving  the  price  to  us.

Considering  the  cost  going  to  be  per  RFQ going  to  be  the  same,

cost  per  second  going  to  be  the  same, just  processing  time  will  be  one  second

out  of  180  seconds.

The  total  cost  will  lower  down to   $2  per  day  per  server.

Then  they  have  three  servers, it's  going  to  be  $6.

If  we  consider, it's  going  to  be  365  days  per  year

that  all  three  servers  are  running, it's  going  to  be  about  $2,000.

Comparing  with  current  cost,

they  are  paying  for  the  server and  processing  time.

It's  almost  99%  cost  saving for  the  business  and  the  company.

But  surprisingly,  this  is  not the  only  cost  they  can  save.

The  other  cost  they  can  save  is or  they  can  bring  more  opportunities.

They  are  missing  some  opportunities.

Right  now,  they  are  processing 480  RFQs  per  day,  one  server.

But  if  we  lower  down the  processing  time  to  one  second,

they  can  process  80,000  RFQs.

Considering  the  win  rate for  them  will  be  4%.

Right  now,  the  total  number they  can  win  per  day  is  going  to  be  19.

But  with  implementing  the  model because  they  are  processing  more  RFQ,

so  they  can  win  more and  they  can  bring  more  opportunities.

Actually,  basically  they  are  missing about  3,000  opportunities  per  day.

Probably  this  is  going  to  be  more than  their  capacity,

but  this  is  one  of  the  points

that  they  can  think about  their  business  development.

This  is  my  project  about  they  could  have predict  a  good  model  and  eventually

we  can  see  they're  going  to  be a  huge  business  impact  for  the  company.

I hope  you  enjoy  it and  thank  you  so  much  to  Texas  A&M

and  Dr.  Jones  as  my  capstone  coach.