cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar
Coding with Continuous and Mixture Variables to Explore More of the Input Space (2022-US-45MP-1103)
Honorable Mention

 

Micol Federica Tresoldi, Senior Research Statistician, Dow Chemical
Xinjie Tong, Senior Research Statistician, Dow Chemical

 

This case study investigates chemical mixtures to achieve optimal properties using design of experiment (DOE) data. The formulation space consists of four input variables: Chemical A Type, Chemical B Type, Chemical C Type, and Chemical D Content. The first three variables represent different compositions for making Chemical A, B and C, respectively, and as such, can be coded both as categorical factors, as well as continuous mixture variables.

 

We created the DOE treating them as categorical due to the experimental constraints. However, at the data analysis stage, even after considering thousands of simulated hypothetical formulations, none of them was predicted to meet the desired properties. At that point, to be able to identify promising subregions, we needed to overcome the discreteness of the space. So, we recoded those two factors as continuous and mixture variables, derived the equivalent regression model, and reran the simulations. Indeed, under certain assumptions, this coding strategy enables one to interpolate and consider missing compositions not present in the original DOE.

 

In this presentation, we demonstrate how to use JMP Pro 16 Profiler Simulation feature with Graph Builder to achieve an extensive and insightful exploration of the formulation space applicable to diverse fields.

 

 

Hello  everyone.

My  name  is  Micol Tresoldi.

Today  my  talk  will  be  about  coding with  continuous  and  mixture  variables

to  explore  more  of  the  input  space.

Before  I  jump  into  the  topic  though,

I'd  like  to  give  your  brief  outline of  what  my  presentation  would  look  like.

I'll  start  by  sharing  and  presenting to  you  a  little  bit  of  a  general  idea

of  what  the  object  of  the  project  was and  the  objective  that  was  driving  it,

and  then  I'll  pass  on to  present  the  initial  approach

that  we  took  initially to  pursue  this  objective.

I'll  then  show  you  though,

that  following  this  initial  approach, we  do  encounter  some  problem.

At  that  point  at  the  problem  stage,

we'll  need  to  go  back to  the  beginning  of  the  problem  setting,

and  try  to  look  at  that from  a  slightly  different  perspective,

in  a  way  that  we  can figure  out  an  alternative  way

of  looking  at  our  input  variables.

In  doing  this, we'll  be  altering  our  data  structure.

But  I'll  show  you  how  we  can  actually build  an  equivalent  to statistical  model

in  a  way  that  we  will  not  be  in  need of  going  and  collect  any  additional  data,

but  actually  we'll be  able to  re-analyse  exact  same  data,

and  still  be  able  to  hopefully overcome  our  initial  problem

and  find  some  useful  directions  to  go.

This  is  the  overview  of  the  presentation.

Let  me  start  by  giving  you the  general  idea  of  the  project.

When  the  clients  first reach  out  to  us, they  had  something  in mind

in  terms  of  having  some  ingredients they  needed  to  mix  together,

in  a  way  that  the  final  formulation exhibited  some  optimal  properties.

More  specifically,

any  formulation  was  going  to  be  judged upon  two  properties,

and  each  of  these  properties  had  to  meet some  certain  optimality  criteria.

As  I  just  stated,  the  problem  itself, is  pre- general  in  its  nature .

We'll  have  some  ingredients, using  the  common  analogy,

we  can  think  about  ourselves in  the  kitchen  having  some  ingredients

and  having  to  figure  out a  way  to  mix  them.

In  way  that,  at  the  end, our  cake  will  look  nice

and  also  taste  good.

This  is  the  general  framework.

Now,  let  me  give  you  some  more  details about  this  specific  cake.

The  recipe  calls for  four  ingredients,

Factor  A,  Factor  B, Factor  C,  and  Factor  D.

For  Factor  A,  B,  and  C,

actually  the  amount  to  put  in  the  recipe is  being  predetermined.

We  don't  have  freedom  there.

On  the  other  hand, what  we  need  to  decide  though

is  how  we're  going  to  make those  ingredients,  if  you  like.

There  are  multiple  ways of  making  those  ingredients

because  we  have  multiple  raw  materials that  we  can  employ

to  arrive  to  those  ingredients.

Then  only  after  having  these  ingredients ready  for  using,

we  can  actually  employ  them in  the  final  recipe.

This  is  for  Factor  A,  B, and C.

For  Factor  D  on  the  other  hand, there  is  only  one  raw  material  we  can  use.

Only  one  way  of  making  it.

What  we  need  to  decide  is  how  much

we're  going  to  put  a  factor  D in  the  final  recipe.

Just  to  recap, in  terms  of  decision -making  problem,

we'll  need  to  decide  four  things,

how  we  make  Factor  A, how  we  make  Factor  B,

how  we  make  Factor  C,

and  how  much  of  Factor  D we're  going  to  put  in  the  recipe.

Okay,  now  I'll  need  to  be a  little  more  specific

in  giving  you  some  more  details

about  how,  what  were  these  ways of  making  Factor  A,  B,  and  C.

The  client,  when  they  came  to  us,

they  had  relatively  few  options  in  mind for  this.

For  Factor  A,  they  wanted to  consider  two  raw  materials.

either  only  using  raw  material  A1, or  only  using  raw  material A2.

For  factor  B,  once  again, only  two  raw  materials,  B 1  and  B 2.

The  possible  ways  of  making  factor  B,

it  was  either the  two  pure  blends  of  B and  B 2,

or  a  50-50  blend  of  B 1  and  B2.

Factor  C  we are  now  three  row  materials are  available  for  making  it.

Again, either  the  three  pure  blends,  C 1,  C 1,  C3,

or  as  a  fourth  options, are  50-50  blend  of  C 1  and  C 2.

With  respect  to  the  Factor  D  quantity,

which  I'm  going  to  denote  by from  now  on,  by  X 1,

they  wanted  to  test  four  possible  levels.

Four  possible  amounts, five,  10,  15,  and  20.

Regarding  the  response  variables, those  are  slightly  more  straightforward

in  the  sense  that we  only  have  two  of  them,

both  our  continuous  variables.

Each  of  them, as  I  was  mentioned  in  the  beginning,

had  to  meet  certain  optimality  threshold, optimality  criteria.

For   Y1  had  to  be  above  17. For   Y2  had  to  be  above  2.6.

Now  we  have  on  our  left,

our  input  variables  that  we  need  to  decide how  to  maneuver  and  vary

in  making  the  recipe.

On  the  right  side,

we  have  the  properties that  we're  interested  in.

What  we  decided  to  do  was

to  propose  our  clients to  do  designed  experiment

in  a  way  that  we  would  go  out and  make  some  of  these  recipes,

make  some  of  these  formulations and  be  able,  from  the  collected  data,

after  recording  the  properties for   [inaudible 00:06:32] ,

actual  observed  formulation,

to  understand and  infer  the  relationships  undergoing

that  were  linking the  input  variables.

How  we  were  making  our recipe  and  response  variables.

How  the  properties  actually were  executing  themselves

for  different combinations  of  the  inputs.

Ultimately  that the  objective  of  the  project  was  in  fact,

to  figure  out whether  there  was  an  optimal  recipe,

meaning  a  recipe  that  whose  properties

both  met  their  respective optimality  criteria.

Given  this  framework, given  this  setting,

now  it's  pretty  clear  that X 1  is  going  to  be  a  quantitative  variable.

But  how  about  Factor  A,  B,  and  C?

Given  the  fact  that  we  can  mix these  raw  materials .

Are  we  going  to  treat  them  as  categorical or  are  we  going  to  treat  them  as  numeric?

At  this  stage,  because  the  client  was particularly  interested  in

observing  the  performance  of  these specific  compositions  of  the  raw  materials

for  making  the  various Factors  A,  B,  and  C,

we  decided  to  accommodate  their  requests and  coded  them  as  categorical  variables

in  a  way  that  we  were  sure that  those  specific  compositions

were  going  to  show  up in  the  design  of  experiment.

Again,  categorical  variables means  that,  and  in  this  case,

each  level  of  the  categorical correspond  to  a  possible  way

of  making  the  ingredient  or  factor.

We  end  up  with  three  categorical  variables

with  two,  three, and  four  levels  respectively.

Now,  turns  out  that  actually

this  categorical  coding  approach was  also  pretty  helpful  in  the  discussion

of  how  we  wanted to  specify  the  statistical  model

that,  in  principle,  was  supposed  to,

or  at  least  assumed to  be  comprehensive  enough

to  describe  and  capture the  relationship  undergoing

between  the  factors  and  the  responses, the  properties.

For  the  client,  was  particularly  easy

for  having  this  categorical  coding to  identify  and  specify

what  kind of interaction  turns who  they  were  expecting  to  see

in  terms  of  explaining  and  be  relevant in  explaining  the  relationships.

The  final  statistical  model

that  we  ended  up specifying  the  design  of  experiment,

comprised  of  main  effects, two -way  interactions,  all  of  them,

quadratic  and  cubic  terms for  the  continuance  variable

with  the  addition  of  the  interaction of  the  quadratic  with  one  of  the  factors.

Now,  of  course,

we  also  had  some  constraint in  the  number  of  experiments  available.

Because  we  obviously  don't  have infinite  amount  of  resources,

so  we  put  a  constraint  of  51  runs,

and  this  is  the  DOE  that  JMP  gave  us

able  to  estimate the  statistical  model  we  just  specified,

and  also  be  able  to  be within  the  constraints

that  on  our  resources.

Now  with  this, the  only  thing  that  was  left  to  do

was  go  and  make  this  51  formulations.

Imagine  that  we're  super  quick, and  everything  is  magic,

and  we  have  already  got  gun

and  made all  of  our  relations  collected  data .

Now  we  are  in  good  shape  for

estimating  the  Gaussian  model that  we  specified.

These  are  the  results for  the  first property, Y 1.

We  can  see  that  there  is  a  pretty  good  fit between  predicting  and  actual  values.

A lso,  if  we  look  at  the  metrics, the  reporting,  the  model  summary,

those  look  pretty  satisfactory.

The  same  is  true  if  we  look  at now  at  the  second  property   Y2,

again,  pretty  good  fit.

We  are  happy  with  our  models, and  we  think  we  did  a  good  job

in  capturing  the  relationship.

Now  remember  that  what  we  really  want to  discover  is,  in  fact,

there  is  any  optimal  recipe  that  can meet  both  criteria  for  our  properties.

How  are  we  going  to  do  this?

How  are  we  going  to  establish if  such  a  optimal  recipe  exists  or  not?

Well,  in   JMP Pro 16, this  is  a  super  easy  task,

because  we  can  simulate  thousands of  potential  alternatives  recipes

by  using  the  Profiler  Feature  options.

For  each  of  these  hypothetical  recipes,

we  can  automatically  have in  the  same  table

the  predicted  mean  value for  the  two  properties,

so  that  it  comes  super natural and  super  easy

to  see  if  there  is  any  optimal  recipe.

Just  to  give  you  an  idea how  quick  that  is,

I  want  to  show  you  live, how  we  can  do  this.

This  is  my  DOE  categorical  table, where  I  have  my  Factor  A,  B,  and  C.

X1  is  my  only  quantitative input  variables.

I  have  my  recorded  values for  the  two  properties,   Y1  and   Y2.

Imagine now  that  we  have  already run  the  model,  estimated  model

and  saved  the  prediction  formulas for  the  two  variables  here.

We  can  go  here and  highlight  these  two  columns.

Go  to  graph,  select  Profiler,

and  then  put  those  two  prediction  formulas in  the  Y  prediction  formula  box

and  click  OK.

This  is  the  usual  way we  get  a  profiler  dialogue  box.

In  fact,  we  can easily  play  around and  changing  the  various,

but  levels  of  the  inputs  in  a  way that  we  can  actually  see

how  this  impacts  our  predictions for  the  two  properties.

However,  what  I  want  to  show  you  today

is  how  we  can  actually  ask, going  to  the  red  triangle,

ask  JMP  to  output  a  random  table, and  we  can  make  it  as  big  as  we  like.

I'm  going  to  start  with  30,000  rows, just  to  start,

I'll  show  you,  see,

didn't  really, took  no  time  for  JMP to  give  us  this  30,000  rows

where  each  table,  where  each  row corresponds  to  a  hypothetical  recipe

that  we  haven't  necessarily  seen in  the  DOE.

This  is  the  power of  having  this  feature  in  JMP,

that  we  can  explore  the  input  space in  literally  no  time.

Now,  if  we  are  interested  in  seeing

whether  there  is  one  recipe that  is  optimal,

then  we  can  go  here,  Graph  Builder,

and  put  the  predictive  values  for   Y1, predictive  values  for   Y2.

And  then  just  to  aid  our  visualization,

I'm  going  to  put  a  vertical  axis

in  correspondence of  the  optimal  threshold  for   Y2,

and  likewise  horizontal  line marking  the  optimal  threshold  for  Y1.

This  upper  quadrant denotes  the  optimal  region,

because  both  properties  are satisfying  the  optimality  criteria.

Unfortunately,  that  we  can  see  from  here

that  we  don't  find  any  recipe that  is,  in  fact,

able  to  satisfy  both  the  criteria.

This  is  like, okay,  not  very  good  news.

Now  let  me  go  back to  my  presentation  very  quick.

We  can  see,  in  fact,

that  we  don't  have any  properties  line  this  quadrant

with  the  happy  green  smiley.

What  do  we  do  at  this  point?

Do  we  give  up? Of  course  not.

What  we  can  do  is,  in  fact, go  back  to  the  beginning  of  the  problem

and  try  to  see  if  we  can  change any  of  our  initial  choices

that  we  first made in  approaching  the  problem.

In  particular,  you  might  be  remembering that  we  were  undecided

whether  we  would  treat the  Factor  A,  B,  and  C

as  categorical  or  as  numeric.

So  far  we  have  treated them  as  categorical.

So  far,  factor  A  as  being a  categorical  variable,  with  two  levels,

either  only  using  A1  or  only  using  A2.

However,  because  in  fact, the  client  were  open

to  mix  the  raw  materials to  make  Factor  A.

So  that  was  an  option.

Then  what  we  can  think  of

is  substituting   this  Factor  A  with  variable

that  now  I  call   A1 Content, which  is  a  quantitive  variable,

which  represents how  much  of   A1  I'm  going  to  put

into  the  mixture  of   A1  and   A2 for  making  Factor  A.

The  translation,

the  conversion  between  categorical  levels and  numerical  values,

it's  almost  immediate .

If  I'm  only  using   A1, I'm  going  to  use  100% of  A1  in  my  mixture.

so  I  can  code   A1  Content  to  be equal  to  one.

On  the  opposite  side, if  I'm  only  using   A2,

this  means  that I  have  zero A1  Content  in  my  mixture,

and  therefore   A1  Content is  going  to  be  equal  to  zero.

You  might  have  guessed  that  implicitly,

we  are  also  defining   A2  Content to  be  equal  to   1 -   A1  Content.

But  we  don't  really  need  that

because  we  are  only  looking at  two  mixture  variables.

Why  are  we  doing  this?

Well,  the  advantage  is  clear .

With   Factor A,

we  were  constrained  in  looking  at  either A1  Content  to  be  equal  to  zero  or  one.

Now  that  we're  considering continuance  coding,

the   A1  Content  can  take any  value  between  zero  and  one.

This,  of  course,

represents  an  enormous  jump in  the  flexibility  of  our  model

and  an  infinite  in  the  sense  that now  we  are  open

to  literally  infinite  more  mixtures and  infinite  more  ways  of  making   Factor A.

Likewise,  Factor  B  is categorical  with  three  levels.

So  far  it's  been  this  way,

coded  only  B1,  only   B2 or  50-50  blend.

But  following  the  similar  logic,

we  can  now  introduce  a B1  Content, continuous  variable.

A gain,  the  conversion  is going  to  be  exactly  the  same.

50-50  blend  of  B 1  and  B 2 will  be  converted  in  0.5

because  I'm  using 50 %  of  B 1  and  50 %  of  B 2.

Again,  B 2 Content  is 1 - B 1 Content.

A gain,  the  advantage  is  that we're  not  bound  to  jump  from  zero  to  0.5,

or  to  zero  to  one  necessarily,

but  we  can  explore the  whole  spectrum  of  values

from  zero  to  one.

Factor  C  is  likely  more  tricky,

because  we  do  have  three  possible raw  materials  to  mix  up.

A t  this  stage,

we  need  to  introduce  not  just  one, but  actually  three  continuous  variables

that  besides  being  continuous, have  also  the  mixture  constraints.

Meaning at  all  times, they  need  to  be  something  to  one.

But  the  conversion

between  the  levels  of  Factor  C and  the  three  new  mixture  variables

follows  exactly  the  same  logic.

That's  super  easy.

This  is  just  a  visualization  of  how we  do  the  conversion  of  the  levels.

This  is  how  the  DOE  points that  we  already  have  the  data  on.

We  don't  need  anything  else.

Are  seat  within the  continuous  coding  space.

Now,  the  only  more

involved  steps  in  passing from  the  categorical  coding

to  use  a  continuous  coding is  how,  in  fact,

we  convert  this  the  statistical  model

that  we  use  to  design  the  experiment and  then  to  analyze  the  data.

How  are  we  going  to  do  this?

Well,  the  easier  way is  to  just  do  it  in  many  small  steps.

What  we're  going  to  do is  start  with  our  main  effects  model,

a  little  by  little at  the  different  factors.

We  start  with   Factor A, which  had  only  two  levels.

Now  in  the  continuous  coding,  what  we're  going  to  put  is   A1  Content.

We're  only  going  to  put the  linear  term  of  this   A1  Content.

In  fact,  we  only  had  one  coefficient for  Factor A  in  the  category  coding  model.

Likewise,  now  we're  going  to  have one  single  coefficient  for   A1  content.

Now  if  you  don't  believe  me, this,  it's  an  equivalent  model.

I'm  going  to  show  you a  couple  of  examples.

Imagine that  we  want  to  figure  out

the  impact  of  using  only   A2 for  making   Factor A,

then  that  means that   A1  content  is  zero.  Fine.

From  the  categorical  coding  model,

we're  going  to  just look  at  the  intercept  term,

because  this  extra  term refers  to  when  we  use   A1.

On  this  other  side,

for  continuous  coding  model,

we're  going  to  put  the  intercept, of  course,

and  then  the   A1  Content  coefficient, but  now  we  will  multiply  it  by  zero

because   A1  Content  is  zero.

Not  even  doing  any  math,

you  can  really  see  that these  two  numbers  are  exactly  the  same.

Similarly,  if  we  want  to  see , what's  the  impact  of  using  only   A1

now  at  this  time,   A1  content is  going  to  be  equal  to  one.

Now  for  categorical,

I'm  going  to  sum  up  the  intercept  term plus  the   Factor A  coefficient  accord

accounting  for  the  difference and  the  levels  of  the  factor.

On  this  other  side though,

we  are  going  to  always include  the  intercept.

A t  this  point,  we'll  multiply the   A1  Content  coefficient  by  one

because  the  content  is  one.

Again, not  even  any  math,

the  two  numbers  here  are  the  same as  the  two  numbers  here.

Exactly  equivalent.

Now  with  Factor  B,  we  had  three  levels.

How  are  we  going  to  do  that?

Well,  because  it  has  three  levels, now  we  can't  just  add  the  linear  term,

but  we  also  need to  add  the  quadratic  term.

We  had  two  coefficients  before,  and we're  going  to  have  two  coefficients

also now  with  the  continuous  coding.

A gain,  if  you  don't  believe  me, this  is  an  equivalent  model,

we can  work  out  at  least  one  example, which  works  exactly  as  befor e.

If  I  only  have  B2, B 1 Content  in  zero,

means  that  two  coefficients are  going  to  have  zero  weight

in  computing  the  impact.

Therefore  the  two  numbers  are  only just  two  that  are,  in  fact  the  same.

I'm  not  going  to  go  into  this  again,

only  B 1  is  equivalent to   B1 Content  equal  to one.

The  most  interesting  is,

this  that  at  least  requires  you to  do  some  summation.

Where  B 1  Content  is  going  to  be  0.5, because  we  are  considering  50-50  blend.

You  can  verify  easily

that  these  two  numbers  here  summed  up

are  equivalent to  this  other  side  of  the  equation

where  we  put  0.5  and  0.5  squared,

because  now  our  B 1 Content is  equal  to  0.5.

Now  for  Factor  C,  we  had  four  levels.

We  particularly  remember, we  had  three  possible  raw  materials.

We  had  to  introduce three  mixture  variables.

Every  time  we  do  have  to  deal with  mixture  variables  things,

it's  slightly  complicated

because  they  become  perfectly  cleaner with  any  constant  term.

In  putting  the  C 1,  C 2,  and  C 3,

the  sum  of  them  deletes  or requires  us  to  delete  the  constant  term.

But  other  than  that, everything  follows  pretty  much  the  same.

We  had  three  coefficients  here,

and  we're  still  going  to  have three  coefficient  here

because  we  have  four, but  we  are  getting  rid  of  the  intercept.

So  still  the  same  balance.

A gain,  I'm  not  going  to go  through  all  of  the  examples,

but  you're  more  than  welcome to  look  at  the  slides  offline

and  check  that  those  are,  in  fact, gives  you  always  the  same  answers.

These  are  all  the  examples.

Now  with  so  much  work,  we  have  found the  conversion  of  the  main  effects.

How  we  actually  convert each  separate  factors

into  using  the  new continuous  variables?

Now  our  original  model,  though, included  more  than  just  main  effects.

In  fact,  we  had  the  two -way  interactions.

Now  the  idea  here  is  that every  time  Factor A  appears,

I'm  going  to  substitute  it with   A1  Content.

Every  time  Factor  B  appears,

I'm  going  to  substitute  it

with  the  two B1  Content and  B 1  Content  squared.

Likewise  for  Factor  C,

I'm  going  to  substitute  it with  the  four  terms  that  I've  put  here.

The  same  holds when  I'm  interacting  with  X 1,

and  everything is  very  much  in  the  same  flavor,

logically  follows  the  same  scheme.

The  only  caution that  you  want  to  be  aware  of

and  be  particularly  attentive  about

is  that  every  time  you  interact a three  mixture  variables,

where  those  are your  three  mixture  variables,

those  main  effects  that  you  originally  had now  need  to  be  excluded  from  the  model,

otherwise,  the  model  won't  be  feasible.

That's  the  only  caution that  you  need  to  be  careful  about.

Other  than  that,  we're  ready  to  go.

We've  got  our  equivalent  continuous  model.

Now  what  we  can  do  is,  in  fact,  again, verify  that  everything  is  still  same.

I  get  exactly  the  same  predictions,

either  using  the  categorical  coding or  going  and  using  the  continuous  coding.

Now  you  might  ask  myself, why  are  you  going  into  so  much  trouble

and  going,  doing  so  much  mess if  things  are  exactly  the same?

Well,  the  advantage  is  immediate  to  see,

and  you  can  really  appreciate  it if  you  start  looking  at  the  profilers.

This  is  the  profilers,  how  it  looks, when  you  use  the  categorical  coding .

You  have  to  jump  between the  different  levels.

You  don't  have  the  faintest  idea what  can  happen  in  between.

With the  continuous  coding on the  other  hand,

that's  exactly  what  you  can  do.

You  can  explore  way  more of  the  different  possible  ways

of  making  the  various  ingredients Factor A,  B,  and  C

in  a  way  that  before it  was  just  out  of  bounds.

In  technical  terms,  means  that we  have  way  more  power  of  interpolation.

This  doesn't  come  free,  of  course.

What  you  pay,  the  price  of  is  in  fact,

that  you  are  implicitly making  some  assumptions.

The  assumptions  regards  the  way that  the  various  new  continuous  variables

that  we  have  introduced are  related  to  the  responses.

In  a  way,  we  are  implicitly  assuming  that

the  relations  between  A 1 Content and  our  properties  is  linear.

The  relationship  between B1 C ontent  is  quadratic and  so  forth.

If  you  think  that  those  assumptions don't  really  hold  in  your  case,

then  of  course,  the  whole  procedure is  questionable.

You  don't  want  to  pursue  this.

But  if  you  don't  have  any  reason why  you  wouldn't  believe  this,

or  at  least  why  you  wouldn't at  least  explore  this  possibility,

then,  now  we  can  go  back and  do  the  same  exercise

and  explore  again  the  input  space, but  with  way  more  flexibility.

Again,  let's  see  if  we  can  find that  an  optimal  recipe

with  this  new  continuous  mixture  coding.

How  we're  going  to  do? Well,  exactly  same  ways.

I'm going to  use  the  JMP  profiler  feature

and  use  the  simulation and  see  if  we  can  find  anything.

Now  let  me  go  here.

This  is  my  DOE  continuous  table.

Continuous,  because  now  you  can  see  that

these  are  all  coded as  continuous  variables .

They  have  the  blue  triangle next  to  themselves .

The  C 1,  C 2,  C 3  are  also  these  stars,

indicating  that  they're  coded as  mixture  variables  in  JMP.

Now  imagine  that  again,

we  have  already  fitted  our  model with  the  fit  model  platform.

We  saved  our  prediction  formulas now  with  the  continuous  coding.

What  we're  going  to  do  same  thing,

Graph  Profiler,  select  those, and  here  we  go.

Here  is  our  prediction  profiler.

Now  we  can  play  way  more  with  the  profiler and  see  all  different  combinations

without  having  to  jump between  different  options.

Now,  once  again, red  triangle,  output  random  table.

Just  for  making  things  fair, I'm  going  to  ask  3,000  rows.

Again,  no  time, literally  blink  of  an  eye.

JMP  gives  you  3,000  row  tables

where  now  every recipe is again, sorry.

Every row  is  again, a  potential  hypothetical  recipe

that  we  haven't  really  seen, necessarily  seen  in  our  DOE

but  it  still  feasible,

because  it  still  respects  the  constraint that  we  had  at  the  beginning.

Once  again,

to  figure  out whether  something  good  is  happening,

or  at  least  whether within  this  30,000  formulation,

we  do  find  something  that  is  optimal.

I'm  going  to  construct  the  same  graph.

Now  you  can  see  that  our  points  are  all disperse  and  are  not  aligned  anymore.

Again,  fitting  the  axis just  to  aid  our  visualization.

This  is  the  nice  thing.

With  this  way  of  coding and  looking  at  more  of  the  input  space,

we  do  find  few formulations  that  seem  to  be  promising.

Of  course,  we  need  to  keep  in  mind that  this  our  predictive  values.

Everything  is  still  relying  on  our  data,   on  our  statistical  model  analysis,

but  is  still  more  promising  than  before.

We  do  find  something  in  the  optimal  region defined  by  these  two  axis.

Quickly,  going  back  to  my  presentation,

I  want  to  draw  a  final  conclusion  here,

which  is,  in  fact,  that, using  the  categorical  coding,

we  couldn't  find  any  recipe that  at  least  on  the  predictive  side,

could,  in  fact, meet  both  the  optimality  criteria.

Well,  once  we  turn  to,

figuring  out  how  to  code these  different  categorical  variables

into  continuous  and  mixture  variables

and  exploit  the  JMP  power  of  giving  us thousands  and  thousands  of  formulations,

we  do  find  a  few that  in  fact  meet  the  specs.

We  were  happy  that  at  least we  could  go  back  to  our  clients  say,

look,  instead  of giving  up  on  your  project,

try  to  make  these  formulations and  see  how,  in  fact,

whether  the  actual  properties do  meet  your  criteria  or  not,

but  at  least  it  gives  us  some  directions of  improvement  where  to  go.

With  this, I'd  like  to  end  my  presentation.

Thank  my  colleague,   Xinjie Tong

and  all  of  my  collaborators at  Dow  Chemical.

Thank  you,  all  of  you for  watching  my  presentation.

I'll  be  more  than  happy

to  answer  any  questions you  might  have  at  this  point.

Thank  you.