Synergy Between Design of Experiments and Machine Learning for Enhanced Domain Expertise
The aim of the study was to explore a simplex mixture of three raw materials using a design of experiment (DOE) and to characterize it in terms of price, viscosities, and stabilities at different conditions (temperatures). Due to a difficulty in postulating an a priori model and the presence of a possible area of instability of the formulas, which could compromise the success of the DOE and the subsequent analysis of the results (no measurable response in the event of instability), a space filling design type with excluded zone was conducted. A first modelling with different machine learning type models (SVM, Gaussian process) was carried out, but certain areas of the experimental space were poorly described due to missing values for viscosity (e.g., too low viscosity or instability of some formulations). Using information from domain expertise, and with the help of a local data imputation method by K-nearest neighbors, the modelling was corrected and provided satisfactory results, thus giving a better representation and understanding of the experimental space and enabling the identification of a promising formulation candidate.