Data Mining for asymmetric data set under the curse of dimensionality
Find in big and noisy data set the most influential yield predictors in a Semiconductor Fab
Introduction:
Semiconductor manufacturing is one of the most technologically and highly complicated manufacturing processes. Because of high number of process steps and the high number of sensors this industry is facing a huge torrent of data. In addition to the large number of production data, the unbalance of pass and failing parts make this dataset difficult to analyze.
With a so high number of data the standard technique of one variable at the time could fail because of the influence of a large number of manufacturing variables.
Data by itself isn’t useful. To be useful it must be converted into actionable information to drive yield and product quality improvement.
Here comes the Machine Learning (ML).
In order to avoid model over-fitting issue the reduction of sample dimensionality is needed too. In other words to increase the signal-noise ratio of available data we need to reduce the feature number before apply any ML model. Once the interesting patterns have been extracted from database, they will be validate by experience of engineer.
The entire process has been realized using JMP13 JSL features