Link to Kaggle notebook for this project: https://www.kaggle.com/code/adrianacosta0/data-science-for-semiconductor-process-reliability

I set up this mini-tutorial to demonstrate simulating a CVD SiO2 growth process on a wafer, focusing on process reliability and variation across multiple chambers. The key steps include:

1) Process Simulation: Created a simplified model for deposition rate, incorporating various process parameters like pressure, temperature, and gas flow rates.

2) Data Generation: Simulated data for 10 chambers processing 10 wafers each, including realistic hardware/sensor variations and an intentional outlier chamber.

3) Visualization: Plotting deposition rates across wafers, revealing chamber-to-chamber variations.

4) Principal Component Analysis (PCA): Applied PCA to visualize chamber clustering based on process conditions, effectively identifying the outlier chamber.

5) Machine Learning for Variation Source: Implemented a Random Forest classifier to identify key input parameters explaining the variation between normal and outlier chambers.

6) Drift Analysis: Introduced an artificial drift in deposition rate and analyzed its impact using PCA and sensor data visualization.

7) Predictive Modeling: Used a Random Forest regressor to predict deposition rates and identify the most influential process parameters.