Data-Driven Optimization of Semiconductor Processes and Forecasting
This project focuses on enhancing semiconductor manufacturing efficiency by applying data mining techniques across four distinct datasets. Each dataset corresponds to a unique aspect of semiconductor manufacturing and analysis, including performance benchmarking, manufacturing analysis, wafer fault detection, and economic forecasting related to semiconductor shortages.
The project employs the CRISP-DM (Cross-Industry Standard Process for Data Mining) methodology combined with Agile-Scrum practices to ensure a structured yet flexible approach. The main objective is to identify key factors influencing outcomes such as oxide thickness, yield, and defect rates, ultimately improving production processes and strategic decision-making.
This project uses Python 3.12.4 and is managed with a virtual environment to ensure that all dependencies are correctly isolated.
To create and activate the virtual environment, run the following commands in your terminal:
python3.12 -m venv venv
source venv/bin/activate # On Windows use `venv\Scripts\activate`
After activating the virtual environment, install the required packages using pip
:
pip install -r requirements.txt
The requirements.txt
file includes the following dependencies with specific versions:
absl-py==2.1.0
anyio==4.4.0
argon2-cffi==23.1.0
argon2-cffi-bindings==21.2.0
arrow==1.3.0
asttokens==2.4.1
astunparse==1.6.3
async-lru==2.0.4
attrs==24.2.0
Babel==2.15.0
beautifulsoup4==4.12.3
bleach==6.1.0
certifi==2024.7.4
cffi==1.17.0
charset-normalizer==3.3.2
cmdstanpy==1.2.4
colorama==0.4.6
comm==0.2.2
contourpy==1.2.1
cycler==0.12.1
Cython==3.0.11
debugpy==1.8.5
decorator==5.1.1
defusedxml==0.7.1
executing==2.0.1
fastjsonschema==2.20.0
flatbuffers==24.3.25
fonttools==4.53.1
fqdn==1.5.1
gast==0.6.0
google-pasta==0.2.0
grpcio==1.65.5
h11==0.14.0
h5py==3.11.0
holidays==0.54
httpcore==1.0.5
httpx==0.27.0
idna==3.7
imbalanced-learn==0.12.3
importlib_resources==6.4.3
ipykernel==6.29.5
ipython==8.26.0
ipywidgets==8.1.3
isoduration==20.11.0
jedi==0.19.1
Jinja2==3.1.4
joblib==1.4.2
json5==0.9.25
jsonpointer==3.0.0
jsonschema==4.23.0
jsonschema-specifications==2023.12.1
jupyter==1.0.0
jupyter_client==8.6.2
jupyter-console==6.6.3
jupyter_core==5.7.2
jupyter-events==0.10.0
jupyter-lsp==2.2.5
jupyter_server==2.14.2
jupyter_server_terminals==0.5.3
jupyterlab==4.2.4
jupyterlab_pygments==0.3.0
jupyterlab_server==2.27.3
jupyterlab_widgets==3.0.11
keras==3.5.0
kiwisolver==1.4.5
libclang==18.1.1
Markdown==3.7
markdown-it-py==3.0.0
MarkupSafe==2.1.5
matplotlib==3.9.0
matplotlib-inline==0.1.7
mdurl==0.1.2
meson==1.5.1
mistune==3.0.2
ml-dtypes==0.4.0
namex==0.0.8
nbclient==0.10.0
nbconvert==7.16.4
nbformat==5.10.4
nest-asyncio==1.6.0
ninja==1.11.1.1
notebook==7.2.1
notebook_shim==0.2.4
numpy==1.26.4
opt-einsum==3.3.0
optree==0.12.1
overrides==7.7.0
packaging==24.1
pandas==2.2.2
pandocfilters==1.5.1
parso==0.8.4
patsy==0.5.6
pillow==10.4.0
pip==24.2
platformdirs==4.2.2
pmdarima==2.0.4
prometheus_client==0.20.0
prompt_toolkit==3.0.47
prophet==1.1.5
protobuf==4.25.4
psutil==6.0.0
pure_eval==0.2.3
pycparser==2.22
Pygments==2.18.0
pyparsing==3.1.2
python-dateutil==2.9.0.post0
python-json-logger==2.0.7
pytz==2024.1
pywin32==306
pywinpty==2.0.13
PyYAML==6.0.2
pyzmq==26.1.0
qtconsole==5.5.2
QtPy==2.4.1
referencing==0.35.1
requests==2.32.3
rfc3339-validator==0.1.4
rfc3986-validator==0.1.1
rich==13.7.1
rpds-py==0.20.0
scikit-learn==1.5.1
scipy==1.14.0
seaborn==0.13.2
Send2Trash==1.8.3
setuptools==72.1.0
six==1.16.0
sniffio==1.3.1
soupsieve==2.5
stack-data==0.6.3
stanio==0.5.1
statsmodels==0.14.0
tensorboard==2.17.1
tensorboard-data-server==0.7.2
tensorflow-intel==2.17.0
termcolor==2.4.0
terminado==0.18.1
threadpoolctl==3.5.0
tinycss2==1.3.0
tornado==6.4.1
tqdm==4.66.5
traitlets==5.14.3
types-python-dateutil==2.9.0.20240316
typing_extensions==4.12.2
tzdata==2024.1
uri-template==1.3.0
urllib3==2.2.2
wcwidth==0.2.13
webcolors==24.6.0
webencodings==0.5.1
websocket-client==1.8.0
Werkzeug==3.0.3
wheel==0.43.0
widgetsnbextension==4.0.11
wrapt==1.16.0
xgboost==2.1.1
Note: If newer versions of any packages are released and cause compatibility issues, the versions listed here should be used as a reference for the working environment.
The project adheres to the CRISP-DM framework, which is particularly suitable for handling complex datasets from various sources. The six stages of CRISP-DM—Business Understanding, Data Understanding, Data Preparation, Modeling, Evaluation, and Deployment—are iteratively followed to ensure the project addresses the critical challenges in semiconductor manufacturing.
The project successfully developed predictive models and insights that can improve semiconductor manufacturing efficiency, forecast economic trends, and optimize production processes. Key findings include the validation of Moore's Law in performance benchmarking, identification of critical features impacting yield, and sensors crucial for fault detection in wafers.
Further exploration could involve refining the models by incorporating more external factors, addressing class imbalance more effectively, and extending the economic forecasting models to include additional economic indicators.
The following datasets were used in this project, and their respective authors are credited below:
ChipPerformance.csv:
ChipPerformance.csv
.FeatureSelection.csv:
FeatureSelection.csv
.WaferFaultRates.csv:
WaferFaultRates.csv
.SemiconductorShortage.csv:
SemiconductorShortage.csv
.