MUGA LAB Undergraduate Research – Experimental Pipeline

This directory contains the core experimental workflow for the MUGA LAB undergraduate research program on Calibration and Distillation of Modern Non-Image Neural Networks.

Each file represents a distinct and reproducible experimental stage, designed to align with corresponding sections of the undergraduate thesis (Methodology → Results → Discussion).

All experiments are fully tracked using MLflow 3.0 and compatible with CUDA, MPS (Apple Silicon), and CPU backends.

📂 Directory Overview

File	Stage	Description
`01_baseline_tuning.py`	Stage 1	Tunes a baseline MLP using Optuna or DEHB, logs best configuration to MLflow.
`02_temperature_scaling.py`	Stage 2	Applies post-hoc temperature scaling calibration to the tuned model, optimizes ( T^* ), and evaluates calibration metrics (ECE, MCS, NLL).
`03_distillation_experiment.py`	Stage 3	Performs knowledge distillation from the calibrated teacher to a smaller student model, logging teacher and student performance.
`04_cross_architecture_eval.py`	Stage 4	Compares calibration and accuracy across multiple architectures (Baseline, Calibrated, Distilled). Generates LaTeX-ready tables.
`05_calibration_summary.py`	Stage 5	Aggregates metrics from all experiments, computes mean ± std, and exports CSV + LaTeX summary tables for thesis reporting.
`__init__.py`	—	Marks directory as a Python module for structured imports.

Workflow Summary

The MUGA LAB undergraduate pipeline follows a five-stage methodology:

Model Optimization (Stage 1)
Optimize an MLP model on tabular data using Optuna or DEHB.
Output: best hyperparameters + model artifact logged to MLflow.
Calibration (Stage 2)
Calibrate the tuned model using temperature scaling.
Output: optimal temperature (T^*), improved ECE and NLL metrics.
Distillation (Stage 3)
Distill calibrated teacher knowledge into a smaller student network.
Output: distilled student model + teacher/student metrics.
Cross-Architecture Comparison (Stage 4)
Evaluate multiple trained models across architectures or training regimes.
Output: comparative LaTeX table of calibration and accuracy metrics.
Summary and Reporting (Stage 5)
Consolidate all experiments into a final quantitative report.
Output: CSV + LaTeX tables for the Results & Discussion chapters.

Integration with Other MUGA LAB Modules

Dependency	Description
`mlp_tuner_tabular_mlflow.py`	Core MLP tuning logic and MLflow integration.
`calibration_metrics.py`	Implements Expected Calibration Error (ECE), Miscalibration Score (MCS), and NLL.
`reliability_diagram_utils.py`	Generates reliability diagrams and logs them to MLflow.
`../utils/seed_sensitivity_utils.py`	Provides multi-seed reproducibility utilities for robustness analysis.

Execution Guide

Example Pipeline

```bash

1. Baseline tuning

python 01_baseline_tuning.py –train ../../data/train.csv –target label –search optuna

2. Calibration

python 02_temperature_scaling.py –model_uri runs://model --train ../../data/train.csv --target label

3. Distillation

python 03_distillation_experiment.py –teacher_uri runs://model --train ../../data/train.csv --target label

4. Cross-architecture evaluation

python 04_cross_architecture_eval.py
–models runs://model runs://model runs://model \ --labels Baseline Calibrated Distilled \ --train ../../data/train.csv --target label

5. Summary aggregation

python 05_calibration_summary.py –mlflow_uri ../../results/mlruns –output_dir ../../reports/summary