Abstract
Unsupervised defect detection is crucial for industrial inspection, but teacher–student (T–S) frameworks tend to overfit a single teacher’s feature manifold, leading to poor generalization on subtle anomalies. We introduce TAD++, a dual-path distillation framework that combines heterogeneous Teacher–Assistant–Student (T–A–S) guidance with a pseudo-defect inverse-distillation branch. A compact assistant, structurally distinct from the teacher, is trained to co-distill the student, thereby mitigating single-teacher bias. In parallel, the inverse-distillation path tasks the student with reconstructing normal appearances from defect-injected inputs, serving as a regularization term to prevent anomaly leakage. A dynamic attention weighting module adaptively fuses these heterogeneous guidance signals. Crucially, the assistant, inverse branch, and weight modules are strictly training-only. This design ensures that while TAD++ benefits from a rigorous multi-phase optimization, it maintains zero additional inference latency and memory overhead compared to standard T–S baselines. On MVTec AD, BTAD, and VisA, TAD++ achieves consistent improvements in both image-level detection and pixel-level localization, with extensive ablations confirming the efficacy of the heterogeneous dual-path design.