Abstract
—Tool wear detection in mechanical machining is a critical link for ensuring product quality and improving production efficiency. However, this field faces challenges such as scarce annotated data and interference from complex working conditions, making it difficult to deploy advanced detection models. To address the fundamental mismatch between model capacity and data availability, this paper proposes a novel data-efficient hybrid detection architecture named MD-YOLOV12. This architecture ingeniously integrates the rich general visual representations learned by the self-supervised vision transformer model DINOv3 with the YOLOv12 object detection framework. Specifically, we perform feature enhancement at two key locations: input preprocessing and the middle layer of the backbone network, thereby enhancing the model's perception and recognition capability for tool wear features without relying on massive annotated data. To validate the method's effectiveness, we constructed a specialized tool wear detection dataset containing 8083 high-resolution images, meticulously annotated into three categories: "No Wear," "Moderate Wear," and "Severe Wear." Extensive experimental results demonstrate that the proposed MD-YOLOV12 method surpasses existing state-of-the-art techniques in the tool wear detection task, providing a viable technical pathway for data-efficient industrial vision applications.