Abstract:The performance of visual anomaly inspection in industrial quality control is often constrained by the scarcity of real anomalous samples. Consequently, anomaly synthesis techniques have been developed to enlarge training sets and enhance downstream inspection. However, existing methods either suffer from poor integration caused by inpainting or fail to provide accurate masks. To address these limitations, we propose GroundingAnomaly, a novel few-shot anomaly image generation framework. Our framework introduces a Spatial Conditioning Module that leverages per-pixel semantic maps to enable precise spatial control over the synthesized anomalies. Furthermore, a Gated Self-Attention Module is designed to inject conditioning tokens into a frozen U-Net via gated attention layers. This carefully preserves pretrained priors while ensuring stable few-shot adaptation. Extensive evaluations on the MVTec AD and VisA datasets demonstrate that GroundingAnomaly generates high-quality anomalies and achieves state-of-the-art performance across multiple downstream tasks, including anomaly detection, segmentation, and instance-level detection.




Abstract:Large language models (LLMs) primarily trained on English texts, often face biases and inaccuracies in Chinese contexts. Their limitations are pronounced in fields like Traditional Chinese Medicine (TCM), where cultural and clinical subtleties are vital, further hindered by a lack of domain-specific data, such as rheumatoid arthritis (RA). To address these issues, this paper introduces Hengqin-RA-v1, the first large language model specifically tailored for TCM with a focus on diagnosing and treating RA. We also present HQ-GCM-RA-C1, a comprehensive RA-specific dataset curated from ancient Chinese medical literature, classical texts, and modern clinical studies. This dataset empowers Hengqin-RA-v1 to deliver accurate and culturally informed responses, effectively bridging the gaps left by general-purpose models. Extensive experiments demonstrate that Hengqin-RA-v1 outperforms state-of-the-art models, even surpassing the diagnostic accuracy of TCM practitioners in certain cases.




Abstract:In recently years, a significant amount of research has been conducted on applying deep learning methods for glaucoma classification and detection. However, the explainability of those established machine learning models remains a big concern. In this research, in contrast, we learn from cognitive science concept and study how ophthalmologists judge glaucoma detection. Simulating experts' efforts, we propose a hierarchical decision making system, centered around a holistic set of carefully designed biomarker-oriented machine learning models. While biomarkers represent the key indicators of how ophthalmologists identify glaucoma, they usually exhibit latent inter-relations. We thus construct a time series model, named TRI-LSTM, capable of calculating and uncovering potential and latent relationships among various biomarkers of glaucoma. Our model is among the first efforts to explore the intrinsic connections among glaucoma biomarkers. We monitor temporal relationships in patients' disease states over time and to capture and retain the progression of disease-relevant clinical information from prior visits, thereby enriching biomarker's potential relationships. Extensive experiments over real-world dataset have demonstrated the effectiveness of the proposed model.