Abstract:Image tokenizers are central to modern vision models as they often operate in latent spaces. An ideal latent space must be simultaneously compact and generation-friendly: it should capture image's essential content compactly while remaining easy to model with generative approaches. In this work, we introduce a novel regularizer to align latent spaces with these two objectives. The key idea is to guide tokenizers to mimic the hidden state dynamics of state-space models (SSMs), thereby transferring their critical property, frequency awareness, to latent features. Grounded in a theoretical analysis of SSMs, our regularizer enforces encoding of fine spatial structures and frequency-domain cues into compact latent features; leading to more effective use of representation capacity and improved generative modelability. Experiments demonstrate that our method improves generation quality in diffusion models while incurring only minimal loss in reconstruction fidelity.
Abstract:Deep-learning-based denoising methods have significantly improved Low-Dose CT (LDCT) image quality. However, existing models often over-smooth important anatomical details due to their purely data-driven attention mechanisms. To address this challenge, we propose a novel LDCT denoising framework, BioAtt. The key innovation lies in attending anatomical prior distributions extracted from the pretrained vision-language model BiomedCLIP. These priors guide the denoising model to focus on anatomically relevant regions to suppress noise while preserving clinically relevant structures. We highlight three main contributions: BioAtt outperforms baseline and attention-based models in SSIM, PSNR, and RMSE across multiple anatomical regions. The framework introduces a new architectural paradigm by embedding anatomic priors directly into spatial attention. Finally, BioAtt attention maps provide visual confirmation that the improvements stem from anatomical guidance rather than increased model complexity.




Abstract:Recent advances in deep-learning based denoising methods have improved Low-Dose CT image quality. However, due to distinct HU distributions and diverse anatomical characteristics, a single model often struggles to generalize across multiple anatomies. To address this limitation, we introduce \textbf{Agent-Integrated Denoising Experts (A-IDE)} framework, which integrates three anatomical region-specialized RED-CNN models under the management of decision-making LLM agent. The agent analyzes semantic cues from BiomedCLIP to dynamically route incoming LDCT scans to the most appropriate expert model. We highlight three major advantages of our approach. A-IDE excels in heterogeneous, data-scarce environments. The framework automatically prevents overfitting by distributing tasks among multiple experts. Finally, our LLM-driven agentic pipeline eliminates the need for manual interventions. Experimental evaluations on the Mayo-2016 dataset confirm that A-IDE achieves superior performance in RMSE, PSNR, and SSIM compared to a single unified denoiser.




Abstract:The aim of this work is to propose a new paradigm that imparts intelligence to metal parts with the fusion of metal additive manufacturing and artificial intelligence (AI). Our digital metal part classifies the status with real time data processing with convolutional neural network (CNN). The training data for the CNN is collected from a strain gauge embedded in metal parts by laser powder bed fusion process. We implement this approach using additive manufacturing, demonstrate a self-cognitive metal part recognizing partial screw loosening, malfunctioning, and external impacting object. The results indicate that metal part can recognize subtle change of multiple fixation state under repetitive compression with 89.1% accuracy with test sets. The proposed strategy showed promising potential in contributing to the hyper-connectivity for next generation of digital metal based mechanical systems