Abstract:Recent advancements in Low-Light Image Enhancement (LLIE) have focused heavily on Diffusion Probabilistic Models, which achieve high perceptual quality but suffer from significant computational latency (often exceeding 2-4 seconds per image). Conversely, traditional CNN-based baselines offer real-time inference but struggle with "over-smoothing," failing to recover fine structural details in extreme low-light conditions. This creates a practical gap in the literature: the lack of a model that provides generative-level texture recovery at edge-deployable speeds. In this paper, we address this trade-off by proposing a hybrid Attention U-Net GAN. We demonstrate that the heavy iterative sampling of diffusion models is not strictly necessary for texture recovery. Instead, by integrating Attention Gates into a lightweight U-Net backbone and training within a conditional adversarial framework, we can approximate the high-frequency fidelity of generative models in a single forward pass. Extensive experiments on the SID dataset show that our method achieves a best-in-class LPIPS score of 0.112 among efficient models, significantly outperforming efficient baselines (SID, EnlightenGAN) while maintaining an inference latency of 0.06s. This represents a 40x speedup over latent diffusion models, making our approach suitable for near real-time applications.
Abstract:Model stealing attacks pose an existential threat to Machine Learning as a Service (MLaaS), allowing adversaries to replicate proprietary models for a fraction of their training cost. While Data-Free Model Extraction (DFME) has emerged as a stealthy vector, it remains fundamentally constrained by the "Cold Start" problem: GAN-based adversaries waste thousands of queries converging from random noise to meaningful data. We propose DiMEx, a framework that weaponizes the rich semantic priors of pre-trained Latent Diffusion Models to bypass this initialization barrier entirely. By employing Random Embedding Bayesian Optimization (REMBO) within the generator's latent space, DiMEx synthesizes high-fidelity queries immediately, achieving 52.1 percent agreement on SVHN with just 2,000 queries - outperforming state-of-the-art GAN baselines by over 16 percent. To counter this highly semantic threat, we introduce the Hybrid Stateful Ensemble (HSE) defense, which identifies the unique "optimization trajectory" of latent-space attacks. Our results demonstrate that while DiMEx evades static distribution detectors, HSE exploits this temporal signature to suppress attack success rates to 21.6 percent with negligible latency.




Abstract:Fine-grained visual classification aims to recognize objects belonging to multiple subordinate categories within a super-category. However, this remains a challenging problem, as appearance information alone is often insufficient to accurately differentiate between fine-grained visual categories. To address this, we propose a novel and unified framework that leverages meta-information to assist fine-grained identification. We tackle the joint learning of visual and meta-information through cross-contrastive pre-training. In the first stage, we employ three encoders for images, text, and meta-information, aligning their projected embeddings to achieve better representations. We then fine-tune the image and meta-information encoders for the classification task. Experiments on the NABirds dataset demonstrate that our framework effectively utilizes meta-information to enhance fine-grained recognition performance. With the addition of meta-information, our framework surpasses the current baseline on NABirds by 7.83%. Furthermore, it achieves an accuracy of 84.44% on the NABirds dataset, outperforming many existing state-of-the-art approaches that utilize meta-information.