Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jingwei Guan

ZeroSense:How Vision matters in Long Context Compression

Mar 12, 2026

Yonghan Gao, Zehong Chen, Lijian Xu, Jingzhi Chen, Jingwei Guan, Xingyu Zeng

Abstract:Recent visual-text compression (VTC) methods, typified by DeepSeek-OCR, report impressive high token compression ratios for long-context modeling tasks by leveraging text-to-image rendering. However, existing evaluation protocols heavily rely on downstream task performance. Such evaluation metrics fail to accurately measure text preservation due to the strong inherent linguistic priors of Multimodal Large Language Models (MLLMs). In this work, we introduce a new evaluation framework that decouples MLLMs' capabilities to faithfully assess VTC quality. Within this framework, we further introduce the ZeroSense Benchmark to ensure low semantic correlation of testing samples. By eliminating contextual dependencies, our benchmark guarantees that the evaluation results are purely reflective of VTC quality, unaffected by the semantic inference capabilities of downstream models. Extensive experiments across multiple datasets demonstrate that VTC quality and downstream task accuracy diverge significantly, highlighting the necessity of our decoupled evaluation framework.

Via

Access Paper or Ask Questions

LDPM: Towards undersampled MRI reconstruction with MR-VAE and Latent Diffusion Prior

Nov 05, 2024

Xingjian Tang, Jingwei Guan, Linge Li, Youmei Zhang, Mengye Lyu, Li Yan

Figure 1 for LDPM: Towards undersampled MRI reconstruction with MR-VAE and Latent Diffusion Prior

Figure 2 for LDPM: Towards undersampled MRI reconstruction with MR-VAE and Latent Diffusion Prior

Figure 3 for LDPM: Towards undersampled MRI reconstruction with MR-VAE and Latent Diffusion Prior

Figure 4 for LDPM: Towards undersampled MRI reconstruction with MR-VAE and Latent Diffusion Prior

Abstract:Diffusion model, as a powerful generative model, has found a wide range of applications including MRI reconstruction. However, most existing diffusion model-based MRI reconstruction methods operate directly in pixel space, which makes their optimization and inference computationally expensive. Latent diffusion models were introduced to address this problem in natural image processing, but directly applying them to MRI reconstruction still faces many challenges, including the lack of control over the generated results, the adaptability of Variational AutoEncoder (VAE) to MRI, and the exploration of applicable data consistency in latent space. To address these challenges, a Latent Diffusion Prior based undersampled MRI reconstruction (LDPM) method is proposed. A sketcher module is utilized to provide appropriate control and balance the quality and fidelity of the reconstructed MR images. A VAE adapted for MRI tasks (MR-VAE) is explored, which can serve as the backbone for future MR-related tasks. Furthermore, a variation of the DDIM sampler, called the Dual-Stage Sampler, is proposed to achieve high-fidelity reconstruction in the latent space. The proposed method achieves competitive results on fastMRI datasets, and the effectiveness of each module is demonstrated in ablation experiments.

Via

Access Paper or Ask Questions