Picture for Kaicheng Yang

Kaicheng Yang

OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence

Add code
Feb 09, 2026
Viaarxiv icon

Q-DiT4SR: Exploration of Detail-Preserving Diffusion Transformer Quantization for Real-World Image Super-Resolution

Add code
Feb 01, 2026
Viaarxiv icon

DanQing: An Up-to-Date Large-Scale Chinese Vision-Language Pre-training Dataset

Add code
Jan 15, 2026
Viaarxiv icon

ProCLIP: Progressive Vision-Language Alignment via LLM-based Embedder

Add code
Oct 22, 2025
Viaarxiv icon

Gradient-Attention Guided Dual-Masking Synergetic Framework for Robust Text-based Person Retrieval

Add code
Sep 11, 2025
Viaarxiv icon

Region-based Cluster Discrimination for Visual Representation Learning

Add code
Jul 26, 2025
Viaarxiv icon

Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs

Add code
Apr 24, 2025
Viaarxiv icon

Decoupled Global-Local Alignment for Improving Compositional Understanding

Add code
Apr 23, 2025
Viaarxiv icon

QArtSR: Quantization via Reverse-Module and Timestep-Retraining in One-Step Diffusion based Image Super-Resolution

Add code
Mar 07, 2025
Figure 1 for QArtSR: Quantization via Reverse-Module and Timestep-Retraining in One-Step Diffusion based Image Super-Resolution
Figure 2 for QArtSR: Quantization via Reverse-Module and Timestep-Retraining in One-Step Diffusion based Image Super-Resolution
Figure 3 for QArtSR: Quantization via Reverse-Module and Timestep-Retraining in One-Step Diffusion based Image Super-Resolution
Figure 4 for QArtSR: Quantization via Reverse-Module and Timestep-Retraining in One-Step Diffusion based Image Super-Resolution
Viaarxiv icon

RealSyn: An Effective and Scalable Multimodal Interleaved Document Transformation Paradigm

Add code
Feb 18, 2025
Figure 1 for RealSyn: An Effective and Scalable Multimodal Interleaved Document Transformation Paradigm
Figure 2 for RealSyn: An Effective and Scalable Multimodal Interleaved Document Transformation Paradigm
Figure 3 for RealSyn: An Effective and Scalable Multimodal Interleaved Document Transformation Paradigm
Figure 4 for RealSyn: An Effective and Scalable Multimodal Interleaved Document Transformation Paradigm
Viaarxiv icon