Picture for Xuan Dong

Xuan Dong

Gene

ZeroGUI: Automating Online GUI Learning at Zero Human Cost

Add code
May 29, 2025
Viaarxiv icon

Investigating and Enhancing the Robustness of Large Multimodal Models Against Temporal Inconsistency

Add code
May 20, 2025
Viaarxiv icon

Multi-Grained Compositional Visual Clue Learning for Image Intent Recognition

Add code
Apr 25, 2025
Viaarxiv icon

Consistent Diffusion: Denoising Diffusion Model with Data-Consistent Training for Image Restoration

Add code
Dec 17, 2024
Viaarxiv icon

PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models

Add code
Dec 12, 2024
Figure 1 for PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models
Figure 2 for PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models
Figure 3 for PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models
Figure 4 for PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models
Viaarxiv icon

Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning

Add code
Jun 11, 2024
Figure 1 for Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
Figure 2 for Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
Figure 3 for Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
Figure 4 for Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
Viaarxiv icon

ReWiTe: Realistic Wide-angle and Telephoto Dual Camera Fusion Dataset via Beam Splitter Camera Rig

Add code
Apr 16, 2024
Figure 1 for ReWiTe: Realistic Wide-angle and Telephoto Dual Camera Fusion Dataset via Beam Splitter Camera Rig
Figure 2 for ReWiTe: Realistic Wide-angle and Telephoto Dual Camera Fusion Dataset via Beam Splitter Camera Rig
Figure 3 for ReWiTe: Realistic Wide-angle and Telephoto Dual Camera Fusion Dataset via Beam Splitter Camera Rig
Figure 4 for ReWiTe: Realistic Wide-angle and Telephoto Dual Camera Fusion Dataset via Beam Splitter Camera Rig
Viaarxiv icon

View Transition based Dual Camera Image Fusion

Add code
Dec 18, 2023
Viaarxiv icon

A Pyramid Recurrent Network for Predicting Crowdsourced Speech-Quality Ratings of Real-World Signals

Add code
Jul 31, 2020
Figure 1 for A Pyramid Recurrent Network for Predicting Crowdsourced Speech-Quality Ratings of Real-World Signals
Figure 2 for A Pyramid Recurrent Network for Predicting Crowdsourced Speech-Quality Ratings of Real-World Signals
Figure 3 for A Pyramid Recurrent Network for Predicting Crowdsourced Speech-Quality Ratings of Real-World Signals
Figure 4 for A Pyramid Recurrent Network for Predicting Crowdsourced Speech-Quality Ratings of Real-World Signals
Viaarxiv icon

Ground-truth dataset and baseline evaluations for image base-detail separation algorithms

Add code
Feb 18, 2016
Figure 1 for Ground-truth dataset and baseline evaluations for image base-detail separation algorithms
Figure 2 for Ground-truth dataset and baseline evaluations for image base-detail separation algorithms
Figure 3 for Ground-truth dataset and baseline evaluations for image base-detail separation algorithms
Figure 4 for Ground-truth dataset and baseline evaluations for image base-detail separation algorithms
Viaarxiv icon