Picture for Wangmeng Zuo

Wangmeng Zuo

Text to Image for Multi-Label Image Recognition with Joint Prompt-Adapter Learning

Add code
Jun 12, 2025
Figure 1 for Text to Image for Multi-Label Image Recognition with Joint Prompt-Adapter Learning
Figure 2 for Text to Image for Multi-Label Image Recognition with Joint Prompt-Adapter Learning
Figure 3 for Text to Image for Multi-Label Image Recognition with Joint Prompt-Adapter Learning
Figure 4 for Text to Image for Multi-Label Image Recognition with Joint Prompt-Adapter Learning
Viaarxiv icon

Image Demoiréing Using Dual Camera Fusion on Mobile Phones

Add code
Jun 10, 2025
Viaarxiv icon

Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers

Add code
Jun 09, 2025
Figure 1 for Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers
Figure 2 for Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers
Figure 3 for Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers
Figure 4 for Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers
Viaarxiv icon

Voyager: Long-Range and World-Consistent Video Diffusion for Explorable 3D Scene Generation

Add code
Jun 04, 2025
Viaarxiv icon

MIRAGE: Assessing Hallucination in Multimodal Reasoning Chains of MLLM

Add code
May 30, 2025
Viaarxiv icon

LabUtopia: High-Fidelity Simulation and Hierarchical Benchmark for Scientific Embodied Agents

Add code
May 28, 2025
Figure 1 for LabUtopia: High-Fidelity Simulation and Hierarchical Benchmark for Scientific Embodied Agents
Figure 2 for LabUtopia: High-Fidelity Simulation and Hierarchical Benchmark for Scientific Embodied Agents
Figure 3 for LabUtopia: High-Fidelity Simulation and Hierarchical Benchmark for Scientific Embodied Agents
Figure 4 for LabUtopia: High-Fidelity Simulation and Hierarchical Benchmark for Scientific Embodied Agents
Viaarxiv icon

Multi-Timescale Motion-Decoupled Spiking Transformer for Audio-Visual Zero-Shot Learning

Add code
May 26, 2025
Figure 1 for Multi-Timescale Motion-Decoupled Spiking Transformer for Audio-Visual Zero-Shot Learning
Figure 2 for Multi-Timescale Motion-Decoupled Spiking Transformer for Audio-Visual Zero-Shot Learning
Figure 3 for Multi-Timescale Motion-Decoupled Spiking Transformer for Audio-Visual Zero-Shot Learning
Figure 4 for Multi-Timescale Motion-Decoupled Spiking Transformer for Audio-Visual Zero-Shot Learning
Viaarxiv icon

Point-RFT: Improving Multimodal Reasoning with Visually Grounded Reinforcement Finetuning

Add code
May 26, 2025
Figure 1 for Point-RFT: Improving Multimodal Reasoning with Visually Grounded Reinforcement Finetuning
Figure 2 for Point-RFT: Improving Multimodal Reasoning with Visually Grounded Reinforcement Finetuning
Figure 3 for Point-RFT: Improving Multimodal Reasoning with Visually Grounded Reinforcement Finetuning
Figure 4 for Point-RFT: Improving Multimodal Reasoning with Visually Grounded Reinforcement Finetuning
Viaarxiv icon

Decoupled Visual Interpretation and Linguistic Reasoning for Math Problem Solving

Add code
May 23, 2025
Viaarxiv icon

High-Frequency Prior-Driven Adaptive Masking for Accelerating Image Super-Resolution

Add code
May 11, 2025
Figure 1 for High-Frequency Prior-Driven Adaptive Masking for Accelerating Image Super-Resolution
Figure 2 for High-Frequency Prior-Driven Adaptive Masking for Accelerating Image Super-Resolution
Figure 3 for High-Frequency Prior-Driven Adaptive Masking for Accelerating Image Super-Resolution
Figure 4 for High-Frequency Prior-Driven Adaptive Masking for Accelerating Image Super-Resolution
Viaarxiv icon