Picture for Yunze Man

Yunze Man

Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought

Add code
May 29, 2025
Viaarxiv icon

AgMMU: A Comprehensive Agricultural Multimodal Understanding and Reasoning Benchmark

Add code
Apr 14, 2025
Viaarxiv icon

PaintScene4D: Consistent 4D Scene Generation from Text Prompts

Add code
Dec 05, 2024
Viaarxiv icon

RandAR: Decoder-only Autoregressive Visual Generation in Random Orders

Add code
Dec 02, 2024
Figure 1 for RandAR: Decoder-only Autoregressive Visual Generation in Random Orders
Figure 2 for RandAR: Decoder-only Autoregressive Visual Generation in Random Orders
Figure 3 for RandAR: Decoder-only Autoregressive Visual Generation in Random Orders
Figure 4 for RandAR: Decoder-only Autoregressive Visual Generation in Random Orders
Viaarxiv icon

SceneCraft: Layout-Guided 3D Scene Generation

Add code
Oct 11, 2024
Figure 1 for SceneCraft: Layout-Guided 3D Scene Generation
Figure 2 for SceneCraft: Layout-Guided 3D Scene Generation
Figure 3 for SceneCraft: Layout-Guided 3D Scene Generation
Figure 4 for SceneCraft: Layout-Guided 3D Scene Generation
Viaarxiv icon

Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding

Add code
Sep 05, 2024
Figure 1 for Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding
Figure 2 for Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding
Figure 3 for Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding
Figure 4 for Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding
Viaarxiv icon

Floating No More: Object-Ground Reconstruction from a Single Image

Add code
Jul 26, 2024
Viaarxiv icon

Situational Awareness Matters in 3D Vision Language Reasoning

Add code
Jun 11, 2024
Figure 1 for Situational Awareness Matters in 3D Vision Language Reasoning
Figure 2 for Situational Awareness Matters in 3D Vision Language Reasoning
Figure 3 for Situational Awareness Matters in 3D Vision Language Reasoning
Figure 4 for Situational Awareness Matters in 3D Vision Language Reasoning
Viaarxiv icon

Frozen Transformers in Language Models Are Effective Visual Encoder Layers

Add code
Oct 19, 2023
Viaarxiv icon

DualCross: Cross-Modality Cross-Domain Adaptation for Monocular BEV Perception

Add code
May 05, 2023
Viaarxiv icon