Picture for Hongyuan Zhu

Hongyuan Zhu

Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning

Add code
Jun 16, 2025
Viaarxiv icon

Towards Rich Emotions in 3D Avatars: A Text-to-3D Avatar Generation Benchmark

Add code
Dec 03, 2024
Figure 1 for Towards Rich Emotions in 3D Avatars: A Text-to-3D Avatar Generation Benchmark
Figure 2 for Towards Rich Emotions in 3D Avatars: A Text-to-3D Avatar Generation Benchmark
Figure 3 for Towards Rich Emotions in 3D Avatars: A Text-to-3D Avatar Generation Benchmark
Figure 4 for Towards Rich Emotions in 3D Avatars: A Text-to-3D Avatar Generation Benchmark
Viaarxiv icon

Synergistic Dual Spatial-aware Generation of Image-to-Text and Text-to-Image

Add code
Oct 20, 2024
Figure 1 for Synergistic Dual Spatial-aware Generation of Image-to-Text and Text-to-Image
Figure 2 for Synergistic Dual Spatial-aware Generation of Image-to-Text and Text-to-Image
Figure 3 for Synergistic Dual Spatial-aware Generation of Image-to-Text and Text-to-Image
Figure 4 for Synergistic Dual Spatial-aware Generation of Image-to-Text and Text-to-Image
Viaarxiv icon

PointCloud-Text Matching: Benchmark Datasets and a Baseline

Add code
Mar 28, 2024
Figure 1 for PointCloud-Text Matching: Benchmark Datasets and a Baseline
Figure 2 for PointCloud-Text Matching: Benchmark Datasets and a Baseline
Figure 3 for PointCloud-Text Matching: Benchmark Datasets and a Baseline
Figure 4 for PointCloud-Text Matching: Benchmark Datasets and a Baseline
Viaarxiv icon

Contributing Dimension Structure of Deep Feature for Coreset Selection

Add code
Jan 29, 2024
Figure 1 for Contributing Dimension Structure of Deep Feature for Coreset Selection
Figure 2 for Contributing Dimension Structure of Deep Feature for Coreset Selection
Figure 3 for Contributing Dimension Structure of Deep Feature for Coreset Selection
Figure 4 for Contributing Dimension Structure of Deep Feature for Coreset Selection
Viaarxiv icon

Direct Distillation between Different Domains

Add code
Jan 12, 2024
Figure 1 for Direct Distillation between Different Domains
Figure 2 for Direct Distillation between Different Domains
Figure 3 for Direct Distillation between Different Domains
Figure 4 for Direct Distillation between Different Domains
Viaarxiv icon

M3DBench: Let's Instruct Large Models with Multi-modal 3D Prompts

Add code
Dec 17, 2023
Figure 1 for M3DBench: Let's Instruct Large Models with Multi-modal 3D Prompts
Figure 2 for M3DBench: Let's Instruct Large Models with Multi-modal 3D Prompts
Figure 3 for M3DBench: Let's Instruct Large Models with Multi-modal 3D Prompts
Figure 4 for M3DBench: Let's Instruct Large Models with Multi-modal 3D Prompts
Viaarxiv icon

LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning

Add code
Nov 30, 2023
Figure 1 for LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning
Figure 2 for LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning
Figure 3 for LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning
Figure 4 for LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning
Viaarxiv icon

Exploit the antenna response consistency to define the alignment criteria for CSI data

Add code
Oct 10, 2023
Figure 1 for Exploit the antenna response consistency to define the alignment criteria for CSI data
Figure 2 for Exploit the antenna response consistency to define the alignment criteria for CSI data
Figure 3 for Exploit the antenna response consistency to define the alignment criteria for CSI data
Figure 4 for Exploit the antenna response consistency to define the alignment criteria for CSI data
Viaarxiv icon

Towards Debiasing Frame Length Bias in Text-Video Retrieval via Causal Intervention

Add code
Sep 17, 2023
Figure 1 for Towards Debiasing Frame Length Bias in Text-Video Retrieval via Causal Intervention
Figure 2 for Towards Debiasing Frame Length Bias in Text-Video Retrieval via Causal Intervention
Figure 3 for Towards Debiasing Frame Length Bias in Text-Video Retrieval via Causal Intervention
Figure 4 for Towards Debiasing Frame Length Bias in Text-Video Retrieval via Causal Intervention
Viaarxiv icon