Picture for Chuang Gan

Chuang Gan

SocialGPT: Prompting LLMs for Social Relation Reasoning via Greedy Segment Optimization

Add code
Oct 28, 2024
Figure 1 for SocialGPT: Prompting LLMs for Social Relation Reasoning via Greedy Segment Optimization
Figure 2 for SocialGPT: Prompting LLMs for Social Relation Reasoning via Greedy Segment Optimization
Figure 3 for SocialGPT: Prompting LLMs for Social Relation Reasoning via Greedy Segment Optimization
Figure 4 for SocialGPT: Prompting LLMs for Social Relation Reasoning via Greedy Segment Optimization
Viaarxiv icon

UniMuMo: Unified Text, Music and Motion Generation

Add code
Oct 06, 2024
Figure 1 for UniMuMo: Unified Text, Music and Motion Generation
Figure 2 for UniMuMo: Unified Text, Music and Motion Generation
Figure 3 for UniMuMo: Unified Text, Music and Motion Generation
Figure 4 for UniMuMo: Unified Text, Music and Motion Generation
Viaarxiv icon

Compositional Physical Reasoning of Objects and Events from Videos

Add code
Aug 02, 2024
Viaarxiv icon

FlexAttention for Efficient High-Resolution Vision-Language Models

Add code
Jul 29, 2024
Figure 1 for FlexAttention for Efficient High-Resolution Vision-Language Models
Figure 2 for FlexAttention for Efficient High-Resolution Vision-Language Models
Figure 3 for FlexAttention for Efficient High-Resolution Vision-Language Models
Figure 4 for FlexAttention for Efficient High-Resolution Vision-Language Models
Viaarxiv icon

Disentangled Acoustic Fields For Multimodal Physical Scene Understanding

Add code
Jul 16, 2024
Figure 1 for Disentangled Acoustic Fields For Multimodal Physical Scene Understanding
Figure 2 for Disentangled Acoustic Fields For Multimodal Physical Scene Understanding
Figure 3 for Disentangled Acoustic Fields For Multimodal Physical Scene Understanding
Figure 4 for Disentangled Acoustic Fields For Multimodal Physical Scene Understanding
Viaarxiv icon

ConMe: Rethinking Evaluation of Compositional Reasoning for Modern VLMs

Add code
Jun 12, 2024
Figure 1 for ConMe: Rethinking Evaluation of Compositional Reasoning for Modern VLMs
Figure 2 for ConMe: Rethinking Evaluation of Compositional Reasoning for Modern VLMs
Figure 3 for ConMe: Rethinking Evaluation of Compositional Reasoning for Modern VLMs
Figure 4 for ConMe: Rethinking Evaluation of Compositional Reasoning for Modern VLMs
Viaarxiv icon

CoNav: A Benchmark for Human-Centered Collaborative Navigation

Add code
Jun 04, 2024
Viaarxiv icon

Physically Compatible 3D Object Modeling from a Single Image

Add code
Jun 03, 2024
Viaarxiv icon

RapVerse: Coherent Vocals and Whole-Body Motions Generations from Text

Add code
May 30, 2024
Figure 1 for RapVerse: Coherent Vocals and Whole-Body Motions Generations from Text
Figure 2 for RapVerse: Coherent Vocals and Whole-Body Motions Generations from Text
Figure 3 for RapVerse: Coherent Vocals and Whole-Body Motions Generations from Text
Figure 4 for RapVerse: Coherent Vocals and Whole-Body Motions Generations from Text
Viaarxiv icon

SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge

Add code
May 17, 2024
Figure 1 for SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge
Figure 2 for SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge
Figure 3 for SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge
Figure 4 for SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge
Viaarxiv icon