Picture for Haoyuan Shi

Haoyuan Shi

AniMaker: Automated Multi-Agent Animated Storytelling with MCTS-Driven Clip Generation

Add code
Jun 12, 2025
Viaarxiv icon

Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models

Add code
May 08, 2025
Viaarxiv icon

AI Awareness

Add code
Apr 25, 2025
Viaarxiv icon

VideoVista-CulturalLingo: 360$^\circ$ Horizons-Bridging Cultures, Languages, and Domains in Video Comprehension

Add code
Apr 23, 2025
Viaarxiv icon

CSHNet: A Novel Information Asymmetric Image Translation Method

Add code
Jan 17, 2025
Figure 1 for CSHNet: A Novel Information Asymmetric Image Translation Method
Figure 2 for CSHNet: A Novel Information Asymmetric Image Translation Method
Figure 3 for CSHNet: A Novel Information Asymmetric Image Translation Method
Figure 4 for CSHNet: A Novel Information Asymmetric Image Translation Method
Viaarxiv icon

Controllable Edge-Type-Specific Interpretation in Multi-Relational Graph Neural Networks for Drug Response Prediction

Add code
Sep 03, 2024
Viaarxiv icon

DRExplainer: Quantifiable Interpretability in Drug Response Prediction with Directed Graph Convolutional Network

Add code
Aug 22, 2024
Viaarxiv icon

Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation

Add code
Aug 19, 2024
Figure 1 for Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation
Figure 2 for Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation
Figure 3 for Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation
Figure 4 for Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation
Viaarxiv icon

VideoVista: A Versatile Benchmark for Video Understanding and Reasoning

Add code
Jun 17, 2024
Figure 1 for VideoVista: A Versatile Benchmark for Video Understanding and Reasoning
Figure 2 for VideoVista: A Versatile Benchmark for Video Understanding and Reasoning
Figure 3 for VideoVista: A Versatile Benchmark for Video Understanding and Reasoning
Figure 4 for VideoVista: A Versatile Benchmark for Video Understanding and Reasoning
Viaarxiv icon

TokenUnify: Scalable Autoregressive Visual Pre-training with Mixture Token Prediction

Add code
May 27, 2024
Viaarxiv icon