Picture for Yang Shi

Yang Shi

VersaVid-R1: A Versatile Video Understanding and Reasoning Model from Question Answering to Captioning Tasks

Add code
Jun 10, 2025
Viaarxiv icon

MME-VideoOCR: Evaluating OCR-Based Capabilities of Multimodal LLMs in Video Scenarios

Add code
May 27, 2025
Viaarxiv icon

Mavors: Multi-granularity Video Representation for Multimodal Large Language Model

Add code
Apr 14, 2025
Viaarxiv icon

MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models

Add code
Apr 07, 2025
Viaarxiv icon

Scalable Overload-Aware Graph-Based Index Construction for 10-Billion-Scale Vector Similarity Search

Add code
Feb 28, 2025
Viaarxiv icon

AI Models Still Lag Behind Traditional Numerical Models in Predicting Sudden-Turning Typhoons

Add code
Feb 22, 2025
Viaarxiv icon

EmbodiedEval: Evaluate Multimodal LLMs as Embodied Agents

Add code
Jan 21, 2025
Figure 1 for EmbodiedEval: Evaluate Multimodal LLMs as Embodied Agents
Figure 2 for EmbodiedEval: Evaluate Multimodal LLMs as Embodied Agents
Figure 3 for EmbodiedEval: Evaluate Multimodal LLMs as Embodied Agents
Figure 4 for EmbodiedEval: Evaluate Multimodal LLMs as Embodied Agents
Viaarxiv icon

Political-LLM: Large Language Models in Political Science

Add code
Dec 09, 2024
Figure 1 for Political-LLM: Large Language Models in Political Science
Figure 2 for Political-LLM: Large Language Models in Political Science
Figure 3 for Political-LLM: Large Language Models in Political Science
Figure 4 for Political-LLM: Large Language Models in Political Science
Viaarxiv icon

Way to Specialist: Closing Loop Between Specialized LLM and Evolving Domain Knowledge Graph

Add code
Nov 28, 2024
Figure 1 for Way to Specialist: Closing Loop Between Specialized LLM and Evolving Domain Knowledge Graph
Figure 2 for Way to Specialist: Closing Loop Between Specialized LLM and Evolving Domain Knowledge Graph
Figure 3 for Way to Specialist: Closing Loop Between Specialized LLM and Evolving Domain Knowledge Graph
Figure 4 for Way to Specialist: Closing Loop Between Specialized LLM and Evolving Domain Knowledge Graph
Viaarxiv icon

Image-Based Visual Servoing for Enhanced Cooperation of Dual-Arm Manipulation

Add code
Oct 28, 2024
Viaarxiv icon