Picture for Zhiyuan Feng

Zhiyuan Feng

HiSpatial: Taming Hierarchical 3D Spatial Understanding in Vision-Language Models

Add code
Mar 26, 2026
Viaarxiv icon

Does LLM Alignment Really Need Diversity? An Empirical Study of Adapting RLVR Methods for Moral Reasoning

Add code
Mar 11, 2026
Viaarxiv icon

AD-MIR: Bridging the Gap from Perception to Persuasion in Advertising Video Understanding via Structured Reasoning

Add code
Feb 07, 2026
Viaarxiv icon

Multimodal Multi-Agent Empowered Legal Judgment Prediction

Add code
Jan 21, 2026
Viaarxiv icon

What Should I Cite? A RAG Benchmark for Academic Citation Prediction

Add code
Jan 21, 2026
Viaarxiv icon

Scalable Vision-Language-Action Model Pretraining for Robotic Manipulation with Real-Life Human Activity Videos

Add code
Oct 24, 2025
Figure 1 for Scalable Vision-Language-Action Model Pretraining for Robotic Manipulation with Real-Life Human Activity Videos
Figure 2 for Scalable Vision-Language-Action Model Pretraining for Robotic Manipulation with Real-Life Human Activity Videos
Figure 3 for Scalable Vision-Language-Action Model Pretraining for Robotic Manipulation with Real-Life Human Activity Videos
Figure 4 for Scalable Vision-Language-Action Model Pretraining for Robotic Manipulation with Real-Life Human Activity Videos
Viaarxiv icon

MIRA: Medical Time Series Foundation Model for Real-World Health Data

Add code
Jun 09, 2025
Figure 1 for MIRA: Medical Time Series Foundation Model for Real-World Health Data
Figure 2 for MIRA: Medical Time Series Foundation Model for Real-World Health Data
Figure 3 for MIRA: Medical Time Series Foundation Model for Real-World Health Data
Figure 4 for MIRA: Medical Time Series Foundation Model for Real-World Health Data
Viaarxiv icon

HSSBench: Benchmarking Humanities and Social Sciences Ability for Multimodal Large Language Models

Add code
Jun 04, 2025
Figure 1 for HSSBench: Benchmarking Humanities and Social Sciences Ability for Multimodal Large Language Models
Figure 2 for HSSBench: Benchmarking Humanities and Social Sciences Ability for Multimodal Large Language Models
Figure 3 for HSSBench: Benchmarking Humanities and Social Sciences Ability for Multimodal Large Language Models
Figure 4 for HSSBench: Benchmarking Humanities and Social Sciences Ability for Multimodal Large Language Models
Viaarxiv icon

TransDiff: Diffusion-Based Method for Manipulating Transparent Objects Using a Single RGB-D Image

Add code
Mar 17, 2025
Viaarxiv icon