Picture for Zhenyu Yang

Zhenyu Yang

MobileWorld: Benchmarking Autonomous Mobile Agents in Agent-User Interactive, and MCP-Augmented Environments

Add code
Dec 22, 2025
Viaarxiv icon

LiveStar: Live Streaming Assistant for Real-World Online Video Understanding

Add code
Nov 07, 2025
Viaarxiv icon

C-MAG: Cascade Multimodal Attributed Graphs for Supply Chain Link Prediction

Add code
Aug 13, 2025
Viaarxiv icon

Efficient Agent: Optimizing Planning Capability for Multimodal Retrieval Augmented Generation

Add code
Aug 12, 2025
Viaarxiv icon

Embedding Radiomics into Vision Transformers for Multimodal Medical Image Classification

Add code
Apr 15, 2025
Viaarxiv icon

Improved Visual-Spatial Reasoning via R1-Zero-Like Training

Add code
Apr 01, 2025
Viaarxiv icon

H2VU-Benchmark: A Comprehensive Benchmark for Hierarchical Holistic Video Understanding

Add code
Mar 31, 2025
Viaarxiv icon

An Explainable Neural Radiomic Sequence Model with Spatiotemporal Continuity for Quantifying 4DCT-based Pulmonary Ventilation

Add code
Mar 31, 2025
Viaarxiv icon

LandMarkSystem Technical Report

Add code
Mar 27, 2025
Viaarxiv icon

Layton: Latent Consistency Tokenizer for 1024-pixel Image Reconstruction and Generation by 256 Tokens

Add code
Mar 12, 2025
Figure 1 for Layton: Latent Consistency Tokenizer for 1024-pixel Image Reconstruction and Generation by 256 Tokens
Figure 2 for Layton: Latent Consistency Tokenizer for 1024-pixel Image Reconstruction and Generation by 256 Tokens
Figure 3 for Layton: Latent Consistency Tokenizer for 1024-pixel Image Reconstruction and Generation by 256 Tokens
Figure 4 for Layton: Latent Consistency Tokenizer for 1024-pixel Image Reconstruction and Generation by 256 Tokens
Viaarxiv icon