Picture for Song Guo

Song Guo

VisionCreator: A Native Visual-Generation Agentic Model with Understanding, Thinking, Planning and Creation

Add code
Mar 03, 2026
Viaarxiv icon

DualSentinel: A Lightweight Framework for Detecting Targeted Attacks in Black-box LLM via Dual Entropy Lull Pattern

Add code
Mar 02, 2026
Viaarxiv icon

UNICBench: UNIfied Counting Benchmark for MLLM

Add code
Feb 28, 2026
Viaarxiv icon

Mesh-Pro: Asynchronous Advantage-guided Ranking Preference Optimization for Artist-style Quadrilateral Mesh Generation

Add code
Feb 28, 2026
Viaarxiv icon

HALO: A Unified Vision-Language-Action Model for Embodied Multimodal Chain-of-Thought Reasoning

Add code
Feb 24, 2026
Viaarxiv icon

EMFormer: Efficient Multi-Scale Transformer for Accumulative Context Weather Forecasting

Add code
Feb 01, 2026
Viaarxiv icon

WMPO: World Model-based Policy Optimization for Vision-Language-Action Models

Add code
Nov 12, 2025
Figure 1 for WMPO: World Model-based Policy Optimization for Vision-Language-Action Models
Figure 2 for WMPO: World Model-based Policy Optimization for Vision-Language-Action Models
Figure 3 for WMPO: World Model-based Policy Optimization for Vision-Language-Action Models
Figure 4 for WMPO: World Model-based Policy Optimization for Vision-Language-Action Models
Viaarxiv icon

Learning by Neighbor-Aware Semantics, Deciding by Open-form Flows: Towards Robust Zero-Shot Skeleton Action Recognition

Add code
Nov 12, 2025
Viaarxiv icon

Predicting the Future by Retrieving the Past

Add code
Nov 08, 2025
Viaarxiv icon

DiEP: Adaptive Mixture-of-Experts Compression through Differentiable Expert Pruning

Add code
Sep 19, 2025
Viaarxiv icon