Picture for Zirui Wang

Zirui Wang

FrontierCS: Evolving Challenges for Evolving Intelligence

Add code
Dec 17, 2025
Figure 1 for FrontierCS: Evolving Challenges for Evolving Intelligence
Figure 2 for FrontierCS: Evolving Challenges for Evolving Intelligence
Figure 3 for FrontierCS: Evolving Challenges for Evolving Intelligence
Figure 4 for FrontierCS: Evolving Challenges for Evolving Intelligence
Viaarxiv icon

Unveiling the Impact of Data and Model Scaling on High-Level Control for Humanoid Robots

Add code
Nov 12, 2025
Figure 1 for Unveiling the Impact of Data and Model Scaling on High-Level Control for Humanoid Robots
Figure 2 for Unveiling the Impact of Data and Model Scaling on High-Level Control for Humanoid Robots
Figure 3 for Unveiling the Impact of Data and Model Scaling on High-Level Control for Humanoid Robots
Figure 4 for Unveiling the Impact of Data and Model Scaling on High-Level Control for Humanoid Robots
Viaarxiv icon

Towards Adaptable Humanoid Control via Adaptive Motion Tracking

Add code
Oct 16, 2025
Viaarxiv icon

COMPASS: A Multi-Turn Benchmark for Tool-Mediated Planning & Preference Optimization

Add code
Oct 08, 2025
Figure 1 for COMPASS: A Multi-Turn Benchmark for Tool-Mediated Planning & Preference Optimization
Figure 2 for COMPASS: A Multi-Turn Benchmark for Tool-Mediated Planning & Preference Optimization
Figure 3 for COMPASS: A Multi-Turn Benchmark for Tool-Mediated Planning & Preference Optimization
Figure 4 for COMPASS: A Multi-Turn Benchmark for Tool-Mediated Planning & Preference Optimization
Viaarxiv icon

MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer

Add code
Sep 19, 2025
Figure 1 for MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer
Figure 2 for MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer
Figure 3 for MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer
Figure 4 for MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer
Viaarxiv icon

YOLO-Count: Differentiable Object Counting for Text-to-Image Generation

Add code
Aug 01, 2025
Figure 1 for YOLO-Count: Differentiable Object Counting for Text-to-Image Generation
Figure 2 for YOLO-Count: Differentiable Object Counting for Text-to-Image Generation
Figure 3 for YOLO-Count: Differentiable Object Counting for Text-to-Image Generation
Figure 4 for YOLO-Count: Differentiable Object Counting for Text-to-Image Generation
Viaarxiv icon

UniTracker: Learning Universal Whole-Body Motion Tracker for Humanoid Robots

Add code
Jul 10, 2025
Viaarxiv icon

Active View Selector: Fast and Accurate Active View Selection with Cross Reference Image Quality Assessment

Add code
Jun 24, 2025
Viaarxiv icon

Seeing in the Dark: Benchmarking Egocentric 3D Vision with the Oxford Day-and-Night Dataset

Add code
Jun 04, 2025
Figure 1 for Seeing in the Dark: Benchmarking Egocentric 3D Vision with the Oxford Day-and-Night Dataset
Figure 2 for Seeing in the Dark: Benchmarking Egocentric 3D Vision with the Oxford Day-and-Night Dataset
Figure 3 for Seeing in the Dark: Benchmarking Egocentric 3D Vision with the Oxford Day-and-Night Dataset
Figure 4 for Seeing in the Dark: Benchmarking Egocentric 3D Vision with the Oxford Day-and-Night Dataset
Viaarxiv icon

VEAttack: Downstream-agnostic Vision Encoder Attack against Large Vision Language Models

Add code
May 23, 2025
Viaarxiv icon