Picture for Yizhuo Li

Yizhuo Li

Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies

Add code
Aug 27, 2025
Viaarxiv icon

ARC-Hunyuan-Video-7B: Structured Video Comprehension of Real-World Shorts

Add code
Jul 28, 2025
Viaarxiv icon

Aligning Latent Spaces with Flow Priors

Add code
Jun 05, 2025
Viaarxiv icon

Beamforming Design for Beyond Diagonal RIS-Aided Cell-Free Massive MIMO Systems

Add code
Mar 10, 2025
Figure 1 for Beamforming Design for Beyond Diagonal RIS-Aided Cell-Free Massive MIMO Systems
Figure 2 for Beamforming Design for Beyond Diagonal RIS-Aided Cell-Free Massive MIMO Systems
Figure 3 for Beamforming Design for Beyond Diagonal RIS-Aided Cell-Free Massive MIMO Systems
Figure 4 for Beamforming Design for Beyond Diagonal RIS-Aided Cell-Free Massive MIMO Systems
Viaarxiv icon

Moto: Latent Motion Token as the Bridging Language for Robot Manipulation

Add code
Dec 05, 2024
Figure 1 for Moto: Latent Motion Token as the Bridging Language for Robot Manipulation
Figure 2 for Moto: Latent Motion Token as the Bridging Language for Robot Manipulation
Figure 3 for Moto: Latent Motion Token as the Bridging Language for Robot Manipulation
Figure 4 for Moto: Latent Motion Token as the Bridging Language for Robot Manipulation
Viaarxiv icon

DiCoDe: Diffusion-Compressed Deep Tokens for Autoregressive Video Generation with Language Models

Add code
Dec 05, 2024
Viaarxiv icon

Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation

Add code
Dec 05, 2024
Figure 1 for Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation
Figure 2 for Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation
Figure 3 for Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation
Figure 4 for Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation
Viaarxiv icon

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark

Add code
Dec 03, 2023
Figure 1 for MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
Figure 2 for MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
Figure 3 for MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
Figure 4 for MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
Viaarxiv icon

Harvest Video Foundation Models via Efficient Post-Pretraining

Add code
Oct 30, 2023
Viaarxiv icon

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation

Add code
Jul 13, 2023
Viaarxiv icon