Picture for Cai Zhou

Cai Zhou

CLAP: Contrastive Latent Action Pretraining for Learning Vision-Language-Action Models from Human Videos

Add code
Jan 07, 2026
Viaarxiv icon

What Affects the Effective Depth of Large Language Models?

Add code
Dec 16, 2025
Figure 1 for What Affects the Effective Depth of Large Language Models?
Figure 2 for What Affects the Effective Depth of Large Language Models?
Figure 3 for What Affects the Effective Depth of Large Language Models?
Figure 4 for What Affects the Effective Depth of Large Language Models?
Viaarxiv icon

SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models

Add code
Oct 10, 2025
Viaarxiv icon

Thought calibration: Efficient and confident test-time scaling

Add code
May 23, 2025
Viaarxiv icon

FiTv2: Scalable and Improved Flexible Vision Transformer for Diffusion Model

Add code
Oct 17, 2024
Figure 1 for FiTv2: Scalable and Improved Flexible Vision Transformer for Diffusion Model
Figure 2 for FiTv2: Scalable and Improved Flexible Vision Transformer for Diffusion Model
Figure 3 for FiTv2: Scalable and Improved Flexible Vision Transformer for Diffusion Model
Figure 4 for FiTv2: Scalable and Improved Flexible Vision Transformer for Diffusion Model
Viaarxiv icon

Towards Stable, Globally Expressive Graph Representations with Laplacian Eigenvectors

Add code
Oct 13, 2024
Figure 1 for Towards Stable, Globally Expressive Graph Representations with Laplacian Eigenvectors
Figure 2 for Towards Stable, Globally Expressive Graph Representations with Laplacian Eigenvectors
Figure 3 for Towards Stable, Globally Expressive Graph Representations with Laplacian Eigenvectors
Figure 4 for Towards Stable, Globally Expressive Graph Representations with Laplacian Eigenvectors
Viaarxiv icon

Geometric Representation Condition Improves Equivariant Molecule Generation

Add code
Oct 04, 2024
Figure 1 for Geometric Representation Condition Improves Equivariant Molecule Generation
Figure 2 for Geometric Representation Condition Improves Equivariant Molecule Generation
Figure 3 for Geometric Representation Condition Improves Equivariant Molecule Generation
Figure 4 for Geometric Representation Condition Improves Equivariant Molecule Generation
Viaarxiv icon

ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area

Add code
Aug 16, 2024
Figure 1 for ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area
Figure 2 for ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area
Figure 3 for ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area
Figure 4 for ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area
Viaarxiv icon

Seeing and Understanding: Bridging Vision with Chemical Knowledge Via ChemVLM

Add code
Aug 14, 2024
Figure 1 for Seeing and Understanding: Bridging Vision with Chemical Knowledge Via ChemVLM
Figure 2 for Seeing and Understanding: Bridging Vision with Chemical Knowledge Via ChemVLM
Figure 3 for Seeing and Understanding: Bridging Vision with Chemical Knowledge Via ChemVLM
Figure 4 for Seeing and Understanding: Bridging Vision with Chemical Knowledge Via ChemVLM
Viaarxiv icon

On the Theoretical Expressive Power and the Design Space of Higher-Order Graph Transformers

Add code
Apr 04, 2024
Figure 1 for On the Theoretical Expressive Power and the Design Space of Higher-Order Graph Transformers
Figure 2 for On the Theoretical Expressive Power and the Design Space of Higher-Order Graph Transformers
Figure 3 for On the Theoretical Expressive Power and the Design Space of Higher-Order Graph Transformers
Figure 4 for On the Theoretical Expressive Power and the Design Space of Higher-Order Graph Transformers
Viaarxiv icon