Picture for Saining Xie

Saining Xie

AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark

Add code
Oct 04, 2024
Figure 1 for AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
Figure 2 for AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
Figure 3 for AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
Figure 4 for AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
Viaarxiv icon

Fast Encoding and Decoding for Implicit Video Representation

Add code
Sep 28, 2024
Viaarxiv icon

On Scaling Up 3D Gaussian Splatting Training

Add code
Jun 26, 2024
Figure 1 for On Scaling Up 3D Gaussian Splatting Training
Figure 2 for On Scaling Up 3D Gaussian Splatting Training
Figure 3 for On Scaling Up 3D Gaussian Splatting Training
Figure 4 for On Scaling Up 3D Gaussian Splatting Training
Viaarxiv icon

Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs

Add code
Jun 24, 2024
Figure 1 for Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Figure 2 for Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Figure 3 for Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Figure 4 for Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Viaarxiv icon

Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning

Add code
May 17, 2024
Figure 1 for Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning
Figure 2 for Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning
Figure 3 for Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning
Figure 4 for Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning
Viaarxiv icon

MoDE: CLIP Data Experts via Clustering

Add code
Apr 24, 2024
Figure 1 for MoDE: CLIP Data Experts via Clustering
Figure 2 for MoDE: CLIP Data Experts via Clustering
Figure 3 for MoDE: CLIP Data Experts via Clustering
Figure 4 for MoDE: CLIP Data Experts via Clustering
Viaarxiv icon

V-IRL: Grounding Virtual Intelligence in Real Life

Add code
Feb 05, 2024
Figure 1 for V-IRL: Grounding Virtual Intelligence in Real Life
Figure 2 for V-IRL: Grounding Virtual Intelligence in Real Life
Figure 3 for V-IRL: Grounding Virtual Intelligence in Real Life
Figure 4 for V-IRL: Grounding Virtual Intelligence in Real Life
Viaarxiv icon

Deconstructing Denoising Diffusion Models for Self-Supervised Learning

Add code
Jan 25, 2024
Viaarxiv icon

SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers

Add code
Jan 16, 2024
Viaarxiv icon

Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs

Add code
Jan 11, 2024
Figure 1 for Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
Figure 2 for Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
Figure 3 for Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
Figure 4 for Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
Viaarxiv icon