Picture for Jan Kautz

Jan Kautz

NVIDIA

Flextron: Many-in-One Flexible Large Language Model

Add code
Jun 11, 2024
Figure 1 for Flextron: Many-in-One Flexible Large Language Model
Figure 2 for Flextron: Many-in-One Flexible Large Language Model
Figure 3 for Flextron: Many-in-One Flexible Large Language Model
Figure 4 for Flextron: Many-in-One Flexible Large Language Model
Viaarxiv icon

CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation

Add code
Jun 04, 2024
Figure 1 for CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation
Figure 2 for CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation
Figure 3 for CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation
Figure 4 for CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation
Viaarxiv icon

SpatialRGPT: Grounded Spatial Reasoning in Vision Language Model

Add code
Jun 03, 2024
Figure 1 for SpatialRGPT: Grounded Spatial Reasoning in Vision Language Model
Figure 2 for SpatialRGPT: Grounded Spatial Reasoning in Vision Language Model
Figure 3 for SpatialRGPT: Grounded Spatial Reasoning in Vision Language Model
Figure 4 for SpatialRGPT: Grounded Spatial Reasoning in Vision Language Model
Viaarxiv icon

X-VILA: Cross-Modality Alignment for Large Language Model

Add code
May 29, 2024
Figure 1 for X-VILA: Cross-Modality Alignment for Large Language Model
Figure 2 for X-VILA: Cross-Modality Alignment for Large Language Model
Figure 3 for X-VILA: Cross-Modality Alignment for Large Language Model
Figure 4 for X-VILA: Cross-Modality Alignment for Large Language Model
Viaarxiv icon

OmniDrive: A Holistic LLM-Agent Framework for Autonomous Driving with 3D Perception, Reasoning and Planning

Add code
May 02, 2024
Figure 1 for OmniDrive: A Holistic LLM-Agent Framework for Autonomous Driving with 3D Perception, Reasoning and Planning
Figure 2 for OmniDrive: A Holistic LLM-Agent Framework for Autonomous Driving with 3D Perception, Reasoning and Planning
Figure 3 for OmniDrive: A Holistic LLM-Agent Framework for Autonomous Driving with 3D Perception, Reasoning and Planning
Figure 4 for OmniDrive: A Holistic LLM-Agent Framework for Autonomous Driving with 3D Perception, Reasoning and Planning
Viaarxiv icon

LITA: Language Instructed Temporal-Localization Assistant

Add code
Mar 27, 2024
Viaarxiv icon

FoVA-Depth: Field-of-View Agnostic Depth Estimation for Cross-Dataset Generalization

Add code
Jan 24, 2024
Viaarxiv icon

AM-RADIO: Agglomerative Model -- Reduce All Domains Into One

Add code
Dec 21, 2023
Viaarxiv icon

GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning

Add code
Dec 18, 2023
Figure 1 for GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning
Figure 2 for GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning
Figure 3 for GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning
Figure 4 for GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning
Viaarxiv icon

VILA: On Pre-training for Visual Language Models

Add code
Dec 14, 2023
Viaarxiv icon