Alert button
Picture for David A. Ross

David A. Ross

Alert button

SceneCraft: An LLM Agent for Synthesizing 3D Scene as Blender Code

Add code
Bookmark button
Alert button
Mar 02, 2024
Ziniu Hu, Ahmet Iscen, Aashi Jain, Thomas Kipf, Yisong Yue, David A. Ross, Cordelia Schmid, Alireza Fathi

Figure 1 for SceneCraft: An LLM Agent for Synthesizing 3D Scene as Blender Code
Figure 2 for SceneCraft: An LLM Agent for Synthesizing 3D Scene as Blender Code
Figure 3 for SceneCraft: An LLM Agent for Synthesizing 3D Scene as Blender Code
Figure 4 for SceneCraft: An LLM Agent for Synthesizing 3D Scene as Blender Code
Viaarxiv icon

VideoPrism: A Foundational Visual Encoder for Video Understanding

Add code
Bookmark button
Alert button
Feb 20, 2024
Long Zhao, Nitesh B. Gundavarapu, Liangzhe Yuan, Hao Zhou, Shen Yan, Jennifer J. Sun, Luke Friedman, Rui Qian, Tobias Weyand, Yue Zhao, Rachel Hornung, Florian Schroff, Ming-Hsuan Yang, David A. Ross, Huisheng Wang, Hartwig Adam, Mikhail Sirotenko, Ting Liu, Boqing Gong

Viaarxiv icon

Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation

Add code
Bookmark button
Alert button
Oct 09, 2023
Lijun Yu, José Lezama, Nitesh B. Gundavarapu, Luca Versari, Kihyuk Sohn, David Minnen, Yong Cheng, Agrim Gupta, Xiuye Gu, Alexander G. Hauptmann, Boqing Gong, Ming-Hsuan Yang, Irfan Essa, David A. Ross, Lu Jiang

Viaarxiv icon

SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs

Add code
Bookmark button
Alert button
Jul 03, 2023
Lijun Yu, Yong Cheng, Zhiruo Wang, Vivek Kumar, Wolfgang Macherey, Yanping Huang, David A. Ross, Irfan Essa, Yonatan Bisk, Ming-Hsuan Yang, Kevin Murphy, Alexander G. Hauptmann, Lu Jiang

Figure 1 for SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs
Figure 2 for SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs
Figure 3 for SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs
Figure 4 for SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs
Viaarxiv icon

$IC^3$: Image Captioning by Committee Consensus

Add code
Bookmark button
Alert button
Feb 16, 2023
David M. Chan, Austin Myers, Sudheendra Vijayanarasimhan, David A. Ross, John Canny

Figure 1 for $IC^3$: Image Captioning by Committee Consensus
Figure 2 for $IC^3$: Image Captioning by Committee Consensus
Figure 3 for $IC^3$: Image Captioning by Committee Consensus
Figure 4 for $IC^3$: Image Captioning by Committee Consensus
Viaarxiv icon

Open-Vocabulary Temporal Action Detection with Off-the-Shelf Image-Text Features

Add code
Bookmark button
Alert button
Dec 20, 2022
Vivek Rathod, Bryan Seybold, Sudheendra Vijayanarasimhan, Austin Myers, Xiuye Gu, Vighnesh Birodkar, David A. Ross

Figure 1 for Open-Vocabulary Temporal Action Detection with Off-the-Shelf Image-Text Features
Figure 2 for Open-Vocabulary Temporal Action Detection with Off-the-Shelf Image-Text Features
Figure 3 for Open-Vocabulary Temporal Action Detection with Off-the-Shelf Image-Text Features
Figure 4 for Open-Vocabulary Temporal Action Detection with Off-the-Shelf Image-Text Features
Viaarxiv icon

REVEAL: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory

Add code
Bookmark button
Alert button
Dec 10, 2022
Ziniu Hu, Ahmet Iscen, Chen Sun, Zirui Wang, Kai-Wei Chang, Yizhou Sun, Cordelia Schmid, David A. Ross, Alireza Fathi

Figure 1 for REVEAL: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory
Figure 2 for REVEAL: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory
Figure 3 for REVEAL: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory
Figure 4 for REVEAL: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory
Viaarxiv icon

What's in a Caption? Dataset-Specific Linguistic Diversity and Its Effect on Visual Description Models and Metrics

Add code
Bookmark button
Alert button
May 12, 2022
David M. Chan, Austin Myers, Sudheendra Vijayanarasimhan, David A. Ross, Bryan Seybold, John F. Canny

Figure 1 for What's in a Caption? Dataset-Specific Linguistic Diversity and Its Effect on Visual Description Models and Metrics
Figure 2 for What's in a Caption? Dataset-Specific Linguistic Diversity and Its Effect on Visual Description Models and Metrics
Figure 3 for What's in a Caption? Dataset-Specific Linguistic Diversity and Its Effect on Visual Description Models and Metrics
Figure 4 for What's in a Caption? Dataset-Specific Linguistic Diversity and Its Effect on Visual Description Models and Metrics
Viaarxiv icon

Learn to Dance with AIST++: Music Conditioned 3D Dance Generation

Add code
Bookmark button
Alert button
Feb 02, 2021
Ruilong Li, Shan Yang, David A. Ross, Angjoo Kanazawa

Figure 1 for Learn to Dance with AIST++: Music Conditioned 3D Dance Generation
Figure 2 for Learn to Dance with AIST++: Music Conditioned 3D Dance Generation
Figure 3 for Learn to Dance with AIST++: Music Conditioned 3D Dance Generation
Figure 4 for Learn to Dance with AIST++: Music Conditioned 3D Dance Generation
Viaarxiv icon