Alert button

"Text": models, code, and papers
Alert button

ImagenHub: Standardizing the evaluation of conditional image generation models

Oct 02, 2023
Max Ku, Tianle Li, Kai Zhang, Yujie Lu, Xingyu Fu, Wenwen Zhuang, Wenhu Chen

Figure 1 for ImagenHub: Standardizing the evaluation of conditional image generation models
Figure 2 for ImagenHub: Standardizing the evaluation of conditional image generation models
Figure 3 for ImagenHub: Standardizing the evaluation of conditional image generation models
Figure 4 for ImagenHub: Standardizing the evaluation of conditional image generation models
Viaarxiv icon

Mastering Robot Manipulation with Multimodal Prompts through Pretraining and Multi-task Fine-tuning

Oct 14, 2023
Jiachen Li, Qiaozi Gao, Michael Johnston, Xiaofeng Gao, Xuehai He, Suhaila Shakiah, Hangjie Shi, Reza Ghanadan, William Yang Wang

Figure 1 for Mastering Robot Manipulation with Multimodal Prompts through Pretraining and Multi-task Fine-tuning
Figure 2 for Mastering Robot Manipulation with Multimodal Prompts through Pretraining and Multi-task Fine-tuning
Figure 3 for Mastering Robot Manipulation with Multimodal Prompts through Pretraining and Multi-task Fine-tuning
Figure 4 for Mastering Robot Manipulation with Multimodal Prompts through Pretraining and Multi-task Fine-tuning
Viaarxiv icon

JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion Models

Aug 09, 2023
Peike Li, Boyu Chen, Yao Yao, Yikai Wang, Allen Wang, Alex Wang

Figure 1 for JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion Models
Figure 2 for JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion Models
Figure 3 for JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion Models
Figure 4 for JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion Models
Viaarxiv icon

BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions

Aug 19, 2023
Wenbo Hu, Yifan Xu, Yi Li, Weiyue Li, Zeyuan Chen, Zhuowen Tu

Figure 1 for BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions
Figure 2 for BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions
Figure 3 for BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions
Figure 4 for BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions
Viaarxiv icon

OmniControl: Control Any Joint at Any Time for Human Motion Generation

Oct 12, 2023
Yiming Xie, Varun Jampani, Lei Zhong, Deqing Sun, Huaizu Jiang

Viaarxiv icon

GROOT: Learning to Follow Instructions by Watching Gameplay Videos

Oct 12, 2023
Shaofei Cai, Bowei Zhang, Zihao Wang, Xiaojian Ma, Anji Liu, Yitao Liang

Viaarxiv icon

Learning to Act from Actionless Videos through Dense Correspondences

Oct 12, 2023
Po-Chen Ko, Jiayuan Mao, Yilun Du, Shao-Hua Sun, Joshua B. Tenenbaum

Viaarxiv icon

On the Relevance of Phoneme Duration Variability of Synthesized Training Data for Automatic Speech Recognition

Oct 12, 2023
Nick Rossenbach, Benedikt Hilmes, Ralf Schlüter

Figure 1 for On the Relevance of Phoneme Duration Variability of Synthesized Training Data for Automatic Speech Recognition
Figure 2 for On the Relevance of Phoneme Duration Variability of Synthesized Training Data for Automatic Speech Recognition
Figure 3 for On the Relevance of Phoneme Duration Variability of Synthesized Training Data for Automatic Speech Recognition
Figure 4 for On the Relevance of Phoneme Duration Variability of Synthesized Training Data for Automatic Speech Recognition
Viaarxiv icon

Effects of Human Adversarial and Affable Samples on BERT Generalizability

Oct 13, 2023
Aparna Elangovan, Jiayuan He, Yuan Li, Karin Verspoor

Figure 1 for Effects of Human Adversarial and Affable Samples on BERT Generalizability
Figure 2 for Effects of Human Adversarial and Affable Samples on BERT Generalizability
Figure 3 for Effects of Human Adversarial and Affable Samples on BERT Generalizability
Figure 4 for Effects of Human Adversarial and Affable Samples on BERT Generalizability
Viaarxiv icon

KCTS: Knowledge-Constrained Tree Search Decoding with Token-Level Hallucination Detection

Oct 13, 2023
Sehyun Choi, Tianqing Fang, Zhaowei Wang, Yangqiu Song

Viaarxiv icon