Alert button
Picture for Bowen Shi

Bowen Shi

Alert button

XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception

Mar 21, 2024
HyoJung Han, Mohamed Anwar, Juan Pino, Wei-Ning Hsu, Marine Carpuat, Bowen Shi, Changhan Wang

Viaarxiv icon

Towards Privacy-Aware Sign Language Translation at Scale

Feb 14, 2024
Phillip Rust, Bowen Shi, Skyler Wang, Necati Cihan Camgöz, Jean Maillard

Viaarxiv icon

UMG-CLIP: A Unified Multi-Granularity Vision Generalist for Open-World Understanding

Jan 18, 2024
Bowen Shi, Peisen Zhao, Zichen Wang, Yuhang Zhang, Yaoming Wang, Jin Li, Wenrui Dai, Junni Zou, Hongkai Xiong, Qi Tian, Xiaopeng Zhang

Viaarxiv icon

Audiobox: Unified Audio Generation with Natural Language Prompts

Dec 25, 2023
Apoorv Vyas, Bowen Shi, Matthew Le, Andros Tjandra, Yi-Chiao Wu, Baishan Guo, Jiemin Zhang, Xinyue Zhang, Robert Adkins, William Ngan, Jeff Wang, Ivan Cruz, Bapi Akula, Akinniyi Akinyemi, Brian Ellis, Rashel Moritz, Yael Yungster, Alice Rakotoarison, Liang Tan, Chris Summers, Carleigh Wood, Joshua Lane, Mary Williamson, Wei-Ning Hsu

Viaarxiv icon

AiluRus: A Scalable ViT Framework for Dense Prediction

Nov 02, 2023
Jin Li, Yaoming Wang, Xiaopeng Zhang, Bowen Shi, Dongsheng Jiang, Chenglin Li, Wenrui Dai, Hongkai Xiong, Qi Tian

Figure 1 for AiluRus: A Scalable ViT Framework for Dense Prediction
Figure 2 for AiluRus: A Scalable ViT Framework for Dense Prediction
Figure 3 for AiluRus: A Scalable ViT Framework for Dense Prediction
Figure 4 for AiluRus: A Scalable ViT Framework for Dense Prediction
Viaarxiv icon

Generative Pre-training for Speech with Flow Matching

Oct 25, 2023
Alexander H. Liu, Matt Le, Apoorv Vyas, Bowen Shi, Andros Tjandra, Wei-Ning Hsu

Viaarxiv icon

Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning

Sep 05, 2023
Lili Yu, Bowen Shi, Ramakanth Pasunuru, Benjamin Muller, Olga Golovneva, Tianlu Wang, Arun Babu, Binh Tang, Brian Karrer, Shelly Sheynin, Candace Ross, Adam Polyak, Russell Howes, Vasu Sharma, Puxin Xu, Hovhannes Tamoyan, Oron Ashual, Uriel Singer, Shang-Wen Li, Susan Zhang, Richard James, Gargi Ghosh, Yaniv Taigman, Maryam Fazel-Zarandi, Asli Celikyilmaz, Luke Zettlemoyer, Armen Aghajanyan

Figure 1 for Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning
Figure 2 for Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning
Figure 3 for Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning
Figure 4 for Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning
Viaarxiv icon

Toward American Sign Language Processing in the Real World: Data, Tasks, and Methods

Aug 23, 2023
Bowen Shi

Figure 1 for Toward American Sign Language Processing in the Real World: Data, Tasks, and Methods
Figure 2 for Toward American Sign Language Processing in the Real World: Data, Tasks, and Methods
Figure 3 for Toward American Sign Language Processing in the Real World: Data, Tasks, and Methods
Figure 4 for Toward American Sign Language Processing in the Real World: Data, Tasks, and Methods
Viaarxiv icon

EXPRESSO: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis

Aug 10, 2023
Tu Anh Nguyen, Wei-Ning Hsu, Antony D'Avirro, Bowen Shi, Itai Gat, Maryam Fazel-Zarani, Tal Remez, Jade Copet, Gabriel Synnaeve, Michael Hassid, Felix Kreuk, Yossi Adi, Emmanuel Dupoux

Figure 1 for EXPRESSO: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis
Figure 2 for EXPRESSO: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis
Figure 3 for EXPRESSO: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis
Figure 4 for EXPRESSO: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis
Viaarxiv icon