Alert button
Picture for Hassan Akbari

Hassan Akbari

Alert button

VideoPoet: A Large Language Model for Zero-Shot Video Generation

Dec 21, 2023
Dan Kondratyuk, Lijun Yu, Xiuye Gu, José Lezama, Jonathan Huang, Rachel Hornung, Hartwig Adam, Hassan Akbari, Yair Alon, Vighnesh Birodkar, Yong Cheng, Ming-Chang Chiu, Josh Dillon, Irfan Essa, Agrim Gupta, Meera Hahn, Anja Hauth, David Hendon, Alonso Martinez, David Minnen, David Ross, Grant Schindler, Mikhail Sirotenko, Kihyuk Sohn, Krishna Somandepalli, Huisheng Wang, Jimmy Yan, Ming-Hsuan Yang, Xuan Yang, Bryan Seybold, Lu Jiang

Viaarxiv icon

Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception

May 10, 2023
Hassan Akbari, Dan Kondratyuk, Yin Cui, Rachel Hornung, Huisheng Wang, Hartwig Adam

Figure 1 for Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception
Figure 2 for Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception
Figure 3 for Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception
Figure 4 for Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception
Viaarxiv icon

Scaling Multimodal Pre-Training via Cross-Modality Gradient Harmonization

Nov 03, 2022
Junru Wu, Yi Liang, Feng Han, Hassan Akbari, Zhangyang Wang, Cong Yu

Figure 1 for Scaling Multimodal Pre-Training via Cross-Modality Gradient Harmonization
Figure 2 for Scaling Multimodal Pre-Training via Cross-Modality Gradient Harmonization
Figure 3 for Scaling Multimodal Pre-Training via Cross-Modality Gradient Harmonization
Figure 4 for Scaling Multimodal Pre-Training via Cross-Modality Gradient Harmonization
Viaarxiv icon

PaLI: A Jointly-Scaled Multilingual Language-Image Model

Sep 16, 2022
Xi Chen, Xiao Wang, Soravit Changpinyo, AJ Piergiovanni, Piotr Padlewski, Daniel Salz, Sebastian Goodman, Adam Grycner, Basil Mustafa, Lucas Beyer, Alexander Kolesnikov, Joan Puigcerver, Nan Ding, Keran Rong, Hassan Akbari, Gaurav Mishra, Linting Xue, Ashish Thapliyal, James Bradbury, Weicheng Kuo, Mojtaba Seyedhosseini, Chao Jia, Burcu Karagol Ayan, Carlos Riquelme, Andreas Steiner, Anelia Angelova, Xiaohua Zhai, Neil Houlsby, Radu Soricut

Figure 1 for PaLI: A Jointly-Scaled Multilingual Language-Image Model
Figure 2 for PaLI: A Jointly-Scaled Multilingual Language-Image Model
Figure 3 for PaLI: A Jointly-Scaled Multilingual Language-Image Model
Figure 4 for PaLI: A Jointly-Scaled Multilingual Language-Image Model
Viaarxiv icon

VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text

Apr 22, 2021
Hassan Akbari, Linagzhe Yuan, Rui Qian, Wei-Hong Chuang, Shih-Fu Chang, Yin Cui, Boqing Gong

Figure 1 for VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
Figure 2 for VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
Figure 3 for VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
Figure 4 for VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
Viaarxiv icon

Neuro-Symbolic Representations for Video Captioning: A Case for Leveraging Inductive Biases for Vision and Language

Nov 18, 2020
Hassan Akbari, Hamid Palangi, Jianwei Yang, Sudha Rao, Asli Celikyilmaz, Roland Fernandez, Paul Smolensky, Jianfeng Gao, Shih-Fu Chang

Figure 1 for Neuro-Symbolic Representations for Video Captioning: A Case for Leveraging Inductive Biases for Vision and Language
Figure 2 for Neuro-Symbolic Representations for Video Captioning: A Case for Leveraging Inductive Biases for Vision and Language
Figure 3 for Neuro-Symbolic Representations for Video Captioning: A Case for Leveraging Inductive Biases for Vision and Language
Figure 4 for Neuro-Symbolic Representations for Video Captioning: A Case for Leveraging Inductive Biases for Vision and Language
Viaarxiv icon

Multi-level Multimodal Common Semantic Space for Image-Phrase Grounding

Nov 28, 2018
Hassan Akbari, Svebor Karaman, Surabhi Bhargava, Brian Chen, Carl Vondrick, Shih-Fu Chang

Figure 1 for Multi-level Multimodal Common Semantic Space for Image-Phrase Grounding
Figure 2 for Multi-level Multimodal Common Semantic Space for Image-Phrase Grounding
Figure 3 for Multi-level Multimodal Common Semantic Space for Image-Phrase Grounding
Viaarxiv icon

Lip2AudSpec: Speech reconstruction from silent lip movements video

Oct 26, 2017
Hassan Akbari, Himani Arora, Liangliang Cao, Nima Mesgarani

Figure 1 for Lip2AudSpec: Speech reconstruction from silent lip movements video
Figure 2 for Lip2AudSpec: Speech reconstruction from silent lip movements video
Figure 3 for Lip2AudSpec: Speech reconstruction from silent lip movements video
Figure 4 for Lip2AudSpec: Speech reconstruction from silent lip movements video
Viaarxiv icon