Alert button
Picture for Po-Yao Huang

Po-Yao Huang

Alert button

VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild

Add code
Bookmark button
Alert button
Mar 25, 2024
Puyuan Peng, Po-Yao Huang, Daniel Li, Abdelrahman Mohamed, David Harwath

Viaarxiv icon

Adversarially Masked Video Consistency for Unsupervised Domain Adaptation

Add code
Bookmark button
Alert button
Mar 24, 2024
Xiaoyu Zhu, Junwei Liang, Po-Yao Huang, Alex Hauptmann

Viaarxiv icon

FLAP: Fast Language-Audio Pre-training

Add code
Bookmark button
Alert button
Nov 02, 2023
Ching-Feng Yeh, Po-Yao Huang, Vasu Sharma, Shang-Wen Li, Gargi Gosh

Viaarxiv icon

Demystifying CLIP Data

Add code
Bookmark button
Alert button
Oct 02, 2023
Hu Xu, Saining Xie, Xiaoqing Ellen Tan, Po-Yao Huang, Russell Howes, Vasu Sharma, Shang-Wen Li, Gargi Ghosh, Luke Zettlemoyer, Christoph Feichtenhofer

Figure 1 for Demystifying CLIP Data
Figure 2 for Demystifying CLIP Data
Figure 3 for Demystifying CLIP Data
Figure 4 for Demystifying CLIP Data
Viaarxiv icon

AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models

Add code
Bookmark button
Alert button
Sep 19, 2023
Yuan Tseng, Layne Berry, Yi-Ting Chen, I-Hsiang Chiu, Hsuan-Hao Lin, Max Liu, Puyuan Peng, Yi-Jen Shih, Hung-Yu Wang, Haibin Wu, Po-Yao Huang, Chun-Mao Lai, Shang-Wen Li, David Harwath, Yu Tsao, Shinji Watanabe, Abdelrahman Mohamed, Chi-Luen Feng, Hung-yi Lee

Figure 1 for AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models
Figure 2 for AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models
Figure 3 for AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models
Viaarxiv icon

Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles

Add code
Bookmark button
Alert button
Jun 01, 2023
Chaitanya Ryali, Yuan-Ting Hu, Daniel Bolya, Chen Wei, Haoqi Fan, Po-Yao Huang, Vaibhav Aggarwal, Arkabandhu Chowdhury, Omid Poursaeed, Judy Hoffman, Jitendra Malik, Yanghao Li, Christoph Feichtenhofer

Figure 1 for Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
Figure 2 for Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
Figure 3 for Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
Figure 4 for Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
Viaarxiv icon

DINOv2: Learning Robust Visual Features without Supervision

Add code
Bookmark button
Alert button
Apr 14, 2023
Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Hervé Jegou, Julien Mairal, Patrick Labatut, Armand Joulin, Piotr Bojanowski

Figure 1 for DINOv2: Learning Robust Visual Features without Supervision
Figure 2 for DINOv2: Learning Robust Visual Features without Supervision
Figure 3 for DINOv2: Learning Robust Visual Features without Supervision
Figure 4 for DINOv2: Learning Robust Visual Features without Supervision
Viaarxiv icon

Diffusion Models as Masked Autoencoders

Add code
Bookmark button
Alert button
Apr 06, 2023
Chen Wei, Karttikeya Mangalam, Po-Yao Huang, Yanghao Li, Haoqi Fan, Hu Xu, Huiyu Wang, Cihang Xie, Alan Yuille, Christoph Feichtenhofer

Figure 1 for Diffusion Models as Masked Autoencoders
Figure 2 for Diffusion Models as Masked Autoencoders
Figure 3 for Diffusion Models as Masked Autoencoders
Figure 4 for Diffusion Models as Masked Autoencoders
Viaarxiv icon

STMT: A Spatial-Temporal Mesh Transformer for MoCap-Based Action Recognition

Add code
Bookmark button
Alert button
Mar 31, 2023
Xiaoyu Zhu, Po-Yao Huang, Junwei Liang, Celso M. de Melo, Alexander Hauptmann

Figure 1 for STMT: A Spatial-Temporal Mesh Transformer for MoCap-Based Action Recognition
Figure 2 for STMT: A Spatial-Temporal Mesh Transformer for MoCap-Based Action Recognition
Figure 3 for STMT: A Spatial-Temporal Mesh Transformer for MoCap-Based Action Recognition
Figure 4 for STMT: A Spatial-Temporal Mesh Transformer for MoCap-Based Action Recognition
Viaarxiv icon

CiT: Curation in Training for Effective Vision-Language Data

Add code
Bookmark button
Alert button
Jan 05, 2023
Hu Xu, Saining Xie, Po-Yao Huang, Licheng Yu, Russell Howes, Gargi Ghosh, Luke Zettlemoyer, Christoph Feichtenhofer

Figure 1 for CiT: Curation in Training for Effective Vision-Language Data
Figure 2 for CiT: Curation in Training for Effective Vision-Language Data
Figure 3 for CiT: Curation in Training for Effective Vision-Language Data
Figure 4 for CiT: Curation in Training for Effective Vision-Language Data
Viaarxiv icon