Picture for Wenliang Dai

Wenliang Dai

Plausible May Not Be Faithful: Probing Object Hallucination in Vision-Language Pre-training

Add code
Oct 14, 2022
Figure 1 for Plausible May Not Be Faithful: Probing Object Hallucination in Vision-Language Pre-training
Figure 2 for Plausible May Not Be Faithful: Probing Object Hallucination in Vision-Language Pre-training
Figure 3 for Plausible May Not Be Faithful: Probing Object Hallucination in Vision-Language Pre-training
Figure 4 for Plausible May Not Be Faithful: Probing Object Hallucination in Vision-Language Pre-training
Viaarxiv icon

Kaggle Competition: Cantonese Audio-Visual Speech Recognition for In-car Commands

Add code
Jul 06, 2022
Figure 1 for Kaggle Competition: Cantonese Audio-Visual Speech Recognition for In-car Commands
Figure 2 for Kaggle Competition: Cantonese Audio-Visual Speech Recognition for In-car Commands
Figure 3 for Kaggle Competition: Cantonese Audio-Visual Speech Recognition for In-car Commands
Figure 4 for Kaggle Competition: Cantonese Audio-Visual Speech Recognition for In-car Commands
Viaarxiv icon

Enabling Multimodal Generation on CLIP via Vision-Language Knowledge Distillation

Add code
Mar 30, 2022
Figure 1 for Enabling Multimodal Generation on CLIP via Vision-Language Knowledge Distillation
Figure 2 for Enabling Multimodal Generation on CLIP via Vision-Language Knowledge Distillation
Figure 3 for Enabling Multimodal Generation on CLIP via Vision-Language Knowledge Distillation
Figure 4 for Enabling Multimodal Generation on CLIP via Vision-Language Knowledge Distillation
Viaarxiv icon

Automatic Speech Recognition Datasets in Cantonese: A Survey and New Dataset

Add code
Jan 17, 2022
Figure 1 for Automatic Speech Recognition Datasets in Cantonese: A Survey and New Dataset
Figure 2 for Automatic Speech Recognition Datasets in Cantonese: A Survey and New Dataset
Figure 3 for Automatic Speech Recognition Datasets in Cantonese: A Survey and New Dataset
Figure 4 for Automatic Speech Recognition Datasets in Cantonese: A Survey and New Dataset
Viaarxiv icon

CI-AVSR: A Cantonese Audio-Visual Speech Dataset for In-car Command Recognition

Add code
Jan 11, 2022
Figure 1 for CI-AVSR: A Cantonese Audio-Visual Speech Dataset for In-car Command Recognition
Figure 2 for CI-AVSR: A Cantonese Audio-Visual Speech Dataset for In-car Command Recognition
Figure 3 for CI-AVSR: A Cantonese Audio-Visual Speech Dataset for In-car Command Recognition
Figure 4 for CI-AVSR: A Cantonese Audio-Visual Speech Dataset for In-car Command Recognition
Viaarxiv icon

ASCEND: A Spontaneous Chinese-English Dataset for Code-switching in Multi-turn Conversation

Add code
Jan 07, 2022
Figure 1 for ASCEND: A Spontaneous Chinese-English Dataset for Code-switching in Multi-turn Conversation
Figure 2 for ASCEND: A Spontaneous Chinese-English Dataset for Code-switching in Multi-turn Conversation
Figure 3 for ASCEND: A Spontaneous Chinese-English Dataset for Code-switching in Multi-turn Conversation
Figure 4 for ASCEND: A Spontaneous Chinese-English Dataset for Code-switching in Multi-turn Conversation
Viaarxiv icon

Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization

Add code
Oct 01, 2021
Figure 1 for Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization
Figure 2 for Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization
Figure 3 for Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization
Figure 4 for Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization
Viaarxiv icon

Greenformer: Factorization Toolkit for Efficient Deep Neural Networks

Add code
Sep 14, 2021
Figure 1 for Greenformer: Factorization Toolkit for Efficient Deep Neural Networks
Figure 2 for Greenformer: Factorization Toolkit for Efficient Deep Neural Networks
Figure 3 for Greenformer: Factorization Toolkit for Efficient Deep Neural Networks
Viaarxiv icon

Weakly-supervised Multi-task Learning for Multimodal Affect Recognition

Add code
Apr 23, 2021
Figure 1 for Weakly-supervised Multi-task Learning for Multimodal Affect Recognition
Figure 2 for Weakly-supervised Multi-task Learning for Multimodal Affect Recognition
Figure 3 for Weakly-supervised Multi-task Learning for Multimodal Affect Recognition
Figure 4 for Weakly-supervised Multi-task Learning for Multimodal Affect Recognition
Viaarxiv icon

Multimodal End-to-End Sparse Model for Emotion Recognition

Add code
Mar 27, 2021
Figure 1 for Multimodal End-to-End Sparse Model for Emotion Recognition
Figure 2 for Multimodal End-to-End Sparse Model for Emotion Recognition
Figure 3 for Multimodal End-to-End Sparse Model for Emotion Recognition
Figure 4 for Multimodal End-to-End Sparse Model for Emotion Recognition
Viaarxiv icon