Picture for Qilang Ye

Qilang Ye

Answering Diverse Questions via Text Attached with Key Audio-Visual Clues

Add code
Mar 11, 2024
Figure 1 for Answering Diverse Questions via Text Attached with Key Audio-Visual Clues
Figure 2 for Answering Diverse Questions via Text Attached with Key Audio-Visual Clues
Figure 3 for Answering Diverse Questions via Text Attached with Key Audio-Visual Clues
Figure 4 for Answering Diverse Questions via Text Attached with Key Audio-Visual Clues
Viaarxiv icon

CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios

Add code
Mar 07, 2024
Figure 1 for CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios
Figure 2 for CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios
Figure 3 for CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios
Figure 4 for CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios
Viaarxiv icon