Picture for Yaoting Wang

Yaoting Wang

School of computer science and technology, Tiangong University, Tianjin Key Laboratory of Autonomous Intelligence Technology and Systems

On Path to Multimodal Generalist: General-Level and General-Bench

Add code
May 07, 2025
Viaarxiv icon

Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey

Add code
Mar 16, 2025
Viaarxiv icon

AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs

Add code
Jan 03, 2025
Figure 1 for AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs
Figure 2 for AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs
Figure 3 for AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs
Figure 4 for AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs
Viaarxiv icon

Stepping Stones: A Progressive Training Strategy for Audio-Visual Semantic Segmentation

Add code
Jul 16, 2024
Figure 1 for Stepping Stones: A Progressive Training Strategy for Audio-Visual Semantic Segmentation
Figure 2 for Stepping Stones: A Progressive Training Strategy for Audio-Visual Semantic Segmentation
Figure 3 for Stepping Stones: A Progressive Training Strategy for Audio-Visual Semantic Segmentation
Figure 4 for Stepping Stones: A Progressive Training Strategy for Audio-Visual Semantic Segmentation
Viaarxiv icon

Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes

Add code
Jul 15, 2024
Viaarxiv icon

Can Textual Semantics Mitigate Sounding Object Segmentation Preference?

Add code
Jul 15, 2024
Figure 1 for Can Textual Semantics Mitigate Sounding Object Segmentation Preference?
Figure 2 for Can Textual Semantics Mitigate Sounding Object Segmentation Preference?
Figure 3 for Can Textual Semantics Mitigate Sounding Object Segmentation Preference?
Figure 4 for Can Textual Semantics Mitigate Sounding Object Segmentation Preference?
Viaarxiv icon

Prompting Segmentation with Sound is Generalizable Audio-Visual Source Localizer

Add code
Sep 18, 2023
Viaarxiv icon

Cross-Attention is Not Enough: Incongruity-Aware Multimodal Sentiment Analysis and Emotion Recognition

Add code
May 23, 2023
Figure 1 for Cross-Attention is Not Enough: Incongruity-Aware Multimodal Sentiment Analysis and Emotion Recognition
Figure 2 for Cross-Attention is Not Enough: Incongruity-Aware Multimodal Sentiment Analysis and Emotion Recognition
Figure 3 for Cross-Attention is Not Enough: Incongruity-Aware Multimodal Sentiment Analysis and Emotion Recognition
Figure 4 for Cross-Attention is Not Enough: Incongruity-Aware Multimodal Sentiment Analysis and Emotion Recognition
Viaarxiv icon

Aesthetic Quality Assessment for Group photograph

Add code
Feb 04, 2020
Figure 1 for Aesthetic Quality Assessment for Group photograph
Figure 2 for Aesthetic Quality Assessment for Group photograph
Figure 3 for Aesthetic Quality Assessment for Group photograph
Figure 4 for Aesthetic Quality Assessment for Group photograph
Viaarxiv icon