Alert button
Picture for Xinyuan Qian

Xinyuan Qian

Alert button

Enhancing Real-World Active Speaker Detection with Multi-Modal Extraction Pre-Training

Add code
Bookmark button
Alert button
Apr 01, 2024
Ruijie Tao, Xinyuan Qian, Rohan Kumar Das, Xiaoxue Gao, Jiadong Wang, Haizhou Li

Viaarxiv icon

Audio-Visual Speaker Tracking: Progress, Challenges, and Future Directions

Add code
Bookmark button
Alert button
Oct 23, 2023
Jinzheng Zhao, Yong Xu, Xinyuan Qian, Davide Berghi, Peipei Wu, Meng Cui, Jianyuan Sun, Philip J. B. Jackson, Wenwu Wang

Figure 1 for Audio-Visual Speaker Tracking: Progress, Challenges, and Future Directions
Figure 2 for Audio-Visual Speaker Tracking: Progress, Challenges, and Future Directions
Figure 3 for Audio-Visual Speaker Tracking: Progress, Challenges, and Future Directions
Figure 4 for Audio-Visual Speaker Tracking: Progress, Challenges, and Future Directions
Viaarxiv icon

LocSelect: Target Speaker Localization with an Auditory Selective Hearing Mechanism

Add code
Bookmark button
Alert button
Oct 17, 2023
Yu Chen, Xinyuan Qian, Zexu Pan, Kainan Chen, Haizhou Li

Viaarxiv icon

Audio Visual Speaker Localization from EgoCentric Views

Add code
Bookmark button
Alert button
Sep 28, 2023
Jinzheng Zhao, Yong Xu, Xinyuan Qian, Wenwu Wang

Figure 1 for Audio Visual Speaker Localization from EgoCentric Views
Figure 2 for Audio Visual Speaker Localization from EgoCentric Views
Figure 3 for Audio Visual Speaker Localization from EgoCentric Views
Figure 4 for Audio Visual Speaker Localization from EgoCentric Views
Viaarxiv icon

Rethinking Speech Recognition with A Multimodal Perspective via Acoustic and Semantic Cooperative Decoding

Add code
Bookmark button
Alert button
May 23, 2023
Tian-Hao Zhang, Hai-Bo Qin, Zhi-Hao Lai, Song-Lu Chen, Qi Liu, Feng Chen, Xinyuan Qian, Xu-Cheng Yin

Figure 1 for Rethinking Speech Recognition with A Multimodal Perspective via Acoustic and Semantic Cooperative Decoding
Figure 2 for Rethinking Speech Recognition with A Multimodal Perspective via Acoustic and Semantic Cooperative Decoding
Figure 3 for Rethinking Speech Recognition with A Multimodal Perspective via Acoustic and Semantic Cooperative Decoding
Figure 4 for Rethinking Speech Recognition with A Multimodal Perspective via Acoustic and Semantic Cooperative Decoding
Viaarxiv icon

Ripple sparse self-attention for monaural speech enhancement

Add code
Bookmark button
Alert button
May 15, 2023
Qiquan Zhang, Hongxu Zhu, Qi Song, Xinyuan Qian, Zhaoheng Ni, Haizhou Li

Figure 1 for Ripple sparse self-attention for monaural speech enhancement
Figure 2 for Ripple sparse self-attention for monaural speech enhancement
Figure 3 for Ripple sparse self-attention for monaural speech enhancement
Figure 4 for Ripple sparse self-attention for monaural speech enhancement
Viaarxiv icon

Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert

Add code
Bookmark button
Alert button
Mar 29, 2023
Jiadong Wang, Xinyuan Qian, Malu Zhang, Robby T. Tan, Haizhou Li

Figure 1 for Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert
Figure 2 for Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert
Figure 3 for Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert
Figure 4 for Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert
Viaarxiv icon

A Miniaturised Camera-based Multi-Modal Tactile Sensor

Add code
Bookmark button
Alert button
Mar 06, 2023
Kaspar Althoefer, Yonggen Ling, Wanlin Li, Xinyuan Qian, Wang Wei Lee, Peng Qi

Figure 1 for A Miniaturised Camera-based Multi-Modal Tactile Sensor
Figure 2 for A Miniaturised Camera-based Multi-Modal Tactile Sensor
Figure 3 for A Miniaturised Camera-based Multi-Modal Tactile Sensor
Figure 4 for A Miniaturised Camera-based Multi-Modal Tactile Sensor
Viaarxiv icon

Predict-and-Update Network: Audio-Visual Speech Recognition Inspired by Human Speech Perception

Add code
Bookmark button
Alert button
Sep 05, 2022
Jiadong Wang, Xinyuan Qian, Haizhou Li

Figure 1 for Predict-and-Update Network: Audio-Visual Speech Recognition Inspired by Human Speech Perception
Figure 2 for Predict-and-Update Network: Audio-Visual Speech Recognition Inspired by Human Speech Perception
Figure 3 for Predict-and-Update Network: Audio-Visual Speech Recognition Inspired by Human Speech Perception
Figure 4 for Predict-and-Update Network: Audio-Visual Speech Recognition Inspired by Human Speech Perception
Viaarxiv icon

Iterative Sound Source Localization for Unknown Number of Sources

Add code
Bookmark button
Alert button
Jun 24, 2022
Yanjie Fu, Meng Ge, Haoran Yin, Xinyuan Qian, Longbiao Wang, Gaoyan Zhang, Jianwu Dang

Figure 1 for Iterative Sound Source Localization for Unknown Number of Sources
Figure 2 for Iterative Sound Source Localization for Unknown Number of Sources
Figure 3 for Iterative Sound Source Localization for Unknown Number of Sources
Figure 4 for Iterative Sound Source Localization for Unknown Number of Sources
Viaarxiv icon