Picture for Yongxin Zhu

Yongxin Zhu

Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction

Add code
Jun 18, 2024
Figure 1 for Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction
Figure 2 for Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction
Figure 3 for Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction
Figure 4 for Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction
Viaarxiv icon

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

Add code
Jun 11, 2024
Viaarxiv icon

Generative Pre-trained Speech Language Model with Efficient Hierarchical Transformer

Add code
Jun 03, 2024
Viaarxiv icon

Bitformer: An efficient Transformer with bitwise operation-based attention for Big Data Analytics at low-cost low-precision devices

Add code
Nov 22, 2023
Viaarxiv icon

DiffS2UT: A Semantic Preserving Diffusion Model for Textless Direct Speech-to-Speech Translation

Add code
Oct 26, 2023
Figure 1 for DiffS2UT: A Semantic Preserving Diffusion Model for Textless Direct Speech-to-Speech Translation
Figure 2 for DiffS2UT: A Semantic Preserving Diffusion Model for Textless Direct Speech-to-Speech Translation
Figure 3 for DiffS2UT: A Semantic Preserving Diffusion Model for Textless Direct Speech-to-Speech Translation
Figure 4 for DiffS2UT: A Semantic Preserving Diffusion Model for Textless Direct Speech-to-Speech Translation
Viaarxiv icon

Locate Then Generate: Bridging Vision and Language with Bounding Box for Scene-Text VQA

Add code
Apr 04, 2023
Figure 1 for Locate Then Generate: Bridging Vision and Language with Bounding Box for Scene-Text VQA
Figure 2 for Locate Then Generate: Bridging Vision and Language with Bounding Box for Scene-Text VQA
Figure 3 for Locate Then Generate: Bridging Vision and Language with Bounding Box for Scene-Text VQA
Figure 4 for Locate Then Generate: Bridging Vision and Language with Bounding Box for Scene-Text VQA
Viaarxiv icon

Difformer: Empowering Diffusion Model on Embedding Space for Text Generation

Add code
Dec 19, 2022
Figure 1 for Difformer: Empowering Diffusion Model on Embedding Space for Text Generation
Figure 2 for Difformer: Empowering Diffusion Model on Embedding Space for Text Generation
Figure 3 for Difformer: Empowering Diffusion Model on Embedding Space for Text Generation
Figure 4 for Difformer: Empowering Diffusion Model on Embedding Space for Text Generation
Viaarxiv icon

Sequence-to-Action: Grammatical Error Correction with Action Guided Sequence Generation

Add code
May 22, 2022
Figure 1 for Sequence-to-Action: Grammatical Error Correction with Action Guided Sequence Generation
Figure 2 for Sequence-to-Action: Grammatical Error Correction with Action Guided Sequence Generation
Figure 3 for Sequence-to-Action: Grammatical Error Correction with Action Guided Sequence Generation
Figure 4 for Sequence-to-Action: Grammatical Error Correction with Action Guided Sequence Generation
Viaarxiv icon

Handling noise in image deblurring via joint learning

Add code
Jan 27, 2020
Figure 1 for Handling noise in image deblurring via joint learning
Figure 2 for Handling noise in image deblurring via joint learning
Figure 3 for Handling noise in image deblurring via joint learning
Figure 4 for Handling noise in image deblurring via joint learning
Viaarxiv icon