Picture for Haoyu Song

Haoyu Song

CLARITY: Contextual Linguistic Adaptation and Accent Retrieval for Dual-Bias Mitigation in Text-to-Speech Generation

Add code
Nov 14, 2025
Viaarxiv icon

RGB-Event based Pedestrian Attribute Recognition: A Benchmark Dataset and An Asymmetric RWKV Fusion Framework

Add code
Apr 14, 2025
Figure 1 for RGB-Event based Pedestrian Attribute Recognition: A Benchmark Dataset and An Asymmetric RWKV Fusion Framework
Figure 2 for RGB-Event based Pedestrian Attribute Recognition: A Benchmark Dataset and An Asymmetric RWKV Fusion Framework
Figure 3 for RGB-Event based Pedestrian Attribute Recognition: A Benchmark Dataset and An Asymmetric RWKV Fusion Framework
Figure 4 for RGB-Event based Pedestrian Attribute Recognition: A Benchmark Dataset and An Asymmetric RWKV Fusion Framework
Viaarxiv icon

VELoRA: A Low-Rank Adaptation Approach for Efficient RGB-Event based Recognition

Add code
Dec 28, 2024
Figure 1 for VELoRA: A Low-Rank Adaptation Approach for Efficient RGB-Event based Recognition
Figure 2 for VELoRA: A Low-Rank Adaptation Approach for Efficient RGB-Event based Recognition
Figure 3 for VELoRA: A Low-Rank Adaptation Approach for Efficient RGB-Event based Recognition
Figure 4 for VELoRA: A Low-Rank Adaptation Approach for Efficient RGB-Event based Recognition
Viaarxiv icon

A Stack-Propagation Framework for Low-Resource Personalized Dialogue Generation

Add code
Oct 26, 2024
Figure 1 for A Stack-Propagation Framework for Low-Resource Personalized Dialogue Generation
Figure 2 for A Stack-Propagation Framework for Low-Resource Personalized Dialogue Generation
Figure 3 for A Stack-Propagation Framework for Low-Resource Personalized Dialogue Generation
Figure 4 for A Stack-Propagation Framework for Low-Resource Personalized Dialogue Generation
Viaarxiv icon

SNN-PAR: Energy Efficient Pedestrian Attribute Recognition via Spiking Neural Networks

Add code
Oct 10, 2024
Viaarxiv icon

MAT-SED: A Masked Audio Transformer with Masked-Reconstruction Based Pre-training for Sound Event Detection

Add code
Aug 19, 2024
Figure 1 for MAT-SED: A Masked Audio Transformer with Masked-Reconstruction Based Pre-training for Sound Event Detection
Figure 2 for MAT-SED: A Masked Audio Transformer with Masked-Reconstruction Based Pre-training for Sound Event Detection
Figure 3 for MAT-SED: A Masked Audio Transformer with Masked-Reconstruction Based Pre-training for Sound Event Detection
Figure 4 for MAT-SED: A Masked Audio Transformer with Masked-Reconstruction Based Pre-training for Sound Event Detection
Viaarxiv icon

MAT-SED: AMasked Audio Transformer with Masked-Reconstruction Based Pre-training for Sound Event Detection

Add code
Aug 16, 2024
Figure 1 for MAT-SED: AMasked Audio Transformer with Masked-Reconstruction Based Pre-training for Sound Event Detection
Figure 2 for MAT-SED: AMasked Audio Transformer with Masked-Reconstruction Based Pre-training for Sound Event Detection
Figure 3 for MAT-SED: AMasked Audio Transformer with Masked-Reconstruction Based Pre-training for Sound Event Detection
Figure 4 for MAT-SED: AMasked Audio Transformer with Masked-Reconstruction Based Pre-training for Sound Event Detection
Viaarxiv icon

Language Models are General-Purpose Interfaces

Add code
Jun 13, 2022
Figure 1 for Language Models are General-Purpose Interfaces
Figure 2 for Language Models are General-Purpose Interfaces
Figure 3 for Language Models are General-Purpose Interfaces
Figure 4 for Language Models are General-Purpose Interfaces
Viaarxiv icon

Visually-Augmented Language Modeling

Add code
May 20, 2022
Figure 1 for Visually-Augmented Language Modeling
Figure 2 for Visually-Augmented Language Modeling
Figure 3 for Visually-Augmented Language Modeling
Figure 4 for Visually-Augmented Language Modeling
Viaarxiv icon

CLIP Models are Few-shot Learners: Empirical Studies on VQA and Visual Entailment

Add code
Mar 14, 2022
Figure 1 for CLIP Models are Few-shot Learners: Empirical Studies on VQA and Visual Entailment
Figure 2 for CLIP Models are Few-shot Learners: Empirical Studies on VQA and Visual Entailment
Figure 3 for CLIP Models are Few-shot Learners: Empirical Studies on VQA and Visual Entailment
Figure 4 for CLIP Models are Few-shot Learners: Empirical Studies on VQA and Visual Entailment
Viaarxiv icon