Picture for Wen Wang

Wen Wang

Two Birds With One Stone: Enhancing Communication and Sensing via Multi-Functional RIS

Add code
Oct 09, 2024
Viaarxiv icon

Unified Audio Event Detection

Add code
Sep 13, 2024
Viaarxiv icon

WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling

Add code
Aug 29, 2024
Figure 1 for WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
Figure 2 for WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
Figure 3 for WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
Figure 4 for WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
Viaarxiv icon

Recording for Eyes, Not Echoing to Ears: Contextualized Spoken-to-Written Conversion of ASR Transcripts

Add code
Aug 19, 2024
Viaarxiv icon

Multimodal Fusion and Coherence Modeling for Video Topic Segmentation

Add code
Aug 01, 2024
Viaarxiv icon

MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequence

Add code
Jul 23, 2024
Viaarxiv icon

FreeCompose: Generic Zero-Shot Image Composition with Diffusion Prior

Add code
Jul 06, 2024
Viaarxiv icon

ClinicalLab: Aligning Agents for Multi-Departmental Clinical Diagnostics in the Real World

Add code
Jun 19, 2024
Viaarxiv icon

Skip-Layer Attention: Bridging Abstract and Detailed Dependencies in Transformers

Add code
Jun 17, 2024
Viaarxiv icon

Self-Distillation Prototypes Network: Learning Robust Speaker Representations without Supervision

Add code
Jun 17, 2024
Viaarxiv icon