Picture for Ming Cheng

Ming Cheng

Diarization-Aware Multi-Speaker Automatic Speech Recognition via Large Language Models

Add code
Jun 06, 2025
Viaarxiv icon

Music's Multimodal Complexity in AVQA: Why We Need More than General Multimodal LLMs

Add code
May 27, 2025
Viaarxiv icon

Sci-LoRA: Mixture of Scientific LoRAs for Cross-Domain Lay Paraphrasing

Add code
May 24, 2025
Viaarxiv icon

Multi-Channel Sequence-to-Sequence Neural Diarization: Experimental Results for The MISP 2025 Challenge

Add code
May 22, 2025
Viaarxiv icon

NewsNet-SDF: Stochastic Discount Factor Estimation with Pretrained Language Model News Embeddings via Adversarial Networks

Add code
May 11, 2025
Viaarxiv icon

Point-Cache: Test-time Dynamic and Hierarchical Cache for Robust and Generalizable Point Cloud Analysis

Add code
Mar 15, 2025
Viaarxiv icon

Visual Zero-Shot E-Commerce Product Attribute Value Extraction

Add code
Feb 21, 2025
Viaarxiv icon

Learning Musical Representations for Music Performance Question Answering

Add code
Feb 10, 2025
Viaarxiv icon

Temporal Working Memory: Query-Guided Segment Refinement for Enhanced Multimodal Understanding

Add code
Feb 09, 2025
Figure 1 for Temporal Working Memory: Query-Guided Segment Refinement for Enhanced Multimodal Understanding
Figure 2 for Temporal Working Memory: Query-Guided Segment Refinement for Enhanced Multimodal Understanding
Figure 3 for Temporal Working Memory: Query-Guided Segment Refinement for Enhanced Multimodal Understanding
Figure 4 for Temporal Working Memory: Query-Guided Segment Refinement for Enhanced Multimodal Understanding
Viaarxiv icon

Sequence-to-Sequence Neural Diarization with Automatic Speaker Detection and Representation

Add code
Nov 21, 2024
Viaarxiv icon