Picture for Kai Li

Kai Li

Department of Computer Science and Technology, Tsinghua University, Beijing, China

Time-Frequency-Based Attention Cache Memory Model for Real-Time Speech Separation

Add code
May 19, 2025
Viaarxiv icon

MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix

Add code
May 19, 2025
Figure 1 for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
Figure 2 for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
Figure 3 for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
Figure 4 for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
Viaarxiv icon

DIMM: Decoupled Multi-hierarchy Kalman Filter for 3D Object Tracking

Add code
May 18, 2025
Viaarxiv icon

SepPrune: Structured Pruning for Efficient Deep Speech Separation

Add code
May 17, 2025
Viaarxiv icon

Undermining Federated Learning Accuracy in EdgeIoT via Variational Graph Auto-Encoders

Add code
Apr 14, 2025
Figure 1 for Undermining Federated Learning Accuracy in EdgeIoT via Variational Graph Auto-Encoders
Figure 2 for Undermining Federated Learning Accuracy in EdgeIoT via Variational Graph Auto-Encoders
Figure 3 for Undermining Federated Learning Accuracy in EdgeIoT via Variational Graph Auto-Encoders
Figure 4 for Undermining Federated Learning Accuracy in EdgeIoT via Variational Graph Auto-Encoders
Viaarxiv icon

Using machine learning method for variable star classification using the TESS Sectors 1-57 data

Add code
Apr 01, 2025
Viaarxiv icon

LRSCLIP: A Vision-Language Foundation Model for Aligning Remote Sensing Image with Longer Text

Add code
Mar 25, 2025
Figure 1 for LRSCLIP: A Vision-Language Foundation Model for Aligning Remote Sensing Image with Longer Text
Figure 2 for LRSCLIP: A Vision-Language Foundation Model for Aligning Remote Sensing Image with Longer Text
Figure 3 for LRSCLIP: A Vision-Language Foundation Model for Aligning Remote Sensing Image with Longer Text
Figure 4 for LRSCLIP: A Vision-Language Foundation Model for Aligning Remote Sensing Image with Longer Text
Viaarxiv icon

Repurposing 2D Diffusion Models with Gaussian Atlas for 3D Generation

Add code
Mar 20, 2025
Viaarxiv icon

Automatic Operator-level Parallelism Planning for Distributed Deep Learning -- A Mixed-Integer Programming Approach

Add code
Mar 12, 2025
Figure 1 for Automatic Operator-level Parallelism Planning for Distributed Deep Learning -- A Mixed-Integer Programming Approach
Figure 2 for Automatic Operator-level Parallelism Planning for Distributed Deep Learning -- A Mixed-Integer Programming Approach
Figure 3 for Automatic Operator-level Parallelism Planning for Distributed Deep Learning -- A Mixed-Integer Programming Approach
Figure 4 for Automatic Operator-level Parallelism Planning for Distributed Deep Learning -- A Mixed-Integer Programming Approach
Viaarxiv icon

Collective Behavior Clone with Visual Attention via Neural Interaction Graph Prediction

Add code
Mar 10, 2025
Figure 1 for Collective Behavior Clone with Visual Attention via Neural Interaction Graph Prediction
Figure 2 for Collective Behavior Clone with Visual Attention via Neural Interaction Graph Prediction
Figure 3 for Collective Behavior Clone with Visual Attention via Neural Interaction Graph Prediction
Figure 4 for Collective Behavior Clone with Visual Attention via Neural Interaction Graph Prediction
Viaarxiv icon