Picture for Kai Li

Kai Li

Department of Computer Science and Technology, Tsinghua University, Beijing, China

Hi-Agent: Hierarchical Vision-Language Agents for Mobile Device Control

Add code
Oct 16, 2025
Viaarxiv icon

AudioMarathon: A Comprehensive Benchmark for Long-Context Audio Understanding and Efficiency in Audio LLMs

Add code
Oct 08, 2025
Figure 1 for AudioMarathon: A Comprehensive Benchmark for Long-Context Audio Understanding and Efficiency in Audio LLMs
Figure 2 for AudioMarathon: A Comprehensive Benchmark for Long-Context Audio Understanding and Efficiency in Audio LLMs
Figure 3 for AudioMarathon: A Comprehensive Benchmark for Long-Context Audio Understanding and Efficiency in Audio LLMs
Figure 4 for AudioMarathon: A Comprehensive Benchmark for Long-Context Audio Understanding and Efficiency in Audio LLMs
Viaarxiv icon

Text2Move: Text-to-moving sound generation via trajectory prediction and temporal alignment

Add code
Sep 26, 2025
Viaarxiv icon

Goal-Oriented Skill Abstraction for Offline Multi-Task Reinforcement Learning

Add code
Jul 09, 2025
Viaarxiv icon

Synergizing Reinforcement Learning and Genetic Algorithms for Neural Combinatorial Optimization

Add code
Jun 11, 2025
Figure 1 for Synergizing Reinforcement Learning and Genetic Algorithms for Neural Combinatorial Optimization
Figure 2 for Synergizing Reinforcement Learning and Genetic Algorithms for Neural Combinatorial Optimization
Figure 3 for Synergizing Reinforcement Learning and Genetic Algorithms for Neural Combinatorial Optimization
Figure 4 for Synergizing Reinforcement Learning and Genetic Algorithms for Neural Combinatorial Optimization
Viaarxiv icon

Segment Concealed Objects with Incomplete Supervision

Add code
Jun 10, 2025
Viaarxiv icon

A Fast and Lightweight Model for Causal Audio-Visual Speech Separation

Add code
Jun 07, 2025
Viaarxiv icon

Zero-Trust Foundation Models: A New Paradigm for Secure and Collaborative Artificial Intelligence for Internet of Things

Add code
May 26, 2025
Viaarxiv icon

SoloSpeech: Enhancing Intelligibility and Quality in Target Speech Extraction through a Cascaded Generative Pipeline

Add code
May 25, 2025
Viaarxiv icon

AudioTrust: Benchmarking the Multifaceted Trustworthiness of Audio Large Language Models

Add code
May 22, 2025
Figure 1 for AudioTrust: Benchmarking the Multifaceted Trustworthiness of Audio Large Language Models
Figure 2 for AudioTrust: Benchmarking the Multifaceted Trustworthiness of Audio Large Language Models
Figure 3 for AudioTrust: Benchmarking the Multifaceted Trustworthiness of Audio Large Language Models
Figure 4 for AudioTrust: Benchmarking the Multifaceted Trustworthiness of Audio Large Language Models
Viaarxiv icon