Picture for Tian Tan

Tian Tan

WavCube: Unifying Speech Representation for Understanding and Generation via Semantic-Acoustic Joint Modeling

Add code
May 07, 2026
Viaarxiv icon

HumanScore: Benchmarking Human Motions in Generated Videos

Add code
Apr 22, 2026
Viaarxiv icon

Parallel OctoMapping: A Scalable Framework for Enhanced Path Planning in Autonomous Navigation

Add code
Mar 23, 2026
Viaarxiv icon

Think Deep, Not Just Long: Measuring LLM Reasoning Effort via Deep-Thinking Tokens

Add code
Feb 13, 2026
Viaarxiv icon

LikeBench: Evaluating Subjective Likability in LLMs for Personalization

Add code
Dec 15, 2025
Viaarxiv icon

Not All Documents Are What You Need for Extracting Instruction Tuning Data

Add code
May 18, 2025
Viaarxiv icon

Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition

Add code
Jul 05, 2024
Figure 1 for Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition
Figure 2 for Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition
Figure 3 for Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition
Figure 4 for Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition
Viaarxiv icon

video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models

Add code
Jun 22, 2024
Figure 1 for video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models
Figure 2 for video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models
Figure 3 for video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models
Figure 4 for video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models
Viaarxiv icon

Text-aware Speech Separation for Multi-talker Keyword Spotting

Add code
Jun 18, 2024
Figure 1 for Text-aware Speech Separation for Multi-talker Keyword Spotting
Figure 2 for Text-aware Speech Separation for Multi-talker Keyword Spotting
Figure 3 for Text-aware Speech Separation for Multi-talker Keyword Spotting
Figure 4 for Text-aware Speech Separation for Multi-talker Keyword Spotting
Viaarxiv icon

Can Large Language Models Understand Spatial Audio?

Add code
Jun 12, 2024
Viaarxiv icon