Picture for Teng Wang

Teng Wang

Reinforcing Video Reasoning with Focused Thinking

Add code
May 30, 2025
Viaarxiv icon

Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?

Add code
May 27, 2025
Viaarxiv icon

CP-Router: An Uncertainty-Aware Router Between LLM and LRM

Add code
May 26, 2025
Viaarxiv icon

TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation

Add code
May 08, 2025
Viaarxiv icon

GenHancer: Imperfect Generative Models are Secretly Strong Vision-Centric Enhancers

Add code
Mar 25, 2025
Viaarxiv icon

Towards Hierarchical Multi-Step Reward Models for Enhanced Reasoning in Large Language Models

Add code
Mar 19, 2025
Viaarxiv icon

LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos

Add code
Nov 29, 2024
Figure 1 for LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos
Figure 2 for LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos
Figure 3 for LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos
Figure 4 for LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos
Viaarxiv icon

BPP-Search: Enhancing Tree of Thought Reasoning for Mathematical Modeling Problem Solving

Add code
Nov 26, 2024
Figure 1 for BPP-Search: Enhancing Tree of Thought Reasoning for Mathematical Modeling Problem Solving
Figure 2 for BPP-Search: Enhancing Tree of Thought Reasoning for Mathematical Modeling Problem Solving
Figure 3 for BPP-Search: Enhancing Tree of Thought Reasoning for Mathematical Modeling Problem Solving
Figure 4 for BPP-Search: Enhancing Tree of Thought Reasoning for Mathematical Modeling Problem Solving
Viaarxiv icon

ImagineNav: Prompting Vision-Language Models as Embodied Navigator through Scene Imagination

Add code
Oct 13, 2024
Viaarxiv icon

Sample then Identify: A General Framework for Risk Control and Assessment in Multimodal Large Language Models

Add code
Oct 10, 2024
Figure 1 for Sample then Identify: A General Framework for Risk Control and Assessment in Multimodal Large Language Models
Figure 2 for Sample then Identify: A General Framework for Risk Control and Assessment in Multimodal Large Language Models
Figure 3 for Sample then Identify: A General Framework for Risk Control and Assessment in Multimodal Large Language Models
Figure 4 for Sample then Identify: A General Framework for Risk Control and Assessment in Multimodal Large Language Models
Viaarxiv icon