Picture for Bin Wang

Bin Wang

and Other Contributors

GTR-CoT: Graph Traversal as Visual Chain of Thought for Molecular Structure Recognition

Add code
Jun 09, 2025
Viaarxiv icon

Beyond Classification: Towards Speech Emotion Reasoning with Multitask AudioLLMs

Add code
Jun 07, 2025
Viaarxiv icon

Dimension-Reduction Attack! Video Generative Models are Experts on Controllable Image Synthesis

Add code
May 29, 2025
Viaarxiv icon

HMAD: Advancing E2E Driving with Anchored Offset Proposals and Simulation-Supervised Multi-target Scoring

Add code
May 29, 2025
Viaarxiv icon

Pangu Embedded: An Efficient Dual-system LLM Reasoner with Metacognition

Add code
May 29, 2025
Viaarxiv icon

Scaling-up Perceptual Video Quality Assessment

Add code
May 28, 2025
Viaarxiv icon

TailorKV: A Hybrid Framework for Long-Context Inference via Tailored KV Cache Optimization

Add code
May 26, 2025
Viaarxiv icon

TimeCF: A TimeMixer-Based Model with adaptive Convolution and Sharpness-Aware Minimization Frequency Domain Loss for long-term time seris forecasting

Add code
May 23, 2025
Viaarxiv icon

IFEval-Audio: Benchmarking Instruction-Following Capability in Audio-based Large Language Models

Add code
May 22, 2025
Viaarxiv icon

Towards Spoken Mathematical Reasoning: Benchmarking Speech-based Models over Multi-faceted Math Problems

Add code
May 21, 2025
Viaarxiv icon