Picture for Junxiao Xue

Junxiao Xue

RePose: A Real-Time 3D Human Pose Estimation and Biomechanical Analysis Framework for Rehabilitation

Add code
Jan 02, 2026
Viaarxiv icon

Disentangling Hardness from Noise: An Uncertainty-Driven Model-Agnostic Framework for Long-Tailed Remote Sensing Classification

Add code
Jan 01, 2026
Viaarxiv icon

AD-AVSR: Asymmetric Dual-stream Enhancement for Robust Audio-Visual Speech Recognition

Add code
Aug 11, 2025
Viaarxiv icon

A Trustworthy Method for Multimodal Emotion Recognition

Add code
Aug 11, 2025
Viaarxiv icon

eMotions: A Large-Scale Dataset and Audio-Visual Fusion Network for Emotion Analysis in Short-form Videos

Add code
Aug 09, 2025
Viaarxiv icon

HOLA: Enhancing Audio-visual Deepfake Detection via Hierarchical Contextual Aggregations and Efficient Pre-training

Add code
Jul 30, 2025
Viaarxiv icon

ViC-Bench: Benchmarking Visual-Interleaved Chain-of-Thought Capability in MLLMs with Free-Style Intermediate State Representations

Add code
May 20, 2025
Figure 1 for ViC-Bench: Benchmarking Visual-Interleaved Chain-of-Thought Capability in MLLMs with Free-Style Intermediate State Representations
Figure 2 for ViC-Bench: Benchmarking Visual-Interleaved Chain-of-Thought Capability in MLLMs with Free-Style Intermediate State Representations
Figure 3 for ViC-Bench: Benchmarking Visual-Interleaved Chain-of-Thought Capability in MLLMs with Free-Style Intermediate State Representations
Figure 4 for ViC-Bench: Benchmarking Visual-Interleaved Chain-of-Thought Capability in MLLMs with Free-Style Intermediate State Representations
Viaarxiv icon

Enhanced Multimodal RAG-LLM for Accurate Visual Question Answering

Add code
Dec 30, 2024
Figure 1 for Enhanced Multimodal RAG-LLM for Accurate Visual Question Answering
Figure 2 for Enhanced Multimodal RAG-LLM for Accurate Visual Question Answering
Figure 3 for Enhanced Multimodal RAG-LLM for Accurate Visual Question Answering
Figure 4 for Enhanced Multimodal RAG-LLM for Accurate Visual Question Answering
Viaarxiv icon

3A-YOLO: New Real-Time Object Detectors with Triple Discriminative Awareness and Coordinated Representations

Add code
Dec 10, 2024
Figure 1 for 3A-YOLO: New Real-Time Object Detectors with Triple Discriminative Awareness and Coordinated Representations
Figure 2 for 3A-YOLO: New Real-Time Object Detectors with Triple Discriminative Awareness and Coordinated Representations
Figure 3 for 3A-YOLO: New Real-Time Object Detectors with Triple Discriminative Awareness and Coordinated Representations
Figure 4 for 3A-YOLO: New Real-Time Object Detectors with Triple Discriminative Awareness and Coordinated Representations
Viaarxiv icon

Pilot-guided Multimodal Semantic Communication for Audio-Visual Event Localization

Add code
Dec 09, 2024
Viaarxiv icon