Picture for Xiaolin Hu

Xiaolin Hu

Department of Computer Science and Technology, Tsinghua University, Beijing, China

FGNet: Leveraging Feature-Guided Attention to Refine SAM2 for 3D EM Neuron Segmentation

Add code
Nov 17, 2025
Viaarxiv icon

StepProof: Step-by-step verification of natural language mathematical proofs

Add code
Jun 12, 2025
Figure 1 for StepProof: Step-by-step verification of natural language mathematical proofs
Figure 2 for StepProof: Step-by-step verification of natural language mathematical proofs
Figure 3 for StepProof: Step-by-step verification of natural language mathematical proofs
Figure 4 for StepProof: Step-by-step verification of natural language mathematical proofs
Viaarxiv icon

A Fast and Lightweight Model for Causal Audio-Visual Speech Separation

Add code
Jun 07, 2025
Viaarxiv icon

Surrogate Signals from Format and Length: Reinforcement Learning for Solving Mathematical Problems without Ground Truth Answers

Add code
May 26, 2025
Viaarxiv icon

AudioTrust: Benchmarking the Multifaceted Trustworthiness of Audio Large Language Models

Add code
May 22, 2025
Figure 1 for AudioTrust: Benchmarking the Multifaceted Trustworthiness of Audio Large Language Models
Figure 2 for AudioTrust: Benchmarking the Multifaceted Trustworthiness of Audio Large Language Models
Figure 3 for AudioTrust: Benchmarking the Multifaceted Trustworthiness of Audio Large Language Models
Figure 4 for AudioTrust: Benchmarking the Multifaceted Trustworthiness of Audio Large Language Models
Viaarxiv icon

Time-Frequency-Based Attention Cache Memory Model for Real-Time Speech Separation

Add code
May 19, 2025
Viaarxiv icon

GlyphMastero: A Glyph Encoder for High-Fidelity Scene Text Editing

Add code
May 08, 2025
Figure 1 for GlyphMastero: A Glyph Encoder for High-Fidelity Scene Text Editing
Figure 2 for GlyphMastero: A Glyph Encoder for High-Fidelity Scene Text Editing
Figure 3 for GlyphMastero: A Glyph Encoder for High-Fidelity Scene Text Editing
Figure 4 for GlyphMastero: A Glyph Encoder for High-Fidelity Scene Text Editing
Viaarxiv icon

LLaVA-CMoE: Towards Continual Mixture of Experts for Large Vision-Language Models

Add code
Mar 27, 2025
Figure 1 for LLaVA-CMoE: Towards Continual Mixture of Experts for Large Vision-Language Models
Figure 2 for LLaVA-CMoE: Towards Continual Mixture of Experts for Large Vision-Language Models
Figure 3 for LLaVA-CMoE: Towards Continual Mixture of Experts for Large Vision-Language Models
Figure 4 for LLaVA-CMoE: Towards Continual Mixture of Experts for Large Vision-Language Models
Viaarxiv icon

Towards Auto-Regressive Next-Token Prediction: In-Context Learning Emerges from Generalization

Add code
Feb 24, 2025
Figure 1 for Towards Auto-Regressive Next-Token Prediction: In-Context Learning Emerges from Generalization
Figure 2 for Towards Auto-Regressive Next-Token Prediction: In-Context Learning Emerges from Generalization
Figure 3 for Towards Auto-Regressive Next-Token Prediction: In-Context Learning Emerges from Generalization
Figure 4 for Towards Auto-Regressive Next-Token Prediction: In-Context Learning Emerges from Generalization
Viaarxiv icon

ADePT: Adaptive Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning

Add code
Jan 06, 2025
Figure 1 for ADePT: Adaptive Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning
Figure 2 for ADePT: Adaptive Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning
Figure 3 for ADePT: Adaptive Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning
Figure 4 for ADePT: Adaptive Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning
Viaarxiv icon