Picture for Xintong Li

Xintong Li

Stepwise Penalization for Length-Efficient Chain-of-Thought Reasoning

Add code
Feb 27, 2026
Viaarxiv icon

WS-GRPO: Weakly-Supervised Group-Relative Policy Optimization for Rollout-Efficient Reasoning

Add code
Feb 19, 2026
Viaarxiv icon

AMPS: Adaptive Modality Preference Steering via Functional Entropy

Add code
Feb 13, 2026
Viaarxiv icon

SceneAlign: Aligning Multimodal Reasoning to Scene Graphs in Complex Visual Scenes

Add code
Jan 09, 2026
Viaarxiv icon

EverMemOS: A Self-Organizing Memory Operating System for Structured Long-Horizon Reasoning

Add code
Jan 05, 2026
Viaarxiv icon

A Survey on Personalized and Pluralistic Preference Alignment in Large Language Models

Add code
Apr 09, 2025
Viaarxiv icon

ASRL:A robust loss function with potential for development

Add code
Apr 09, 2025
Viaarxiv icon

Toward Multi-Session Personalized Conversation: A Large-Scale Dataset and Hierarchical Tree Framework for Implicit Reasoning

Add code
Mar 10, 2025
Viaarxiv icon

Active Learning for Direct Preference Optimization

Add code
Mar 03, 2025
Viaarxiv icon

From Selection to Generation: A Survey of LLM-based Active Learning

Add code
Feb 17, 2025
Viaarxiv icon