Picture for Bei Li

Bei Li

Dissecting Long Reasoning Models: An Empirical Study

Add code
Jun 05, 2025
Viaarxiv icon

Selecting Demonstrations for Many-Shot In-Context Learning via Gradient Matching

Add code
Jun 05, 2025
Viaarxiv icon

Beyond Decoder-only: Large Language Models Can be Good Encoders for Machine Translation

Add code
Mar 09, 2025
Viaarxiv icon

Earlier Tokens Contribute More: Learning Direct Preference Optimization From Temporal Decay Perspective

Add code
Feb 20, 2025
Viaarxiv icon

Optimizing Speech Multi-View Feature Fusion through Conditional Computation

Add code
Jan 14, 2025
Viaarxiv icon

SLAM: Towards Efficient Multilingual Reasoning via Selective Language Alignment

Add code
Jan 07, 2025
Viaarxiv icon

Disentangling Preference Representation and Text Generation for Efficient Individual Preference Alignment

Add code
Dec 30, 2024
Viaarxiv icon

Early Exit Is a Natural Capability in Transformer-based Models: An Empirical Study on Early Exit without Joint Optimization

Add code
Dec 02, 2024
Figure 1 for Early Exit Is a Natural Capability in Transformer-based Models: An Empirical Study on Early Exit without Joint Optimization
Figure 2 for Early Exit Is a Natural Capability in Transformer-based Models: An Empirical Study on Early Exit without Joint Optimization
Figure 3 for Early Exit Is a Natural Capability in Transformer-based Models: An Empirical Study on Early Exit without Joint Optimization
Figure 4 for Early Exit Is a Natural Capability in Transformer-based Models: An Empirical Study on Early Exit without Joint Optimization
Viaarxiv icon

Predictor-Corrector Enhanced Transformers with Exponential Moving Average Coefficient Learning

Add code
Nov 05, 2024
Viaarxiv icon

Scaling Laws Across Model Architectures: A Comparative Analysis of Dense and MoE Models in Large Language Models

Add code
Oct 08, 2024
Viaarxiv icon