Picture for Tianjian Li

Tianjian Li

SIMPLEMIX: Frustratingly Simple Mixing of Off- and On-policy Data in Language Model Preference Learning

Add code
May 05, 2025
Viaarxiv icon

Upsample or Upweight? Balanced Training on Heavily Imbalanced Datasets

Add code
Oct 06, 2024
Viaarxiv icon

Benchmarking Language Model Creativity: A Case Study on Code Generation

Add code
Jul 12, 2024
Viaarxiv icon

Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data

Add code
Apr 05, 2024
Viaarxiv icon

Error Norm Truncation: Robust Training in the Presence of Data Noise for Text Generation Models

Add code
Oct 02, 2023
Viaarxiv icon

Simple yet Effective Code-Switching Language Identification with Multitask Pre-Training and Transfer Learning

Add code
May 31, 2023
Viaarxiv icon

Why Does Zero-Shot Cross-Lingual Generation Fail? An Explanation and a Solution

Add code
May 27, 2023
Viaarxiv icon