Picture for Rei Higuchi

Rei Higuchi

Inference-Aware Meta-Alignment of LLMs via Non-Linear GRPO

Add code
Feb 02, 2026
Viaarxiv icon

A Relative-Budget Theory for Reinforcement Learning with Verifiable Rewards in Large Language Model Reasoning

Add code
Feb 02, 2026
Viaarxiv icon

Direct Density Ratio Optimization: A Statistically Consistent Approach to Aligning Large Language Models

Add code
May 12, 2025
Viaarxiv icon

When Does Metadata Conditioning (NOT) Work for Language Model Pre-Training? A Study with Context-Free Grammars

Add code
Apr 24, 2025
Figure 1 for When Does Metadata Conditioning (NOT) Work for Language Model Pre-Training? A Study with Context-Free Grammars
Figure 2 for When Does Metadata Conditioning (NOT) Work for Language Model Pre-Training? A Study with Context-Free Grammars
Figure 3 for When Does Metadata Conditioning (NOT) Work for Language Model Pre-Training? A Study with Context-Free Grammars
Figure 4 for When Does Metadata Conditioning (NOT) Work for Language Model Pre-Training? A Study with Context-Free Grammars
Viaarxiv icon