Picture for Suchir Salhan

Suchir Salhan

LangMAP: A Language-Adaptive Approach to Tokenization

Add code
Jun 23, 2026
Viaarxiv icon

Modelling the Diachronic Emergence of Phoneme Frequency Distributions

Add code
Mar 10, 2026
Viaarxiv icon

The Distribution of Phoneme Frequencies across the World's Languages: Macroscopic and Microscopic Information-Theoretic Models

Add code
Mar 03, 2026
Viaarxiv icon

BabyLM Turns 4 and Goes Multilingual: Call for Papers for the 2026 BabyLM Workshop

Add code
Feb 24, 2026
Viaarxiv icon

Teacher Demonstrations in a BabyLM's Zone of Proximal Development for Contingent Multi-Turn Interaction

Add code
Oct 23, 2025
Viaarxiv icon

Less is More: Pre-Training Cross-Lingual Small-Scale Language Models with Cognitively-Plausible Curriculum Learning Strategies

Add code
Oct 30, 2024
Viaarxiv icon