Picture for Benoît Sagot

Benoît Sagot

ALMAnaCH

CommonLID: Re-evaluating State-of-the-Art Language Identification Performance on Web Data

Add code
Jan 25, 2026
Viaarxiv icon

When the Gold Standard isn't Necessarily Standard: Challenges of Evaluating the Translation of User-Generated Content

Add code
Dec 19, 2025
Figure 1 for When the Gold Standard isn't Necessarily Standard: Challenges of Evaluating the Translation of User-Generated Content
Figure 2 for When the Gold Standard isn't Necessarily Standard: Challenges of Evaluating the Translation of User-Generated Content
Figure 3 for When the Gold Standard isn't Necessarily Standard: Challenges of Evaluating the Translation of User-Generated Content
Figure 4 for When the Gold Standard isn't Necessarily Standard: Challenges of Evaluating the Translation of User-Generated Content
Viaarxiv icon

Gaperon: A Peppered English-French Generative Language Model Suite

Add code
Oct 29, 2025
Viaarxiv icon

TopXGen: Topic-Diverse Parallel Data Generation for Low-Resource Machine Translation

Add code
Aug 12, 2025
Figure 1 for TopXGen: Topic-Diverse Parallel Data Generation for Low-Resource Machine Translation
Figure 2 for TopXGen: Topic-Diverse Parallel Data Generation for Low-Resource Machine Translation
Figure 3 for TopXGen: Topic-Diverse Parallel Data Generation for Low-Resource Machine Translation
Figure 4 for TopXGen: Topic-Diverse Parallel Data Generation for Low-Resource Machine Translation
Viaarxiv icon

ModernBERT or DeBERTaV3? Examining Architecture and Data Influence on Transformer Encoder Models Performance

Add code
Apr 11, 2025
Figure 1 for ModernBERT or DeBERTaV3? Examining Architecture and Data Influence on Transformer Encoder Models Performance
Figure 2 for ModernBERT or DeBERTaV3? Examining Architecture and Data Influence on Transformer Encoder Models Performance
Figure 3 for ModernBERT or DeBERTaV3? Examining Architecture and Data Influence on Transformer Encoder Models Performance
Figure 4 for ModernBERT or DeBERTaV3? Examining Architecture and Data Influence on Transformer Encoder Models Performance
Viaarxiv icon

Explicit Learning and the LLM in Machine Translation

Add code
Mar 12, 2025
Figure 1 for Explicit Learning and the LLM in Machine Translation
Figure 2 for Explicit Learning and the LLM in Machine Translation
Figure 3 for Explicit Learning and the LLM in Machine Translation
Figure 4 for Explicit Learning and the LLM in Machine Translation
Viaarxiv icon

KréyoLID From Language Identification Towards Language Mining

Add code
Mar 09, 2025
Figure 1 for KréyoLID From Language Identification Towards Language Mining
Figure 2 for KréyoLID From Language Identification Towards Language Mining
Figure 3 for KréyoLID From Language Identification Towards Language Mining
Figure 4 for KréyoLID From Language Identification Towards Language Mining
Viaarxiv icon

Compositional Translation: A Novel LLM-based Approach for Low-resource Machine Translation

Add code
Mar 06, 2025
Figure 1 for Compositional Translation: A Novel LLM-based Approach for Low-resource Machine Translation
Figure 2 for Compositional Translation: A Novel LLM-based Approach for Low-resource Machine Translation
Figure 3 for Compositional Translation: A Novel LLM-based Approach for Low-resource Machine Translation
Figure 4 for Compositional Translation: A Novel LLM-based Approach for Low-resource Machine Translation
Viaarxiv icon

Q-Filters: Leveraging QK Geometry for Efficient KV Cache Compression

Add code
Mar 04, 2025
Viaarxiv icon

Diachronic Document Dataset for Semantic Layout Analysis

Add code
Nov 15, 2024
Figure 1 for Diachronic Document Dataset for Semantic Layout Analysis
Figure 2 for Diachronic Document Dataset for Semantic Layout Analysis
Figure 3 for Diachronic Document Dataset for Semantic Layout Analysis
Figure 4 for Diachronic Document Dataset for Semantic Layout Analysis
Viaarxiv icon