Picture for Guokan Shang

Guokan Shang

École Polytechnique, Linagora

Beyond Random Sampling: Efficient Language Model Pretraining via Curriculum Learning

Add code
Jun 12, 2025
Viaarxiv icon

LLM as a Broken Telephone: Iterative Generation Distorts Information

Add code
Feb 27, 2025
Viaarxiv icon

Benchmarking Linguistic Diversity of Large Language Models

Add code
Dec 13, 2024
Figure 1 for Benchmarking Linguistic Diversity of Large Language Models
Figure 2 for Benchmarking Linguistic Diversity of Large Language Models
Figure 3 for Benchmarking Linguistic Diversity of Large Language Models
Figure 4 for Benchmarking Linguistic Diversity of Large Language Models
Viaarxiv icon

Graph Linearization Methods for Reasoning on Graphs with Large Language Models

Add code
Oct 25, 2024
Figure 1 for Graph Linearization Methods for Reasoning on Graphs with Large Language Models
Figure 2 for Graph Linearization Methods for Reasoning on Graphs with Large Language Models
Figure 3 for Graph Linearization Methods for Reasoning on Graphs with Large Language Models
Figure 4 for Graph Linearization Methods for Reasoning on Graphs with Large Language Models
Viaarxiv icon

Atlas-Chat: Adapting Large Language Models for Low-Resource Moroccan Arabic Dialect

Add code
Sep 26, 2024
Viaarxiv icon

Leveraging Discourse Structure for Extractive Meeting Summarization

Add code
May 21, 2024
Viaarxiv icon

FREDSum: A Dialogue Summarization Corpus for French Political Debates

Add code
Dec 08, 2023
Viaarxiv icon

The Claire French Dialogue Dataset

Add code
Nov 28, 2023
Viaarxiv icon

Automatic Analysis of Substantiation in Scientific Peer Reviews

Add code
Nov 20, 2023
Viaarxiv icon

The Curious Decline of Linguistic Diversity: Training Language Models on Synthetic Text

Add code
Nov 16, 2023
Viaarxiv icon