Picture for Hinrich Schütze

Hinrich Schütze

Shammie

How far can bias go? -- Tracing bias from pretraining data to alignment

Add code
Nov 28, 2024
Viaarxiv icon

Large Language Models as Neurolinguistic Subjects: Identifying Internal Representations for Form and Meaning

Add code
Nov 12, 2024
Figure 1 for Large Language Models as Neurolinguistic Subjects: Identifying Internal Representations for Form and Meaning
Figure 2 for Large Language Models as Neurolinguistic Subjects: Identifying Internal Representations for Form and Meaning
Figure 3 for Large Language Models as Neurolinguistic Subjects: Identifying Internal Representations for Form and Meaning
Figure 4 for Large Language Models as Neurolinguistic Subjects: Identifying Internal Representations for Form and Meaning
Viaarxiv icon

Derivational Morphology Reveals Analogical Generalization in Large Language Models

Add code
Nov 12, 2024
Viaarxiv icon

GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority Languages

Add code
Oct 31, 2024
Figure 1 for GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority Languages
Figure 2 for GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority Languages
Figure 3 for GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority Languages
Figure 4 for GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority Languages
Viaarxiv icon

MEXA: Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment

Add code
Oct 08, 2024
Viaarxiv icon

Better Call SAUL: Fluent and Consistent Language Model Editing with Generation Regularization

Add code
Oct 03, 2024
Figure 1 for Better Call SAUL: Fluent and Consistent Language Model Editing with Generation Regularization
Figure 2 for Better Call SAUL: Fluent and Consistent Language Model Editing with Generation Regularization
Figure 3 for Better Call SAUL: Fluent and Consistent Language Model Editing with Generation Regularization
Figure 4 for Better Call SAUL: Fluent and Consistent Language Model Editing with Generation Regularization
Viaarxiv icon

LangSAMP: Language-Script Aware Multilingual Pretraining

Add code
Sep 26, 2024
Viaarxiv icon

EMMA-500: Enhancing Massively Multilingual Adaptation of Large Language Models

Add code
Sep 26, 2024
Viaarxiv icon

How Transliterations Improve Crosslingual Alignment

Add code
Sep 25, 2024
Figure 1 for How Transliterations Improve Crosslingual Alignment
Figure 2 for How Transliterations Improve Crosslingual Alignment
Figure 3 for How Transliterations Improve Crosslingual Alignment
Figure 4 for How Transliterations Improve Crosslingual Alignment
Viaarxiv icon

CRAFT Your Dataset: Task-Specific Synthetic Dataset Generation Through Corpus Retrieval and Augmentation

Add code
Sep 03, 2024
Viaarxiv icon