Picture for Volker Stampa

Volker Stampa

A Family of LLMs Liberated from Static Vocabularies

Add code
Mar 16, 2026
Viaarxiv icon

Aleph-Alpha-GermanWeb: Improving German-language LLM pre-training with model-based data curation and synthetic data generation

Add code
Apr 24, 2025
Figure 1 for Aleph-Alpha-GermanWeb: Improving German-language LLM pre-training with model-based data curation and synthetic data generation
Figure 2 for Aleph-Alpha-GermanWeb: Improving German-language LLM pre-training with model-based data curation and synthetic data generation
Figure 3 for Aleph-Alpha-GermanWeb: Improving German-language LLM pre-training with model-based data curation and synthetic data generation
Figure 4 for Aleph-Alpha-GermanWeb: Improving German-language LLM pre-training with model-based data curation and synthetic data generation
Viaarxiv icon