The Data-Quality Illusion: Rethinking Classifier-Based Quality Filtering for LLM Pretraining

Add code
Oct 02, 2025
Figure 1 for The Data-Quality Illusion: Rethinking Classifier-Based Quality Filtering for LLM Pretraining
Figure 2 for The Data-Quality Illusion: Rethinking Classifier-Based Quality Filtering for LLM Pretraining
Figure 3 for The Data-Quality Illusion: Rethinking Classifier-Based Quality Filtering for LLM Pretraining
Figure 4 for The Data-Quality Illusion: Rethinking Classifier-Based Quality Filtering for LLM Pretraining

Share this with someone who'll enjoy it:

View paper onarxiv icon

Share this with someone who'll enjoy it: