Picture for Steven Y. Feng

Steven Y. Feng

Michael Pokorny

To Memorize or to Retrieve: Scaling Laws for RAG-Considerate Pretraining

Add code
Apr 01, 2026
Viaarxiv icon

Baby Scale: Investigating Models Trained on Individual Children's Language Input

Add code
Mar 31, 2026
Viaarxiv icon

Bringing Up a Bilingual BabyLM: Investigating Multilingual Language Acquisition Using Small-Scale Models

Add code
Mar 31, 2026
Viaarxiv icon

A Unified Definition of Hallucination, Or: It's the World Model, Stupid

Add code
Dec 25, 2025
Viaarxiv icon

Humanity's Last Exam

Add code
Jan 24, 2025
Viaarxiv icon

Is Child-Directed Speech Effective Training Data for Language Models?

Add code
Aug 07, 2024
Figure 1 for Is Child-Directed Speech Effective Training Data for Language Models?
Figure 2 for Is Child-Directed Speech Effective Training Data for Language Models?
Figure 3 for Is Child-Directed Speech Effective Training Data for Language Models?
Figure 4 for Is Child-Directed Speech Effective Training Data for Language Models?
Viaarxiv icon

The BabyView dataset: High-resolution egocentric videos of infants' and young children's everyday experiences

Add code
Jun 14, 2024
Figure 1 for The BabyView dataset: High-resolution egocentric videos of infants' and young children's everyday experiences
Figure 2 for The BabyView dataset: High-resolution egocentric videos of infants' and young children's everyday experiences
Figure 3 for The BabyView dataset: High-resolution egocentric videos of infants' and young children's everyday experiences
Figure 4 for The BabyView dataset: High-resolution egocentric videos of infants' and young children's everyday experiences
Viaarxiv icon

CHARD: Clinical Health-Aware Reasoning Across Dimensions for Text Generation Models

Add code
Oct 09, 2022
Figure 1 for CHARD: Clinical Health-Aware Reasoning Across Dimensions for Text Generation Models
Figure 2 for CHARD: Clinical Health-Aware Reasoning Across Dimensions for Text Generation Models
Figure 3 for CHARD: Clinical Health-Aware Reasoning Across Dimensions for Text Generation Models
Figure 4 for CHARD: Clinical Health-Aware Reasoning Across Dimensions for Text Generation Models
Viaarxiv icon

PINEAPPLE: Personifying INanimate Entities by Acquiring Parallel Personification data for Learning Enhanced generation

Add code
Sep 16, 2022
Figure 1 for PINEAPPLE: Personifying INanimate Entities by Acquiring Parallel Personification data for Learning Enhanced generation
Figure 2 for PINEAPPLE: Personifying INanimate Entities by Acquiring Parallel Personification data for Learning Enhanced generation
Figure 3 for PINEAPPLE: Personifying INanimate Entities by Acquiring Parallel Personification data for Learning Enhanced generation
Figure 4 for PINEAPPLE: Personifying INanimate Entities by Acquiring Parallel Personification data for Learning Enhanced generation
Viaarxiv icon

PANCETTA: Phoneme Aware Neural Completion to Elicit Tongue Twisters Automatically

Add code
Sep 13, 2022
Figure 1 for PANCETTA: Phoneme Aware Neural Completion to Elicit Tongue Twisters Automatically
Figure 2 for PANCETTA: Phoneme Aware Neural Completion to Elicit Tongue Twisters Automatically
Figure 3 for PANCETTA: Phoneme Aware Neural Completion to Elicit Tongue Twisters Automatically
Figure 4 for PANCETTA: Phoneme Aware Neural Completion to Elicit Tongue Twisters Automatically
Viaarxiv icon