Picture for Enrico Shippole

Enrico Shippole

The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text

Add code
Jun 05, 2025
Viaarxiv icon

Bridging the Data Provenance Gap Across Text, Speech and Video

Add code
Dec 19, 2024
Figure 1 for Bridging the Data Provenance Gap Across Text, Speech and Video
Figure 2 for Bridging the Data Provenance Gap Across Text, Speech and Video
Figure 3 for Bridging the Data Provenance Gap Across Text, Speech and Video
Figure 4 for Bridging the Data Provenance Gap Across Text, Speech and Video
Viaarxiv icon

Consent in Crisis: The Rapid Decline of the AI Data Commons

Add code
Jul 24, 2024
Figure 1 for Consent in Crisis: The Rapid Decline of the AI Data Commons
Figure 2 for Consent in Crisis: The Rapid Decline of the AI Data Commons
Figure 3 for Consent in Crisis: The Rapid Decline of the AI Data Commons
Figure 4 for Consent in Crisis: The Rapid Decline of the AI Data Commons
Viaarxiv icon

Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers

Add code
Jan 21, 2024
Viaarxiv icon

The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing & Attribution in AI

Add code
Nov 04, 2023
Viaarxiv icon

YaRN: Efficient Context Window Extension of Large Language Models

Add code
Aug 31, 2023
Figure 1 for YaRN: Efficient Context Window Extension of Large Language Models
Figure 2 for YaRN: Efficient Context Window Extension of Large Language Models
Figure 3 for YaRN: Efficient Context Window Extension of Large Language Models
Figure 4 for YaRN: Efficient Context Window Extension of Large Language Models
Viaarxiv icon