Picture for Manan Dey

Manan Dey

MVEB: Massive Video Embedding Benchmark

Add code
Jun 12, 2026
Viaarxiv icon

MMTEB: Massive Multilingual Text Embedding Benchmark

Add code
Feb 19, 2025
Viaarxiv icon

StructTest: Benchmarking LLMs' Reasoning through Compositional Structured Outputs

Add code
Dec 23, 2024
Figure 1 for StructTest: Benchmarking LLMs' Reasoning through Compositional Structured Outputs
Figure 2 for StructTest: Benchmarking LLMs' Reasoning through Compositional Structured Outputs
Figure 3 for StructTest: Benchmarking LLMs' Reasoning through Compositional Structured Outputs
Figure 4 for StructTest: Benchmarking LLMs' Reasoning through Compositional Structured Outputs
Viaarxiv icon

Bridging the Data Provenance Gap Across Text, Speech and Video

Add code
Dec 19, 2024
Figure 1 for Bridging the Data Provenance Gap Across Text, Speech and Video
Figure 2 for Bridging the Data Provenance Gap Across Text, Speech and Video
Figure 3 for Bridging the Data Provenance Gap Across Text, Speech and Video
Figure 4 for Bridging the Data Provenance Gap Across Text, Speech and Video
Viaarxiv icon

Consent in Crisis: The Rapid Decline of the AI Data Commons

Add code
Jul 24, 2024
Figure 1 for Consent in Crisis: The Rapid Decline of the AI Data Commons
Figure 2 for Consent in Crisis: The Rapid Decline of the AI Data Commons
Figure 3 for Consent in Crisis: The Rapid Decline of the AI Data Commons
Figure 4 for Consent in Crisis: The Rapid Decline of the AI Data Commons
Viaarxiv icon

StarCoder 2 and The Stack v2: The Next Generation

Add code
Feb 29, 2024
Figure 1 for StarCoder 2 and The Stack v2: The Next Generation
Figure 2 for StarCoder 2 and The Stack v2: The Next Generation
Figure 3 for StarCoder 2 and The Stack v2: The Next Generation
Figure 4 for StarCoder 2 and The Stack v2: The Next Generation
Viaarxiv icon

StarCoder: may the source be with you!

Add code
May 09, 2023
Figure 1 for StarCoder: may the source be with you!
Figure 2 for StarCoder: may the source be with you!
Figure 3 for StarCoder: may the source be with you!
Figure 4 for StarCoder: may the source be with you!
Viaarxiv icon

The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset

Add code
Mar 07, 2023
Figure 1 for The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset
Figure 2 for The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset
Figure 3 for The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset
Figure 4 for The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset
Viaarxiv icon

SantaCoder: don't reach for the stars!

Add code
Jan 09, 2023
Figure 1 for SantaCoder: don't reach for the stars!
Figure 2 for SantaCoder: don't reach for the stars!
Figure 3 for SantaCoder: don't reach for the stars!
Figure 4 for SantaCoder: don't reach for the stars!
Viaarxiv icon

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Add code
Nov 09, 2022
Viaarxiv icon