Picture for Noah A. Smith

Noah A. Smith

Paul G. Allen School of Computer Science & Engineering, University of Washington, Allen Institute for Artificial Intelligence

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models

Add code
Sep 25, 2024
Figure 1 for Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Figure 2 for Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Figure 3 for Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Figure 4 for Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Viaarxiv icon

OLMoE: Open Mixture-of-Experts Language Models

Add code
Sep 03, 2024
Figure 1 for OLMoE: Open Mixture-of-Experts Language Models
Figure 2 for OLMoE: Open Mixture-of-Experts Language Models
Figure 3 for OLMoE: Open Mixture-of-Experts Language Models
Figure 4 for OLMoE: Open Mixture-of-Experts Language Models
Viaarxiv icon

Toward a More Complete OMR Solution

Add code
Aug 31, 2024
Figure 1 for Toward a More Complete OMR Solution
Figure 2 for Toward a More Complete OMR Solution
Figure 3 for Toward a More Complete OMR Solution
Figure 4 for Toward a More Complete OMR Solution
Viaarxiv icon

Risks and NLP Design: A Case Study on Procedural Document QA

Add code
Aug 16, 2024
Viaarxiv icon

Does Liking Yellow Imply Driving a School Bus? Semantic Leakage in Language Models

Add code
Aug 12, 2024
Viaarxiv icon

Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data?

Add code
Jul 24, 2024
Figure 1 for Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data?
Figure 2 for Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data?
Figure 3 for Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data?
Figure 4 for Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data?
Viaarxiv icon

MAGNET: Improving the Multilingual Fairness of Language Models with Adaptive Gradient-Based Tokenization

Add code
Jul 11, 2024
Figure 1 for MAGNET: Improving the Multilingual Fairness of Language Models with Adaptive Gradient-Based Tokenization
Figure 2 for MAGNET: Improving the Multilingual Fairness of Language Models with Adaptive Gradient-Based Tokenization
Figure 3 for MAGNET: Improving the Multilingual Fairness of Language Models with Adaptive Gradient-Based Tokenization
Figure 4 for MAGNET: Improving the Multilingual Fairness of Language Models with Adaptive Gradient-Based Tokenization
Viaarxiv icon

MUSE: Machine Unlearning Six-Way Evaluation for Language Models

Add code
Jul 08, 2024
Viaarxiv icon

Voices Unheard: NLP Resources and Models for Yorùbá Regional Dialects

Add code
Jun 27, 2024
Figure 1 for Voices Unheard: NLP Resources and Models for Yorùbá Regional Dialects
Figure 2 for Voices Unheard: NLP Resources and Models for Yorùbá Regional Dialects
Figure 3 for Voices Unheard: NLP Resources and Models for Yorùbá Regional Dialects
Figure 4 for Voices Unheard: NLP Resources and Models for Yorùbá Regional Dialects
Viaarxiv icon

Evaluating Copyright Takedown Methods for Language Models

Add code
Jun 26, 2024
Figure 1 for Evaluating Copyright Takedown Methods for Language Models
Figure 2 for Evaluating Copyright Takedown Methods for Language Models
Figure 3 for Evaluating Copyright Takedown Methods for Language Models
Figure 4 for Evaluating Copyright Takedown Methods for Language Models
Viaarxiv icon