Picture for Tomasz Limisiewicz

Tomasz Limisiewicz

MAGNET: Improving the Multilingual Fairness of Language Models with Adaptive Gradient-Based Tokenization

Add code
Jul 11, 2024
Viaarxiv icon

MYTE: Morphology-Driven Byte Encoding for Better and Fairer Multilingual Language Modeling

Add code
Mar 15, 2024
Figure 1 for MYTE: Morphology-Driven Byte Encoding for Better and Fairer Multilingual Language Modeling
Figure 2 for MYTE: Morphology-Driven Byte Encoding for Better and Fairer Multilingual Language Modeling
Figure 3 for MYTE: Morphology-Driven Byte Encoding for Better and Fairer Multilingual Language Modeling
Figure 4 for MYTE: Morphology-Driven Byte Encoding for Better and Fairer Multilingual Language Modeling
Viaarxiv icon

Breaking the Curse of Multilinguality with Cross-lingual Expert Language Models

Add code
Jan 19, 2024
Viaarxiv icon

Debiasing Algorithm through Model Adaptation

Add code
Oct 29, 2023
Viaarxiv icon

Exploring the Impact of Training Data Distribution and Subword Tokenization on Gender Bias in Machine Translation

Add code
Sep 30, 2023
Figure 1 for Exploring the Impact of Training Data Distribution and Subword Tokenization on Gender Bias in Machine Translation
Figure 2 for Exploring the Impact of Training Data Distribution and Subword Tokenization on Gender Bias in Machine Translation
Figure 3 for Exploring the Impact of Training Data Distribution and Subword Tokenization on Gender Bias in Machine Translation
Figure 4 for Exploring the Impact of Training Data Distribution and Subword Tokenization on Gender Bias in Machine Translation
Viaarxiv icon

Tokenization Impacts Multilingual Language Modeling: Assessing Vocabulary Allocation and Overlap Across Languages

Add code
May 26, 2023
Figure 1 for Tokenization Impacts Multilingual Language Modeling: Assessing Vocabulary Allocation and Overlap Across Languages
Figure 2 for Tokenization Impacts Multilingual Language Modeling: Assessing Vocabulary Allocation and Overlap Across Languages
Figure 3 for Tokenization Impacts Multilingual Language Modeling: Assessing Vocabulary Allocation and Overlap Across Languages
Figure 4 for Tokenization Impacts Multilingual Language Modeling: Assessing Vocabulary Allocation and Overlap Across Languages
Viaarxiv icon

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Add code
Nov 09, 2022
Viaarxiv icon

You Can Have Your Data and Balance It Too: Towards Balanced and Efficient Multilingual Models

Add code
Oct 13, 2022
Figure 1 for You Can Have Your Data and Balance It Too: Towards Balanced and Efficient Multilingual Models
Figure 2 for You Can Have Your Data and Balance It Too: Towards Balanced and Efficient Multilingual Models
Figure 3 for You Can Have Your Data and Balance It Too: Towards Balanced and Efficient Multilingual Models
Figure 4 for You Can Have Your Data and Balance It Too: Towards Balanced and Efficient Multilingual Models
Viaarxiv icon

Don't Forget About Pronouns: Removing Gender Bias in Language Models Without Losing Factual Gender Information

Add code
Jun 21, 2022
Figure 1 for Don't Forget About Pronouns: Removing Gender Bias in Language Models Without Losing Factual Gender Information
Figure 2 for Don't Forget About Pronouns: Removing Gender Bias in Language Models Without Losing Factual Gender Information
Figure 3 for Don't Forget About Pronouns: Removing Gender Bias in Language Models Without Losing Factual Gender Information
Figure 4 for Don't Forget About Pronouns: Removing Gender Bias in Language Models Without Losing Factual Gender Information
Viaarxiv icon

A Balanced Data Approach for Evaluating Cross-Lingual Transfer: Mapping the Linguistic Blood Bank

Add code
May 09, 2022
Figure 1 for A Balanced Data Approach for Evaluating Cross-Lingual Transfer: Mapping the Linguistic Blood Bank
Figure 2 for A Balanced Data Approach for Evaluating Cross-Lingual Transfer: Mapping the Linguistic Blood Bank
Figure 3 for A Balanced Data Approach for Evaluating Cross-Lingual Transfer: Mapping the Linguistic Blood Bank
Figure 4 for A Balanced Data Approach for Evaluating Cross-Lingual Transfer: Mapping the Linguistic Blood Bank
Viaarxiv icon