Multilingual Text Classification


Multilingual text classification is the process of categorizing text documents in multiple languages into predefined categories.

From Synthetic to Native: Benchmarking Multilingual Intent Classification in Logistics Customer Service

Add code
Mar 24, 2026
Viaarxiv icon

SozKZ: Training Efficient Small Language Models for Kazakh from Scratch

Add code
Mar 21, 2026
Viaarxiv icon

MzansiText and MzansiLM: An Open Corpus and Decoder-Only Language Model for South African Languages

Add code
Mar 21, 2026
Viaarxiv icon

Long-Context Encoder Models for Polish Language Understanding

Add code
Mar 12, 2026
Viaarxiv icon

MUTEX: Leveraging Multilingual Transformers and Conditional Random Fields for Enhanced Urdu Toxic Span Detection

Add code
Mar 05, 2026
Viaarxiv icon

FEAST: Retrieval-Augmented Multi-Hierarchical Food Classification for the FoodEx2 System

Add code
Mar 03, 2026
Viaarxiv icon

Enhancing Multilingual Embeddings via Multi-Way Parallel Text Alignment

Add code
Feb 25, 2026
Viaarxiv icon

Evaluating Cross-Lingual Classification Approaches Enabling Topic Discovery for Multilingual Social Media Data

Add code
Feb 19, 2026
Viaarxiv icon

MAEB: Massive Audio Embedding Benchmark

Add code
Feb 17, 2026
Viaarxiv icon

MultiCW: A Large-Scale Balanced Benchmark Dataset for Training Robust Check-Worthiness Detection Models

Add code
Feb 18, 2026
Viaarxiv icon