Vietnamese Datasets


ViCocktail: Automated Multi-Modal Data Collection for Vietnamese Audio-Visual Speech Recognition

Add code
Jun 05, 2025
Viaarxiv icon

A Survey on Vietnamese Document Analysis and Recognition: Challenges and Future Directions

Add code
Jun 05, 2025
Viaarxiv icon

Dynamic Context-Aware Streaming Pretrained Language Model For Inverse Text Normalization

Add code
May 30, 2025
Viaarxiv icon

OWL: Probing Cross-Lingual Recall of Memorized Texts via World Literature

Add code
May 28, 2025
Viaarxiv icon

WriteViT: Handwritten Text Generation with Vision Transformer

Add code
May 19, 2025
Viaarxiv icon

ViMRHP: A Vietnamese Benchmark Dataset for Multimodal Review Helpfulness Prediction via Human-AI Collaborative Annotation

Add code
May 12, 2025
Viaarxiv icon

Towards Cultural Bridge by Bahnaric-Vietnamese Translation Using Transfer Learning of Sequence-To-Sequence Pre-training Language Model

Add code
May 16, 2025
Viaarxiv icon

Validation of a 24-hour-ahead Prediction model for a Residential Electrical Load under diverse climate

Add code
May 01, 2025
Viaarxiv icon

GreenMind: A Next-Generation Vietnamese Large Language Model for Structured and Logical Reasoning

Add code
Apr 23, 2025
Viaarxiv icon

ViQA-COVID: COVID-19 Machine Reading Comprehension Dataset for Vietnamese

Add code
Apr 21, 2025
Viaarxiv icon