Picture for Jan Buys

Jan Buys

MzansiText and MzansiLM: An Open Corpus and Decoder-Only Language Model for South African Languages

Add code
Mar 21, 2026
Viaarxiv icon

The Learning Dynamics of Subword Segmentation for Morphologically Diverse Languages

Add code
Nov 19, 2025
Viaarxiv icon

A Systematic Analysis of Subwords and Cross-Lingual Transfer in Multilingual Translation

Add code
Mar 29, 2024
Figure 1 for A Systematic Analysis of Subwords and Cross-Lingual Transfer in Multilingual Translation
Figure 2 for A Systematic Analysis of Subwords and Cross-Lingual Transfer in Multilingual Translation
Figure 3 for A Systematic Analysis of Subwords and Cross-Lingual Transfer in Multilingual Translation
Figure 4 for A Systematic Analysis of Subwords and Cross-Lingual Transfer in Multilingual Translation
Viaarxiv icon

Triples-to-isiXhosa (T2X): Addressing the Challenges of Low-Resource Agglutinative Data-to-Text Generation

Add code
Mar 12, 2024
Figure 1 for Triples-to-isiXhosa (T2X): Addressing the Challenges of Low-Resource Agglutinative Data-to-Text Generation
Figure 2 for Triples-to-isiXhosa (T2X): Addressing the Challenges of Low-Resource Agglutinative Data-to-Text Generation
Figure 3 for Triples-to-isiXhosa (T2X): Addressing the Challenges of Low-Resource Agglutinative Data-to-Text Generation
Figure 4 for Triples-to-isiXhosa (T2X): Addressing the Challenges of Low-Resource Agglutinative Data-to-Text Generation
Viaarxiv icon

Multipath parsing in the brain

Add code
Jan 31, 2024
Figure 1 for Multipath parsing in the brain
Figure 2 for Multipath parsing in the brain
Figure 3 for Multipath parsing in the brain
Figure 4 for Multipath parsing in the brain
Viaarxiv icon

Subword Segmental Machine Translation: Unifying Segmentation and Target Sentence Generation

Add code
May 11, 2023
Viaarxiv icon

Subword Segmental Language Modelling for Nguni Languages

Add code
Oct 12, 2022
Figure 1 for Subword Segmental Language Modelling for Nguni Languages
Figure 2 for Subword Segmental Language Modelling for Nguni Languages
Figure 3 for Subword Segmental Language Modelling for Nguni Languages
Figure 4 for Subword Segmental Language Modelling for Nguni Languages
Viaarxiv icon

Low-Resource Language Modelling of South African Languages

Add code
Apr 01, 2021
Figure 1 for Low-Resource Language Modelling of South African Languages
Figure 2 for Low-Resource Language Modelling of South African Languages
Figure 3 for Low-Resource Language Modelling of South African Languages
Figure 4 for Low-Resource Language Modelling of South African Languages
Viaarxiv icon

Canonical and Surface Morphological Segmentation for Nguni Languages

Add code
Apr 01, 2021
Figure 1 for Canonical and Surface Morphological Segmentation for Nguni Languages
Figure 2 for Canonical and Surface Morphological Segmentation for Nguni Languages
Figure 3 for Canonical and Surface Morphological Segmentation for Nguni Languages
Figure 4 for Canonical and Surface Morphological Segmentation for Nguni Languages
Viaarxiv icon

BottleSum: Unsupervised and Self-supervised Sentence Summarization using the Information Bottleneck Principle

Add code
Sep 20, 2019
Figure 1 for BottleSum: Unsupervised and Self-supervised Sentence Summarization using the Information Bottleneck Principle
Figure 2 for BottleSum: Unsupervised and Self-supervised Sentence Summarization using the Information Bottleneck Principle
Figure 3 for BottleSum: Unsupervised and Self-supervised Sentence Summarization using the Information Bottleneck Principle
Figure 4 for BottleSum: Unsupervised and Self-supervised Sentence Summarization using the Information Bottleneck Principle
Viaarxiv icon