Picture for Naoaki Okazaki

Naoaki Okazaki

LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs

Add code
Jul 04, 2024
Viaarxiv icon

Social Bias Evaluation for Large Language Models Requires Prompt Variations

Add code
Jul 03, 2024
Viaarxiv icon

Continual Pre-Training for Cross-Lingual LLM Adaptation: Enhancing Japanese Language Capabilities

Add code
Apr 27, 2024
Figure 1 for Continual Pre-Training for Cross-Lingual LLM Adaptation: Enhancing Japanese Language Capabilities
Figure 2 for Continual Pre-Training for Cross-Lingual LLM Adaptation: Enhancing Japanese Language Capabilities
Figure 3 for Continual Pre-Training for Cross-Lingual LLM Adaptation: Enhancing Japanese Language Capabilities
Figure 4 for Continual Pre-Training for Cross-Lingual LLM Adaptation: Enhancing Japanese Language Capabilities
Viaarxiv icon

Building a Large Japanese Web Corpus for Large Language Models

Add code
Apr 27, 2024
Viaarxiv icon

Building a Japanese Document-Level Relation Extraction Dataset Assisted by Cross-Lingual Transfer

Add code
Apr 25, 2024
Figure 1 for Building a Japanese Document-Level Relation Extraction Dataset Assisted by Cross-Lingual Transfer
Figure 2 for Building a Japanese Document-Level Relation Extraction Dataset Assisted by Cross-Lingual Transfer
Figure 3 for Building a Japanese Document-Level Relation Extraction Dataset Assisted by Cross-Lingual Transfer
Figure 4 for Building a Japanese Document-Level Relation Extraction Dataset Assisted by Cross-Lingual Transfer
Viaarxiv icon

Sampling-based Pseudo-Likelihood for Membership Inference Attacks

Add code
Apr 17, 2024
Viaarxiv icon

An Analysis of BPE Vocabulary Trimming in Neural Machine Translation

Add code
Mar 30, 2024
Viaarxiv icon

Likelihood-based Mitigation of Evaluation Bias in Large Language Models

Add code
Mar 01, 2024
Viaarxiv icon

Two Counterexamples to Tokenization and the Noiseless Channel

Add code
Feb 29, 2024
Figure 1 for Two Counterexamples to Tokenization and the Noiseless Channel
Figure 2 for Two Counterexamples to Tokenization and the Noiseless Channel
Figure 3 for Two Counterexamples to Tokenization and the Noiseless Channel
Figure 4 for Two Counterexamples to Tokenization and the Noiseless Channel
Viaarxiv icon

Vision Language Model-based Caption Evaluation Method Leveraging Visual Context Extraction

Add code
Feb 28, 2024
Viaarxiv icon