Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Satoshi Nakamura

Development of a Large-scale Dataset of Chest Computed Tomography Reports in Japanese and a High-performance Finding Classification Model

Dec 20, 2024

Yosuke Yamagishi, Yuta Nakamura, Tomohiro Kikuchi, Yuki Sonoda, Hiroshi Hirakawa, Shintaro Kano, Satoshi Nakamura, Shouhei Hanaoka, Takeharu Yoshikawa, Osamu Abe

Figure 1 for Development of a Large-scale Dataset of Chest Computed Tomography Reports in Japanese and a High-performance Finding Classification Model

Figure 2 for Development of a Large-scale Dataset of Chest Computed Tomography Reports in Japanese and a High-performance Finding Classification Model

Figure 3 for Development of a Large-scale Dataset of Chest Computed Tomography Reports in Japanese and a High-performance Finding Classification Model

Figure 4 for Development of a Large-scale Dataset of Chest Computed Tomography Reports in Japanese and a High-performance Finding Classification Model

Abstract:Background: Recent advances in large language models highlight the need for high-quality multilingual medical datasets. While Japan leads globally in CT scanner deployment and utilization, the lack of large-scale Japanese radiology datasets has hindered the development of specialized language models for medical imaging analysis. Objective: To develop a comprehensive Japanese CT report dataset through machine translation and establish a specialized language model for structured finding classification. Additionally, to create a rigorously validated evaluation dataset through expert radiologist review. Methods: We translated the CT-RATE dataset (24,283 CT reports from 21,304 patients) into Japanese using GPT-4o mini. The training dataset consisted of 22,778 machine-translated reports, while the validation dataset included 150 radiologist-revised reports. We developed CT-BERT-JPN based on "tohoku-nlp/bert-base-japanese-v3" architecture for extracting 18 structured findings from Japanese radiology reports. Results: Translation metrics showed strong performance with BLEU scores of 0.731 and 0.690, and ROUGE scores ranging from 0.770 to 0.876 for Findings and from 0.748 to 0.857 for Impression sections. CT-BERT-JPN demonstrated superior performance compared to GPT-4o in 11 out of 18 conditions, including lymphadenopathy (+14.2%), interlobular septal thickening (+10.9%), and atelectasis (+7.4%). The model maintained F1 scores exceeding 0.95 in 14 out of 18 conditions and achieved perfect scores in four conditions. Conclusions: Our study establishes a robust Japanese CT report dataset and demonstrates the effectiveness of a specialized language model for structured finding classification. The hybrid approach of machine translation and expert validation enables the creation of large-scale medical datasets while maintaining high quality.

* Dataset available at https://huggingface.co/datasets/YYama0/CT-RATE-JPN

Via

Access Paper or Ask Questions

Findings of the IWSLT 2024 Evaluation Campaign

Nov 07, 2024

Ibrahim Said Ahmad, Antonios Anastasopoulos, Ondřej Bojar, Claudia Borg, Marine Carpuat, Roldano Cattoni, Mauro Cettolo, William Chen, Qianqian Dong, Marcello Federico(+35 more)

Abstract:This paper reports on the shared tasks organized by the 21st IWSLT Conference. The shared tasks address 7 scientific challenges in spoken language translation: simultaneous and offline translation, automatic subtitling and dubbing, speech-to-speech translation, dialect and low-resource speech translation, and Indic languages. The shared tasks attracted 18 teams whose submissions are documented in 26 system papers. The growing interest towards spoken language translation is also witnessed by the constantly increasing number of shared task organizers and contributors to the overview paper, almost evenly distributed across industry and academia.

* IWSLT 2024; 59 pages

Via

Access Paper or Ask Questions

A Transformer Framework for Simultaneous Segmentation, Classification, and Caller Identification of Marmoset Vocalization

Nov 06, 2024

Bin Wu, Shinnosuke Takamichi, Sakriani Sakti, Satoshi Nakamura

Figure 1 for A Transformer Framework for Simultaneous Segmentation, Classification, and Caller Identification of Marmoset Vocalization

Figure 2 for A Transformer Framework for Simultaneous Segmentation, Classification, and Caller Identification of Marmoset Vocalization

Figure 3 for A Transformer Framework for Simultaneous Segmentation, Classification, and Caller Identification of Marmoset Vocalization

Figure 4 for A Transformer Framework for Simultaneous Segmentation, Classification, and Caller Identification of Marmoset Vocalization

Abstract:Marmoset, a highly vocalized primate, has become a popular animal model for studying social-communicative behavior and its underlying mechanism comparing with human infant linguistic developments. In the study of vocal communication, it is vital to know the caller identities, call contents, and vocal exchanges. Previous work of a CNN has achieved a joint model for call segmentation, classification, and caller identification for marmoset vocalizations. However, the CNN has limitations in modeling long-range acoustic patterns; the Transformer architecture that has been shown to outperform CNNs, utilizes the self-attention mechanism that efficiently segregates information parallelly over long distances and captures the global structure of marmoset vocalization. We propose using the Transformer to jointly segment and classify the marmoset calls and identify the callers for each vocalization.

Via

Access Paper or Ask Questions

A Neural Transformer Framework for Simultaneous Tasks of Segmentation, Classification, and Caller Identification of Marmoset Vocalization

Oct 30, 2024

Bin Wu, Sakriani Sakti, Shinnosuke Takamichi, Satoshi Nakamura

Figure 1 for A Neural Transformer Framework for Simultaneous Tasks of Segmentation, Classification, and Caller Identification of Marmoset Vocalization

Figure 2 for A Neural Transformer Framework for Simultaneous Tasks of Segmentation, Classification, and Caller Identification of Marmoset Vocalization

Figure 3 for A Neural Transformer Framework for Simultaneous Tasks of Segmentation, Classification, and Caller Identification of Marmoset Vocalization

Figure 4 for A Neural Transformer Framework for Simultaneous Tasks of Segmentation, Classification, and Caller Identification of Marmoset Vocalization

Abstract:Marmoset, a highly vocalized primate, has become a popular animal model for studying social-communicative behavior and its underlying mechanism. In the study of vocal communication, it is vital to know the caller identities, call contents, and vocal exchanges. Previous work of a CNN has achieved a joint model for call segmentation, classification, and caller identification for marmoset vocalizations. However, the CNN has limitations in modeling long-range acoustic patterns; the Transformer architecture that has been shown to outperform CNNs, utilizes the self-attention mechanism that efficiently segregates information parallelly over long distances and captures the global structure of marmoset vocalization. We propose using the Transformer to jointly segment and classify the marmoset calls and identify the callers for each vocalization.

Via

Access Paper or Ask Questions

LLaST: Improved End-to-end Speech Translation System Leveraged by Large Language Models

Jul 22, 2024

Xi Chen, Songyang Zhang, Qibing Bai, Kai Chen, Satoshi Nakamura

Abstract:We introduces LLaST, a framework for building high-performance Large Language model based Speech-to-text Translation systems. We address the limitations of end-to-end speech translation(E2E ST) models by exploring model architecture design and optimization techniques tailored for LLMs. Our approach includes LLM-based speech translation architecture design, ASR-augmented training, multilingual data augmentation, and dual-LoRA optimization. Our approach demonstrates superior performance on the CoVoST-2 benchmark and showcases exceptional scaling capabilities powered by LLMs. We believe this effective method will serve as a strong baseline for speech translation and provide insights for future improvements of the LLM-based speech translation framework. We release the data, code and models in https://github.com/openaudiolab/LLaST.

Via

Access Paper or Ask Questions

A Word Order Synchronization Metric for Evaluating Simultaneous Interpretation and Translation

Jul 09, 2024

Mana Makinae, Katsuhito Sudoh, Mararu Yamada, Satoshi Nakamura

Figure 1 for A Word Order Synchronization Metric for Evaluating Simultaneous Interpretation and Translation

Figure 2 for A Word Order Synchronization Metric for Evaluating Simultaneous Interpretation and Translation

Figure 3 for A Word Order Synchronization Metric for Evaluating Simultaneous Interpretation and Translation

Figure 4 for A Word Order Synchronization Metric for Evaluating Simultaneous Interpretation and Translation

Abstract:Simultaneous interpretation (SI), the translation of one language to another in real time, starts translation before the original speech has finished. Its evaluation needs to consider both latency and quality. This trade-off is challenging especially for distant word order language pairs such as English and Japanese. To handle this word order gap, interpreters maintain the word order of the source language as much as possible to keep up with original language to minimize its latency while maintaining its quality, whereas in translation reordering happens to keep fluency in the target language. This means outputs synchronized with the source language are desirable based on the real SI situation, and it's a key for further progress in computational SI and simultaneous machine translation (SiMT). In this work, we propose an automatic evaluation metric for SI and SiMT focusing on word order synchronization. Our evaluation metric is based on rank correlation coefficients, leveraging cross-lingual pre-trained language models. Our experimental results on NAIST-SIC-Aligned and JNPC showed our metrics' effectiveness to measure word order synchronization between source and target language.

Via

Access Paper or Ask Questions

NAIST Simultaneous Speech Translation System for IWSLT 2024

Jun 30, 2024

Yuka Ko, Ryo Fukuda, Yuta Nishikawa, Yasumasa Kano, Tomoya Yanagita, Kosuke Doi, Mana Makinae, Haotian Tan, Makoto Sakai, Sakriani Sakti(+2 more)

Figure 1 for NAIST Simultaneous Speech Translation System for IWSLT 2024

Figure 2 for NAIST Simultaneous Speech Translation System for IWSLT 2024

Figure 3 for NAIST Simultaneous Speech Translation System for IWSLT 2024

Figure 4 for NAIST Simultaneous Speech Translation System for IWSLT 2024

Abstract:This paper describes NAIST's submission to the simultaneous track of the IWSLT 2024 Evaluation Campaign: English-to-{German, Japanese, Chinese} speech-to-text translation and English-to-Japanese speech-to-speech translation. We develop a multilingual end-to-end speech-to-text translation model combining two pre-trained language models, HuBERT and mBART. We trained this model with two decoding policies, Local Agreement (LA) and AlignAtt. The submitted models employ the LA policy because it outperformed the AlignAtt policy in previous models. Our speech-to-speech translation method is a cascade of the above speech-to-text model and an incremental text-to-speech (TTS) module that incorporates a phoneme estimation model, a parallel acoustic model, and a parallel WaveGAN vocoder. We improved our incremental TTS by applying the Transformer architecture with the AlignAtt policy for the estimation model. The results show that our upgraded TTS module contributed to improving the system performance.

* IWSLT 2024 system paper

Via

Access Paper or Ask Questions

LLMs Are Zero-Shot Context-Aware Simultaneous Translators

Jun 19, 2024

Roman Koshkin, Katsuhito Sudoh, Satoshi Nakamura

Figure 1 for LLMs Are Zero-Shot Context-Aware Simultaneous Translators

Figure 2 for LLMs Are Zero-Shot Context-Aware Simultaneous Translators

Figure 3 for LLMs Are Zero-Shot Context-Aware Simultaneous Translators

Figure 4 for LLMs Are Zero-Shot Context-Aware Simultaneous Translators

Abstract:The advent of transformers has fueled progress in machine translation. More recently large language models (LLMs) have come to the spotlight thanks to their generality and strong performance in a wide range of language tasks, including translation. Here we show that open-source LLMs perform on par with or better than some state-of-the-art baselines in simultaneous machine translation (SiMT) tasks, zero-shot. We also demonstrate that injection of minimal background information, which is easy with an LLM, brings further performance gains, especially on challenging technical subject-matter. This highlights LLMs' potential for building next generation of massively multilingual, context-aware and terminologically accurate SiMT systems that require no resource-intensive training or fine-tuning.

Via

Access Paper or Ask Questions

Automated Essay Scoring Using Grammatical Variety and Errors with Multi-Task Learning and Item Response Theory

Jun 13, 2024

Kosuke Doi, Katsuhito Sudoh, Satoshi Nakamura

Abstract:This study examines the effect of grammatical features in automatic essay scoring (AES). We use two kinds of grammatical features as input to an AES model: (1) grammatical items that writers used correctly in essays, and (2) the number of grammatical errors. Experimental results show that grammatical features improve the performance of AES models that predict the holistic scores of essays. Multi-task learning with the holistic and grammar scores, alongside using grammatical features, resulted in a larger improvement in model performance. We also show that a model using grammar abilities estimated using Item Response Theory (IRT) as the labels for the auxiliary task achieved comparable performance to when we used grammar scores assigned by human raters. In addition, we weight the grammatical features using IRT to consider the difficulty of grammatical items and writers' grammar abilities. We found that weighting grammatical features with the difficulty led to further improvement in performance.

* Accepted to BEA2024

Via

Access Paper or Ask Questions

Word Order in English-Japanese Simultaneous Interpretation: Analyses and Evaluation using Chunk-wise Monotonic Translation

Jun 13, 2024

Kosuke Doi, Yuka Ko, Mana Makinae, Katsuhito Sudoh, Satoshi Nakamura

Abstract:This paper analyzes the features of monotonic translations, which follow the word order of the source language, in simultaneous interpreting (SI). The word order differences are one of the biggest challenges in SI, especially for language pairs with significant structural differences like English and Japanese. We analyzed the characteristics of monotonic translations using the NAIST English-to-Japanese Chunk-wise Monotonic Translation Evaluation Dataset and found some grammatical structures that make monotonic translation difficult in English-Japanese SI. We further investigated the features of monotonic translations through evaluating the output from the existing speech translation (ST) and simultaneous speech translation (simulST) models on NAIST English-to-Japanese Chunk-wise Monotonic Translation Evaluation Dataset as well as on existing test sets. The results suggest that the existing SI-based test set underestimates the model performance. We also found that the monotonic-translation-based dataset would better evaluate simulST models, while using an offline-based test set for evaluating simulST models underestimates the model performance.

* Accepted to IWSLT2024

Via

Access Paper or Ask Questions