Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Amit Barman

From Scratch to Fine-Tuned: A Comparative Study of Transformer Training Strategies for Legal Machine Translation

Dec 21, 2025

Amit Barman, Atanu Mandal, Sudip Kumar Naskar

Abstract:In multilingual nations like India, access to legal information is often hindered by language barriers, as much of the legal and judicial documentation remains in English. Legal Machine Translation (L-MT) offers a scalable solution to this challenge by enabling accurate and accessible translations of legal documents. This paper presents our work for the JUST-NLP 2025 Legal MT shared task, focusing on English-Hindi translation using Transformer-based approaches. We experiment with 2 complementary strategies, fine-tuning a pre-trained OPUS-MT model for domain-specific adaptation and training a Transformer model from scratch using the provided legal corpus. Performance is evaluated using standard MT metrics, including SacreBLEU, chrF++, TER, ROUGE, BERTScore, METEOR, and COMET. Our fine-tuned OPUS-MT model achieves a SacreBLEU score of 46.03, significantly outperforming both baseline and from-scratch models. The results highlight the effectiveness of domain adaptation in enhancing translation quality and demonstrate the potential of L-MT systems to improve access to justice and legal transparency in multilingual contexts.

Via

Access Paper or Ask Questions

Convolutional Neural Networks can achieve binary bail judgement classification

Jan 25, 2024

Amit Barman, Devangan Roy, Debapriya Paul, Indranil Dutta, Shouvik Kumar Guha, Samir Karmakar, Sudip Kumar Naskar

Figure 1 for Convolutional Neural Networks can achieve binary bail judgement classification

Figure 2 for Convolutional Neural Networks can achieve binary bail judgement classification

Figure 3 for Convolutional Neural Networks can achieve binary bail judgement classification

Figure 4 for Convolutional Neural Networks can achieve binary bail judgement classification

Abstract:There is an evident lack of implementation of Machine Learning (ML) in the legal domain in India, and any research that does take place in this domain is usually based on data from the higher courts of law and works with English data. The lower courts and data from the different regional languages of India are often overlooked. In this paper, we deploy a Convolutional Neural Network (CNN) architecture on a corpus of Hindi legal documents. We perform a bail Prediction task with the help of a CNN model and achieve an overall accuracy of 93\% which is an improvement on the benchmark accuracy, set by Kapoor et al. (2022), albeit in data from 20 districts of the Indian state of Uttar Pradesh.

* Accepted on 20th International Conference on Natural Language Processing (ICON)

Via

Access Paper or Ask Questions

Attentive Fusion: A Transformer-based Approach to Multimodal Hate Speech Detection

Jan 19, 2024

Atanu Mandal, Gargi Roy, Amit Barman, Indranil Dutta, Sudip Kumar Naskar

Figure 1 for Attentive Fusion: A Transformer-based Approach to Multimodal Hate Speech Detection

Figure 2 for Attentive Fusion: A Transformer-based Approach to Multimodal Hate Speech Detection

Figure 3 for Attentive Fusion: A Transformer-based Approach to Multimodal Hate Speech Detection

Figure 4 for Attentive Fusion: A Transformer-based Approach to Multimodal Hate Speech Detection

Abstract:With the recent surge and exponential growth of social media usage, scrutinizing social media content for the presence of any hateful content is of utmost importance. Researchers have been diligently working since the past decade on distinguishing between content that promotes hatred and content that does not. Traditionally, the main focus has been on analyzing textual content. However, recent research attempts have also commenced into the identification of audio-based content. Nevertheless, studies have shown that relying solely on audio or text-based content may be ineffective, as recent upsurge indicates that individuals often employ sarcasm in their speech and writing. To overcome these challenges, we present an approach to identify whether a speech promotes hate or not utilizing both audio and textual representations. Our methodology is based on the Transformer framework that incorporates both audio and text sampling, accompanied by our very own layer called "Attentive Fusion". The results of our study surpassed previous state-of-the-art techniques, achieving an impressive macro F1 score of 0.927 on the Test Set.

* Accepted in 20th International Conference on Natural Language Processing (ICON)

Via

Access Paper or Ask Questions