Alert button
Picture for Tan-Minh Nguyen

Tan-Minh Nguyen

Alert button

RMDM: A Multilabel Fakenews Dataset for Vietnamese Evidence Verification

Sep 16, 2023
Hai-Long Nguyen, Thi-Kieu-Trang Pham, Thai-Son Le, Tan-Minh Nguyen, Thi-Hai-Yen Vuong, Ha-Thanh Nguyen

Figure 1 for RMDM: A Multilabel Fakenews Dataset for Vietnamese Evidence Verification
Figure 2 for RMDM: A Multilabel Fakenews Dataset for Vietnamese Evidence Verification
Figure 3 for RMDM: A Multilabel Fakenews Dataset for Vietnamese Evidence Verification
Figure 4 for RMDM: A Multilabel Fakenews Dataset for Vietnamese Evidence Verification

In this study, we present a novel and challenging multilabel Vietnamese dataset (RMDM) designed to assess the performance of large language models (LLMs), in verifying electronic information related to legal contexts, focusing on fake news as potential input for electronic evidence. The RMDM dataset comprises four labels: real, mis, dis, and mal, representing real information, misinformation, disinformation, and mal-information, respectively. By including these diverse labels, RMDM captures the complexities of differing fake news categories and offers insights into the abilities of different language models to handle various types of information that could be part of electronic evidence. The dataset consists of a total of 1,556 samples, with 389 samples for each label. Preliminary tests on the dataset using GPT-based and BERT-based models reveal variations in the models' performance across different labels, indicating that the dataset effectively challenges the ability of various language models to verify the authenticity of such information. Our findings suggest that verifying electronic information related to legal contexts, including fake news, remains a difficult problem for language models, warranting further attention from the research community to advance toward more reliable AI models for potential legal applications.

* ISAILD@KSE 2023 
Viaarxiv icon

NOWJ1@ALQAC 2023: Enhancing Legal Task Performance with Classic Statistical Models and Pre-trained Language Models

Sep 16, 2023
Tan-Minh Nguyen, Xuan-Hoa Nguyen, Ngoc-Duy Mai, Minh-Quan Hoang, Van-Huan Nguyen, Hoang-Viet Nguyen, Ha-Thanh Nguyen, Thi-Hai-Yen Vuong

Figure 1 for NOWJ1@ALQAC 2023: Enhancing Legal Task Performance with Classic Statistical Models and Pre-trained Language Models
Figure 2 for NOWJ1@ALQAC 2023: Enhancing Legal Task Performance with Classic Statistical Models and Pre-trained Language Models
Figure 3 for NOWJ1@ALQAC 2023: Enhancing Legal Task Performance with Classic Statistical Models and Pre-trained Language Models
Figure 4 for NOWJ1@ALQAC 2023: Enhancing Legal Task Performance with Classic Statistical Models and Pre-trained Language Models

This paper describes the NOWJ1 Team's approach for the Automated Legal Question Answering Competition (ALQAC) 2023, which focuses on enhancing legal task performance by integrating classical statistical models and Pre-trained Language Models (PLMs). For the document retrieval task, we implement a pre-processing step to overcome input limitations and apply learning-to-rank methods to consolidate features from various models. The question-answering task is split into two sub-tasks: sentence classification and answer extraction. We incorporate state-of-the-art models to develop distinct systems for each sub-task, utilizing both classic statistical models and pre-trained Language Models. Experimental results demonstrate the promising potential of our proposed methodology in the competition.

* ISAILD@KSE 2023 
Viaarxiv icon

Constructing a Knowledge Graph for Vietnamese Legal Cases with Heterogeneous Graphs

Sep 16, 2023
Thi-Hai-Yen Vuong, Minh-Quan Hoang, Tan-Minh Nguyen, Hoang-Trung Nguyen, Ha-Thanh Nguyen

Figure 1 for Constructing a Knowledge Graph for Vietnamese Legal Cases with Heterogeneous Graphs
Figure 2 for Constructing a Knowledge Graph for Vietnamese Legal Cases with Heterogeneous Graphs
Figure 3 for Constructing a Knowledge Graph for Vietnamese Legal Cases with Heterogeneous Graphs
Figure 4 for Constructing a Knowledge Graph for Vietnamese Legal Cases with Heterogeneous Graphs

This paper presents a knowledge graph construction method for legal case documents and related laws, aiming to organize legal information efficiently and enhance various downstream tasks. Our approach consists of three main steps: data crawling, information extraction, and knowledge graph deployment. First, the data crawler collects a large corpus of legal case documents and related laws from various sources, providing a rich database for further processing. Next, the information extraction step employs natural language processing techniques to extract entities such as courts, cases, domains, and laws, as well as their relationships from the unstructured text. Finally, the knowledge graph is deployed, connecting these entities based on their extracted relationships, creating a heterogeneous graph that effectively represents legal information and caters to users such as lawyers, judges, and scholars. The established baseline model leverages unsupervised learning methods, and by incorporating the knowledge graph, it demonstrates the ability to identify relevant laws for a given legal case. This approach opens up opportunities for various applications in the legal domain, such as legal case analysis, legal recommendation, and decision support.

* ISAILD@KSE 2023 
Viaarxiv icon

NOWJ at COLIEE 2023 -- Multi-Task and Ensemble Approaches in Legal Information Processing

Jun 08, 2023
Thi-Hai-Yen Vuong, Hai-Long Nguyen, Tan-Minh Nguyen, Hoang-Trung Nguyen, Thai-Binh Nguyen, Ha-Thanh Nguyen

Figure 1 for NOWJ at COLIEE 2023 -- Multi-Task and Ensemble Approaches in Legal Information Processing
Figure 2 for NOWJ at COLIEE 2023 -- Multi-Task and Ensemble Approaches in Legal Information Processing
Figure 3 for NOWJ at COLIEE 2023 -- Multi-Task and Ensemble Approaches in Legal Information Processing
Figure 4 for NOWJ at COLIEE 2023 -- Multi-Task and Ensemble Approaches in Legal Information Processing

This paper presents the NOWJ team's approach to the COLIEE 2023 Competition, which focuses on advancing legal information processing techniques and applying them to real-world legal scenarios. Our team tackles the four tasks in the competition, which involve legal case retrieval, legal case entailment, statute law retrieval, and legal textual entailment. We employ state-of-the-art machine learning models and innovative approaches, such as BERT, Longformer, BM25-ranking algorithm, and multi-task learning models. Although our team did not achieve state-of-the-art results, our findings provide valuable insights and pave the way for future improvements in legal information processing.

* COLIEE 2023 
Viaarxiv icon

LBMT team at VLSP2022-Abmusu: Hybrid method with text correlation and generative models for Vietnamese multi-document summarization

Apr 11, 2023
Tan-Minh Nguyen, Thai-Binh Nguyen, Hoang-Trung Nguyen, Hai-Long Nguyen, Tam Doan Thanh, Ha-Thanh Nguyen, Thi-Hai-Yen Vuong

Figure 1 for LBMT team at VLSP2022-Abmusu: Hybrid method with text correlation and generative models for Vietnamese multi-document summarization
Figure 2 for LBMT team at VLSP2022-Abmusu: Hybrid method with text correlation and generative models for Vietnamese multi-document summarization
Figure 3 for LBMT team at VLSP2022-Abmusu: Hybrid method with text correlation and generative models for Vietnamese multi-document summarization
Figure 4 for LBMT team at VLSP2022-Abmusu: Hybrid method with text correlation and generative models for Vietnamese multi-document summarization

Multi-document summarization is challenging because the summaries should not only describe the most important information from all documents but also provide a coherent interpretation of the documents. This paper proposes a method for multi-document summarization based on cluster similarity. In the extractive method we use hybrid model based on a modified version of the PageRank algorithm and a text correlation considerations mechanism. After generating summaries by selecting the most important sentences from each cluster, we apply BARTpho and ViT5 to construct the abstractive models. Both extractive and abstractive approaches were considered in this study. The proposed method achieves competitive results in VLSP 2022 competition.

* In Proceedings of the 9th International Workshop on Vietnamese Language and Speech Processing (VLSP 2022) 
Viaarxiv icon