Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yen-Lung Tsai

Tree-Based Text Retrieval via Hierarchical Clustering in RAGFrameworks: Application on Taiwanese Regulations

Jun 16, 2025

Chia-Heng Yu, Yen-Lung Tsai

Abstract:Traditional Retrieval-Augmented Generation (RAG) systems employ brute-force inner product search to retrieve the top-k most similar documents, then combined with the user query and passed to a language model. This allows the model to access external knowledge and reduce hallucinations. However, selecting an appropriate k value remains a significant challenge in practical applications: a small k may fail to retrieve sufficient information, while a large k can introduce excessive and irrelevant content. To address this, we propose a hierarchical clustering-based retrieval method that eliminates the need to predefine k. Our approach maintains the accuracy and relevance of system responses while adaptively selecting semantically relevant content. In the experiment stage, we applied our method to a Taiwanese legal dataset with expert-graded queries. The results show that our approach achieves superior performance in expert evaluations and maintains high precision while eliminating the need to predefine k, demonstrating improved accuracy and interpretability in legal text retrieval tasks. Our framework is simple to implement and easily integrates with existing RAG pipelines, making it a practical solution for real-world applications under limited resources.

* 19 pages, 5 figures, Code available at https://github.com/arthur422tp/hierachical

Via

Access Paper or Ask Questions

Performance Prediction in Major League Baseball by Long Short-Term Memory Networks

Jun 20, 2022

Hsuan-Cheng Sun, Tse-Yu Lin, Yen-Lung Tsai

Abstract:Player performance prediction is a serious problem in every sport since it brings valuable future information for managers to make important decisions. In baseball industries, there already existed variable prediction systems and many types of researches that attempt to provide accurate predictions and help domain users. However, it is a lack of studies about the predicting method or systems based on deep learning. Deep learning models had proven to be the greatest solutions in different fields nowadays, so we believe they could be tried and applied to the prediction problem in baseball. Hence, the predicting abilities of deep learning models are set to be our research problem in this paper. As a beginning, we select numbers of home runs as the target because it is one of the most critical indexes to understand the power and the talent of baseball hitters. Moreover, we use the sequential model Long Short-Term Memory as our main method to solve the home run prediction problem in Major League Baseball. We compare models' ability with several machine learning models and a widely used baseball projection system, sZymborski Projection System. Our results show that Long Short-Term Memory has better performance than others and has the ability to make more exact predictions. We conclude that Long Short-Term Memory is a feasible way for performance prediction problems in baseball and could bring valuable information to fit users' needs.

* 25 pages, 1 figures, 18 tables

Via

Access Paper or Ask Questions

An Application of HodgeRank to Online Peer Assessment

Mar 07, 2018

Tse-Yu Lin, Yen-Lung Tsai

Figure 1 for An Application of HodgeRank to Online Peer Assessment

Abstract:Bias and heterogeneity in peer assessment can lead to the issue of unfair scoring in the educational field. To deal with this problem, we propose a reference ranking method for an online peer assessment system using HodgeRank. Such a scheme provides instructors with an objective scoring reference based on mathematics.

* 7 pages. To appear in MLRec 2018

Via

Access Paper or Ask Questions

Variational Grid Setting Network

Oct 26, 2017

Yu-Neng Chuang, Zi-Yu Huang, Yen-Lung Tsai

Figure 1 for Variational Grid Setting Network

Figure 2 for Variational Grid Setting Network

Figure 3 for Variational Grid Setting Network

Figure 4 for Variational Grid Setting Network

Abstract:We propose a new neural network architecture for automatic generation of missing characters in a Chinese font set. We call the neural network architecture the Variational Grid Setting Network which is based on the variational autoencoder (VAE) with some tweaks. The neural network model is able to generate missing characters relatively large in size ($256 \times 256$ pixels). Moreover, we show that one can use very few samples for training data set, and get a satisfied result.

* 8 pages, 6 figures

Via

Access Paper or Ask Questions