Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jonghwi Kim

MiLQ: Benchmarking IR Models for Bilingual Web Search with Mixed Language Queries

May 22, 2025

Jonghwi Kim, Deokhyung Kang, Seonjeong Hwang, Yunsu Kim, Jungseul Ok, Gary Lee

Figure 1 for MiLQ: Benchmarking IR Models for Bilingual Web Search with Mixed Language Queries

Figure 2 for MiLQ: Benchmarking IR Models for Bilingual Web Search with Mixed Language Queries

Figure 3 for MiLQ: Benchmarking IR Models for Bilingual Web Search with Mixed Language Queries

Figure 4 for MiLQ: Benchmarking IR Models for Bilingual Web Search with Mixed Language Queries

Abstract:Despite bilingual speakers frequently using mixed-language queries in web searches, Information Retrieval (IR) research on them remains scarce. To address this, we introduce MiLQ,Mixed-Language Query test set, the first public benchmark of mixed-language queries, confirmed as realistic and highly preferred. Experiments show that multilingual IR models perform moderately on MiLQ and inconsistently across native, English, and mixed-language queries, also suggesting code-switched training data's potential for robust IR models handling such queries. Meanwhile, intentional English mixing in queries proves an effective strategy for bilinguals searching English documents, which our analysis attributes to enhanced token matching compared to native queries.

* 16 pages, 9 figures

Via

Access Paper or Ask Questions

GuRE:Generative Query REwriter for Legal Passage Retrieval

May 19, 2025

Daehee Kim, Deokhyung Kang, Jonghwi Kim, Sangwon Ryu, Gary Geunbae Lee

Abstract:Legal Passage Retrieval (LPR) systems are crucial as they help practitioners save time when drafting legal arguments. However, it remains an underexplored avenue. One primary reason is the significant vocabulary mismatch between the query and the target passage. To address this, we propose a simple yet effective method, the Generative query REwriter (GuRE). We leverage the generative capabilities of Large Language Models (LLMs) by training the LLM for query rewriting. "Rewritten queries" help retrievers to retrieve target passages by mitigating vocabulary mismatch. Experimental results show that GuRE significantly improves performance in a retriever-agnostic manner, outperforming all baseline methods. Further analysis reveals that different training objectives lead to distinct retrieval behaviors, making GuRE more suitable than direct retriever fine-tuning for real-world applications. Codes are avaiable at github.com/daehuikim/GuRE.

* 14 pages, 9 figures

Via

Access Paper or Ask Questions

Multi-Facet Blending for Faceted Query-by-Example Retrieval

Dec 02, 2024

Heejin Do, Sangwon Ryu, Jonghwi Kim, Gary Geunbae Lee

Abstract:With the growing demand to fit fine-grained user intents, faceted query-by-example (QBE), which retrieves similar documents conditioned on specific facets, has gained recent attention. However, prior approaches mainly depend on document-level comparisons using basic indicators like citations due to the lack of facet-level relevance datasets; yet, this limits their use to citation-based domains and fails to capture the intricacies of facet constraints. In this paper, we propose a multi-facet blending (FaBle) augmentation method, which exploits modularity by decomposing and recomposing to explicitly synthesize facet-specific training sets. We automatically decompose documents into facet units and generate (ir)relevant pairs by leveraging LLMs' intrinsic distinguishing capabilities; then, dynamically recomposing the units leads to facet-wise relevance-informed document pairs. Our modularization eliminates the need for pre-defined facet knowledge or labels. Further, to prove the FaBle's efficacy in a new domain beyond citation-based scientific paper retrieval, we release a benchmark dataset for educational exam item QBE. FaBle augmentation on 1K documents remarkably assists training in obtaining facet conditional embeddings.

Via

Access Paper or Ask Questions

Nonlinear Model Predictive Control with Obstacle Avoidance Constraints for Autonomous Navigation in a Canal Environment

Jul 19, 2023

Changyu Lee, Dongha Chung, Jonghwi Kim, Jinwhan Kim

Figure 1 for Nonlinear Model Predictive Control with Obstacle Avoidance Constraints for Autonomous Navigation in a Canal Environment

Figure 2 for Nonlinear Model Predictive Control with Obstacle Avoidance Constraints for Autonomous Navigation in a Canal Environment

Figure 3 for Nonlinear Model Predictive Control with Obstacle Avoidance Constraints for Autonomous Navigation in a Canal Environment

Figure 4 for Nonlinear Model Predictive Control with Obstacle Avoidance Constraints for Autonomous Navigation in a Canal Environment

Abstract:In this paper, we describe the development process of autonomous navigation capabilities of a small cruise boat operating in a canal environment and present the results of a field experiment conducted in the Pohang Canal, South Korea. Nonlinear model predictive control (NMPC) was used for the online trajectory planning and tracking control of the cruise boat in a narrow passage in the canal. To consider the nonlinear characteristics of boat dynamics, system identification was performed using experimental data from various test maneuvers, such as acceleration-deceleration and zigzag trials. To efficiently represent the obstacle structures in the canal environment, we parameterized the canal walls as line segments with point cloud data, captured by an onboard LiDAR sensor, and considered them as constraints for obstacle avoidance. The proposed method was implemented in a single NMPC layer, and its real-world performance was verified through experimental runs in the Pohang Canal.

Via

Access Paper or Ask Questions

Pohang Canal Dataset: A Multimodal Maritime Dataset for Autonomous Navigation in Restricted Waters

Mar 09, 2023

Dongha Chung, Jonghwi Kim, Changyu Lee, Jinwhan Kim

Abstract:This paper presents a multimodal maritime dataset and the data collection procedure used to gather it, which aims to facilitate autonomous navigation in restricted water environments. The dataset comprises measurements obtained using various perception and navigation sensors, including a stereo camera, an infrared camera, an omnidirectional camera, three LiDARs, a marine radar, a global positioning system, and an attitude heading reference system. The data were collected along a 7.5-km-long route that includes a narrow canal, inner and outer ports, and near-coastal areas in Pohang, South Korea. The collection was conducted under diverse weather and visual conditions. The dataset and its detailed description are available for free download at https://sites.google.com/view/pohang-canal-dataset.

* Submitted to IJRR as a data paper for review

Via

Access Paper or Ask Questions