Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yifan Liu

Peter

Elaborative Subtopic Query Reformulation for Broad and Indirect Queries in Travel Destination Recommendation

Oct 02, 2024

Qianfeng Wen, Yifan Liu, Joshua Zhang, George Saad, Anton Korikov, Yury Sambale, Scott Sanner

Abstract:In Query-driven Travel Recommender Systems (RSs), it is crucial to understand the user intent behind challenging natural language(NL) destination queries such as the broadly worded "youth-friendly activities" or the indirect description "a high school graduation trip". Such queries are challenging due to the wide scope and subtlety of potential user intents that confound the ability of retrieval methods to infer relevant destinations from available textual descriptions such as WikiVoyage. While query reformulation (QR) has proven effective in enhancing retrieval by addressing user intent, existing QR methods tend to focus only on expanding the range of potentially matching query subtopics (breadth) or elaborating on the potential meaning of a query (depth), but not both. In this paper, we introduce Elaborative Subtopic Query Reformulation (EQR), a large language model-based QR method that combines both breadth and depth by generating potential query subtopics with information-rich elaborations. We also release TravelDest, a novel dataset for query-driven travel destination RSs. Experiments on TravelDest show that EQR achieves significant improvements in recall and precision over existing state-of-the-art QR methods.

* 9 pages, 7 figures,The 1st Workshop on Risks, Opportunities, and Evaluation of Generative Models in Recommender Systems (ROEGEN@RecSys 2024), October 2024, Bari, Italy

Via

Access Paper or Ask Questions

Human Mobility Modeling with Limited Information via Large Language Models

Sep 26, 2024

Yifan Liu, Xishun Liao, Haoxuan Ma, Brian Yueshuai He, Chris Stanford, Jiaqi Ma

Figure 1 for Human Mobility Modeling with Limited Information via Large Language Models

Figure 2 for Human Mobility Modeling with Limited Information via Large Language Models

Figure 3 for Human Mobility Modeling with Limited Information via Large Language Models

Figure 4 for Human Mobility Modeling with Limited Information via Large Language Models

Abstract:Understanding human mobility patterns has traditionally been a complex challenge in transportation modeling. Due to the difficulties in obtaining high-quality training datasets across diverse locations, conventional activity-based models and learning-based human mobility modeling algorithms are particularly limited by the availability and quality of datasets. Furthermore, current research mainly focuses on the spatial-temporal travel pattern but lacks an understanding of the semantic information between activities, which is crucial for modeling the interdependence between activities. In this paper, we propose an innovative Large Language Model (LLM) empowered human mobility modeling framework. Our proposed approach significantly reduces the reliance on detailed human mobility statistical data, utilizing basic socio-demographic information of individuals to generate their daily mobility patterns. We have validated our results using the NHTS and SCAG-ABM datasets, demonstrating the effective modeling of mobility patterns and the strong adaptability of our framework across various geographic locations.

Via

Access Paper or Ask Questions

Enhancing Socially-Aware Robot Navigation through Bidirectional Natural Language Conversation

Sep 08, 2024

Congcong Wen, Yifan Liu, Geeta Chandra Raju Bethala, Zheng Peng, Hui Lin, Yu-Shen Liu, Yi Fang

Figure 1 for Enhancing Socially-Aware Robot Navigation through Bidirectional Natural Language Conversation

Figure 2 for Enhancing Socially-Aware Robot Navigation through Bidirectional Natural Language Conversation

Figure 3 for Enhancing Socially-Aware Robot Navigation through Bidirectional Natural Language Conversation

Figure 4 for Enhancing Socially-Aware Robot Navigation through Bidirectional Natural Language Conversation

Abstract:Robot navigation is an important research field with applications in various domains. However, traditional approaches often prioritize efficiency and obstacle avoidance, neglecting a nuanced understanding of human behavior or intent in shared spaces. With the rise of service robots, there's an increasing emphasis on endowing robots with the capability to navigate and interact in complex real-world environments. Socially aware navigation has recently become a key research area. However, existing work either predicts pedestrian movements or simply emits alert signals to pedestrians, falling short of facilitating genuine interactions between humans and robots. In this paper, we introduce the Hybrid Soft Actor-Critic with Large Language Model (HSAC-LLM), an innovative model designed for socially-aware navigation in robots. This model seamlessly integrates deep reinforcement learning with large language models, enabling it to predict both continuous and discrete actions for navigation. Notably, HSAC-LLM facilitates bidirectional interaction based on natural language with pedestrian models. When a potential collision with pedestrians is detected, the robot can initiate or respond to communications with pedestrians, obtaining and executing subsequent avoidance strategies. Experimental results in 2D simulation, the Gazebo environment, and the real-world environment demonstrate that HSAC-LLM not only efficiently enables interaction with humans but also exhibits superior performance in navigation and obstacle avoidance compared to state-of-the-art DRL algorithms. We believe this innovative paradigm opens up new avenues for effective and socially aware human-robot interactions in dynamic environments. Videos are available at https://hsacllm.github.io/.

Via

Access Paper or Ask Questions

When 3D Partial Points Meets SAM: Tooth Point Cloud Segmentation with Sparse Labels

Sep 03, 2024

Yifan Liu, Wuyang Li, Cheng Wang, Hui Chen, Yixuan Yuan

Figure 1 for When 3D Partial Points Meets SAM: Tooth Point Cloud Segmentation with Sparse Labels

Figure 2 for When 3D Partial Points Meets SAM: Tooth Point Cloud Segmentation with Sparse Labels

Figure 3 for When 3D Partial Points Meets SAM: Tooth Point Cloud Segmentation with Sparse Labels

Figure 4 for When 3D Partial Points Meets SAM: Tooth Point Cloud Segmentation with Sparse Labels

Abstract:Tooth point cloud segmentation is a fundamental task in many orthodontic applications. Current research mainly focuses on fully supervised learning which demands expensive and tedious manual point-wise annotation. Although recent weakly-supervised alternatives are proposed to use weak labels for 3D segmentation and achieve promising results, they tend to fail when the labels are extremely sparse. Inspired by the powerful promptable segmentation capability of the Segment Anything Model (SAM), we propose a framework named SAMTooth that leverages such capacity to complement the extremely sparse supervision. To automatically generate appropriate point prompts for SAM, we propose a novel Confidence-aware Prompt Generation strategy, where coarse category predictions are aggregated with confidence-aware filtering. Furthermore, to fully exploit the structural and shape clues in SAM's outputs for assisting the 3D feature learning, we advance a Mask-guided Representation Learning that re-projects the generated tooth masks of SAM into 3D space and constrains these points of different teeth to possess distinguished representations. To demonstrate the effectiveness of the framework, we conduct experiments on the public dataset and surprisingly find with only 0.1\% annotations (one point per tooth), our method can surpass recent weakly supervised methods by a large margin, and the performance is even comparable to the recent fully-supervised methods, showcasing the significant potential of applying SAM to 3D perception tasks with sparse labels. Code is available at https://github.com/CUHK-AIM-Group/SAMTooth.

* To appear at MICCAI24

Via

Access Paper or Ask Questions

Intertwined Biases Across Social Media Spheres: Unpacking Correlations in Media Bias Dimensions

Aug 27, 2024

Yifan Liu, Yike Li, Dong Wang

Figure 1 for Intertwined Biases Across Social Media Spheres: Unpacking Correlations in Media Bias Dimensions

Figure 2 for Intertwined Biases Across Social Media Spheres: Unpacking Correlations in Media Bias Dimensions

Figure 3 for Intertwined Biases Across Social Media Spheres: Unpacking Correlations in Media Bias Dimensions

Figure 4 for Intertwined Biases Across Social Media Spheres: Unpacking Correlations in Media Bias Dimensions

Abstract:Media bias significantly shapes public perception by reinforcing stereotypes and exacerbating societal divisions. Prior research has often focused on isolated media bias dimensions such as \textit{political bias} or \textit{racial bias}, neglecting the complex interrelationships among various bias dimensions across different topic domains. Moreover, we observe that models trained on existing media bias benchmarks fail to generalize effectively on recent social media posts, particularly in certain bias identification tasks. This shortfall primarily arises because these benchmarks do not adequately reflect the rapidly evolving nature of social media content, which is characterized by shifting user behaviors and emerging trends. In response to these limitations, our research introduces a novel dataset collected from YouTube and Reddit over the past five years. Our dataset includes automated annotations for YouTube content across a broad spectrum of bias dimensions, such as gender, racial, and political biases, as well as hate speech, among others. It spans diverse domains including politics, sports, healthcare, education, and entertainment, reflecting the complex interplay of biases across different societal sectors. Through comprehensive statistical analysis, we identify significant differences in bias expression patterns and intra-domain bias correlations across these domains. By utilizing our understanding of the correlations among various bias dimensions, we lay the groundwork for creating advanced systems capable of detecting multiple biases simultaneously. Overall, our dataset advances the field of media bias identification, contributing to the development of tools that promote fairer media consumption. The comprehensive awareness of existing media bias fosters more ethical journalism, promotes cultural sensitivity, and supports a more informed and equitable public discourse.

* Accepted to ASONAM 2024

Via

Access Paper or Ask Questions

AI Transparency in Academic Search Systems: An Initial Exploration

Aug 02, 2024

Yifan Liu, Peter Sullivan, Luanne Sinnamon

Figure 1 for AI Transparency in Academic Search Systems: An Initial Exploration

Abstract:As AI-enhanced academic search systems become increasingly popular among researchers, investigating their AI transparency is crucial to ensure trust in the search outcomes, as well as the reliability and integrity of scholarly work. This study employs a qualitative content analysis approach to examine the websites of a sample of 10 AI-enhanced academic search systems identified through university library guides. The assessed level of transparency varies across these systems: five provide detailed information about their mechanisms, three offer partial information, and two provide little to no information. These findings indicate that the academic community is recommending and using tools with opaque functionalities, raising concerns about research integrity, including issues of reproducibility and researcher responsibility.

Via

Access Paper or Ask Questions

HIMO: A New Benchmark for Full-Body Human Interacting with Multiple Objects

Jul 17, 2024

Xintao Lv, Liang Xu, Yichao Yan, Xin Jin, Congsheng Xu, Shuwen Wu, Yifan Liu, Lincheng Li, Mengxiao Bi, Wenjun Zeng(+1 more)

Figure 1 for HIMO: A New Benchmark for Full-Body Human Interacting with Multiple Objects

Figure 2 for HIMO: A New Benchmark for Full-Body Human Interacting with Multiple Objects

Figure 3 for HIMO: A New Benchmark for Full-Body Human Interacting with Multiple Objects

Figure 4 for HIMO: A New Benchmark for Full-Body Human Interacting with Multiple Objects

Abstract:Generating human-object interactions (HOIs) is critical with the tremendous advances of digital avatars. Existing datasets are typically limited to humans interacting with a single object while neglecting the ubiquitous manipulation of multiple objects. Thus, we propose HIMO, a large-scale MoCap dataset of full-body human interacting with multiple objects, containing 3.3K 4D HOI sequences and 4.08M 3D HOI frames. We also annotate HIMO with detailed textual descriptions and temporal segments, benchmarking two novel tasks of HOI synthesis conditioned on either the whole text prompt or the segmented text prompts as fine-grained timeline control. To address these novel tasks, we propose a dual-branch conditional diffusion model with a mutual interaction module for HOI synthesis. Besides, an auto-regressive generation pipeline is also designed to obtain smooth transitions between HOI segments. Experimental results demonstrate the generalization ability to unseen object geometries and temporal compositions.

* Project page: https://lvxintao.github.io/himo, accepted by ECCV 2024

Via

Access Paper or Ask Questions

GTP-4o: Modality-prompted Heterogeneous Graph Learning for Omni-modal Biomedical Representation

Jul 08, 2024

Chenxin Li, Xinyu Liu, Cheng Wang, Yifan Liu, Weihao Yu, Jing Shao, Yixuan Yuan

Figure 1 for GTP-4o: Modality-prompted Heterogeneous Graph Learning for Omni-modal Biomedical Representation

Figure 2 for GTP-4o: Modality-prompted Heterogeneous Graph Learning for Omni-modal Biomedical Representation

Figure 3 for GTP-4o: Modality-prompted Heterogeneous Graph Learning for Omni-modal Biomedical Representation

Figure 4 for GTP-4o: Modality-prompted Heterogeneous Graph Learning for Omni-modal Biomedical Representation

Abstract:Recent advances in learning multi-modal representation have witnessed the success in biomedical domains. While established techniques enable handling multi-modal information, the challenges are posed when extended to various clinical modalities and practical modalitymissing setting due to the inherent modality gaps. To tackle these, we propose an innovative Modality-prompted Heterogeneous Graph for Omnimodal Learning (GTP-4o), which embeds the numerous disparate clinical modalities into a unified representation, completes the deficient embedding of missing modality and reformulates the cross-modal learning with a graph-based aggregation. Specially, we establish a heterogeneous graph embedding to explicitly capture the diverse semantic properties on both the modality-specific features (nodes) and the cross-modal relations (edges). Then, we design a modality-prompted completion that enables completing the inadequate graph representation of missing modality through a graph prompting mechanism, which generates hallucination graphic topologies to steer the missing embedding towards the intact representation. Through the completed graph, we meticulously develop a knowledge-guided hierarchical cross-modal aggregation consisting of a global meta-path neighbouring to uncover the potential heterogeneous neighbors along the pathways driven by domain knowledge, and a local multi-relation aggregation module for the comprehensive cross-modal interaction across various heterogeneous relations. We assess the efficacy of our methodology on rigorous benchmarking experiments against prior state-of-the-arts. In a nutshell, GTP-4o presents an initial foray into the intriguing realm of embedding, relating and perceiving the heterogeneous patterns from various clinical modalities holistically via a graph theory. Project page: https://gtp-4-o.github.io/.

* Accepted by ECCV2024

Via

Access Paper or Ask Questions

EndoSparse: Real-Time Sparse View Synthesis of Endoscopic Scenes using Gaussian Splatting

Jul 01, 2024

Chenxin Li, Brandon Y. Feng, Yifan Liu, Hengyu Liu, Cheng Wang, Weihao Yu, Yixuan Yuan

Figure 1 for EndoSparse: Real-Time Sparse View Synthesis of Endoscopic Scenes using Gaussian Splatting

Figure 2 for EndoSparse: Real-Time Sparse View Synthesis of Endoscopic Scenes using Gaussian Splatting

Figure 3 for EndoSparse: Real-Time Sparse View Synthesis of Endoscopic Scenes using Gaussian Splatting

Figure 4 for EndoSparse: Real-Time Sparse View Synthesis of Endoscopic Scenes using Gaussian Splatting

Abstract:3D reconstruction of biological tissues from a collection of endoscopic images is a key to unlock various important downstream surgical applications with 3D capabilities. Existing methods employ various advanced neural rendering techniques for photorealistic view synthesis, but they often struggle to recover accurate 3D representations when only sparse observations are available, which is usually the case in real-world clinical scenarios. To tackle this {sparsity} challenge, we propose a framework leveraging the prior knowledge from multiple foundation models during the reconstruction process, dubbed as \textit{EndoSparse}. Experimental results indicate that our proposed strategy significantly improves the geometric and appearance quality under challenging sparse-view conditions, including using only three views. In rigorous benchmarking experiments against state-of-the-art methods, \textit{EndoSparse} achieves superior results in terms of accurate geometry, realistic appearance, and rendering efficiency, confirming the robustness to sparse-view limitations in endoscopic reconstruction. \textit{EndoSparse} signifies a steady step towards the practical deployment of neural 3D reconstruction in real-world clinical scenarios. Project page: https://endo-sparse.github.io/.

* Accpeted by MICCAI2024

Via

Access Paper or Ask Questions

GaussianStego: A Generalizable Stenography Pipeline for Generative 3D Gaussians Splatting

Jul 01, 2024

Chenxin Li, Hengyu Liu, Zhiwen Fan, Wuyang Li, Yifan Liu, Panwang Pan, Yixuan Yuan

Figure 1 for GaussianStego: A Generalizable Stenography Pipeline for Generative 3D Gaussians Splatting

Figure 2 for GaussianStego: A Generalizable Stenography Pipeline for Generative 3D Gaussians Splatting

Figure 3 for GaussianStego: A Generalizable Stenography Pipeline for Generative 3D Gaussians Splatting

Figure 4 for GaussianStego: A Generalizable Stenography Pipeline for Generative 3D Gaussians Splatting

Abstract:Recent advancements in large generative models and real-time neural rendering using point-based techniques pave the way for a future of widespread visual data distribution through sharing synthesized 3D assets. However, while standardized methods for embedding proprietary or copyright information, either overtly or subtly, exist for conventional visual content such as images and videos, this issue remains unexplored for emerging generative 3D formats like Gaussian Splatting. We present GaussianStego, a method for embedding steganographic information in the rendering of generated 3D assets. Our approach employs an optimization framework that enables the accurate extraction of hidden information from images rendered using Gaussian assets derived from large models, while maintaining their original visual quality. We conduct preliminary evaluations of our method across several potential deployment scenarios and discuss issues identified through analysis. GaussianStego represents an initial exploration into the novel challenge of embedding customizable, imperceptible, and recoverable information within the renders produced by current 3D generative models, while ensuring minimal impact on the rendered content's quality.

* Project website: https://gaussian-stego.github.io/

Via

Access Paper or Ask Questions