Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yifan Liu

Peter

LGS: A Light-weight 4D Gaussian Splatting for Efficient Surgical Scene Reconstruction

Jun 23, 2024

Hengyu Liu, Yifan Liu, Chenxin Li, Wuyang Li, Yixuan Yuan

Abstract:The advent of 3D Gaussian Splatting (3D-GS) techniques and their dynamic scene modeling variants, 4D-GS, offers promising prospects for real-time rendering of dynamic surgical scenarios. However, the prerequisite for modeling dynamic scenes by a large number of Gaussian units, the high-dimensional Gaussian attributes and the high-resolution deformation fields, all lead to serve storage issues that hinder real-time rendering in resource-limited surgical equipment. To surmount these limitations, we introduce a Lightweight 4D Gaussian Splatting framework (LGS) that can liberate the efficiency bottlenecks of both rendering and storage for dynamic endoscopic reconstruction. Specifically, to minimize the redundancy of Gaussian quantities, we propose Deformation-Aware Pruning by gauging the impact of each Gaussian on deformation. Concurrently, to reduce the redundancy of Gaussian attributes, we simplify the representation of textures and lighting in non-crucial areas by pruning the dimensions of Gaussian attributes. We further resolve the feature field redundancy caused by the high resolution of 4D neural spatiotemporal encoder for modeling dynamic scenes via a 4D feature field condensation. Experiments on public benchmarks demonstrate efficacy of LGS in terms of a compression rate exceeding 9 times while maintaining the pleasing visual quality and real-time rendering efficiency. LGS confirms a substantial step towards its application in robotic surgical services.

* Accepted by MICCAI 2024. Project page: https://lgs-endo.github.io/

Via

Access Paper or Ask Questions

Retrieval Augmented Fact Verification by Synthesizing Contrastive Arguments

Jun 14, 2024

Zhenrui Yue, Huimin Zeng, Lanyu Shang, Yifan Liu, Yang Zhang, Dong Wang

Figure 1 for Retrieval Augmented Fact Verification by Synthesizing Contrastive Arguments

Figure 2 for Retrieval Augmented Fact Verification by Synthesizing Contrastive Arguments

Figure 3 for Retrieval Augmented Fact Verification by Synthesizing Contrastive Arguments

Figure 4 for Retrieval Augmented Fact Verification by Synthesizing Contrastive Arguments

Abstract:The rapid propagation of misinformation poses substantial risks to public interest. To combat misinformation, large language models (LLMs) are adapted to automatically verify claim credibility. Nevertheless, existing methods heavily rely on the embedded knowledge within LLMs and / or black-box APIs for evidence collection, leading to subpar performance with smaller LLMs or upon unreliable context. In this paper, we propose retrieval augmented fact verification through the synthesis of contrasting arguments (RAFTS). Upon input claims, RAFTS starts with evidence retrieval, where we design a retrieval pipeline to collect and re-rank relevant documents from verifiable sources. Then, RAFTS forms contrastive arguments (i.e., supporting or refuting) conditioned on the retrieved evidence. In addition, RAFTS leverages an embedding model to identify informative demonstrations, followed by in-context prompting to generate the prediction and explanation. Our method effectively retrieves relevant documents as evidence and evaluates arguments from varying perspectives, incorporating nuanced information for fine-grained decision-making. Combined with informative in-context examples as prior, RAFTS achieves significant improvements to supervised and LLM baselines without complex prompts. We demonstrate the effectiveness of our method through extensive experiments, where RAFTS can outperform GPT-based methods with a significantly smaller 7B LLM.

* Accepted to ACL 2024

Via

Access Paper or Ask Questions

LLMs for User Interest Exploration: A Hybrid Approach

May 25, 2024

Jianling Wang, Haokai Lu, Yifan Liu, He Ma, Yueqi Wang, Yang Gu, Shuzhou Zhang, Ningren, Han, Shuchao Bi(+3 more)

Figure 1 for LLMs for User Interest Exploration: A Hybrid Approach

Figure 2 for LLMs for User Interest Exploration: A Hybrid Approach

Figure 3 for LLMs for User Interest Exploration: A Hybrid Approach

Figure 4 for LLMs for User Interest Exploration: A Hybrid Approach

Abstract:Traditional recommendation systems are subject to a strong feedback loop by learning from and reinforcing past user-item interactions, which in turn limits the discovery of novel user interests. To address this, we introduce a hybrid hierarchical framework combining Large Language Models (LLMs) and classic recommendation models for user interest exploration. The framework controls the interfacing between the LLMs and the classic recommendation models through "interest clusters", the granularity of which can be explicitly determined by algorithm designers. It recommends the next novel interests by first representing "interest clusters" using language, and employs a fine-tuned LLM to generate novel interest descriptions that are strictly within these predefined clusters. At the low level, it grounds these generated interests to an item-level policy by restricting classic recommendation models, in this case a transformer-based sequence recommender to return items that fall within the novel clusters generated at the high level. We showcase the efficacy of this approach on an industrial-scale commercial platform serving billions of users. Live experiments show a significant increase in both exploration of novel interests and overall user enjoyment of the platform.

Via

Access Paper or Ask Questions

Semantic Trajectory Data Mining with LLM-Informed POI Classification

May 20, 2024

Yifan Liu, Chenchen Kuai, Haoxuan Ma, Xishun Liao, Brian Yueshuai He, Jiaqi Ma

Figure 1 for Semantic Trajectory Data Mining with LLM-Informed POI Classification

Figure 2 for Semantic Trajectory Data Mining with LLM-Informed POI Classification

Figure 3 for Semantic Trajectory Data Mining with LLM-Informed POI Classification

Figure 4 for Semantic Trajectory Data Mining with LLM-Informed POI Classification

Abstract:Human travel trajectory mining is crucial for transportation systems, enhancing route optimization, traffic management, and the study of human travel patterns. Previous rule-based approaches without the integration of semantic information show a limitation in both efficiency and accuracy. Semantic information, such as activity types inferred from Points of Interest (POI) data, can significantly enhance the quality of trajectory mining. However, integrating these insights is challenging, as many POIs have incomplete feature information, and current learning-based POI algorithms require the integrity of datasets to do the classification. In this paper, we introduce a novel pipeline for human travel trajectory mining. Our approach first leverages the strong inferential and comprehension capabilities of large language models (LLMs) to annotate POI with activity types and then uses a Bayesian-based algorithm to infer activity for each stay point in a trajectory. In our evaluation using the OpenStreetMap (OSM) POI dataset, our approach achieves a 93.4% accuracy and a 96.1% F-1 score in POI classification, and a 91.7% accuracy with a 92.3% F-1 score in activity inference.

Via

Access Paper or Ask Questions

SOMTP: Self-Supervised Learning-Based Optimizer for MPC-Based Safe Trajectory Planning Problems in Robotics

May 15, 2024

Yifan Liu, You Wang, Guang Li

Abstract:Model Predictive Control (MPC)-based trajectory planning has been widely used in robotics, and incorporating Control Barrier Function (CBF) constraints into MPC can greatly improve its obstacle avoidance efficiency. Unfortunately, traditional optimizers are resource-consuming and slow to solve such non-convex constrained optimization problems (COPs) while learning-based methods struggle to satisfy the non-convex constraints. In this paper, we propose SOMTP algorithm, a self-supervised learning-based optimizer for CBF-MPC trajectory planning. Specifically, first, SOMTP employs problem transcription to satisfy most of the constraints. Then the differentiable SLPG correction is proposed to move the solution closer to the safe set and is then converted as the guide policy in the following training process. After that, inspired by the Augmented Lagrangian Method (ALM), our training algorithm integrated with guide policy constraints is proposed to enable the optimizer network to converge to a feasible solution. Finally, experiments show that the proposed algorithm has better feasibility than other learning-based methods and can provide solutions much faster than traditional optimizers with similar optimality.

Via

Access Paper or Ask Questions

DRepMRec: A Dual Representation Learning Framework for Multimodal Recommendation

Apr 17, 2024

Kangning Zhang, Yingjie Qin, Ruilong Su, Yifan Liu, Jiarui Jin, Weinan Zhang, Yong Yu

Figure 1 for DRepMRec: A Dual Representation Learning Framework for Multimodal Recommendation

Figure 2 for DRepMRec: A Dual Representation Learning Framework for Multimodal Recommendation

Figure 3 for DRepMRec: A Dual Representation Learning Framework for Multimodal Recommendation

Figure 4 for DRepMRec: A Dual Representation Learning Framework for Multimodal Recommendation

Abstract:Multimodal Recommendation focuses mainly on how to effectively integrate behavior and multimodal information in the recommendation task. Previous works suffer from two major issues. Firstly, the training process tightly couples the behavior module and multimodal module by jointly optimizing them using the sharing model parameters, which leads to suboptimal performance since behavior signals and modality signals often provide opposite guidance for the parameters updates. Secondly, previous approaches fail to take into account the significant distribution differences between behavior and modality when they attempt to fuse behavior and modality information. This resulted in a misalignment between the representations of behavior and modality. To address these challenges, in this paper, we propose a novel Dual Representation learning framework for Multimodal Recommendation called DRepMRec, which introduce separate dual lines for coupling problem and Behavior-Modal Alignment (BMA) for misalignment problem. Specifically, DRepMRec leverages two independent lines of representation learning to calculate behavior and modal representations. After obtaining separate behavior and modal representations, we design a Behavior-Modal Alignment Module (BMA) to align and fuse the dual representations to solve the misalignment problem. Furthermore, we integrate the BMA into other recommendation models, resulting in consistent performance improvements. To ensure dual representations maintain their semantic independence during alignment, we introduce Similarity-Supervised Signal (SSS) for representation learning. We conduct extensive experiments on three public datasets and our method achieves state-of-the-art (SOTA) results. The source code will be available upon acceptance.

* 8 pages, 9 figures

Via

Access Paper or Ask Questions

An Aligning and Training Framework for Multimodal Recommendations

Mar 20, 2024

Yifan Liu, Kangning Zhang, Xiangyuan Ren, Yanhua Huang, Jiarui Jin, Yingjie Qin, Ruilong Su, Ruiwen Xu, Weinan Zhang

Figure 1 for An Aligning and Training Framework for Multimodal Recommendations

Figure 2 for An Aligning and Training Framework for Multimodal Recommendations

Figure 3 for An Aligning and Training Framework for Multimodal Recommendations

Figure 4 for An Aligning and Training Framework for Multimodal Recommendations

Abstract:With the development of multimedia applications, multimodal recommendations are playing an essential role, as they can leverage rich contexts beyond user interactions. Existing methods mainly regard multimodal information as an auxiliary, using them to help learn ID features; however, there exist semantic gaps among multimodal content features and ID features, for which directly using multimodal information as an auxiliary would lead to misalignment in representations of users and items. In this paper, we first systematically investigate the misalignment issue in multimodal recommendations, and propose a solution named AlignRec. In AlignRec, the recommendation objective is decomposed into three alignments, namely alignment within contents, alignment between content and categorical ID, and alignment between users and items. Each alignment is characterized by a specific objective function and is integrated into our multimodal recommendation framework. To effectively train our AlignRec, we propose starting from pre-training the first alignment to obtain unified multimodal features and subsequently training the following two alignments together with these features as input. As it is essential to analyze whether each multimodal feature helps in training, we design three new classes of metrics to evaluate intermediate performance. Our extensive experiments on three real-world datasets consistently verify the superiority of AlignRec compared to nine baselines. We also find that the multimodal features generated by AlignRec are better than currently used ones, which are to be open-sourced.

* 11 pages, add some necessary explanations, revise typos

Via

Access Paper or Ask Questions

Endora: Video Generation Models as Endoscopy Simulators

Mar 17, 2024

Chenxin Li, Hengyu Liu, Yifan Liu, Brandon Y. Feng, Wuyang Li, Xinyu Liu, Zhen Chen, Jing Shao, Yixuan Yuan

Figure 1 for Endora: Video Generation Models as Endoscopy Simulators

Figure 2 for Endora: Video Generation Models as Endoscopy Simulators

Figure 3 for Endora: Video Generation Models as Endoscopy Simulators

Figure 4 for Endora: Video Generation Models as Endoscopy Simulators

Abstract:Generative models hold promise for revolutionizing medical education, robot-assisted surgery, and data augmentation for machine learning. Despite progress in generating 2D medical images, the complex domain of clinical video generation has largely remained untapped.This paper introduces \model, an innovative approach to generate medical videos that simulate clinical endoscopy scenes. We present a novel generative model design that integrates a meticulously crafted spatial-temporal video transformer with advanced 2D vision foundation model priors, explicitly modeling spatial-temporal dynamics during video generation. We also pioneer the first public benchmark for endoscopy simulation with video generation models, adapting existing state-of-the-art methods for this endeavor.Endora demonstrates exceptional visual quality in generating endoscopy videos, surpassing state-of-the-art methods in extensive testing. Moreover, we explore how this endoscopy simulator can empower downstream video analysis tasks and even generate 3D medical scenes with multi-view consistency. In a nutshell, Endora marks a notable breakthrough in the deployment of generative AI for clinical endoscopy research, setting a substantial stage for further advances in medical content generation. For more details, please visit our project page: https://endora-medvidgen.github.io/.

* Project page: https://endora-medvidgen.github.io/

Via

Access Paper or Ask Questions

Diffusion Model-Based Image Editing: A Survey

Feb 27, 2024

Yi Huang, Jiancheng Huang, Yifan Liu, Mingfu Yan, Jiaxi Lv, Jianzhuang Liu, Wei Xiong, He Zhang, Shifeng Chen, Liangliang Cao

Figure 1 for Diffusion Model-Based Image Editing: A Survey

Figure 2 for Diffusion Model-Based Image Editing: A Survey

Figure 3 for Diffusion Model-Based Image Editing: A Survey

Figure 4 for Diffusion Model-Based Image Editing: A Survey

Abstract:Denoising diffusion models have emerged as a powerful tool for various image generation and editing tasks, facilitating the synthesis of visual content in an unconditional or input-conditional manner. The core idea behind them is learning to reverse the process of gradually adding noise to images, allowing them to generate high-quality samples from a complex distribution. In this survey, we provide an exhaustive overview of existing methods using diffusion models for image editing, covering both theoretical and practical aspects in the field. We delve into a thorough analysis and categorization of these works from multiple perspectives, including learning strategies, user-input conditions, and the array of specific editing tasks that can be accomplished. In addition, we pay special attention to image inpainting and outpainting, and explore both earlier traditional context-driven and current multimodal conditional methods, offering a comprehensive analysis of their methodologies. To further evaluate the performance of text-guided image editing algorithms, we propose a systematic benchmark, EditEval, featuring an innovative metric, LMM Score. Finally, we address current limitations and envision some potential directions for future research. The accompanying repository is released at https://github.com/SiatMMLab/Awesome-Diffusion-Model-Based-Image-Editing-Methods.

Via

Access Paper or Ask Questions

Zero-Shot Chain-of-Thought Reasoning Guided by Evolutionary Algorithms in Large Language Models

Feb 08, 2024

Feihu Jin, Yifan Liu, Ying Tan

Figure 1 for Zero-Shot Chain-of-Thought Reasoning Guided by Evolutionary Algorithms in Large Language Models

Figure 2 for Zero-Shot Chain-of-Thought Reasoning Guided by Evolutionary Algorithms in Large Language Models

Figure 3 for Zero-Shot Chain-of-Thought Reasoning Guided by Evolutionary Algorithms in Large Language Models

Figure 4 for Zero-Shot Chain-of-Thought Reasoning Guided by Evolutionary Algorithms in Large Language Models

Abstract:Large Language Models (LLMs) have demonstrated remarkable performance across diverse tasks and exhibited impressive reasoning abilities by applying zero-shot Chain-of-Thought (CoT) prompting. However, due to the evolving nature of sentence prefixes during the pre-training phase, existing zero-shot CoT prompting methods that employ identical CoT prompting across all task instances may not be optimal. In this paper, we introduce a novel zero-shot prompting method that leverages evolutionary algorithms to generate diverse promptings for LLMs dynamically. Our approach involves initializing two CoT promptings, performing evolutionary operations based on LLMs to create a varied set, and utilizing the LLMs to select a suitable CoT prompting for a given problem. Additionally, a rewriting operation, guided by the selected CoT prompting, enhances the understanding of the LLMs about the problem. Extensive experiments conducted across ten reasoning datasets demonstrate the superior performance of our proposed method compared to current zero-shot CoT prompting methods on GPT-3.5-turbo and GPT-4. Moreover, in-depth analytical experiments underscore the adaptability and effectiveness of our method in various reasoning tasks.

* 17 pages, 5 figures, 16 tables

Via

Access Paper or Ask Questions