Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Lei Li

HyperAI Team, Xiaomi Corporation

The Role of Deductive and Inductive Reasoning in Large Language Models

Oct 03, 2024

Chengkun Cai, Xu Zhao, Haoliang Liu, Zhongyu Jiang, Tianfang Zhang, Zongkai Wu, Jenq-Neng Hwang, Lei Li

Figure 1 for The Role of Deductive and Inductive Reasoning in Large Language Models

Figure 2 for The Role of Deductive and Inductive Reasoning in Large Language Models

Figure 3 for The Role of Deductive and Inductive Reasoning in Large Language Models

Figure 4 for The Role of Deductive and Inductive Reasoning in Large Language Models

Abstract:Large Language Models (LLMs) have achieved substantial progress in artificial intelligence, particularly in reasoning tasks. However, their reliance on static prompt structures, coupled with limited dynamic reasoning capabilities, often constrains their adaptability to complex and evolving problem spaces. In this paper, we propose the Deductive and InDuctive(DID) method, which enhances LLM reasoning by dynamically integrating both deductive and inductive reasoning within the prompt construction process. Drawing inspiration from cognitive science, the DID approach mirrors human adaptive reasoning mechanisms, offering a flexible framework that allows the model to adjust its reasoning pathways based on task context and performance. We empirically validate the efficacy of DID on established datasets such as AIW and MR-GSM8K, as well as on our custom dataset, Holiday Puzzle, which presents tasks about different holiday date calculating challenges. By leveraging DID's hybrid prompt strategy, we demonstrate significant improvements in both solution accuracy and reasoning quality, achieved without imposing substantial computational overhead. Our findings suggest that DID provides a more robust and cognitively aligned framework for reasoning in LLMs, contributing to the development of advanced LLM-driven problem-solving strategies informed by cognitive science models.

* 4 figures

Via

Access Paper or Ask Questions

TypedThinker: Typed Thinking Improves Large Language Model Reasoning

Oct 02, 2024

Danqing Wang, Jianxin Ma, Fei Fang, Lei Li

Abstract:Despite significant advancements in the reasoning capabilities of Large Language Models (LLMs), the lack of diverse reasoning solutions often makes them trapped in a limited solution search area. In this paper, we propose TypedThinker, a novel framework that enhances LLMs' problem-solving abilities by incorporating multiple reasoning types (deductive, inductive, abductive, and analogical). Our analysis across four benchmarks reveals that different reasoning types uniquely solve distinct sets of problems, highlighting the importance of diverse thinking approaches. TypedThinker addresses two key challenges: selecting appropriate reasoning types for given problems and effectively implementing specific reasoning types. Through self-training on successful experiences, TypedThinker learns an implicit policy for reasoning type selection and application. Experimental results demonstrate significant improvements over baseline models, with accuracy increases of 3.4% for Mistral 7B and 16.7% for LLaMA3 8B across four reasoning benchmarks. Notably, TypedThinker shows effective generalization to new benchmarks and can further enhance the reasoning capability of powerful models like GPT-4o. The code is released at https://github.com/dqwang122/ThinkHub.

* work in process

Via

Access Paper or Ask Questions

Multimodal Trajectory Prediction for Autonomous Driving on Unstructured Roads using Deep Convolutional Network

Sep 27, 2024

Lei Li, Zhifa Chen, Jian Wang, Bin Zhou, Guizhen Yu, Xiaoxuan Chen

Figure 1 for Multimodal Trajectory Prediction for Autonomous Driving on Unstructured Roads using Deep Convolutional Network

Figure 2 for Multimodal Trajectory Prediction for Autonomous Driving on Unstructured Roads using Deep Convolutional Network

Figure 3 for Multimodal Trajectory Prediction for Autonomous Driving on Unstructured Roads using Deep Convolutional Network

Figure 4 for Multimodal Trajectory Prediction for Autonomous Driving on Unstructured Roads using Deep Convolutional Network

Abstract:Recently, the application of autonomous driving in open-pit mining has garnered increasing attention for achieving safe and efficient mineral transportation. Compared to urban structured roads, unstructured roads in mining sites have uneven boundaries and lack clearly defined lane markings. This leads to a lack of sufficient constraint information for predicting the trajectories of other human-driven vehicles, resulting in higher uncertainty in trajectory prediction problems. A method is proposed to predict multiple possible trajectories and their probabilities of the target vehicle. The surrounding environment and historical trajectories of the target vehicle are encoded as a rasterized image, which is used as input to our deep convolutional network to predict the target vehicle's multiple possible trajectories. The method underwent offline testing on a dataset specifically designed for autonomous driving scenarios in open-pit mining and was compared and evaluated against physics-based method. The open-source code and data are available at https://github.com/LLsxyc/mine_motion_prediction.git

* 11 pages,6 figures

Via

Access Paper or Ask Questions

You Only Speak Once to See

Sep 27, 2024

Wenhao Yang, Jianguo Wei, Wenhuan Lu, Lei Li

Abstract:Grounding objects in images using visual cues is a well-established approach in computer vision, yet the potential of audio as a modality for object recognition and grounding remains underexplored. We introduce YOSS, "You Only Speak Once to See," to leverage audio for grounding objects in visual scenes, termed Audio Grounding. By integrating pre-trained audio models with visual models using contrastive learning and multi-modal alignment, our approach captures speech commands or descriptions and maps them directly to corresponding objects within images. Experimental results indicate that audio guidance can be effectively applied to object grounding, suggesting that incorporating audio guidance may enhance the precision and robustness of current object grounding methods and improve the performance of robotic systems and computer vision applications. This finding opens new possibilities for advanced object recognition, scene understanding, and the development of more intuitive and capable robotic systems.

* 7 pages, 4 figures, submitted to ICASSP 2025

Via

Access Paper or Ask Questions

Channel Adaptation for Speaker Verification Using Optimal Transport with Pseudo Label

Sep 14, 2024

Wenhao Yang, Jianguo Wei, Wenhuan Lu, Lei Li, Xugang Lu

Figure 1 for Channel Adaptation for Speaker Verification Using Optimal Transport with Pseudo Label

Figure 2 for Channel Adaptation for Speaker Verification Using Optimal Transport with Pseudo Label

Figure 3 for Channel Adaptation for Speaker Verification Using Optimal Transport with Pseudo Label

Figure 4 for Channel Adaptation for Speaker Verification Using Optimal Transport with Pseudo Label

Abstract:Domain gap often degrades the performance of speaker verification (SV) systems when the statistical distributions of training data and real-world test speech are mismatched. Channel variation, a primary factor causing this gap, is less addressed than other issues (e.g., noise). Although various domain adaptation algorithms could be applied to handle this domain gap problem, most algorithms could not take the complex distribution structure in domain alignment with discriminative learning. In this paper, we propose a novel unsupervised domain adaptation method, i.e., Joint Partial Optimal Transport with Pseudo Label (JPOT-PL), to alleviate the channel mismatch problem. Leveraging the geometric-aware distance metric of optimal transport in distribution alignment, we further design a pseudo label-based discriminative learning where the pseudo label can be regarded as a new type of soft speaker label derived from the optimal coupling. With the JPOT-PL, we carry out experiments on the SV channel adaptation task with VoxCeleb as the basis corpus. Experiments show our method reduces EER by over 10% compared with several state-of-the-art channel adaptation algorithms.

* 5 pages, 3 figures, submitted to ICASSP 2025

Via

Access Paper or Ask Questions

Integrated Multi-Level Knowledge Distillation for Enhanced Speaker Verification

Sep 14, 2024

Wenhao Yang, Jianguo Wei, Wenhuan Lu, Xugang Lu, Lei Li

Figure 1 for Integrated Multi-Level Knowledge Distillation for Enhanced Speaker Verification

Figure 2 for Integrated Multi-Level Knowledge Distillation for Enhanced Speaker Verification

Figure 3 for Integrated Multi-Level Knowledge Distillation for Enhanced Speaker Verification

Figure 4 for Integrated Multi-Level Knowledge Distillation for Enhanced Speaker Verification

Abstract:Knowledge distillation (KD) is widely used in audio tasks, such as speaker verification (SV), by transferring knowledge from a well-trained large model (the teacher) to a smaller, more compact model (the student) for efficiency and portability. Existing KD methods for SV often mirror those used in image processing, focusing on approximating predicted probabilities and hidden representations. However, these methods fail to account for the multi-level temporal properties of speech audio. In this paper, we propose a novel KD method, i.e., Integrated Multi-level Knowledge Distillation (IML-KD), to transfer knowledge of various temporal-scale features of speech from a teacher model to a student model. In the IML-KD, temporal context information from the teacher model is integrated into novel Integrated Gradient-based input-sensitive representations from speech segments with various durations, and the student model is trained to infer these representations with multi-level alignment for the output. We conduct SV experiments on the VoxCeleb1 dataset to evaluate the proposed method. Experimental results demonstrate that IML-KD significantly enhances KD performance, reducing the Equal Error Rate (EER) by 5%.

* 5 pages, 3 figures, submitted to ICASSP 2025

Via

Access Paper or Ask Questions

Dense Point Clouds Matter: Dust-GS for Scene Reconstruction from Sparse Viewpoints

Sep 13, 2024

Shan Chen, Jiale Zhou, Lei Li

Figure 1 for Dense Point Clouds Matter: Dust-GS for Scene Reconstruction from Sparse Viewpoints

Figure 2 for Dense Point Clouds Matter: Dust-GS for Scene Reconstruction from Sparse Viewpoints

Figure 3 for Dense Point Clouds Matter: Dust-GS for Scene Reconstruction from Sparse Viewpoints

Figure 4 for Dense Point Clouds Matter: Dust-GS for Scene Reconstruction from Sparse Viewpoints

Abstract:3D Gaussian Splatting (3DGS) has demonstrated remarkable performance in scene synthesis and novel view synthesis tasks. Typically, the initialization of 3D Gaussian primitives relies on point clouds derived from Structure-from-Motion (SfM) methods. However, in scenarios requiring scene reconstruction from sparse viewpoints, the effectiveness of 3DGS is significantly constrained by the quality of these initial point clouds and the limited number of input images. In this study, we present Dust-GS, a novel framework specifically designed to overcome the limitations of 3DGS in sparse viewpoint conditions. Instead of relying solely on SfM, Dust-GS introduces an innovative point cloud initialization technique that remains effective even with sparse input data. Our approach leverages a hybrid strategy that integrates an adaptive depth-based masking technique, thereby enhancing the accuracy and detail of reconstructed scenes. Extensive experiments conducted on several benchmark datasets demonstrate that Dust-GS surpasses traditional 3DGS methods in scenarios with sparse viewpoints, achieving superior scene reconstruction quality with a reduced number of input images.

Via

Access Paper or Ask Questions

LT3SD: Latent Trees for 3D Scene Diffusion

Sep 12, 2024

Quan Meng, Lei Li, Matthias Nießner, Angela Dai

Figure 1 for LT3SD: Latent Trees for 3D Scene Diffusion

Figure 2 for LT3SD: Latent Trees for 3D Scene Diffusion

Figure 3 for LT3SD: Latent Trees for 3D Scene Diffusion

Figure 4 for LT3SD: Latent Trees for 3D Scene Diffusion

Abstract:We present LT3SD, a novel latent diffusion model for large-scale 3D scene generation. Recent advances in diffusion models have shown impressive results in 3D object generation, but are limited in spatial extent and quality when extended to 3D scenes. To generate complex and diverse 3D scene structures, we introduce a latent tree representation to effectively encode both lower-frequency geometry and higher-frequency detail in a coarse-to-fine hierarchy. We can then learn a generative diffusion process in this latent 3D scene space, modeling the latent components of a scene at each resolution level. To synthesize large-scale scenes with varying sizes, we train our diffusion model on scene patches and synthesize arbitrary-sized output 3D scenes through shared diffusion generation across multiple scene patches. Through extensive experiments, we demonstrate the efficacy and benefits of LT3SD for large-scale, high-quality unconditional 3D scene generation and for probabilistic completion for partial scene observations.

* Project page: https://quan-meng.github.io/projects/lt3sd/ Video: https://youtu.be/AJ5sG9VyjGA

Via

Access Paper or Ask Questions

Optimizing 3D Gaussian Splatting for Sparse Viewpoint Scene Reconstruction

Sep 05, 2024

Shen Chen, Jiale Zhou, Lei Li

Figure 1 for Optimizing 3D Gaussian Splatting for Sparse Viewpoint Scene Reconstruction

Figure 2 for Optimizing 3D Gaussian Splatting for Sparse Viewpoint Scene Reconstruction

Figure 3 for Optimizing 3D Gaussian Splatting for Sparse Viewpoint Scene Reconstruction

Figure 4 for Optimizing 3D Gaussian Splatting for Sparse Viewpoint Scene Reconstruction

Abstract:3D Gaussian Splatting (3DGS) has emerged as a promising approach for 3D scene representation, offering a reduction in computational overhead compared to Neural Radiance Fields (NeRF). However, 3DGS is susceptible to high-frequency artifacts and demonstrates suboptimal performance under sparse viewpoint conditions, thereby limiting its applicability in robotics and computer vision. To address these limitations, we introduce SVS-GS, a novel framework for Sparse Viewpoint Scene reconstruction that integrates a 3D Gaussian smoothing filter to suppress artifacts. Furthermore, our approach incorporates a Depth Gradient Profile Prior (DGPP) loss with a dynamic depth mask to sharpen edges and 2D diffusion with Score Distillation Sampling (SDS) loss to enhance geometric consistency in novel view synthesis. Experimental evaluations on the MipNeRF-360 and SeaThru-NeRF datasets demonstrate that SVS-GS markedly improves 3D reconstruction from sparse viewpoints, offering a robust and efficient solution for scene understanding in robotics and computer vision applications.

Via

Access Paper or Ask Questions

Personalized Topology-Informed 12-Lead ECG Electrode Localization from Incomplete Cardiac MRIs for Efficient Cardiac Digital Twins

Aug 25, 2024

Lei Li, Hannah Smith, Yilin Lyu, Julia Camps, Blanca Rodriguez, Abhirup Banerjee, Vicente Grau

Abstract:Cardiac digital twins (CDTs) offer personalized \textit{in-silico} cardiac representations for the inference of multi-scale properties tied to cardiac mechanisms. The creation of CDTs requires precise information about the electrode position on the torso, especially for the personalized electrocardiogram (ECG) calibration. However, current studies commonly rely on additional acquisition of torso imaging and manual/semi-automatic methods for ECG electrode localization. In this study, we propose a novel and efficient topology-informed model to fully automatically extract personalized ECG electrode locations from 2D clinically standard cardiac MRIs. Specifically, we obtain the sparse torso contours from the cardiac MRIs and then localize the electrodes from the contours. Cardiac MRIs aim at imaging of the heart instead of the torso, leading to incomplete torso geometry within the imaging. To tackle the missing topology, we incorporate the electrodes as a subset of the keypoints, which can be explicitly aligned with the 3D torso topology. The experimental results demonstrate that the proposed model outperforms the time-consuming conventional method in terms of accuracy (Euclidean distance: $1.24 \pm 0.293$ cm vs. $1.48 \pm 0.362$ cm) and efficiency ($2$~s vs. $30$-$35$~min). We further demonstrate the effectiveness of using the detected electrodes for \textit{in-silico} ECG simulation, highlighting their potential for creating accurate and efficient CDT models. The code will be released publicly after the manuscript is accepted for publication.

* 12 pages

Via

Access Paper or Ask Questions