Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shuang Li

FSMR: A Feature Swapping Multi-modal Reasoning Approach with Joint Textual and Visual Clues

Mar 29, 2024

Shuang Li, Jiahua Wang, Lijie Wen

Abstract:Multi-modal reasoning plays a vital role in bridging the gap between textual and visual information, enabling a deeper understanding of the context. This paper presents the Feature Swapping Multi-modal Reasoning (FSMR) model, designed to enhance multi-modal reasoning through feature swapping. FSMR leverages a pre-trained visual-language model as an encoder, accommodating both text and image inputs for effective feature representation from both modalities. It introduces a unique feature swapping module, enabling the exchange of features between identified objects in images and corresponding vocabulary words in text, thereby enhancing the model's comprehension of the interplay between images and text. To further bolster its multi-modal alignment capabilities, FSMR incorporates a multi-modal cross-attention mechanism, facilitating the joint modeling of textual and visual information. During training, we employ image-text matching and cross-entropy losses to ensure semantic consistency between visual and language elements. Extensive experiments on the PMR dataset demonstrate FSMR's superiority over state-of-the-art baseline models across various performance metrics.

Via

Access Paper or Ask Questions

On A Class of Greedy Sparse Recovery Algorithms -- A High Dimensional Approach

Feb 25, 2024

Gang Li, Qiuwei Li, Shuang Li, Wu Angela Li

Figure 1 for On A Class of Greedy Sparse Recovery Algorithms -- A High Dimensional Approach

Figure 2 for On A Class of Greedy Sparse Recovery Algorithms -- A High Dimensional Approach

Figure 3 for On A Class of Greedy Sparse Recovery Algorithms -- A High Dimensional Approach

Figure 4 for On A Class of Greedy Sparse Recovery Algorithms -- A High Dimensional Approach

Abstract:Sparse signal recovery deals with finding the sparest solution of an under-determined linear system $x = Qs$. In this paper, we propose a novel greedy approach to addressing the challenges from such a problem. Such an approach is based on a characterization of solutions to the system, which allows us to work on the sparse recovery in the $s$-space directly with a given measure. With $l_2$-based measure, two OMP-type algorithms are proposed, which significantly outperform the classical OMP algorithm in terms of recovery accuracy while maintaining comparable computational complexity. An $l_1$-based algorithm, denoted as $\text{Alg}_{GBP}$ (greedy basis pursuit) algorithm, is derived. Such an algorithm significantly outperforms the classical BP algorithm. A CoSaMP-type algorithm is also proposed to further enhance the performance of the two proposed OMP-type algorithms. The superior performance of our proposed algorithms is demonstrated through extensive numerical simulations using synthetic data as well as video signals, highlighting their potential for various applications in compressed sensing and signal processing.

Via

Access Paper or Ask Questions

Unveiling Latent Causal Rules: A Temporal Point Process Approach for Abnormal Event Explanation

Feb 03, 2024

Yiling Kuang, Chao Yang, Yang Yang, Shuang Li

Figure 1 for Unveiling Latent Causal Rules: A Temporal Point Process Approach for Abnormal Event Explanation

Figure 2 for Unveiling Latent Causal Rules: A Temporal Point Process Approach for Abnormal Event Explanation

Figure 3 for Unveiling Latent Causal Rules: A Temporal Point Process Approach for Abnormal Event Explanation

Figure 4 for Unveiling Latent Causal Rules: A Temporal Point Process Approach for Abnormal Event Explanation

Abstract:In high-stakes systems such as healthcare, it is critical to understand the causal reasons behind unusual events, such as sudden changes in patient's health. Unveiling the causal reasons helps with quick diagnoses and precise treatment planning. In this paper, we propose an automated method for uncovering "if-then" logic rules to explain observational events. We introduce temporal point processes to model the events of interest, and discover the set of latent rules to explain the occurrence of events. To achieve this, we employ an Expectation-Maximization (EM) algorithm. In the E-step, we calculate the likelihood of each event being explained by each discovered rule. In the M-step, we update both the rule set and model parameters to enhance the likelihood function's lower bound. Notably, we optimize the rule set in a differential manner. Our approach demonstrates accurate performance in both discovering rules and identifying root causes. We showcase its promising results using synthetic and real healthcare datasets.

* Accepted by AISTATS 2024

Via

Access Paper or Ask Questions

Deep Generative Modeling for Financial Time Series with Application in VaR: A Comparative Review

Jan 18, 2024

Lars Ericson, Xuejun Zhu, Xusi Han, Rao Fu, Shuang Li, Steve Guo, Ping Hu

Abstract:In the financial services industry, forecasting the risk factor distribution conditional on the history and the current market environment is the key to market risk modeling in general and value at risk (VaR) model in particular. As one of the most widely adopted VaR models in commercial banks, Historical simulation (HS) uses the empirical distribution of daily returns in a historical window as the forecast distribution of risk factor returns in the next day. The objectives for financial time series generation are to generate synthetic data paths with good variety, and similar distribution and dynamics to the original historical data. In this paper, we apply multiple existing deep generative methods (e.g., CGAN, CWGAN, Diffusion, and Signature WGAN) for conditional time series generation, and propose and test two new methods for conditional multi-step time series generation, namely Encoder-Decoder CGAN and Conditional TimeVAE. Furthermore, we introduce a comprehensive framework with a set of KPIs to measure the quality of the generated time series for financial modeling. The KPIs cover distribution distance, autocorrelation and backtesting. All models (HS, parametric and neural networks) are tested on both historical USD yield curve data and additional data simulated from GARCH and CIR processes. The study shows that top performing models are HS, GARCH and CWGAN models. Future research directions in this area are also discussed.

Via

Access Paper or Ask Questions

Evolutionary Alternating Direction Method of Multipliers for Constrained Multi-Objective Optimization with Unknown Constraints

Jan 02, 2024

Shuang Li, Ke Li, Wei Li, Ming Yang

Figure 1 for Evolutionary Alternating Direction Method of Multipliers for Constrained Multi-Objective Optimization with Unknown Constraints

Figure 2 for Evolutionary Alternating Direction Method of Multipliers for Constrained Multi-Objective Optimization with Unknown Constraints

Figure 3 for Evolutionary Alternating Direction Method of Multipliers for Constrained Multi-Objective Optimization with Unknown Constraints

Figure 4 for Evolutionary Alternating Direction Method of Multipliers for Constrained Multi-Objective Optimization with Unknown Constraints

Abstract:Constrained multi-objective optimization problems (CMOPs) pervade real-world applications in science, engineering, and design. Constraint violation has been a building block in designing evolutionary multi-objective optimization algorithms for solving constrained multi-objective optimization problems. However, in certain scenarios, constraint functions might be unknown or inadequately defined, making constraint violation unattainable and potentially misleading for conventional constrained evolutionary multi-objective optimization algorithms. To address this issue, we present the first of its kind evolutionary optimization framework, inspired by the principles of the alternating direction method of multipliers that decouples objective and constraint functions. This framework tackles CMOPs with unknown constraints by reformulating the original problem into an additive form of two subproblems, each of which is allotted a dedicated evolutionary population. Notably, these two populations operate towards complementary evolutionary directions during their optimization processes. In order to minimize discrepancy, their evolutionary directions alternate, aiding the discovery of feasible solutions. Comparative experiments conducted against five state-of-the-art constrained evolutionary multi-objective optimization algorithms, on 120 benchmark test problem instances with varying properties, as well as two real-world engineering optimization problems, demonstrate the effectiveness and superiority of our proposed framework. Its salient features include faster convergence and enhanced resilience to various Pareto front shapes.

* 29 pages, 17 figures

Via

Access Paper or Ask Questions

Inferring Heterogeneous Treatment Effects of Crashes on Highway Traffic: A Doubly Robust Causal Machine Learning Approach

Jan 01, 2024

Shuang Li, Ziyuan Pu, Zhiyong Cui, Seunghyeon Lee, Xiucheng Guo, Dong Ngoduy

Figure 1 for Inferring Heterogeneous Treatment Effects of Crashes on Highway Traffic: A Doubly Robust Causal Machine Learning Approach

Figure 2 for Inferring Heterogeneous Treatment Effects of Crashes on Highway Traffic: A Doubly Robust Causal Machine Learning Approach

Figure 3 for Inferring Heterogeneous Treatment Effects of Crashes on Highway Traffic: A Doubly Robust Causal Machine Learning Approach

Figure 4 for Inferring Heterogeneous Treatment Effects of Crashes on Highway Traffic: A Doubly Robust Causal Machine Learning Approach

Abstract:Highway traffic crashes exert a considerable impact on both transportation systems and the economy. In this context, accurate and dependable emergency responses are crucial for effective traffic management. However, the influence of crashes on traffic status varies across diverse factors and may be biased due to selection bias. Therefore, there arises a necessity to accurately estimate the heterogeneous causal effects of crashes, thereby providing essential insights to facilitate individual-level emergency decision-making. This paper proposes a novel causal machine learning framework to estimate the causal effect of different types of crashes on highway speed. The Neyman-Rubin Causal Model (RCM) is employed to formulate this problem from a causal perspective. The Conditional Shapley Value Index (CSVI) is proposed based on causal graph theory to filter adverse variables, and the Structural Causal Model (SCM) is then adopted to define the statistical estimand for causal effects. The treatment effects are estimated by Doubly Robust Learning (DRL) methods, which combine doubly robust causal inference with classification and regression machine learning models. Experimental results from 4815 crashes on Highway Interstate 5 in Washington State reveal the heterogeneous treatment effects of crashes at varying distances and durations. The rear-end crashes cause more severe congestion and longer durations than other types of crashes, and the sideswipe crashes have the longest delayed impact. Additionally, the findings show that rear-end crashes affect traffic greater at night, while crash to objects has the most significant influence during peak hours. Statistical hypothesis tests, error metrics based on matched "counterfactual outcomes", and sensitive analyses are employed for assessment, and the results validate the accuracy and effectiveness of our method.

* 38 pages, 13 figures, 8 tables

Via

Access Paper or Ask Questions

Few Clicks Suffice: Active Test-Time Adaptation for Semantic Segmentation

Dec 04, 2023

Longhui Yuan, Shuang Li, Zhuo He, Binhui Xie

Figure 1 for Few Clicks Suffice: Active Test-Time Adaptation for Semantic Segmentation

Figure 2 for Few Clicks Suffice: Active Test-Time Adaptation for Semantic Segmentation

Figure 3 for Few Clicks Suffice: Active Test-Time Adaptation for Semantic Segmentation

Figure 4 for Few Clicks Suffice: Active Test-Time Adaptation for Semantic Segmentation

Abstract:Test-time adaptation (TTA) adapts the pre-trained models during inference using unlabeled test data and has received a lot of research attention due to its potential practical value. Unfortunately, without any label supervision, existing TTA methods rely heavily on heuristic or empirical studies. Where to update the model always falls into suboptimal or brings more computational resource consumption. Meanwhile, there is still a significant performance gap between the TTA approaches and their supervised counterparts. Motivated by active learning, in this work, we propose the active test-time adaptation for semantic segmentation setup. Specifically, we introduce the human-in-the-loop pattern during the testing phase, which queries very few labels to facilitate predictions and model updates in an online manner. To do so, we propose a simple but effective ATASeg framework, which consists of two parts, i.e., model adapter and label annotator. Extensive experiments demonstrate that ATASeg bridges the performance gap between TTA methods and their supervised counterparts with only extremely few annotations, even one click for labeling surpasses known SOTA TTA methods by 2.6% average mIoU on ACDC benchmark. Empirical results imply that progress in either the model adapter or the label annotator will bring improvements to the ATASeg framework, giving it large research and reality potential.

* 15 pages, 10 figures

Via

Access Paper or Ask Questions

Language Semantic Graph Guided Data-Efficient Learning

Nov 15, 2023

Wenxuan Ma, Shuang Li, Lincan Cai, Jingxuan Kang

Figure 1 for Language Semantic Graph Guided Data-Efficient Learning

Figure 2 for Language Semantic Graph Guided Data-Efficient Learning

Figure 3 for Language Semantic Graph Guided Data-Efficient Learning

Figure 4 for Language Semantic Graph Guided Data-Efficient Learning

Abstract:Developing generalizable models that can effectively learn from limited data and with minimal reliance on human supervision is a significant objective within the machine learning community, particularly in the era of deep neural networks. Therefore, to achieve data-efficient learning, researchers typically explore approaches that can leverage more related or unlabeled data without necessitating additional manual labeling efforts, such as Semi-Supervised Learning (SSL), Transfer Learning (TL), and Data Augmentation (DA). SSL leverages unlabeled data in the training process, while TL enables the transfer of expertise from related data distributions. DA broadens the dataset by synthesizing new data from existing examples. However, the significance of additional knowledge contained within labels has been largely overlooked in research. In this paper, we propose a novel perspective on data efficiency that involves exploiting the semantic information contained in the labels of the available data. Specifically, we introduce a Language Semantic Graph (LSG) which is constructed from labels manifest as natural language descriptions. Upon this graph, an auxiliary graph neural network is trained to extract high-level semantic relations and then used to guide the training of the primary model, enabling more adequate utilization of label knowledge. Across image, video, and audio modalities, we utilize the LSG method in both TL and SSL scenarios and illustrate its versatility in significantly enhancing performance compared to other data-efficient learning approaches. Additionally, our in-depth analysis shows that the LSG method also expedites the training process.

* Accepted by NeurIPS 2023

Via

Access Paper or Ask Questions

Annotator: A Generic Active Learning Baseline for LiDAR Semantic Segmentation

Oct 31, 2023

Binhui Xie, Shuang Li, Qingju Guo, Chi Harold Liu, Xinjing Cheng

Figure 1 for Annotator: A Generic Active Learning Baseline for LiDAR Semantic Segmentation

Figure 2 for Annotator: A Generic Active Learning Baseline for LiDAR Semantic Segmentation

Figure 3 for Annotator: A Generic Active Learning Baseline for LiDAR Semantic Segmentation

Figure 4 for Annotator: A Generic Active Learning Baseline for LiDAR Semantic Segmentation

Abstract:Active learning, a label-efficient paradigm, empowers models to interactively query an oracle for labeling new data. In the realm of LiDAR semantic segmentation, the challenges stem from the sheer volume of point clouds, rendering annotation labor-intensive and cost-prohibitive. This paper presents Annotator, a general and efficient active learning baseline, in which a voxel-centric online selection strategy is tailored to efficiently probe and annotate the salient and exemplar voxel girds within each LiDAR scan, even under distribution shift. Concretely, we first execute an in-depth analysis of several common selection strategies such as Random, Entropy, Margin, and then develop voxel confusion degree (VCD) to exploit the local topology relations and structures of point clouds. Annotator excels in diverse settings, with a particular focus on active learning (AL), active source-free domain adaptation (ASFDA), and active domain adaptation (ADA). It consistently delivers exceptional performance across LiDAR semantic segmentation benchmarks, spanning both simulation-to-real and real-to-real scenarios. Surprisingly, Annotator exhibits remarkable efficiency, requiring significantly fewer annotations, e.g., just labeling five voxels per scan in the SynLiDAR-to-SemanticKITTI task. This results in impressive performance, achieving 87.8% fully-supervised performance under AL, 88.5% under ASFDA, and 94.4% under ADA. We envision that Annotator will offer a simple, general, and efficient solution for label-efficient 3D applications. Project page: https://binhuixie.github.io/annotator-web

* NeurIPS 2023. Project page at https://binhuixie.github.io/annotator-web/

Via

Access Paper or Ask Questions

Shape-centered Representation Learning for Visible-Infrared Person Re-identification

Oct 30, 2023

Shuang Li, Jiaxu Leng, Ji Gan, Mengjingcheng Mo, Xinbo Gao

Figure 1 for Shape-centered Representation Learning for Visible-Infrared Person Re-identification

Figure 2 for Shape-centered Representation Learning for Visible-Infrared Person Re-identification

Figure 3 for Shape-centered Representation Learning for Visible-Infrared Person Re-identification

Figure 4 for Shape-centered Representation Learning for Visible-Infrared Person Re-identification

Abstract:Current Visible-Infrared Person Re-Identification (VI-ReID) methods prioritize extracting distinguishing appearance features, ignoring the natural resistance of body shape against modality changes. Initially, we gauged the discriminative potential of shapes by a straightforward concatenation of shape and appearance features. However, two unresolved issues persist in the utilization of shape features. One pertains to the dependence on auxiliary models for shape feature extraction in the inference phase, along with the errors in generated infrared shapes due to the intrinsic modality disparity. The other issue involves the inadequately explored correlation between shape and appearance features. To tackle the aforementioned challenges, we propose the Shape-centered Representation Learning framework (ScRL), which focuses on learning shape features and appearance features associated with shapes. Specifically, we devise the Shape Feature Propagation (SFP), facilitating direct extraction of shape features from original images with minimal complexity costs during inference. To restitute inaccuracies in infrared body shapes at the feature level, we present the Infrared Shape Restitution (ISR). Furthermore, to acquire appearance features related to shape, we design the Appearance Feature Enhancement (AFE), which accentuates identity-related features while suppressing identity-unrelated features guided by shape features. Extensive experiments are conducted to validate the effectiveness of the proposed ScRL. Achieving remarkable results, the Rank-1 (mAP) accuracy attains 76.1%, 71.2%, 92.4% (72.6%, 52.9%, 86.7%) on the SYSU-MM01, HITSZ-VCM, RegDB datasets respectively, outperforming existing state-of-the-art methods.

Via

Access Paper or Ask Questions