Alert button
Picture for Li Sun

Li Sun

Alert button

RecursiveDet: End-to-End Region-based Recursive Object Detection

Jul 25, 2023
Jing Zhao, Li Sun, Qingli Li

Figure 1 for RecursiveDet: End-to-End Region-based Recursive Object Detection
Figure 2 for RecursiveDet: End-to-End Region-based Recursive Object Detection
Figure 3 for RecursiveDet: End-to-End Region-based Recursive Object Detection
Figure 4 for RecursiveDet: End-to-End Region-based Recursive Object Detection

End-to-end region-based object detectors like Sparse R-CNN usually have multiple cascade bounding box decoding stages, which refine the current predictions according to their previous results. Model parameters within each stage are independent, evolving a huge cost. In this paper, we find the general setting of decoding stages is actually redundant. By simply sharing parameters and making a recursive decoder, the detector already obtains a significant improvement. The recursive decoder can be further enhanced by positional encoding (PE) of the proposal box, which makes it aware of the exact locations and sizes of input bounding boxes, thus becoming adaptive to proposals from different stages during the recursion. Moreover, we also design centerness-based PE to distinguish the RoI feature element and dynamic convolution kernels at different positions within the bounding box. To validate the effectiveness of the proposed method, we conduct intensive ablations and build the full model on three recent mainstream region-based detectors. The RecusiveDet is able to achieve obvious performance boosts with even fewer model parameters and slightly increased computation cost. Codes are available at https://github.com/bravezzzzzz/RecursiveDet.

* Accepted by ICCV 2023 
Viaarxiv icon

NDT-Map-Code: A 3D global descriptor for real-time loop closure detection in lidar SLAM

Jul 17, 2023
Lizhou Liao, Li Sun, Xinhui Bai, Zhenxing You, Hongyuan Yuan, Chunyun Fu

Figure 1 for NDT-Map-Code: A 3D global descriptor for real-time loop closure detection in lidar SLAM
Figure 2 for NDT-Map-Code: A 3D global descriptor for real-time loop closure detection in lidar SLAM
Figure 3 for NDT-Map-Code: A 3D global descriptor for real-time loop closure detection in lidar SLAM
Figure 4 for NDT-Map-Code: A 3D global descriptor for real-time loop closure detection in lidar SLAM

Loop-closure detection, also known as place recognition, aiming to identify previously visited locations, is an essential component of a SLAM system. Existing research on lidar-based loop closure heavily relies on dense point cloud and 360 FOV lidars. This paper proposes an out-of-the-box NDT (Normal Distribution Transform) based global descriptor, NDT-Map-Code, designed for both on-road driving and underground valet parking scenarios. NDT-Map-Code can be directly extracted from the NDT map without the need for a dense point cloud, resulting in excellent scalability and low maintenance cost. The NDT representation is leveraged to identify representative patterns, which are further encoded according to their spatial location (bearing, range, and height). Experimental results on the NIO underground parking lot dataset and the KITTI dataset demonstrate that our method achieves significantly better performance compared to the state-of-the-art.

* 8 pages, 9 figures, 2 tables 
Viaarxiv icon

From Characters to Words: Hierarchical Pre-trained Language Model for Open-vocabulary Language Understanding

May 23, 2023
Li Sun, Florian Luisier, Kayhan Batmanghelich, Dinei Florencio, Cha Zhang

Figure 1 for From Characters to Words: Hierarchical Pre-trained Language Model for Open-vocabulary Language Understanding
Figure 2 for From Characters to Words: Hierarchical Pre-trained Language Model for Open-vocabulary Language Understanding
Figure 3 for From Characters to Words: Hierarchical Pre-trained Language Model for Open-vocabulary Language Understanding
Figure 4 for From Characters to Words: Hierarchical Pre-trained Language Model for Open-vocabulary Language Understanding

Current state-of-the-art models for natural language understanding require a preprocessing step to convert raw text into discrete tokens. This process known as tokenization relies on a pre-built vocabulary of words or sub-word morphemes. This fixed vocabulary limits the model's robustness to spelling errors and its capacity to adapt to new domains. In this work, we introduce a novel open-vocabulary language model that adopts a hierarchical two-level approach: one at the word level and another at the sequence level. Concretely, we design an intra-word module that uses a shallow Transformer architecture to learn word representations from their characters, and a deep inter-word Transformer module that contextualizes each word representation by attending to the entire word sequence. Our model thus directly operates on character sequences with explicit awareness of word boundaries, but without biased sub-word or word-level vocabulary. Experiments on various downstream tasks show that our method outperforms strong baselines. We also demonstrate that our hierarchical model is robust to textual corruption and domain shift.

* Accepted to ACL 2023 Main Conference 
Viaarxiv icon

SINCERE: Sequential Interaction Networks representation learning on Co-Evolving RiEmannian manifolds

May 06, 2023
Junda Ye, Zhongbao Zhang, Li Sun, Yang Yan, Feiyang Wang, Fuxin Ren

Figure 1 for SINCERE: Sequential Interaction Networks representation learning on Co-Evolving RiEmannian manifolds
Figure 2 for SINCERE: Sequential Interaction Networks representation learning on Co-Evolving RiEmannian manifolds
Figure 3 for SINCERE: Sequential Interaction Networks representation learning on Co-Evolving RiEmannian manifolds
Figure 4 for SINCERE: Sequential Interaction Networks representation learning on Co-Evolving RiEmannian manifolds

Sequential interaction networks (SIN) have been commonly adopted in many applications such as recommendation systems, search engines and social networks to describe the mutual influence between users and items/products. Efforts on representing SIN are mainly focused on capturing the dynamics of networks in Euclidean space, and recently plenty of work has extended to hyperbolic geometry for implicit hierarchical learning. Previous approaches which learn the embedding trajectories of users and items achieve promising results. However, there are still a range of fundamental issues remaining open. For example, is it appropriate to place user and item nodes in one identical space regardless of their inherent discrepancy? Instead of residing in a single fixed curvature space, how will the representation spaces evolve when new interaction occurs? To explore these issues for sequential interaction networks, we propose SINCERE, a novel method representing Sequential Interaction Networks on Co-Evolving RiEmannian manifolds. SIN- CERE not only takes the user and item embedding trajectories in respective spaces into account, but also emphasizes on the space evolvement that how curvature changes over time. Specifically, we introduce a fresh cross-geometry aggregation which allows us to propagate information across different Riemannian manifolds without breaking conformal invariance, and a curvature estimator which is delicately designed to predict global curvatures effectively according to current local Ricci curvatures. Extensive experiments on several real-world datasets demonstrate the promising performance of SINCERE over the state-of-the-art sequential interaction prediction methods.

* Accepted by ACM The Web Conference 2023 (WWW) 
Viaarxiv icon

Contrastive Graph Clustering in Curvature Spaces

May 05, 2023
Li Sun, Feiyang Wang, Junda Ye, Hao Peng, Philip S. Yu

Figure 1 for Contrastive Graph Clustering in Curvature Spaces
Figure 2 for Contrastive Graph Clustering in Curvature Spaces
Figure 3 for Contrastive Graph Clustering in Curvature Spaces
Figure 4 for Contrastive Graph Clustering in Curvature Spaces

Graph clustering is a longstanding research topic, and has achieved remarkable success with the deep learning methods in recent years. Nevertheless, we observe that several important issues largely remain open. On the one hand, graph clustering from the geometric perspective is appealing but has rarely been touched before, as it lacks a promising space for geometric clustering. On the other hand, contrastive learning boosts the deep graph clustering but usually struggles in either graph augmentation or hard sample mining. To bridge this gap, we rethink the problem of graph clustering from geometric perspective and, to the best of our knowledge, make the first attempt to introduce a heterogeneous curvature space to graph clustering problem. Correspondingly, we present a novel end-to-end contrastive graph clustering model named CONGREGATE, addressing geometric graph clustering with Ricci curvatures. To support geometric clustering, we construct a theoretically grounded Heterogeneous Curvature Space where deep representations are generated via the product of the proposed fully Riemannian graph convolutional nets. Thereafter, we train the graph clusters by an augmentation-free reweighted contrastive approach where we pay more attention to both hard negatives and hard positives in our curvature space. Empirical results on real-world graphs show that our model outperforms the state-of-the-art competitors.

* Accepted by IJCAI'23 
Viaarxiv icon

DrasCLR: A Self-supervised Framework of Learning Disease-related and Anatomy-specific Representation for 3D Medical Images

Mar 15, 2023
Ke Yu, Li Sun, Junxiang Chen, Max Reynolds, Tigmanshu Chaudhary, Kayhan Batmanghelich

Figure 1 for DrasCLR: A Self-supervised Framework of Learning Disease-related and Anatomy-specific Representation for 3D Medical Images
Figure 2 for DrasCLR: A Self-supervised Framework of Learning Disease-related and Anatomy-specific Representation for 3D Medical Images
Figure 3 for DrasCLR: A Self-supervised Framework of Learning Disease-related and Anatomy-specific Representation for 3D Medical Images
Figure 4 for DrasCLR: A Self-supervised Framework of Learning Disease-related and Anatomy-specific Representation for 3D Medical Images

Large-scale volumetric medical images with annotation are rare, costly, and time prohibitive to acquire. Self-supervised learning (SSL) offers a promising pre-training and feature extraction solution for many downstream tasks, as it only uses unlabeled data. Recently, SSL methods based on instance discrimination have gained popularity in the medical imaging domain. However, SSL pre-trained encoders may use many clues in the image to discriminate an instance that are not necessarily disease-related. Moreover, pathological patterns are often subtle and heterogeneous, requiring the ability of the desired method to represent anatomy-specific features that are sensitive to abnormal changes in different body parts. In this work, we present a novel SSL framework, named DrasCLR, for 3D medical imaging to overcome these challenges. We propose two domain-specific contrastive learning strategies: one aims to capture subtle disease patterns inside a local anatomical region, and the other aims to represent severe disease patterns that span larger regions. We formulate the encoder using conditional hyper-parameterized network, in which the parameters are dependant on the anatomical location, to extract anatomically sensitive features. Extensive experiments on large-scale computer tomography (CT) datasets of lung images show that our method improves the performance of many downstream prediction and segmentation tasks. The patient-level representation improves the performance of the patient survival prediction task. We show how our method can detect emphysema subtypes via dense prediction. We demonstrate that fine-tuning the pre-trained model can significantly reduce annotation efforts without sacrificing emphysema detection accuracy. Our ablation study highlights the importance of incorporating anatomical context into the SSL framework.

* Added some recent references 
Viaarxiv icon

MVKT-ECG: Efficient Single-lead ECG Classification on Multi-Label Arrhythmia by Multi-View Knowledge Transferring

Jan 28, 2023
Yuzhen Qin, Li Sun, Hui Chen, Wei-qiang Zhang, Wenming Yang, Jintao Fei, Guijin Wang

Figure 1 for MVKT-ECG: Efficient Single-lead ECG Classification on Multi-Label Arrhythmia by Multi-View Knowledge Transferring
Figure 2 for MVKT-ECG: Efficient Single-lead ECG Classification on Multi-Label Arrhythmia by Multi-View Knowledge Transferring
Figure 3 for MVKT-ECG: Efficient Single-lead ECG Classification on Multi-Label Arrhythmia by Multi-View Knowledge Transferring
Figure 4 for MVKT-ECG: Efficient Single-lead ECG Classification on Multi-Label Arrhythmia by Multi-View Knowledge Transferring

The widespread emergence of smart devices for ECG has sparked demand for intelligent single-lead ECG-based diagnostic systems. However, it is challenging to develop a single-lead-based ECG interpretation model for multiple diseases diagnosis due to the lack of some key disease information. In this work, we propose inter-lead Multi-View Knowledge Transferring of ECG (MVKT-ECG) to boost single-lead ECG's ability for multi-label disease diagnosis. This training strategy can transfer superior disease knowledge from multiple different views of ECG (e.g. 12-lead ECG) to single-lead-based ECG interpretation model to mine details in single-lead ECG signals that are easily overlooked by neural networks. MVKT-ECG allows this lead variety as a supervision signal within a teacher-student paradigm, where the teacher observes multi-lead ECG educates a student who observes only single-lead ECG. Since the mutual disease information between the single-lead ECG and muli-lead ECG plays a key role in knowledge transferring, we present a new disease-aware Contrastive Lead-information Transferring(CLT) to improve the mutual disease information between the single-lead ECG and muli-lead ECG. Moreover, We modify traditional Knowledge Distillation to multi-label disease Knowledge Distillation (MKD) to make it applicable for multi-label disease diagnosis. The comprehensive experiments verify that MVKT-ECG has an excellent performance in improving the diagnostic effect of single-lead ECG.

Viaarxiv icon

Measuring tail risk at high-frequency: An $L_1$-regularized extreme value regression approach with unit-root predictors

Jan 03, 2023
Julien Hambuckers, Li Sun, Luca Trapin

Figure 1 for Measuring tail risk at high-frequency: An $L_1$-regularized extreme value regression approach with unit-root predictors
Figure 2 for Measuring tail risk at high-frequency: An $L_1$-regularized extreme value regression approach with unit-root predictors
Figure 3 for Measuring tail risk at high-frequency: An $L_1$-regularized extreme value regression approach with unit-root predictors
Figure 4 for Measuring tail risk at high-frequency: An $L_1$-regularized extreme value regression approach with unit-root predictors

We study tail risk dynamics in high-frequency financial markets and their connection with trading activity and market uncertainty. We introduce a dynamic extreme value regression model accommodating both stationary and local unit-root predictors to appropriately capture the time-varying behaviour of the distribution of high-frequency extreme losses. To characterize trading activity and market uncertainty, we consider several volatility and liquidity predictors, and propose a two-step adaptive $L_1$-regularized maximum likelihood estimator to select the most appropriate ones. We establish the oracle property of the proposed estimator for selecting both stationary and local unit-root predictors, and show its good finite sample properties in an extensive simulation study. Studying the high-frequency extreme losses of nine large liquid U.S. stocks using 42 liquidity and volatility predictors, we find the severity of extreme losses to be well predicted by low levels of price impact in period of high volatility of liquidity and volatility.

Viaarxiv icon

Towards Long-term Autonomy: A Perspective from Robot Learning

Jan 02, 2023
Zhi Yan, Li Sun, Tomas Krajnik, Tom Duckett, Nicola Bellotto

Figure 1 for Towards Long-term Autonomy: A Perspective from Robot Learning
Figure 2 for Towards Long-term Autonomy: A Perspective from Robot Learning
Figure 3 for Towards Long-term Autonomy: A Perspective from Robot Learning
Figure 4 for Towards Long-term Autonomy: A Perspective from Robot Learning

In the future, service robots are expected to be able to operate autonomously for long periods of time without human intervention. Many work striving for this goal have been emerging with the development of robotics, both hardware and software. Today we believe that an important underpinning of long-term robot autonomy is the ability of robots to learn on site and on-the-fly, especially when they are deployed in changing environments or need to traverse different environments. In this paper, we examine the problem of long-term autonomy from the perspective of robot learning, especially in an online way, and discuss in tandem its premise "data" and the subsequent "deployment".

* Accepted by AAAI-23 Bridge Program on AI & Robotics 
Viaarxiv icon