Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jie Liu

A Monkey Swing Counting Algorithm Based on Object Detection

Mar 12, 2023
Hao Chen, Zhe-Ming Lu, Jie Liu

Figure 1 for A Monkey Swing Counting Algorithm Based on Object Detection

Figure 2 for A Monkey Swing Counting Algorithm Based on Object Detection

Figure 3 for A Monkey Swing Counting Algorithm Based on Object Detection

Figure 4 for A Monkey Swing Counting Algorithm Based on Object Detection

This paper focuses on proposing a deep learning-based monkey swing counting algorithm. Nowadays, there are very few papers on monkey detection, and even fewer papers on monkey swing counting. This research focuses on this gap and attempts to count the number of monkeys swinging their heads by deep learning. This paper further extends the traditional target detection algorithm. By analyzing the results of object detection, we localize the monkey's actions over a period of time. This paper analyzes the task of counting monkey head swings, and proposes the standard that accurately describes a monkey swinging its head. Under the guidance of this standard, the head-swing count in 50 monkey movement videos in this paper has achieved 94%.

Via

Access Paper or Ask Questions

ESCL: Equivariant Self-Contrastive Learning for Sentence Representations

Mar 09, 2023
Jie Liu, Yixuan Liu, Xue Han, Chao Deng, Junlan Feng

Figure 1 for ESCL: Equivariant Self-Contrastive Learning for Sentence Representations

Figure 2 for ESCL: Equivariant Self-Contrastive Learning for Sentence Representations

Figure 3 for ESCL: Equivariant Self-Contrastive Learning for Sentence Representations

Previous contrastive learning methods for sentence representations often focus on insensitive transformations to produce positive pairs, but neglect the role of sensitive transformations that are harmful to semantic representations. Therefore, we propose an Equivariant Self-Contrastive Learning (ESCL) method to make full use of sensitive transformations, which encourages the learned representations to be sensitive to certain types of transformations with an additional equivariant learning task. Meanwhile, in order to improve practicability and generality, ESCL simplifies the implementations of traditional equivariant contrastive methods to share model parameters from the perspective of multi-task learning. We evaluate our ESCL on semantic textual similarity tasks. The proposed method achieves better results while using fewer learning parameters compared to previous methods.

* accepted by ICASSP 2023

Via

Access Paper or Ask Questions

An Empirical Study of Uniform-Architecture Knowledge Distillation in Document Ranking

Feb 08, 2023
Xubo Qin, Xiyuan Liu, Xiongfeng Zheng, Jie Liu, Yutao Zhu

Figure 1 for An Empirical Study of Uniform-Architecture Knowledge Distillation in Document Ranking

Figure 2 for An Empirical Study of Uniform-Architecture Knowledge Distillation in Document Ranking

Figure 3 for An Empirical Study of Uniform-Architecture Knowledge Distillation in Document Ranking

Figure 4 for An Empirical Study of Uniform-Architecture Knowledge Distillation in Document Ranking

Although BERT-based ranking models have been commonly used in commercial search engines, they are usually time-consuming for online ranking tasks. Knowledge distillation, which aims at learning a smaller model with comparable performance to a larger model, is a common strategy for reducing the online inference latency. In this paper, we investigate the effect of different loss functions for uniform-architecture distillation of BERT-based ranking models. Here "uniform-architecture" denotes that both teacher and student models are in cross-encoder architecture, while the student models include small-scaled pre-trained language models. Our experimental results reveal that the optimal distillation configuration for ranking tasks is much different than general natural language processing tasks. Specifically, when the student models are in cross-encoder architecture, a pairwise loss of hard labels is critical for training student models, whereas the distillation objectives of intermediate Transformer layers may hurt performance. These findings emphasize the necessity of carefully designing a distillation strategy (for cross-encoder student models) tailored for document ranking with pairwise training samples.

Via

Access Paper or Ask Questions

Few-shot Semantic Segmentation with Support-induced Graph Convolutional Network

Jan 09, 2023
Jie Liu, Yanqi Bao, Wenzhe Ying, Haochen Wang, Yang Gao, Jan-Jakob Sonke, Efstratios Gavves

Figure 1 for Few-shot Semantic Segmentation with Support-induced Graph Convolutional Network

Figure 2 for Few-shot Semantic Segmentation with Support-induced Graph Convolutional Network

Figure 3 for Few-shot Semantic Segmentation with Support-induced Graph Convolutional Network

Figure 4 for Few-shot Semantic Segmentation with Support-induced Graph Convolutional Network

Few-shot semantic segmentation (FSS) aims to achieve novel objects segmentation with only a few annotated samples and has made great progress recently. Most of the existing FSS models focus on the feature matching between support and query to tackle FSS. However, the appearance variations between objects from the same category could be extremely large, leading to unreliable feature matching and query mask prediction. To this end, we propose a Support-induced Graph Convolutional Network (SiGCN) to explicitly excavate latent context structure in query images. Specifically, we propose a Support-induced Graph Reasoning (SiGR) module to capture salient query object parts at different semantic levels with a Support-induced GCN. Furthermore, an instance association (IA) module is designed to capture high-order instance context from both support and query instances. By integrating the proposed two modules, SiGCN can learn rich query context representation, and thus being more robust to appearance variations. Extensive experiments on PASCAL-5i and COCO-20i demonstrate that our SiGCN achieves state-of-the-art performance.

* Accepted in BMVC2022 as oral presentation

Via

Access Paper or Ask Questions

CLIP-Driven Universal Model for Organ Segmentation and Tumor Detection

Jan 06, 2023
Jie Liu, Yixiao Zhang, Jie-Neng Chen, Junfei Xiao, Yongyi Lu, Bennett A. Landman, Yixuan Yuan, Alan Yuille, Yucheng Tang, Zongwei Zhou

Figure 1 for CLIP-Driven Universal Model for Organ Segmentation and Tumor Detection

Figure 2 for CLIP-Driven Universal Model for Organ Segmentation and Tumor Detection

Figure 3 for CLIP-Driven Universal Model for Organ Segmentation and Tumor Detection

Figure 4 for CLIP-Driven Universal Model for Organ Segmentation and Tumor Detection

An increasing number of public datasets have shown a marked clinical impact on assessing anatomical structures. However, each of the datasets is small, partially labeled, and rarely investigates severe tumor subjects. Moreover, current models are limited to segmenting specific organs/tumors, which can not be extended to novel domains and classes. To tackle these limitations, we introduce embedding learned from Contrastive Language-Image Pre-training (CLIP) to segmentation models, dubbed the CLIP-Driven Universal Model. The Universal Model can better segment 25 organs and 6 types of tumors by exploiting the semantic relationship between abdominal structures. The model is developed from an assembly of 14 datasets with 3,410 CT scans and evaluated on 6,162 external CT scans from 3 datasets. We achieve the state-of-the-art results on Beyond The Cranial Vault (BTCV). Compared with dataset-specific models, the Universal Model is computationally more efficient (6x faster), generalizes better to CT scans from varying sites, and shows stronger transfer learning performance on novel tasks. The design of CLIP embedding enables the Universal Model to be easily extended to new classes without catastrophically forgetting the previously learned classes.

Via

Access Paper or Ask Questions

CCFL: Computationally Customized Federated Learning

Dec 28, 2022
Hao Zhang, Tingting Wu, Siyao Cheng, Jie Liu

Figure 1 for CCFL: Computationally Customized Federated Learning

Figure 2 for CCFL: Computationally Customized Federated Learning

Figure 3 for CCFL: Computationally Customized Federated Learning

Figure 4 for CCFL: Computationally Customized Federated Learning

Federated learning (FL) is a method to train model with distributed data from numerous participants such as IoT devices. It inherently assumes a uniform capacity among participants. However, participants have diverse computational resources in practice due to different conditions such as different energy budgets or executing parallel unrelated tasks. It is necessary to reduce the computation overhead for participants with inefficient computational resources, otherwise they would be unable to finish the full training process. To address the computation heterogeneity, in this paper we propose a strategy for estimating local models without computationally intensive iterations. Based on it, we propose Computationally Customized Federated Learning (CCFL), which allows each participant to determine whether to perform conventional local training or model estimation in each round based on its current computational resources. Both theoretical analysis and exhaustive experiments indicate that CCFL has the same convergence rate as FedAvg without resource constraints. Furthermore, CCFL can be viewed of a computation-efficient extension of FedAvg that retains model performance while considerably reducing computation overhead.

Via

Access Paper or Ask Questions

ACE: Cooperative Multi-agent Q-learning with Bidirectional Action-Dependency

Dec 02, 2022
Chuming Li, Jie Liu, Yinmin Zhang, Yuhong Wei, Yazhe Niu, Yaodong Yang, Yu Liu, Wanli Ouyang

Figure 1 for ACE: Cooperative Multi-agent Q-learning with Bidirectional Action-Dependency

Figure 2 for ACE: Cooperative Multi-agent Q-learning with Bidirectional Action-Dependency

Figure 3 for ACE: Cooperative Multi-agent Q-learning with Bidirectional Action-Dependency

Figure 4 for ACE: Cooperative Multi-agent Q-learning with Bidirectional Action-Dependency

Multi-agent reinforcement learning (MARL) suffers from the non-stationarity problem, which is the ever-changing targets at every iteration when multiple agents update their policies at the same time. Starting from first principle, in this paper, we manage to solve the non-stationarity problem by proposing bidirectional action-dependent Q-learning (ACE). Central to the development of ACE is the sequential decision-making process wherein only one agent is allowed to take action at one time. Within this process, each agent maximizes its value function given the actions taken by the preceding agents at the inference stage. In the learning phase, each agent minimizes the TD error that is dependent on how the subsequent agents have reacted to their chosen action. Given the design of bidirectional dependency, ACE effectively turns a multiagent MDP into a single-agent MDP. We implement the ACE framework by identifying the proper network representation to formulate the action dependency, so that the sequential decision process is computed implicitly in one forward pass. To validate ACE, we compare it with strong baselines on two MARL benchmarks. Empirical experiments demonstrate that ACE outperforms the state-of-the-art algorithms on Google Research Football and StarCraft Multi-Agent Challenge by a large margin. In particular, on SMAC tasks, ACE achieves 100% success rate on almost all the hard and super-hard maps. We further study extensive research problems regarding ACE, including extension, generalization, and practicability. Code is made available to facilitate further research.

* Accepted by the Thirty-Seventh AAAI Conference on Artificial Intelligence(AAAI2023)

Via

Access Paper or Ask Questions

MSV Challenge 2022: NPU-HC Speaker Verification System for Low-resource Indian Languages

Dec 02, 2022
Yue Li, Li Zhang, Namin Wang, Jie Liu, Lei Xie

Figure 1 for MSV Challenge 2022: NPU-HC Speaker Verification System for Low-resource Indian Languages

Figure 2 for MSV Challenge 2022: NPU-HC Speaker Verification System for Low-resource Indian Languages

Figure 3 for MSV Challenge 2022: NPU-HC Speaker Verification System for Low-resource Indian Languages

Figure 4 for MSV Challenge 2022: NPU-HC Speaker Verification System for Low-resource Indian Languages

This report describes the NPU-HC speaker verification system submitted to the O-COCOSDA Multi-lingual Speaker Verification (MSV) Challenge 2022, which focuses on developing speaker verification systems for low-resource Asian languages. We participate in the I-MSV track, which aims to develop speaker verification systems for various Indian languages. In this challenge, we first explore different neural network frameworks for low-resource speaker verification. Then we leverage vanilla fine-tuning and weight transfer fine-tuning to transfer the out-domain pre-trained models to the in-domain Indian dataset. Specifically, the weight transfer fine-tuning aims to constrain the distance of the weights between the pre-trained model and the fine-tuned model, which takes advantage of the previously acquired discriminative ability from the large-scale out-domain datasets and avoids catastrophic forgetting and overfitting at the same time. Finally, score fusion is adopted to further improve performance. Together with the above contributions, we obtain 0.223% EER on the public evaluation set, ranking 2nd place on the leaderboard. On the private evaluation set, the EER of our submitted system is 2.123% and 0.630% for the constrained and unconstrained sub-tasks of the I-MSV track, leading to the 1st and 3rd place in the ranking, respectively.

* 6pages, submitted to the 9th International Workshop on Vietnamese Language and Speech Processing

Via

Access Paper or Ask Questions

From Coarse to Fine: Hierarchical Pixel Integration for Lightweight Image Super-Resolution

Nov 30, 2022
Jie Liu, Chao Chen, Jie Tang, Gangshan Wu

Figure 1 for From Coarse to Fine: Hierarchical Pixel Integration for Lightweight Image Super-Resolution

Figure 2 for From Coarse to Fine: Hierarchical Pixel Integration for Lightweight Image Super-Resolution

Figure 3 for From Coarse to Fine: Hierarchical Pixel Integration for Lightweight Image Super-Resolution

Figure 4 for From Coarse to Fine: Hierarchical Pixel Integration for Lightweight Image Super-Resolution

Image super-resolution (SR) serves as a fundamental tool for the processing and transmission of multimedia data. Recently, Transformer-based models have achieved competitive performances in image SR. They divide images into fixed-size patches and apply self-attention on these patches to model long-range dependencies among pixels. However, this architecture design is originated for high-level vision tasks, which lacks design guideline from SR knowledge. In this paper, we aim to design a new attention block whose insights are from the interpretation of Local Attribution Map (LAM) for SR networks. Specifically, LAM presents a hierarchical importance map where the most important pixels are located in a fine area of a patch and some less important pixels are spread in a coarse area of the whole image. To access pixels in the coarse area, instead of using a very large patch size, we propose a lightweight Global Pixel Access (GPA) module that applies cross-attention with the most similar patch in an image. In the fine area, we use an Intra-Patch Self-Attention (IPSA) module to model long-range pixel dependencies in a local patch, and then a $3\times3$ convolution is applied to process the finest details. In addition, a Cascaded Patch Division (CPD) strategy is proposed to enhance perceptual quality of recovered images. Extensive experiments suggest that our method outperforms state-of-the-art lightweight SR methods by a large margin. Code is available at https://github.com/passerer/HPINet.

* SOTA lightweight image super-resolution. To be appear at AAAI 2023

Via

Access Paper or Ask Questions

TFormer: A throughout fusion transformer for multi-modal skin lesion diagnosis

Nov 21, 2022
Yilan Zhang, Fengying Xie, Jianqi Chen, Jie Liu

Figure 1 for TFormer: A throughout fusion transformer for multi-modal skin lesion diagnosis

Figure 2 for TFormer: A throughout fusion transformer for multi-modal skin lesion diagnosis

Figure 3 for TFormer: A throughout fusion transformer for multi-modal skin lesion diagnosis

Figure 4 for TFormer: A throughout fusion transformer for multi-modal skin lesion diagnosis

Multi-modal skin lesion diagnosis (MSLD) has achieved remarkable success by modern computer-aided diagnosis technology based on deep convolutions. However, the information aggregation across modalities in MSLD remains challenging due to severity unaligned spatial resolution (dermoscopic image and clinical image) and heterogeneous data (dermoscopic image and patients' meta-data). Limited by the intrinsic local attention, most recent MSLD pipelines using pure convolutions struggle to capture representative features in shallow layers, thus the fusion across different modalities is usually done at the end of the pipelines, even at the last layer, leading to an insufficient information aggregation. To tackle the issue, we introduce a pure transformer-based method, which we refer to as ``Throughout Fusion Transformer (TFormer)", for sufficient information intergration in MSLD. Different from the existing approaches with convolutions, the proposed network leverages transformer as feature extraction backbone, bringing more representative shallow features. We then carefully design a stack of dual-branch hierarchical multi-modal transformer (HMT) blocks to fuse information across different image modalities in a stage-by-stage way. With the aggregated information of image modalities, a multi-modal transformer post-fusion (MTP) block is designed to integrate features across image and non-image data. Such a strategy that information of the image modalities is firstly fused then the heterogeneous ones enables us to better divide and conquer the two major challenges while ensuring inter-modality dynamics are effectively modeled. Experiments conducted on the public Derm7pt dataset validate the superiority of the proposed method. Our TFormer outperforms other state-of-the-art methods. Ablation experiments also suggest the effectiveness of our designs.

Via

Access Paper or Ask Questions