Alert button
Picture for Hang Yu

Hang Yu

Alert button

How to Fine-tune the Model: Unified Model Shift and Model Bias Policy Optimization

Sep 22, 2023
Hai Zhang, Hang Yu, Junqiao Zhao, Di Zhang, ChangHuang, Hongtu Zhou, Xiao Zhang, Chen Ye

Designing and deriving effective model-based reinforcement learning (MBRL) algorithms with a performance improvement guarantee is challenging, mainly attributed to the high coupling between model learning and policy optimization. Many prior methods that rely on return discrepancy to guide model learning ignore the impacts of model shift, which can lead to performance deterioration due to excessive model updates. Other methods use performance difference bound to explicitly consider model shift. However, these methods rely on a fixed threshold to constrain model shift, resulting in a heavy dependence on the threshold and a lack of adaptability during the training process. In this paper, we theoretically derive an optimization objective that can unify model shift and model bias and then formulate a fine-tuning process. This process adaptively adjusts the model updates to get a performance improvement guarantee while avoiding model overfitting. Based on these, we develop a straightforward algorithm USB-PO (Unified model Shift and model Bias Policy Optimization). Empirical results show that USB-PO achieves state-of-the-art performance on several challenging benchmark tasks.

Viaarxiv icon

NeuSort: An Automatic Adaptive Spike Sorting Approach with Neuromorphic Models

Apr 20, 2023
Hang Yu, Yu Qi, Gang Pan

Figure 1 for NeuSort: An Automatic Adaptive Spike Sorting Approach with Neuromorphic Models
Figure 2 for NeuSort: An Automatic Adaptive Spike Sorting Approach with Neuromorphic Models
Figure 3 for NeuSort: An Automatic Adaptive Spike Sorting Approach with Neuromorphic Models
Figure 4 for NeuSort: An Automatic Adaptive Spike Sorting Approach with Neuromorphic Models

Spike sorting, which classifies spiking events of different neurons from single electrode recordings, is an essential and widely used step in neural data processing and analysis. The recent development of brain-machine interfaces enables online control of external devices and closed-loop neuroprosthetics using single-unit activity, making online spike sorting desired. Most existing spike sorters work in an offline manner, i.e., sorting after data collection. However, offline spike sorters usually suffer from performance degradation in online tasks due to the instability of neural signals. In an online process, neuronal properties can change over time (such as waveform deformations), and new neurons can appear. Therefore, a static spike sorter requires periodic recalibration to maintain its performance. This study proposes a novel online spike sorter based on neuromorphic models (NeuSort), which can adaptively adjust itself to cope with changes in neural signals. NeuSort can robustly track individual neurons' activities against waveform deformations and automatically recognize new coming neurons in real-time. The adaptation ability of NeuSort is achieved by online parameter updates of the neuromorphic model, according to the plasticity learning rule inspired by biological neural systems. Experimental results on both synthetic and neural signal datasets demonstrate that NeuSort can classify spiking events automatically and cope with non-stationary situations in neural signals. NeuSort also provides ultra-low energy cost computation with neuromorphic chips.

Viaarxiv icon

BALANCE: Bayesian Linear Attribution for Root Cause Localization

Jan 31, 2023
Chaoyu Chen, Hang Yu, Zhichao Lei, Jianguo Li, Shaokang Ren, Tingkai Zhang, Silin Hu, Jianchao Wang, Wenhui Shi

Figure 1 for BALANCE: Bayesian Linear Attribution for Root Cause Localization
Figure 2 for BALANCE: Bayesian Linear Attribution for Root Cause Localization
Figure 3 for BALANCE: Bayesian Linear Attribution for Root Cause Localization
Figure 4 for BALANCE: Bayesian Linear Attribution for Root Cause Localization

Root Cause Analysis (RCA) plays an indispensable role in distributed data system maintenance and operations, as it bridges the gap between fault detection and system recovery. Existing works mainly study multidimensional localization or graph-based root cause localization. This paper opens up the possibilities of exploiting the recently developed framework of explainable AI (XAI) for the purpose of RCA. In particular, we propose BALANCE (BAyesian Linear AttributioN for root CausE localization), which formulates the problem of RCA through the lens of attribution in XAI and seeks to explain the anomalies in the target KPIs by the behavior of the candidate root causes. BALANCE consists of three innovative components. First, we propose a Bayesian multicollinear feature selection (BMFS) model to predict the target KPIs given the candidate root causes in a forward manner while promoting sparsity and concurrently paying attention to the correlation between the candidate root causes. Second, we introduce attribution analysis to compute the attribution score for each candidate in a backward manner. Third, we merge the estimated root causes related to each KPI if there are multiple KPIs. We extensively evaluate the proposed BALANCE method on one synthesis dataset as well as three real-world RCA tasks, that is, bad SQL localization, container fault localization, and fault type diagnosis for Exathlon. Results show that BALANCE outperforms the state-of-the-art (SOTA) methods in terms of accuracy with the least amount of running time, and achieves at least $6\%$ notably higher accuracy than SOTA methods for real tasks. BALANCE has been deployed to production to tackle real-world RCA problems, and the online results further advocate its usage for real-time diagnosis in distributed data systems.

* Accepted by SIGMOD 2023; 15 pages 
Viaarxiv icon

6-DoF Robotic Grasping with Transformer

Jan 29, 2023
Zhenjie Zhao, Hang Yu, Hang Wu, Xuebo Zhang

Figure 1 for 6-DoF Robotic Grasping with Transformer
Figure 2 for 6-DoF Robotic Grasping with Transformer
Figure 3 for 6-DoF Robotic Grasping with Transformer
Figure 4 for 6-DoF Robotic Grasping with Transformer

Robotic grasping aims to detect graspable points and their corresponding gripper configurations in a particular scene, and is fundamental for robot manipulation. Existing research works have demonstrated the potential of using a transformer model for robotic grasping, which can efficiently learn both global and local features. However, such methods are still limited in grasp detection on a 2D plane. In this paper, we extend a transformer model for 6-Degree-of-Freedom (6-DoF) robotic grasping, which makes it more flexible and suitable for tasks that concern safety. The key designs of our method are a serialization module that turns a 3D voxelized space into a sequence of feature tokens that a transformer model can consume and skip-connections that merge multiscale features effectively. In particular, our method takes a Truncated Signed Distance Function (TSDF) as input. After serializing the TSDF, a transformer model is utilized to encode the sequence, which can obtain a set of aggregated hidden feature vectors through multi-head attention. We then decode the hidden features to obtain per-voxel feature vectors through deconvolution and skip-connections. Voxel feature vectors are then used to regress parameters for executing grasping actions. On a recently proposed pile and packed grasping dataset, we showcase that our transformer-based method can surpass existing methods by about 5% in terms of success rates and declutter rates. We further evaluate the running time and generalization ability to demonstrate the superiority of the proposed method.

Viaarxiv icon

AvoidBench: A high-fidelity vision-based obstacle avoidance benchmarking suite for multi-rotors

Jan 18, 2023
Hang Yu, Guido C. H. E de Croon, Christophe De Wagter

Figure 1 for AvoidBench: A high-fidelity vision-based obstacle avoidance benchmarking suite for multi-rotors
Figure 2 for AvoidBench: A high-fidelity vision-based obstacle avoidance benchmarking suite for multi-rotors
Figure 3 for AvoidBench: A high-fidelity vision-based obstacle avoidance benchmarking suite for multi-rotors
Figure 4 for AvoidBench: A high-fidelity vision-based obstacle avoidance benchmarking suite for multi-rotors

Obstacle avoidance is an essential topic in the field of autonomous drone research. When choosing an avoidance algorithm, many different options are available, each with their advantages and disadvantages. As there is currently no consensus on testing methods, it is quite challenging to compare the performance between algorithms. In this paper, we propose AvoidBench, a benchmarking suite which can evaluate the performance of vision-based obstacle avoidance algorithms by subjecting them to a series of tasks. Thanks to the high fidelity of multi-rotors dynamics from RotorS and virtual scenes of Unity3D, AvoidBench can realize realistic simulated flight experiments. Compared to current drone simulators, we propose and implement both performance and environment metrics to reveal the suitability of obstacle avoidance algorithms for environments of different complexity. To illustrate AvoidBench's usage, we compare three algorithms: Ego-planner, MBPlanner, and Agile-autonomy. The trends observed are validated with real-world obstacle avoidance experiments.

Viaarxiv icon

Model-Agnostic Multitask Fine-tuning for Few-shot Vision-Language Transfer Learning

Mar 09, 2022
Zhenhailong Wang, Hang Yu, Manling Li, Han Zhao, Heng Ji

Figure 1 for Model-Agnostic Multitask Fine-tuning for Few-shot Vision-Language Transfer Learning
Figure 2 for Model-Agnostic Multitask Fine-tuning for Few-shot Vision-Language Transfer Learning
Figure 3 for Model-Agnostic Multitask Fine-tuning for Few-shot Vision-Language Transfer Learning
Figure 4 for Model-Agnostic Multitask Fine-tuning for Few-shot Vision-Language Transfer Learning

Despite achieving state-of-the-art zero-shot performance, existing vision-language models, e.g., CLIP, still fall short of domain-specific classification tasks, e.g., Fungi Classification. In the context of few-shot transfer learning, traditional fine-tuning fails to prevent highly expressive model from exploiting spurious correlations in the training data. On the other hand, although model-agnostic meta-learning (MAML) presents as a natural alternative for transfer learning, the expensive computation due to implicit second-order optimization limits its use in large-scale models and datasets. In this work we aim to further improve the generalization of existing vision-language models on unseen tasks via a simple yet efficient fine-tuning strategy based on uniform task sampling. We term our method as Model-Agnostic Multitask Fine-tuning (MAMF). Compared with MAML, MAMF discards the bi-level optimization and uses only first-order gradients, which makes it easily scalable and computationally efficient. Due to the uniform task sampling procedure, MAMF consistently outperforms the classical fine-tuning method for few-shot transfer learning on five benchmark datasets. Empirically, we further discover that the effectiveness of first-order MAML is highly dependent on the zero-shot performance of the pretrained model, and our simple algorithm can outperform first-order MAML on more challenging datasets with low zero-shot performance.

* 7 pages, 6 figures, under review 
Viaarxiv icon

Improving Embedded Knowledge Graph Multi-hop Question Answering by introducing Relational Chain Reasoning

Oct 25, 2021
Weiqiang Jin, Hang Yu, Xi Tao, Ruiping Yin

Figure 1 for Improving Embedded Knowledge Graph Multi-hop Question Answering by introducing Relational Chain Reasoning
Figure 2 for Improving Embedded Knowledge Graph Multi-hop Question Answering by introducing Relational Chain Reasoning
Figure 3 for Improving Embedded Knowledge Graph Multi-hop Question Answering by introducing Relational Chain Reasoning
Figure 4 for Improving Embedded Knowledge Graph Multi-hop Question Answering by introducing Relational Chain Reasoning

Knowledge Base Question Answering (KBQA) aims to answer userquestions from a knowledge base (KB) by identifying the reasoningrelations between topic entity and answer. As a complex branchtask of KBQA, multi-hop KGQA requires reasoning over multi-hop relational chains preserved in KG to arrive at the right answer.Despite the successes made in recent years, the existing works onanswering multi-hop complex question face the following challenges: i) suffering from poor performances due to the neglect of explicit relational chain order and its relational types reflected inuser questions; ii) failing to consider implicit relations between thetopic entity and the answer implied in structured KG because oflimited neighborhood size constraints in subgraph retrieval based algorithms. To address these issues in multi-hop KGQA, we proposea novel model in this paper, namely Relational Chain-based Embed-ded KGQA (Rce-KGQA), which simultaneously utilizes the explicitrelational chain described in natural language questions and the implicit relational chain stored in structured KG. Our extensiveempirical study on two open-domain benchmarks proves that ourmethod significantly outperforms the state-of-the-art counterpartslike GraftNet, PullNet and EmbedKGQA. Comprehensive ablation experiments also verify the effectiveness of our method for multi-hop KGQA tasks. We have made our model's source code availableat Github: https://github.com/albert-jin/Rce-KGQA.

* 10 pages, 5 figures; 36 references; This work was carried out during the first author's master time at Shanghai University. This work is also partially supported by an anonymous Natural Research Foundation. We would like to thank Hang Yu for providing helpful discussions and valuable recommendations 
Viaarxiv icon

CvT-ASSD: Convolutional vision-Transformer Based Attentive Single Shot MultiBox Detector

Oct 24, 2021
Weiqiang Jin, Hang Yu, Hang Yu

Figure 1 for CvT-ASSD: Convolutional vision-Transformer Based Attentive Single Shot MultiBox Detector
Figure 2 for CvT-ASSD: Convolutional vision-Transformer Based Attentive Single Shot MultiBox Detector
Figure 3 for CvT-ASSD: Convolutional vision-Transformer Based Attentive Single Shot MultiBox Detector
Figure 4 for CvT-ASSD: Convolutional vision-Transformer Based Attentive Single Shot MultiBox Detector

Due to the success of Bidirectional Encoder Representations from Transformers (BERT) in natural language process (NLP), the multi-head attention transformer has been more and more prevalent in computer-vision researches (CV). However, it still remains a challenge for researchers to put forward complex tasks such as vision detection and semantic segmentation. Although multiple Transformer-Based architectures like DETR and ViT-FRCNN have been proposed to complete object detection task, they inevitably decreases discrimination accuracy and brings down computational efficiency caused by the enormous learning parameters and heavy computational complexity incurred by the traditional self-attention operation. In order to alleviate these issues, we present a novel object detection architecture, named Convolutional vision Transformer Based Attentive Single Shot MultiBox Detector (CvT-ASSD), that built on the top of Convolutional vision Transormer (CvT) with the efficient Attentive Single Shot MultiBox Detector (ASSD). We provide comprehensive empirical evidence showing that our model CvT-ASSD can leads to good system efficiency and performance while being pretrained on large-scale detection datasets such as PASCAL VOC and MS COCO. Code has been released on public github repository at https://github.com/albert-jin/CvT-ASSD.

* 9 pages;5 figures; conference: IEEE ICTAI; Acknowledgment: The research reported in this paper was supported in part by the National Natural Science Foundation of China under the grant 91746203 and the Outstanding Academic Leader Project of Shanghai under the grant No.20XD1401700 
Viaarxiv icon

AP-10K: A Benchmark for Animal Pose Estimation in the Wild

Aug 28, 2021
Hang Yu, Yufei Xu, Jing Zhang, Wei Zhao, Ziyu Guan, Dacheng Tao

Figure 1 for AP-10K: A Benchmark for Animal Pose Estimation in the Wild
Figure 2 for AP-10K: A Benchmark for Animal Pose Estimation in the Wild
Figure 3 for AP-10K: A Benchmark for Animal Pose Estimation in the Wild
Figure 4 for AP-10K: A Benchmark for Animal Pose Estimation in the Wild

Accurate animal pose estimation is an essential step towards understanding animal behavior, and can potentially benefit many downstream applications, such as wildlife conservation. Previous works only focus on specific animals while ignoring the diversity of animal species, limiting the generalization ability. In this paper, we propose AP-10K, the first large-scale benchmark for general animal pose estimation, to facilitate the research in animal pose estimation. AP-10K consists of 10,015 images collected and filtered from 23 animal families and 60 species following the taxonomic rank and high-quality keypoint annotations labeled and checked manually. Based on AP-10K, we benchmark representative pose estimation models on the following three tracks: (1) supervised learning for animal pose estimation, (2) cross-domain transfer learning from human pose estimation to animal pose estimation, and (3) intra- and inter-family domain generalization for unseen animals. The experimental results provide sound empirical evidence on the superiority of learning from diverse animals species in terms of both accuracy and generalization ability. It opens new directions for facilitating future research in animal pose estimation. AP-10k is publicly available at https://github.com/AlexTheBad/AP10K.

Viaarxiv icon

BWCP: Probabilistic Learning-to-Prune Channels for ConvNets via Batch Whitening

May 13, 2021
Wenqi Shao, Hang Yu, Zhaoyang Zhang, Hang Xu, Zhenguo Li, Ping Luo

Figure 1 for BWCP: Probabilistic Learning-to-Prune Channels for ConvNets via Batch Whitening
Figure 2 for BWCP: Probabilistic Learning-to-Prune Channels for ConvNets via Batch Whitening
Figure 3 for BWCP: Probabilistic Learning-to-Prune Channels for ConvNets via Batch Whitening
Figure 4 for BWCP: Probabilistic Learning-to-Prune Channels for ConvNets via Batch Whitening

This work presents a probabilistic channel pruning method to accelerate Convolutional Neural Networks (CNNs). Previous pruning methods often zero out unimportant channels in training in a deterministic manner, which reduces CNN's learning capacity and results in suboptimal performance. To address this problem, we develop a probability-based pruning algorithm, called batch whitening channel pruning (BWCP), which can stochastically discard unimportant channels by modeling the probability of a channel being activated. BWCP has several merits. (1) It simultaneously trains and prunes CNNs from scratch in a probabilistic way, exploring larger network space than deterministic methods. (2) BWCP is empowered by the proposed batch whitening tool, which is able to empirically and theoretically increase the activation probability of useful channels while keeping unimportant channels unchanged without adding any extra parameters and computational cost in inference. (3) Extensive experiments on CIFAR-10, CIFAR-100, and ImageNet with various network architectures show that BWCP outperforms its counterparts by achieving better accuracy given limited computational budgets. For example, ResNet50 pruned by BWCP has only 0.70\% Top-1 accuracy drop on ImageNet, while reducing 43.1\% FLOPs of the plain ResNet50.

* 19 pages, 7 figures, 6 tables 
Viaarxiv icon