Alert button
Picture for Kai Lu

Kai Lu

Alert button

Shoggoth: Towards Efficient Edge-Cloud Collaborative Real-Time Video Inference via Adaptive Online Learning

Jun 27, 2023
Liang Wang, Kai Lu, Nan Zhang, Xiaoyang Qu, Jianzong Wang, Jiguang Wan, Guokuan Li, Jing Xiao

Figure 1 for Shoggoth: Towards Efficient Edge-Cloud Collaborative Real-Time Video Inference via Adaptive Online Learning
Figure 2 for Shoggoth: Towards Efficient Edge-Cloud Collaborative Real-Time Video Inference via Adaptive Online Learning
Figure 3 for Shoggoth: Towards Efficient Edge-Cloud Collaborative Real-Time Video Inference via Adaptive Online Learning
Figure 4 for Shoggoth: Towards Efficient Edge-Cloud Collaborative Real-Time Video Inference via Adaptive Online Learning

This paper proposes Shoggoth, an efficient edge-cloud collaborative architecture, for boosting inference performance on real-time video of changing scenes. Shoggoth uses online knowledge distillation to improve the accuracy of models suffering from data drift and offloads the labeling process to the cloud, alleviating constrained resources of edge devices. At the edge, we design adaptive training using small batches to adapt models under limited computing power, and adaptive sampling of training frames for robustness and reducing bandwidth. The evaluations on the realistic dataset show 15%-20% model accuracy improvement compared to the edge-only strategy and fewer network costs than the cloud-only strategy.

* Accepted by 60th ACM/IEEE Design Automation Conference (DAC2023) 
Viaarxiv icon

Multi-body SE(3) Equivariance for Unsupervised Rigid Segmentation and Motion Estimation

Jun 08, 2023
Jia-Xing Zhong, Ta-Ying Cheng, Yuhang He, Kai Lu, Kaichen Zhou, Andrew Markham, Niki Trigoni

Figure 1 for Multi-body SE(3) Equivariance for Unsupervised Rigid Segmentation and Motion Estimation
Figure 2 for Multi-body SE(3) Equivariance for Unsupervised Rigid Segmentation and Motion Estimation
Figure 3 for Multi-body SE(3) Equivariance for Unsupervised Rigid Segmentation and Motion Estimation
Figure 4 for Multi-body SE(3) Equivariance for Unsupervised Rigid Segmentation and Motion Estimation

A truly generalizable approach to rigid segmentation and motion estimation is fundamental to 3D understanding of articulated objects and moving scenes. In view of the tightly coupled relationship between segmentation and motion estimates, we present an SE(3) equivariant architecture and a training strategy to tackle this task in an unsupervised manner. Our architecture comprises two lightweight and inter-connected heads that predict segmentation masks using point-level invariant features and motion estimates from SE(3) equivariant features without the prerequisites of category information. Our unified training strategy can be performed online while jointly optimizing the two predictions by exploiting the interrelations among scene flow, segmentation mask, and rigid transformations. We show experiments on four datasets as evidence of the superiority of our method both in terms of model performance and computational efficiency with only 0.25M parameters and 0.92G FLOPs. To the best of our knowledge, this is the first work designed for category-agnostic part-level SE(3) equivariance in dynamic point clouds.

Viaarxiv icon

PAI at SemEval-2023 Task 2: A Universal System for Named Entity Recognition with External Entity Information

May 10, 2023
Long Ma, Kai Lu, Tianbo Che, Hailong Huang, Weiguo Gao, Xuan Li

Figure 1 for PAI at SemEval-2023 Task 2: A Universal System for Named Entity Recognition with External Entity Information
Figure 2 for PAI at SemEval-2023 Task 2: A Universal System for Named Entity Recognition with External Entity Information
Figure 3 for PAI at SemEval-2023 Task 2: A Universal System for Named Entity Recognition with External Entity Information
Figure 4 for PAI at SemEval-2023 Task 2: A Universal System for Named Entity Recognition with External Entity Information

The MultiCoNER II task aims to detect complex, ambiguous, and fine-grained named entities in low-context situations and noisy scenarios like the presence of spelling mistakes and typos for multiple languages. The task poses significant challenges due to the scarcity of contextual information, the high granularity of the entities(up to 33 classes), and the interference of noisy data. To address these issues, our team {\bf PAI} proposes a universal Named Entity Recognition (NER) system that integrates external entity information to improve performance. Specifically, our system retrieves entities with properties from the knowledge base (i.e. Wikipedia) for a given text, then concatenates entity information with the input sentence and feeds it into Transformer-based models. Finally, our system wins 2 first places, 4 second places, and 1 third place out of 13 tracks. The code is publicly available at \url{https://github.com/diqiuzhuanzhuan/semeval-2023}.

* win 2 first places, 4 second places, and 1 third place out of 13 tracks 
Viaarxiv icon

Decoupling Skill Learning from Robotic Control for Generalizable Object Manipulation

Mar 09, 2023
Kai Lu, Bo Yang, Bing Wang, Andrew Markham

Figure 1 for Decoupling Skill Learning from Robotic Control for Generalizable Object Manipulation
Figure 2 for Decoupling Skill Learning from Robotic Control for Generalizable Object Manipulation
Figure 3 for Decoupling Skill Learning from Robotic Control for Generalizable Object Manipulation
Figure 4 for Decoupling Skill Learning from Robotic Control for Generalizable Object Manipulation

Recent works in robotic manipulation through reinforcement learning (RL) or imitation learning (IL) have shown potential for tackling a range of tasks e.g., opening a drawer or a cupboard. However, these techniques generalize poorly to unseen objects. We conjecture that this is due to the high-dimensional action space for joint control. In this paper, we take an alternative approach and separate the task of learning 'what to do' from 'how to do it' i.e., whole-body control. We pose the RL problem as one of determining the skill dynamics for a disembodied virtual manipulator interacting with articulated objects. The whole-body robotic kinematic control is optimized to execute the high-dimensional joint motion to reach the goals in the workspace. It does so by solving a quadratic programming (QP) model with robotic singularity and kinematic constraints. Our experiments on manipulating complex articulated objects show that the proposed approach is more generalizable to unseen objects with large intra-class variations, outperforming previous approaches. The evaluation results indicate that our approach generates more compliant robotic motion and outperforms the pure RL and IL baselines in task success rates. Additional information and videos are available at https://kl-research.github.io/decoupskill

* Accepted to IEEE International Conference on Robotics and Automation (ICRA) 2023 
Viaarxiv icon

Deep Reinforcement Learning for Robotic Pushing and Picking in Cluttered Environment

Feb 21, 2023
Yuhong Deng, Xiaofeng Guo, Yixuan Wei, Kai Lu, Bin Fang, Di Guo, Huaping Liu, Fuchun Sun

Figure 1 for Deep Reinforcement Learning for Robotic Pushing and Picking in Cluttered Environment
Figure 2 for Deep Reinforcement Learning for Robotic Pushing and Picking in Cluttered Environment
Figure 3 for Deep Reinforcement Learning for Robotic Pushing and Picking in Cluttered Environment
Figure 4 for Deep Reinforcement Learning for Robotic Pushing and Picking in Cluttered Environment

In this paper, a novel robotic grasping system is established to automatically pick up objects in cluttered scenes. A composite robotic hand composed of a suction cup and a gripper is designed for grasping the object stably. The suction cup is used for lifting the object from the clutter first and the gripper for grasping the object accordingly. We utilize the affordance map to provide pixel-wise lifting point candidates for the suction cup. To obtain a good affordance map, the active exploration mechanism is introduced to the system. An effective metric is designed to calculate the reward for the current affordance map, and a deep Q-Network (DQN) is employed to guide the robotic hand to actively explore the environment until the generated affordance map is suitable for grasping. Experimental results have demonstrated that the proposed robotic grasping system is able to greatly increase the success rate of the robotic grasping in cluttered scenes.

* IEEE/RSJ International Conference on Intelligent Robots and Systems 2019 (IROS 2019)  
* has been accepted by IEEE/RSJ International Conference on Intelligent Robots and Systems 2019 
Viaarxiv icon

Tree-based Search Graph for Approximate Nearest Neighbor Search

Jan 10, 2022
Xiaobin Fan, Xiaoping Wang, Kai Lu, Lei Xue, Jinjing Zhao

Figure 1 for Tree-based Search Graph for Approximate Nearest Neighbor Search
Figure 2 for Tree-based Search Graph for Approximate Nearest Neighbor Search
Figure 3 for Tree-based Search Graph for Approximate Nearest Neighbor Search
Figure 4 for Tree-based Search Graph for Approximate Nearest Neighbor Search

Nearest neighbor search supports important applications in many domains, such as database, machine learning, computer vision. Since the computational cost for accurate search is too high, the community turned to the research of approximate nearest neighbor search (ANNS). Among them, graph-based algorithm is one of the most important branches. Research by Fu et al. shows that the algorithms based on Monotonic Search Network (MSNET), such as NSG and NSSG, have achieved the state-of-the-art search performance in efficiency. The MSNET is dedicated to achieving monotonic search with minimal out-degree of nodes to pursue high efficiency. However, the current MSNET designs did not optimize the probability of the monotonic search, and the lower bound of the probability is only 50%. If they fail in monotonic search stage, they have to suffer tremendous backtracking cost to achieve the required accuracy. This will cause performance problems in search efficiency. To address this problem, we propose (r,p)-MSNET, which achieves guaranteed probability on monotonic search. Due to the high building complexity of a strict (r,p)-MSNET, we propose TBSG, which is an approximation with low complexity. Experiment conducted on four million-scaled datasets show that TBSG outperforms existing state-of-the-art graph-based algorithms in search efficiency. Our code has been released on Github.

Viaarxiv icon

CoG: a Two-View Co-training Framework for Defending Adversarial Attacks on Graph

Sep 12, 2021
Xugang Wu, Huijun Wu, Xu Zhou, Kai Lu

Figure 1 for CoG: a Two-View Co-training Framework for Defending Adversarial Attacks on Graph
Figure 2 for CoG: a Two-View Co-training Framework for Defending Adversarial Attacks on Graph
Figure 3 for CoG: a Two-View Co-training Framework for Defending Adversarial Attacks on Graph
Figure 4 for CoG: a Two-View Co-training Framework for Defending Adversarial Attacks on Graph

Graph neural networks exhibit remarkable performance in graph data analysis. However, the robustness of GNN models remains a challenge. As a result, they are not reliable enough to be deployed in critical applications. Recent studies demonstrate that GNNs could be easily fooled with adversarial perturbations, especially structural perturbations. Such vulnerability is attributed to the excessive dependence on the structure information to make predictions. To achieve better robustness, it is desirable to build the prediction of GNNs with more comprehensive features. Graph data, in most cases, has two views of information, namely structure information and feature information. In this paper, we propose CoG, a simple yet effective co-training framework to combine these two views for the purpose of robustness. CoG trains sub-models from the feature view and the structure view independently and allows them to distill knowledge from each other by adding their most confident unlabeled data into the training set. The orthogonality of these two views diversifies the sub-models, thus enhancing the robustness of their ensemble. We evaluate our framework on three popular datasets, and results show that CoG significantly improves the robustness of graph models against adversarial attacks without sacrificing their performance on clean data. We also show that CoG still achieves good robustness when both node features and graph structures are perturbed.

Viaarxiv icon

Interpretable Classification from Skin Cancer Histology Slides Using Deep Learning: A Retrospective Multicenter Study

Apr 12, 2019
Peizhen Xie, Ke Zuo, Yu Zhang, Fangfang Li, Mingzhu Yin, Kai Lu

Figure 1 for Interpretable Classification from Skin Cancer Histology Slides Using Deep Learning: A Retrospective Multicenter Study
Figure 2 for Interpretable Classification from Skin Cancer Histology Slides Using Deep Learning: A Retrospective Multicenter Study
Figure 3 for Interpretable Classification from Skin Cancer Histology Slides Using Deep Learning: A Retrospective Multicenter Study
Figure 4 for Interpretable Classification from Skin Cancer Histology Slides Using Deep Learning: A Retrospective Multicenter Study

For diagnosing melanoma, hematoxylin and eosin (H&E) stained tissue slides remains the gold standard. These images contain quantitative information in different magnifications. In the present study, we investigated whether deep convolutional neural networks can extract structural features of complex tissues directly from these massive size images in a patched way. In order to face the challenge arise from morphological diversity in histopathological slides, we built a multicenter database of 2241 digital whole-slide images from 1321 patients from 2008 to 2018. We trained both ResNet50 and Vgg19 using over 9.95 million patches by transferring learning, and test performance with two kinds of critical classifications: malignant melanomas versus benign nevi in separate and mixed magnification; and distinguish among nevi in maximum magnification. The CNNs achieves superior performance across both tasks, demonstrating an AI capable of classifying skin cancer in the analysis from histopathological images. For making the classifications reasonable, the visualization of CNN representations is furthermore used to identify cells between melanoma and nevi. Regions of interest (ROI) are also located which are significantly helpful, giving pathologists more support of correctly diagnosis.

* 6 pages,3 figures 
Viaarxiv icon

The Vulnerabilities of Graph Convolutional Networks: Stronger Attacks and Defensive Techniques

Mar 08, 2019
Huijun Wu, Chen Wang, Yuriy Tyshetskiy, Andrew Dotcherty, Kai Lu, Liming Zhu

Figure 1 for The Vulnerabilities of Graph Convolutional Networks: Stronger Attacks and Defensive Techniques
Figure 2 for The Vulnerabilities of Graph Convolutional Networks: Stronger Attacks and Defensive Techniques
Figure 3 for The Vulnerabilities of Graph Convolutional Networks: Stronger Attacks and Defensive Techniques
Figure 4 for The Vulnerabilities of Graph Convolutional Networks: Stronger Attacks and Defensive Techniques

Graph deep learning models, such as graph convolutional networks (GCN) achieve remarkable performance for tasks on graph data. Similar to other types of deep models, graph deep learning models often suffer from adversarial attacks. However, compared with non-graph data, the discrete features, graph connections and different definitions of imperceptible perturbations bring unique challenges and opportunities for the adversarial attacks and defences for graph data. In this paper, we propose both attack and defence techniques. For attack, we show that the discrete feature problem could easily be resolved by introducing integrated gradients which could accurately reflect the effect of perturbing certain features or edges while still benefiting from the parallel computations. For defence, we propose to partially learn the adjacency matrix to integrate the information of distant nodes so that the prediction of a certain target is supported by more global graph information rather than just few neighbour nodes. This, therefore, makes the attacks harder since one need to perturb more features/edges to make the attacks succeed. Our experiments on a number of datasets show the effectiveness of the proposed methods.

Viaarxiv icon

Interpreting Shared Deep Learning Models via Explicable Boundary Trees

Sep 12, 2017
Huijun Wu, Chen Wang, Jie Yin, Kai Lu, Liming Zhu

Figure 1 for Interpreting Shared Deep Learning Models via Explicable Boundary Trees
Figure 2 for Interpreting Shared Deep Learning Models via Explicable Boundary Trees
Figure 3 for Interpreting Shared Deep Learning Models via Explicable Boundary Trees
Figure 4 for Interpreting Shared Deep Learning Models via Explicable Boundary Trees

Despite outperforming the human in many tasks, deep neural network models are also criticized for the lack of transparency and interpretability in decision making. The opaqueness results in uncertainty and low confidence when deploying such a model in model sharing scenarios, when the model is developed by a third party. For a supervised machine learning model, sharing training process including training data provides an effective way to gain trust and to better understand model predictions. However, it is not always possible to share all training data due to privacy and policy constraints. In this paper, we propose a method to disclose a small set of training data that is just sufficient for users to get the insight of a complicated model. The method constructs a boundary tree using selected training data and the tree is able to approximate the complicated model with high fidelity. We show that traversing data points in the tree gives users significantly better understanding of the model and paves the way for trustworthy model sharing.

* 9 pages, 10 figures 
Viaarxiv icon