Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Li Wang

Northeast Normal University

Progressive Coordinate Transforms for Monocular 3D Object Detection

Aug 13, 2021

Li Wang, Li Zhang, Yi Zhu, Zhi Zhang, Tong He, Mu Li, Xiangyang Xue

Figure 1 for Progressive Coordinate Transforms for Monocular 3D Object Detection

Figure 2 for Progressive Coordinate Transforms for Monocular 3D Object Detection

Figure 3 for Progressive Coordinate Transforms for Monocular 3D Object Detection

Figure 4 for Progressive Coordinate Transforms for Monocular 3D Object Detection

Abstract:Recognizing and localizing objects in the 3D space is a crucial ability for an AI agent to perceive its surrounding environment. While significant progress has been achieved with expensive LiDAR point clouds, it poses a great challenge for 3D object detection given only a monocular image. While there exist different alternatives for tackling this problem, it is found that they are either equipped with heavy networks to fuse RGB and depth information or empirically ineffective to process millions of pseudo-LiDAR points. With in-depth examination, we realize that these limitations are rooted in inaccurate object localization. In this paper, we propose a novel and lightweight approach, dubbed {\em Progressive Coordinate Transforms} (PCT) to facilitate learning coordinate representations. Specifically, a localization boosting mechanism with confidence-aware loss is introduced to progressively refine the localization prediction. In addition, semantic image representation is also exploited to compensate for the usage of patch proposals. Despite being lightweight and simple, our strategy leads to superior improvements on the KITTI and Waymo Open Dataset monocular 3D detection benchmarks. At the same time, our proposed PCT shows great generalization to most coordinate-based 3D detection frameworks. The code is available at: https://github.com/amazon-research/progressive-coordinate-transforms .

* Code is available at: https://github.com/amazon-research/progressive-coordinate-transforms

Via

Access Paper or Ask Questions

Text Anchor Based Metric Learning for Small-footprint Keyword Spotting

Aug 12, 2021

Li Wang, Rongzhi Gu, Nuo Chen, Yuexian Zou

Figure 1 for Text Anchor Based Metric Learning for Small-footprint Keyword Spotting

Figure 2 for Text Anchor Based Metric Learning for Small-footprint Keyword Spotting

Figure 3 for Text Anchor Based Metric Learning for Small-footprint Keyword Spotting

Figure 4 for Text Anchor Based Metric Learning for Small-footprint Keyword Spotting

Abstract:Keyword Spotting (KWS) remains challenging to achieve the trade-off between small footprint and high accuracy. Recently proposed metric learning approaches improved the generalizability of models for the KWS task, and 1D-CNN based KWS models have achieved the state-of-the-arts (SOTA) in terms of model size. However, for metric learning, due to data limitations, the speech anchor is highly susceptible to the acoustic environment and speakers. Also, we note that the 1D-CNN models have limited capability to capture long-term temporal acoustic features. To address the above problems, we propose to utilize text anchors to improve the stability of anchors. Furthermore, a new type of model (LG-Net) is exquisitely designed to promote long-short term acoustic feature modeling based on 1D-CNN and self-attention. Experiments are conducted on Google Speech Commands Dataset version 1 (GSCDv1) and 2 (GSCDv2). The results demonstrate that the proposed text anchor based metric learning method shows consistent improvements over speech anchor on representative CNN-based models. Moreover, our LG-Net model achieves SOTA accuracy of 97.67% and 96.79% on two datasets, respectively. It is encouraged to see that our lighter LG-Net with only 74k parameters obtains 96.82% KWS accuracy on the GSCDv1 and 95.77% KWS accuracy on the GSCDv2.

* Accepted for Interspeech2021

Via

Access Paper or Ask Questions

Privacy Threats Analysis to Secure Federated Learning

Jun 24, 2021

Yuchen Li, Yifan Bao, Liyao Xiang, Junhan Liu, Cen Chen, Li Wang, Xinbing Wang

Figure 1 for Privacy Threats Analysis to Secure Federated Learning

Figure 2 for Privacy Threats Analysis to Secure Federated Learning

Figure 3 for Privacy Threats Analysis to Secure Federated Learning

Figure 4 for Privacy Threats Analysis to Secure Federated Learning

Abstract:Federated learning is emerging as a machine learning technique that trains a model across multiple decentralized parties. It is renowned for preserving privacy as the data never leaves the computational devices, and recent approaches further enhance its privacy by hiding messages transferred in encryption. However, we found that despite the efforts, federated learning remains privacy-threatening, due to its interactive nature across different parties. In this paper, we analyze the privacy threats in industrial-level federated learning frameworks with secure computation, and reveal such threats widely exist in typical machine learning models such as linear regression, logistic regression and decision tree. For the linear and logistic regression, we show through theoretical analysis that it is possible for the attacker to invert the entire private input of the victim, given very few information. For the decision tree model, we launch an attack to infer the range of victim's private inputs. All attacks are evaluated on popular federated learning frameworks and real-world datasets.

Via

Access Paper or Ask Questions

Joint Training of the Superimposed Direct and Reflected Links in Reconfigurable Intelligent Surface Assisted Multiuser Communications

May 30, 2021

Jiancheng An, Chao Xu, Li Wang, Yusha Liu, Lu Gan, Lajos Hanzo

Figure 1 for Joint Training of the Superimposed Direct and Reflected Links in Reconfigurable Intelligent Surface Assisted Multiuser Communications

Figure 2 for Joint Training of the Superimposed Direct and Reflected Links in Reconfigurable Intelligent Surface Assisted Multiuser Communications

Figure 3 for Joint Training of the Superimposed Direct and Reflected Links in Reconfigurable Intelligent Surface Assisted Multiuser Communications

Figure 4 for Joint Training of the Superimposed Direct and Reflected Links in Reconfigurable Intelligent Surface Assisted Multiuser Communications

Abstract:In Reconfigurable intelligent surface (RIS)-assisted systems the acquisition of CSI and the optimization of the reflecting coefficients constitute a pair of salient design issues. In this paper, a novel channel training protocol is proposed, which is capable of achieving a flexible performance vs. signalling and pilot overhead as well as implementation complexity trade-off. More specifically, first of all, we conceive a holistic channel estimation protocol, which integrates the existing channel estimation techniques and passive beamforming design. Secondly, we propose a new channel training framework. In contrast to the conventional channel estimation arrangements, our new framework divides the training phase into several periods, where the superimposed end-to-end channel is estimated instead of separately estimating the direct BS-user channel and cascaded reflected BS-RIS-user channels. As a result, the reflecting coefficients of the RIS are optimized by comparing the objective function values over multiple training periods. Moreover, the theoretical performance of our channel training protocol is analyzed and compared to that under the optimal reflecting coefficients. In addition, the potential benefits of our channel training protocol in reducing the complexity, pilot overhead as well as signalling overhead are also detailed. Thirdly, we derive the theoretical performance of channel estimation protocols and our channel training protocol in the presence of noise for a SISO scenario, which provides useful insights into the impact of the noise on the overall RIS performance. Finally, our numerical simulations characterize the performance of the proposed protocols and verify our theoretical analysis. In particular, the simulation results demonstrate that our channel training protocol is more competitive than the channel estimation protocol at low signal-to-noise ratios.

Via

Access Paper or Ask Questions

AGSFCOS: Based on attention mechanism and Scale-Equalizing pyramid network of object detection

May 20, 2021

Li Wang, Wei Xiang, Ruhui Xue, Kaida Zou, Laili Zhu

Figure 1 for AGSFCOS: Based on attention mechanism and Scale-Equalizing pyramid network of object detection

Figure 2 for AGSFCOS: Based on attention mechanism and Scale-Equalizing pyramid network of object detection

Figure 3 for AGSFCOS: Based on attention mechanism and Scale-Equalizing pyramid network of object detection

Figure 4 for AGSFCOS: Based on attention mechanism and Scale-Equalizing pyramid network of object detection

Abstract:Recently, the anchor-free object detection model has shown great potential for accuracy and speed to exceed anchor-based object detection. Therefore, two issues are mainly studied in this article: (1) How to let the backbone network in the anchor-free object detection model learn feature extraction? (2) How to make better use of the feature pyramid network? In order to solve the above problems, Experiments show that our model has a certain improvement in accuracy compared with the current popular detection models on the COCO dataset, the designed attention mechanism module can capture contextual information well, improve detection accuracy, and use sepc network to help balance abstract and detailed information, and reduce the problem of semantic gap in the feature pyramid network. Whether it is anchor-based network model YOLOv3, Faster RCNN, or anchor-free network model Foveabox, FSAF, FCOS. Our optimal model can get 39.5% COCO AP under the background of ResNet50.

* 9 pages,9 figures

Via

Access Paper or Ask Questions

Towards a Model for LSH

May 11, 2021

Li Wang

Abstract:As data volumes continue to grow, clustering and outlier detection algorithms are becoming increasingly time-consuming. Classical index structures for neighbor search are no longer sustainable due to the "curse of dimensionality". Instead, approximated index structures offer a good opportunity to significantly accelerate the neighbor search for clustering and outlier detection and to have the lowest possible error rate in the results of the algorithms. Locality-sensitive hashing is one of those. We indicate directions to model the properties of LSH.

* arXiv admin note: text overlap with arXiv:2103.01888

Via

Access Paper or Ask Questions

Lite-FPN for Keypoint-based Monocular 3D Object Detection

May 01, 2021

Lei Yang, Xinyu Zhang, Li Wang, Minghan Zhu, Jun Li

Figure 1 for Lite-FPN for Keypoint-based Monocular 3D Object Detection

Figure 2 for Lite-FPN for Keypoint-based Monocular 3D Object Detection

Figure 3 for Lite-FPN for Keypoint-based Monocular 3D Object Detection

Figure 4 for Lite-FPN for Keypoint-based Monocular 3D Object Detection

Abstract:3D object detection with a single image is an essential and challenging task for autonomous driving. Recently, keypoint-based monocular 3D object detection has made tremendous progress and achieved great speed-accuracy trade-off. However, there still exists a huge gap with LIDAR-based methods in terms of accuracy. To improve their performance without sacrificing efficiency, we propose a sort of lightweight feature pyramid network called Lite-FPN to achieve multi-scale feature fusion in an effective and efficient way, which can boost the multi-scale detection capability of keypoint-based detectors. Besides, the misalignment between the classification score and the localization precision is further relieved by introducing a novel regression loss named attention loss. With the proposed loss, predictions with high confidence but poor localization are treated with more attention during the training phase. Comparative experiments based on several state-of-the-art keypoint-based detectors on the KITTI dataset show that our proposed method achieves significantly higher accuracy and frame rate at the same time. The code and pretrained models will be available at https://github.com/yanglei18/Lite-FPN.

* 10 pages, 4 figures

Via

Access Paper or Ask Questions

Privacy-Preserving Federated Learning on Partitioned Attributes

Apr 29, 2021

Shuang Zhang, Liyao Xiang, Xi Yu, Pengzhi Chu, Yingqi Chen, Chen Cen, Li Wang

Figure 1 for Privacy-Preserving Federated Learning on Partitioned Attributes

Figure 2 for Privacy-Preserving Federated Learning on Partitioned Attributes

Figure 3 for Privacy-Preserving Federated Learning on Partitioned Attributes

Figure 4 for Privacy-Preserving Federated Learning on Partitioned Attributes

Abstract:Real-world data is usually segmented by attributes and distributed across different parties. Federated learning empowers collaborative training without exposing local data or models. As we demonstrate through designed attacks, even with a small proportion of corrupted data, an adversary can accurately infer the input attributes. We introduce an adversarial learning based procedure which tunes a local model to release privacy-preserving intermediate representations. To alleviate the accuracy decline, we propose a defense method based on the forward-backward splitting algorithm, which respectively deals with the accuracy loss and privacy loss in the forward and backward gradient descent steps, achieving the two objectives simultaneously. Extensive experiments on a variety of datasets have shown that our defense significantly mitigates privacy leakage with negligible impact on the federated learning task.

Via

Access Paper or Ask Questions

Depth-conditioned Dynamic Message Propagation for Monocular 3D Object Detection

Mar 30, 2021

Li Wang, Liang Du, Xiaoqing Ye, Yanwei Fu, Guodong Guo, Xiangyang Xue, Jianfeng Feng, Li Zhang

Figure 1 for Depth-conditioned Dynamic Message Propagation for Monocular 3D Object Detection

Figure 2 for Depth-conditioned Dynamic Message Propagation for Monocular 3D Object Detection

Figure 3 for Depth-conditioned Dynamic Message Propagation for Monocular 3D Object Detection

Figure 4 for Depth-conditioned Dynamic Message Propagation for Monocular 3D Object Detection

Abstract:The objective of this paper is to learn context- and depth-aware feature representation to solve the problem of monocular 3D object detection. We make following contributions: (i) rather than appealing to the complicated pseudo-LiDAR based approach, we propose a depth-conditioned dynamic message propagation (DDMP) network to effectively integrate the multi-scale depth information with the image context;(ii) this is achieved by first adaptively sampling context-aware nodes in the image context and then dynamically predicting hybrid depth-dependent filter weights and affinity matrices for propagating information; (iii) by augmenting a center-aware depth encoding (CDE) task, our method successfully alleviates the inaccurate depth prior; (iv) we thoroughly demonstrate the effectiveness of our proposed approach and show state-of-the-art results among the monocular-based approaches on the KITTI benchmark dataset. Particularly, we rank $1^{st}$ in the highly competitive KITTI monocular 3D object detection track on the submission day (November 16th, 2020). Code and models are released at \url{https://github.com/fudan-zvg/DDMP}

* CVPR 2021. Code at https://github.com/fudan-zvg/DDMP

Via

Access Paper or Ask Questions

Cross-Dataset Collaborative Learning for Semantic Segmentation

Mar 21, 2021

Li Wang, Dong Li, Yousong Zhu, Lu Tian, Yi Shan

Figure 1 for Cross-Dataset Collaborative Learning for Semantic Segmentation

Figure 2 for Cross-Dataset Collaborative Learning for Semantic Segmentation

Figure 3 for Cross-Dataset Collaborative Learning for Semantic Segmentation

Figure 4 for Cross-Dataset Collaborative Learning for Semantic Segmentation

Abstract:Recent work attempts to improve semantic segmentation performance by exploring well-designed architectures on a target dataset. However, it remains challenging to build a unified system that simultaneously learns from various datasets due to the inherent distribution shift across different datasets. In this paper, we present a simple, flexible, and general method for semantic segmentation, termed Cross-Dataset Collaborative Learning (CDCL). Given multiple labeled datasets, we aim to improve the generalization and discrimination of feature representations on each dataset. Specifically, we first introduce a family of Dataset-Aware Blocks (DAB) as the fundamental computing units of the network, which help capture homogeneous representations and heterogeneous statistics across different datasets. Second, we propose a Dataset Alternation Training (DAT) mechanism to efficiently facilitate the optimization procedure. We conduct extensive evaluations on four diverse datasets, i.e., Cityscapes, BDD100K, CamVid, and COCO Stuff, with single-dataset and cross-dataset settings. Experimental results demonstrate our method consistently achieves notable improvements over prior single-dataset and cross-dataset training methods without introducing extra FLOPs. Particularly, with the same architecture of PSPNet (ResNet-18), our method outperforms the single-dataset baseline by 5.65\%, 6.57\%, and 5.79\% of mIoU on the validation sets of Cityscapes, BDD100K, CamVid, respectively. Code and models will be released.

* To Appear at CVPR2021

Via

Access Paper or Ask Questions