Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhen Li

Qian

Graph Enhanced Contrastive Learning for Radiology Findings Summarization

Apr 01, 2022
Jinpeng Hu, Zhuo Li, Zhihong Chen, Zhen Li, Xiang Wan, Tsung-Hui Chang

Figure 1 for Graph Enhanced Contrastive Learning for Radiology Findings Summarization

Figure 2 for Graph Enhanced Contrastive Learning for Radiology Findings Summarization

Figure 3 for Graph Enhanced Contrastive Learning for Radiology Findings Summarization

Figure 4 for Graph Enhanced Contrastive Learning for Radiology Findings Summarization

The impression section of a radiology report summarizes the most prominent observation from the findings section and is the most important section for radiologists to communicate to physicians. Summarizing findings is time-consuming and can be prone to error for inexperienced radiologists, and thus automatic impression generation has attracted substantial attention. With the encoder-decoder framework, most previous studies explore incorporating extra knowledge (e.g., static pre-defined clinical ontologies or extra background information). Yet, they encode such knowledge by a separate encoder to treat it as an extra input to their models, which is limited in leveraging their relations with the original findings. To address the limitation, we propose a unified framework for exploiting both extra knowledge and the original findings in an integrated way so that the critical information (i.e., key words and their relations) can be extracted in an appropriate way to facilitate impression generation. In detail, for each input findings, it is encoded by a text encoder, and a graph is constructed through its entities and dependency tree. Then, a graph encoder (e.g., graph neural networks (GNNs)) is adopted to model relation information in the constructed graph. Finally, to emphasize the key words in the findings, contrastive learning is introduced to map positive samples (constructed by masking non-key words) closer and push apart negative ones (constructed by masking key words). The experimental results on OpenI and MIMIC-CXR confirm the effectiveness of our proposed method.

* 9 pages, 5 figures

Via

Access Paper or Ask Questions

Beyond 3D Siamese Tracking: A Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds

Mar 03, 2022
Chaoda Zheng, Xu Yan, Haiming Zhang, Baoyuan Wang, Shenghui Cheng, Shuguang Cui, Zhen Li

Figure 1 for Beyond 3D Siamese Tracking: A Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds

Figure 2 for Beyond 3D Siamese Tracking: A Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds

Figure 3 for Beyond 3D Siamese Tracking: A Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds

Figure 4 for Beyond 3D Siamese Tracking: A Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds

3D single object tracking (3D SOT) in LiDAR point clouds plays a crucial role in autonomous driving. Current approaches all follow the Siamese paradigm based on appearance matching. However, LiDAR point clouds are usually textureless and incomplete, which hinders effective appearance matching. Besides, previous methods greatly overlook the critical motion clues among targets. In this work, beyond 3D Siamese tracking, we introduce a motion-centric paradigm to handle 3D SOT from a new perspective. Following this paradigm, we propose a matching-free two-stage tracker M^2-Track. At the 1^st-stage, M^2-Track localizes the target within successive frames via motion transformation. Then it refines the target box through motion-assisted shape completion at the 2^nd-stage. Extensive experiments confirm that M^2-Track significantly outperforms previous state-of-the-arts on three large-scale datasets while running at 57FPS (~8%, ~17%, and ~22%) precision gains on KITTI, NuScenes, and Waymo Open Dataset respectively). Further analysis verifies each component's effectiveness and shows the motion-centric paradigm's promising potential when combined with appearance matching.

* To appear in CVPR2022

Via

Access Paper or Ask Questions

X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning

Mar 02, 2022
Zhihao Yuan, Xu Yan, Yinghong Liao, Yao Guo, Guanbin Li, Zhen Li, Shuguang Cui

Figure 1 for X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning

Figure 2 for X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning

Figure 3 for X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning

Figure 4 for X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning

3D dense captioning aims to describe individual objects by natural language in 3D scenes, where 3D scenes are usually represented as RGB-D scans or point clouds. However, only exploiting single modal information, e.g., point cloud, previous approaches fail to produce faithful descriptions. Though aggregating 2D features into point clouds may be beneficial, it introduces an extra computational burden, especially in inference phases. In this study, we investigate a cross-modal knowledge transfer using Transformer for 3D dense captioning, X-Trans2Cap, to effectively boost the performance of single-modal 3D caption through knowledge distillation using a teacher-student framework. In practice, during the training phase, the teacher network exploits auxiliary 2D modality and guides the student network that only takes point clouds as input through the feature consistency constraints. Owing to the well-designed cross-modal feature fusion module and the feature alignment in the training phase, X-Trans2Cap acquires rich appearance information embedded in 2D images with ease. Thus, a more faithful caption can be generated only using point clouds during the inference. Qualitative and quantitative results confirm that X-Trans2Cap outperforms previous state-of-the-art by a large margin, i.e., about +21 and about +16 absolute CIDEr score on ScanRefer and Nr3D datasets, respectively.

* To appear in CVPR2022

Via

Access Paper or Ask Questions

ET-BERT: A Contextualized Datagram Representation with Pre-training Transformers for Encrypted Traffic Classification

Feb 19, 2022
Xinjie Lin, Gang Xiong, Gaopeng Gou, Zhen Li, Junzheng Shi, Jing Yu

Figure 1 for ET-BERT: A Contextualized Datagram Representation with Pre-training Transformers for Encrypted Traffic Classification

Figure 2 for ET-BERT: A Contextualized Datagram Representation with Pre-training Transformers for Encrypted Traffic Classification

Figure 3 for ET-BERT: A Contextualized Datagram Representation with Pre-training Transformers for Encrypted Traffic Classification

Figure 4 for ET-BERT: A Contextualized Datagram Representation with Pre-training Transformers for Encrypted Traffic Classification

Encrypted traffic classification requires discriminative and robust traffic representation captured from content-invisible and imbalanced traffic data for accurate classification, which is challenging but indispensable to achieve network security and network management. The major limitation of existing solutions is that they highly rely on the deep features, which are overly dependent on data size and hard to generalize on unseen data. How to leverage the open-domain unlabeled traffic data to learn representation with strong generalization ability remains a key challenge. In this paper,we propose a new traffic representation model called Encrypted Traffic Bidirectional Encoder Representations from Transformer (ET-BERT), which pre-trains deep contextualized datagram-level representation from large-scale unlabeled data. The pre-trained model can be fine-tuned on a small number of task-specific labeled data and achieves state-of-the-art performance across five encrypted traffic classification tasks, remarkably pushing the F1 of ISCX-Tor to 99.2% (4.4% absolute improvement), ISCX-VPN-Service to 98.9% (5.2% absolute improvement), Cross-Platform (Android) to 92.5% (5.4% absolute improvement), CSTNET-TLS 1.3 to 97.4% (10.0% absolute improvement). Notably, we provide explanation of the empirically powerful pre-training model by analyzing the randomness of ciphers. It gives us insights in understanding the boundary of classification ability over encrypted traffic. The code is available at: https://github.com/linwhitehat/ET-BERT.

* This work has been accepted in Security, Privacy, and Trust track at The Web Conference 2022 (WWW'22)(see https://www2022.thewebconf.org/cfp/research/security/)

Via

Access Paper or Ask Questions

RoPGen: Towards Robust Code Authorship Attribution via Automatic Coding Style Transformation

Feb 12, 2022
Zhen Li, Guenevere, Chen, Chen Chen, Yayi Zou, Shouhuai Xu

Figure 1 for RoPGen: Towards Robust Code Authorship Attribution via Automatic Coding Style Transformation

Figure 2 for RoPGen: Towards Robust Code Authorship Attribution via Automatic Coding Style Transformation

Figure 3 for RoPGen: Towards Robust Code Authorship Attribution via Automatic Coding Style Transformation

Figure 4 for RoPGen: Towards Robust Code Authorship Attribution via Automatic Coding Style Transformation

Source code authorship attribution is an important problem often encountered in applications such as software forensics, bug fixing, and software quality analysis. Recent studies show that current source code authorship attribution methods can be compromised by attackers exploiting adversarial examples and coding style manipulation. This calls for robust solutions to the problem of code authorship attribution. In this paper, we initiate the study on making Deep Learning (DL)-based code authorship attribution robust. We propose an innovative framework called Robust coding style Patterns Generation (RoPGen), which essentially learns authors' unique coding style patterns that are hard for attackers to manipulate or imitate. The key idea is to combine data augmentation and gradient augmentation at the adversarial training phase. This effectively increases the diversity of training examples, generates meaningful perturbations to gradients of deep neural networks, and learns diversified representations of coding styles. We evaluate the effectiveness of RoPGen using four datasets of programs written in C, C++, and Java. Experimental results show that RoPGen can significantly improve the robustness of DL-based code authorship attribution, by respectively reducing 22.8% and 41.0% of the success rate of targeted and untargeted attacks on average.

* ICSE 2022

Via

Access Paper or Ask Questions

HEAM: High-Efficiency Approximate Multiplier Optimization for Deep Neural Networks

Jan 25, 2022
Su Zheng, Zhen Li, Yao Lu, Jingbo Gao, Jide Zhang, Lingli Wang

Figure 1 for HEAM: High-Efficiency Approximate Multiplier Optimization for Deep Neural Networks

Figure 2 for HEAM: High-Efficiency Approximate Multiplier Optimization for Deep Neural Networks

Figure 3 for HEAM: High-Efficiency Approximate Multiplier Optimization for Deep Neural Networks

Figure 4 for HEAM: High-Efficiency Approximate Multiplier Optimization for Deep Neural Networks

We propose an optimization method for the automatic design of approximate multipliers, which minimizes the average error according to the operand distributions. Our multiplier achieves up to 50.24% higher accuracy than the best reproduced approximate multiplier in DNNs, with 15.76% smaller area, 25.05% less power consumption, and 3.50% shorter delay. Compared with an exact multiplier, our multiplier reduces the area, power consumption, and delay by 44.94%, 47.63%, and 16.78%, respectively, with negligible accuracy losses. The tested DNN accelerator modules with our multiplier obtain up to 18.70% smaller area and 9.99% less power consumption than the original modules.

Via

Access Paper or Ask Questions

Heuristic Search for Rank Aggregation with Application to Label Ranking

Jan 11, 2022
Yangming Zhou, Jin-Kao Hao, Zhen Li, Fred Glover

Figure 1 for Heuristic Search for Rank Aggregation with Application to Label Ranking

Figure 2 for Heuristic Search for Rank Aggregation with Application to Label Ranking

Figure 3 for Heuristic Search for Rank Aggregation with Application to Label Ranking

Figure 4 for Heuristic Search for Rank Aggregation with Application to Label Ranking

Rank aggregation aims to combine the preference rankings of a number of alternatives from different voters into a single consensus ranking. As a useful model for a variety of practical applications, however, it is a computationally challenging problem. In this paper, we propose an effective hybrid evolutionary ranking algorithm to solve the rank aggregation problem with both complete and partial rankings. The algorithm features a semantic crossover based on concordant pairs and a late acceptance local search reinforced by an efficient incremental evaluation technique. Experiments are conducted to assess the algorithm, indicating a highly competitive performance on benchmark instances compared with state-of-the-art algorithms. To demonstrate its practical usefulness, the algorithm is applied to label ranking, which is an important machine learning task.

* 12 pages, 4 figures

Via

Access Paper or Ask Questions

CLEVR3D: Compositional Language and Elementary Visual Reasoning for Question Answering in 3D Real-World Scenes

Dec 31, 2021
Xu Yan, Zhihao Yuan, Yuhao Du, Yinghong Liao, Yao Guo, Zhen Li, Shuguang Cui

Figure 1 for CLEVR3D: Compositional Language and Elementary Visual Reasoning for Question Answering in 3D Real-World Scenes

Figure 2 for CLEVR3D: Compositional Language and Elementary Visual Reasoning for Question Answering in 3D Real-World Scenes

Figure 3 for CLEVR3D: Compositional Language and Elementary Visual Reasoning for Question Answering in 3D Real-World Scenes

Figure 4 for CLEVR3D: Compositional Language and Elementary Visual Reasoning for Question Answering in 3D Real-World Scenes

3D scene understanding is a relatively emerging research field. In this paper, we introduce the Visual Question Answering task in 3D real-world scenes (VQA-3D), which aims to answer all possible questions given a 3D scene. To tackle this problem, the first VQA-3D dataset, namely CLEVR3D, is proposed, which contains 60K questions in 1,129 real-world scenes. Specifically, we develop a question engine leveraging 3D scene graph structures to generate diverse reasoning questions, covering the questions of objects' attributes (i.e., size, color, and material) and their spatial relationships. Built upon this dataset, we further design the first VQA-3D baseline model, TransVQA3D. The TransVQA3D model adopts well-designed Transformer architectures to achieve superior VQA-3D performance, compared with the pure language baseline and previous 3D reasoning methods directly applied to 3D scenarios. Experimental results verify that taking VQA-3D as an auxiliary task can boost the performance of 3D scene understanding, including scene graph analysis for the node-wise classification and whole-graph recognition.

Via

Access Paper or Ask Questions