Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jie Liu

Efficient and Accurate Quantized Image Super-Resolution on Mobile NPUs, Mobile AI & AIM 2022 challenge: Report

Nov 07, 2022
Andrey Ignatov, Radu Timofte, Maurizio Denna, Abdel Younes, Ganzorig Gankhuyag, Jingang Huh, Myeong Kyun Kim, Kihwan Yoon, Hyeon-Cheol Moon, Seungho Lee, Yoonsik Choe, Jinwoo Jeong, Sungjei Kim, Maciej Smyl, Tomasz Latkowski, Pawel Kubik, Michal Sokolski, Yujie Ma, Jiahao Chao, Zhou Zhou, Hongfan Gao, Zhengfeng Yang, Zhenbing Zeng, Zhengyang Zhuge, Chenghua Li, Dan Zhu, Mengdi Sun, Ran Duan, Yan Gao, Lingshun Kong, Long Sun, Xiang Li, Xingdong Zhang, Jiawei Zhang, Yaqi Wu, Jinshan Pan, Gaocheng Yu, Jin Zhang, Feng Zhang, Zhe Ma, Hongbin Wang, Hojin Cho, Steve Kim, Huaen Li, Yanbo Ma, Ziwei Luo, Youwei Li, Lei Yu, Zhihong Wen, Qi Wu, Haoqiang Fan, Shuaicheng Liu, Lize Zhang, Zhikai Zong, Jeremy Kwon, Junxi Zhang, Mengyuan Li, Nianxiang Fu, Guanchen Ding, Han Zhu, Zhenzhong Chen, Gen Li, Yuanfan Zhang, Lei Sun, Dafeng Zhang, Neo Yang, Fitz Liu, Jerry Zhao, Mustafa Ayazoglu, Bahri Batuhan Bilecen, Shota Hirose, Kasidis Arunruangsirilert, Luo Ao, Ho Chun Leung, Andrew Wei, Jie Liu, Qiang Liu, Dahai Yu, Ao Li, Lei Luo, Ce Zhu, Seongmin Hong, Dongwon Park, Joonhee Lee, Byeong Hyun Lee, Seunggyu Lee, Se Young Chun, Ruiyuan He, Xuhao Jiang, Haihang Ruan, Xinjian Zhang, Jing Liu, Garas Gendy, Nabil Sabor, Jingchao Hou, Guanghui He

Figure 1 for Efficient and Accurate Quantized Image Super-Resolution on Mobile NPUs, Mobile AI & AIM 2022 challenge: Report

Figure 2 for Efficient and Accurate Quantized Image Super-Resolution on Mobile NPUs, Mobile AI & AIM 2022 challenge: Report

Figure 3 for Efficient and Accurate Quantized Image Super-Resolution on Mobile NPUs, Mobile AI & AIM 2022 challenge: Report

Figure 4 for Efficient and Accurate Quantized Image Super-Resolution on Mobile NPUs, Mobile AI & AIM 2022 challenge: Report

Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose the participants to design an efficient quantized image super-resolution solution that can demonstrate a real-time performance on mobile NPUs. The participants were provided with the DIV2K dataset and trained INT8 models to do a high-quality 3X image upscaling. The runtime of all models was evaluated on the Synaptics VS680 Smart Home board with a dedicated edge NPU capable of accelerating quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 60 FPS rate when reconstructing Full HD resolution images. A detailed description of all models developed in the challenge is provided in this paper.

* arXiv admin note: text overlap with arXiv:2105.07825, arXiv:2105.08826, arXiv:2211.04470, arXiv:2211.03885, arXiv:2211.05256

Via

Access Paper or Ask Questions

Distinguishable Speaker Anonymization based on Formant and Fundamental Frequency Scaling

Nov 06, 2022
Jixun Yao, Qing Wang, Yi Lei, Pengcheng Guo, Lei Xie, Namin Wang, Jie Liu

Figure 1 for Distinguishable Speaker Anonymization based on Formant and Fundamental Frequency Scaling

Figure 2 for Distinguishable Speaker Anonymization based on Formant and Fundamental Frequency Scaling

Figure 3 for Distinguishable Speaker Anonymization based on Formant and Fundamental Frequency Scaling

Figure 4 for Distinguishable Speaker Anonymization based on Formant and Fundamental Frequency Scaling

Speech data on the Internet are proliferating exponentially because of the emergence of social media, and the sharing of such personal data raises obvious security and privacy concerns. One solution to mitigate these concerns involves concealing speaker identities before sharing speech data, also referred to as speaker anonymization. In our previous work, we have developed an automatic speaker verification (ASV)-model-free anonymization framework to protect speaker privacy while preserving speech intelligibility. Although the framework ranked first place in VoicePrivacy 2022 challenge, the anonymization was imperfect, since the speaker distinguishability of the anonymized speech was deteriorated. To address this issue, in this paper, we directly model the formant distribution and fundamental frequency (F0) to represent speaker identity and anonymize the source speech by the uniformly scaling formant and F0. By directly scaling the formant and F0, the speaker distinguishability degradation of the anonymized speech caused by the introduction of other speakers is prevented. The experimental results demonstrate that our proposed framework can improve the speaker distinguishability and significantly outperforms our previous framework in voice distinctiveness. Furthermore, our proposed method also can trade off the privacy-utility by using different scaling factors.

* Submitted to ICASSP 2023

Via

Access Paper or Ask Questions

Improving the Modality Representation with Multi-View Contrastive Learning for Multimodal Sentiment Analysis

Oct 28, 2022
Peipei Liu, Xin Zheng, Hong Li, Jie Liu, Yimo Ren, Hongsong Zhu, Limin Sun

Figure 1 for Improving the Modality Representation with Multi-View Contrastive Learning for Multimodal Sentiment Analysis

Figure 2 for Improving the Modality Representation with Multi-View Contrastive Learning for Multimodal Sentiment Analysis

Figure 3 for Improving the Modality Representation with Multi-View Contrastive Learning for Multimodal Sentiment Analysis

Modality representation learning is an important problem for multimodal sentiment analysis (MSA), since the highly distinguishable representations can contribute to improving the analysis effect. Previous works of MSA have usually focused on multimodal fusion strategies, and the deep study of modal representation learning was given less attention. Recently, contrastive learning has been confirmed effective at endowing the learned representation with stronger discriminate ability. Inspired by this, we explore the improvement approaches of modality representation with contrastive learning in this study. To this end, we devise a three-stages framework with multi-view contrastive learning to refine representations for the specific objectives. At the first stage, for the improvement of unimodal representations, we employ the supervised contrastive learning to pull samples within the same class together while the other samples are pushed apart. At the second stage, a self-supervised contrastive learning is designed for the improvement of the distilled unimodal representations after cross-modal interaction. At last, we leverage again the supervised contrastive learning to enhance the fused multimodal representation. After all the contrast trainings, we next achieve the classification task based on frozen representations. We conduct experiments on three open datasets, and results show the advance of our model.

Via

Access Paper or Ask Questions

CEntRE: A paragraph-level Chinese dataset for Relation Extraction among Enterprises

Oct 19, 2022
Peipei Liu, Hong Li, Zhiyu Wang, Yimo Ren, Jie Liu, Fei Lyu, Hongsong Zhu, Limin Sun

Figure 1 for CEntRE: A paragraph-level Chinese dataset for Relation Extraction among Enterprises

Figure 2 for CEntRE: A paragraph-level Chinese dataset for Relation Extraction among Enterprises

Figure 3 for CEntRE: A paragraph-level Chinese dataset for Relation Extraction among Enterprises

Figure 4 for CEntRE: A paragraph-level Chinese dataset for Relation Extraction among Enterprises

Enterprise relation extraction aims to detect pairs of enterprise entities and identify the business relations between them from unstructured or semi-structured text data, and it is crucial for several real-world applications such as risk analysis, rating research and supply chain security. However, previous work mainly focuses on getting attribute information about enterprises like personnel and corporate business, and pays little attention to enterprise relation extraction. To encourage further progress in the research, we introduce the CEntRE, a new dataset constructed from publicly available business news data with careful human annotation and intelligent data processing. Extensive experiments on CEntRE with six excellent models demonstrate the challenges of our proposed dataset.

Via

Access Paper or Ask Questions

An Improved Structured Mesh Generation Method Based on Physics-informed Neural Networks

Oct 18, 2022
Xinhai Chen, Jie Liu, Junjun Yan, Zhichao Wang, Chunye Gong

Figure 1 for An Improved Structured Mesh Generation Method Based on Physics-informed Neural Networks

Figure 2 for An Improved Structured Mesh Generation Method Based on Physics-informed Neural Networks

Figure 3 for An Improved Structured Mesh Generation Method Based on Physics-informed Neural Networks

Figure 4 for An Improved Structured Mesh Generation Method Based on Physics-informed Neural Networks

Mesh generation remains a key technology in many areas where numerical simulations are required. As numerical algorithms become more efficient and computers become more powerful, the percentage of time devoted to mesh generation becomes higher. In this paper, we present an improved structured mesh generation method. The method formulates the meshing problem as a global optimization problem related to a physics-informed neural network. The mesh is obtained by intelligently solving the physical boundary-constrained partial differential equations. To improve the prediction accuracy of the neural network, we also introduce a novel auxiliary line strategy and an efficient network model during meshing. The strategy first employs a priori auxiliary lines to provide ground truth data and then uses these data to construct a loss term to better constrain the convergence of the subsequent training. The experimental results indicate that the proposed method is effective and robust. It can accurately approximate the mapping (transformation) from the computational domain to the physical domain and enable fast high-quality structured mesh generation.

Via

Access Paper or Ask Questions

Merlin HugeCTR: GPU-accelerated Recommender System Training and Inference

Oct 17, 2022
Joey Wang, Yingcan Wei, Minseok Lee, Matthias Langer, Fan Yu, Jie Liu, Alex Liu, Daniel Abel, Gems Guo, Jianbing Dong, Jerry Shi, Kunlun Li

Figure 1 for Merlin HugeCTR: GPU-accelerated Recommender System Training and Inference

Figure 2 for Merlin HugeCTR: GPU-accelerated Recommender System Training and Inference

In this talk, we introduce Merlin HugeCTR. Merlin HugeCTR is an open source, GPU-accelerated integration framework for click-through rate estimation. It optimizes both training and inference, whilst enabling model training at scale with model-parallel embeddings and data-parallel neural networks. In particular, Merlin HugeCTR combines a high-performance GPU embedding cache with an hierarchical storage architecture, to realize low-latency retrieval of embeddings for online model inference tasks. In the MLPerf v1.0 DLRM model training benchmark, Merlin HugeCTR achieves a speedup of up to 24.6x on a single DGX A100 (8x A100) over PyTorch on 4x4-socket CPU nodes (4x4x28 cores). Merlin HugeCTR can also take advantage of multi-node environments to accelerate training even further. Since late 2021, Merlin HugeCTR additionally features a hierarchical parameter server (HPS) and supports deployment via the NVIDIA Triton server framework, to leverage the computational capabilities of GPUs for high-speed recommendation model inference. Using this HPS, Merlin HugeCTR users can achieve a 5~62x speedup (batch size dependent) for popular recommendation models over CPU baseline implementations, and dramatically reduce their end-to-end inference latency.

* Proceedings of the 16th ACM Conference on Recommender Systems, 2022
* 4 pages

Via

Access Paper or Ask Questions

Multi-Scale Wavelet Transformer for Face Forgery Detection

Oct 08, 2022
Jie Liu, Jingjing Wang, Peng Zhang, Chunmao Wang, Di Xie, Shiliang Pu

Figure 1 for Multi-Scale Wavelet Transformer for Face Forgery Detection

Figure 2 for Multi-Scale Wavelet Transformer for Face Forgery Detection

Figure 3 for Multi-Scale Wavelet Transformer for Face Forgery Detection

Figure 4 for Multi-Scale Wavelet Transformer for Face Forgery Detection

Currently, many face forgery detection methods aggregate spatial and frequency features to enhance the generalization ability and gain promising performance under the cross-dataset scenario. However, these methods only leverage one level frequency information which limits their expressive ability. To overcome these limitations, we propose a multi-scale wavelet transformer framework for face forgery detection. Specifically, to take full advantage of the multi-scale and multi-frequency wavelet representation, we gradually aggregate the multi-scale wavelet representation at different stages of the backbone network. To better fuse the frequency feature with the spatial features, frequency-based spatial attention is designed to guide the spatial feature extractor to concentrate more on forgery traces. Meanwhile, cross-modality attention is proposed to fuse the frequency features with the spatial features. These two attention modules are calculated through a unified transformer block for efficiency. A wide variety of experiments demonstrate that the proposed method is efficient and effective for both within and cross datasets.

* The first two authors contributed equally to this work. Accepted to ACCV 2022 as oral presentation

Via

Access Paper or Ask Questions

Boost CTR Prediction for New Advertisements via Modeling Visual Content

Sep 23, 2022
Tan Yu, Zhipeng Jin, Jie Liu, Yi Yang, Hongliang Fei, Ping Li

Figure 1 for Boost CTR Prediction for New Advertisements via Modeling Visual Content

Figure 2 for Boost CTR Prediction for New Advertisements via Modeling Visual Content

Figure 3 for Boost CTR Prediction for New Advertisements via Modeling Visual Content

Figure 4 for Boost CTR Prediction for New Advertisements via Modeling Visual Content

Existing advertisements click-through rate (CTR) prediction models are mainly dependent on behavior ID features, which are learned based on the historical user-ad interactions. Nevertheless, behavior ID features relying on historical user behaviors are not feasible to describe new ads without previous interactions with users. To overcome the limitations of behavior ID features in modeling new ads, we exploit the visual content in ads to boost the performance of CTR prediction models. Specifically, we map each ad into a set of visual IDs based on its visual content. These visual IDs are further used for generating the visual embedding for enhancing CTR prediction models. We formulate the learning of visual IDs into a supervised quantization problem. Due to a lack of class labels for commercial images in advertisements, we exploit image textual descriptions as the supervision to optimize the image extractor for generating effective visual IDs. Meanwhile, since the hard quantization is non-differentiable, we soften the quantization operation to make it support the end-to-end network training. After mapping each image into visual IDs, we learn the embedding for each visual ID based on the historical user-ad interactions accumulated in the past. Since the visual ID embedding depends only on the visual content, it generalizes well to new ads. Meanwhile, the visual ID embedding complements the ad behavior ID embedding. Thus, it can considerably boost the performance of the CTR prediction models previously relying on behavior ID features for both new ads and ads that have accumulated rich user behaviors. After incorporating the visual ID embedding in the CTR prediction model of Baidu online advertising, the average CTR of ads improves by 1.46%, and the total charge increases by 1.10%.

Via

Access Paper or Ask Questions

Tree-based Text-Vision BERT for Video Search in Baidu Video Advertising

Sep 19, 2022
Tan Yu, Jie Liu, Yi Yang, Yi Li, Hongliang Fei, Ping Li

Figure 1 for Tree-based Text-Vision BERT for Video Search in Baidu Video Advertising

Figure 2 for Tree-based Text-Vision BERT for Video Search in Baidu Video Advertising

Figure 3 for Tree-based Text-Vision BERT for Video Search in Baidu Video Advertising

Figure 4 for Tree-based Text-Vision BERT for Video Search in Baidu Video Advertising

The advancement of the communication technology and the popularity of the smart phones foster the booming of video ads. Baidu, as one of the leading search engine companies in the world, receives billions of search queries per day. How to pair the video ads with the user search is the core task of Baidu video advertising. Due to the modality gap, the query-to-video retrieval is much more challenging than traditional query-to-document retrieval and image-to-image search. Traditionally, the query-to-video retrieval is tackled by the query-to-title retrieval, which is not reliable when the quality of tiles are not high. With the rapid progress achieved in computer vision and natural language processing in recent years, content-based search methods becomes promising for the query-to-video retrieval. Benefited from pretraining on large-scale datasets, some visionBERT methods based on cross-modal attention have achieved excellent performance in many vision-language tasks not only in academia but also in industry. Nevertheless, the expensive computation cost of cross-modal attention makes it impractical for large-scale search in industrial applications. In this work, we present a tree-based combo-attention network (TCAN) which has been recently launched in Baidu's dynamic video advertising platform. It provides a practical solution to deploy the heavy cross-modal attention for the large-scale query-to-video search. After launching tree-based combo-attention network, click-through rate gets improved by 2.29\% and conversion rate get improved by 2.63\%.

* This revision is based on a manuscript submitted in October 2020, to ICDE 2021. We thank the Program Committee for their valuable comments

Via

Access Paper or Ask Questions

A Rotation Meanout Network with Invariance for Dermoscopy Image Classification and Retrieval

Aug 01, 2022
Yilan Zhang, Fengying Xie, Xuedong Song, Hangning Zhou, Yiguang Yang, Haopeng Zhang, Jie Liu

Figure 1 for A Rotation Meanout Network with Invariance for Dermoscopy Image Classification and Retrieval

Figure 2 for A Rotation Meanout Network with Invariance for Dermoscopy Image Classification and Retrieval

Figure 3 for A Rotation Meanout Network with Invariance for Dermoscopy Image Classification and Retrieval

Figure 4 for A Rotation Meanout Network with Invariance for Dermoscopy Image Classification and Retrieval

The computer-aided diagnosis (CAD) system can provide a reference basis for the clinical diagnosis of skin diseases. Convolutional neural networks (CNNs) can not only extract visual elements such as colors and shapes but also semantic features. As such they have made great improvements in many tasks of dermoscopy images. The imaging of dermoscopy has no main direction, indicating that there are a large number of skin lesion target rotations in the datasets. However, CNNs lack anti-rotation ability, which is bound to affect the feature extraction ability of CNNs. We propose a rotation meanout (RM) network to extract rotation invariance features from dermoscopy images. In RM, each set of rotated feature maps corresponds to a set of weight-sharing convolution outputs and they are fused using meanout operation to obtain the final feature maps. Through theoretical derivation, the proposed RM network is rotation-equivariant and can extract rotation-invariant features when being followed by the global average pooling (GAP) operation. The extracted rotation-invariant features can better represent the original data in classification and retrieval tasks for dermoscopy images. The proposed RM is a general operation, which does not change the network structure or increase any parameter, and can be flexibly embedded in any part of CNNs. Extensive experiments are conducted on a dermoscopy image dataset. The results show our method outperforms other anti-rotation methods and achieves great improvements in dermoscopy image classification and retrieval tasks, indicating the potential of rotation invariance in the field of dermoscopy images.

Via

Access Paper or Ask Questions