Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jiacheng Li

SmartBERT: A Promotion of Dynamic Early Exiting Mechanism for Accelerating BERT Inference

Mar 16, 2023

Boren Hu, Yun Zhu, Jiacheng Li, Siliang Tang

Abstract:Dynamic early exiting has been proven to improve the inference speed of the pre-trained language model like BERT. However, all samples must go through all consecutive layers before early exiting and more complex samples usually go through more layers, which still exists redundant computation. In this paper, we propose a novel dynamic early exiting combined with layer skipping for BERT inference named SmartBERT, which adds a skipping gate and an exiting operator into each layer of BERT. SmartBERT can adaptively skip some layers and adaptively choose whether to exit. Besides, we propose cross-layer contrastive learning and combine it into our training phases to boost the intermediate layers and classifiers which would be beneficial for early exiting. To keep the consistent usage of skipping gates between training and inference phases, we propose a hard weight mechanism during training phase. We conduct experiments on eight classification datasets of the GLUE benchmark. Experimental results show that SmartBERT achieves 2-3x computation reduction with minimal accuracy drops compared with BERT and our method outperforms previous methods in both efficiency and accuracy. Moreover, in some complex datasets like RTE and WNLI, we prove that the early exiting based on entropy hardly works, and the skipping mechanism is essential for reducing computation.

Via

Access Paper or Ask Questions

Lformer: Text-to-Image Generation with L-shape Block Parallel Decoding

Mar 07, 2023

Jiacheng Li, Longhui Wei, ZongYuan Zhan, Xin He, Siliang Tang, Qi Tian, Yueting Zhuang

Abstract:Generative transformers have shown their superiority in synthesizing high-fidelity and high-resolution images, such as good diversity and training stability. However, they suffer from the problem of slow generation since they need to generate a long token sequence autoregressively. To better accelerate the generative transformers while keeping good generation quality, we propose Lformer, a semi-autoregressive text-to-image generation model. Lformer firstly encodes an image into $h{\times}h$ discrete tokens, then divides these tokens into $h$ mirrored L-shape blocks from the top left to the bottom right and decodes the tokens in a block parallelly in each step. Lformer predicts the area adjacent to the previous context like autoregressive models thus it is more stable while accelerating. By leveraging the 2D structure of image tokens, Lformer achieves faster speed than the existing transformer-based methods while keeping good generation quality. Moreover, the pretrained Lformer can edit images without the requirement for finetuning. We can roll back to the early steps for regeneration or edit the image with a bounding box and a text prompt.

Via

Access Paper or Ask Questions

An advanced YOLOv3 method for small object detection

Dec 06, 2022

Baokai Liu, Fengjie He, Shiqiang Du, Jiacheng Li, Wenjie Liu

Abstract:In recent years, object detection has achieved a very large performance improvement, but the detection result of small objects is still not very satisfactory. This work proposes a strategy based on feature fusion and dilated convolution that employs dilated convolution to broaden the receptive field of feature maps at various scales in order to address this issue. On the one hand, it can improve the detection accuracy of larger objects. On the other hand, it provides more contextual information for small objects, which is beneficial to improving the detection accuracy of small objects. The shallow semantic information of small objects is obtained by filtering out the noise in the feature map, and the feature information of more small objects is preserved by using multi-scale fusion feature module and attention mechanism. The fusion of these shallow feature information and deep semantic information can generate richer feature maps for small object detection. Experiments show that this method can have higher accuracy than the traditional YOLOv3 network in the detection of small objects and occluded objects. In addition, we achieve 32.8\% Mean Average Precision on the detection of small objects on MS COCO2017 test set. For 640*640 input, this method has 88.76\% mAP on the PASCAL VOC2012 dataset.

Via

Access Paper or Ask Questions

CLIP also Understands Text: Prompting CLIP for Phrase Understanding

Oct 11, 2022

An Yan, Jiacheng Li, Wanrong Zhu, Yujie Lu, William Yang Wang, Julian McAuley

Figure 1 for CLIP also Understands Text: Prompting CLIP for Phrase Understanding

Figure 2 for CLIP also Understands Text: Prompting CLIP for Phrase Understanding

Figure 3 for CLIP also Understands Text: Prompting CLIP for Phrase Understanding

Figure 4 for CLIP also Understands Text: Prompting CLIP for Phrase Understanding

Abstract:Contrastive Language-Image Pretraining (CLIP) efficiently learns visual concepts by pre-training with natural language supervision. CLIP and its visual encoder have been explored on various vision and language tasks and achieve strong zero-shot or transfer learning performance. However, the application of its text encoder solely for text understanding has been less explored. In this paper, we find that the text encoder of CLIP actually demonstrates strong ability for phrase understanding, and can even significantly outperform popular language models such as BERT with a properly designed prompt. Extensive experiments validate the effectiveness of our method across different datasets and domains on entity clustering and entity set expansion tasks.

* Work in progress

Via

Access Paper or Ask Questions

Deep Learning for Logo Detection: A Survey

Oct 10, 2022

Sujuan Hou, Jiacheng Li, Weiqing Min, Qiang Hou, Yanna Zhao, Yuanjie Zheng, Shuqiang Jiang

Abstract:When logos are increasingly created, logo detection has gradually become a research hotspot across many domains and tasks. Recent advances in this area are dominated by deep learning-based solutions, where many datasets, learning strategies, network architectures, etc. have been employed. This paper reviews the advance in applying deep learning techniques to logo detection. Firstly, we discuss a comprehensive account of public datasets designed to facilitate performance evaluation of logo detection algorithms, which tend to be more diverse, more challenging, and more reflective of real life. Next, we perform an in-depth analysis of the existing logo detection strategies and the strengths and weaknesses of each learning strategy. Subsequently, we summarize the applications of logo detection in various fields, from intelligent transportation and brand monitoring to copyright and trademark compliance. Finally, we analyze the potential challenges and present the future directions for the development of logo detection to complete this survey.

Via

Access Paper or Ask Questions

UCEpic: Unifying Aspect Planning and Lexical Constraints for Explainable Recommendation

Sep 28, 2022

Jiacheng Li, Zhankui He, Jingbo Shang, Julian McAuley

Figure 1 for UCEpic: Unifying Aspect Planning and Lexical Constraints for Explainable Recommendation

Figure 2 for UCEpic: Unifying Aspect Planning and Lexical Constraints for Explainable Recommendation

Figure 3 for UCEpic: Unifying Aspect Planning and Lexical Constraints for Explainable Recommendation

Figure 4 for UCEpic: Unifying Aspect Planning and Lexical Constraints for Explainable Recommendation

Abstract:Personalized natural language generation for explainable recommendations plays a key role in justifying why a recommendation might match a user's interests. Existing models usually control the generation process by soft constraints (e.g.,~aspect planning). While promising, these methods struggle to generate specific information correctly, which prevents generated explanations from being informative and diverse. In this paper, we propose UCEpic, an explanation generation model that unifies aspect planning and lexical constraints for controllable personalized generation. Specifically, we first pre-train a non-personalized text generator by our proposed robust insertion process so that the model is able to generate sentences containing lexical constraints. Then, we demonstrate the method of incorporating aspect planning and personalized references into the insertion process to obtain personalized explanations. Compared to previous work controlled by soft constraints, UCEpic incorporates specific information from keyphrases and then largely improves the diversity and informativeness of generated explanations. Extensive experiments on RateBeer and Yelp show that UCEpic can generate high-quality and diverse explanations for recommendations.

Via

Access Paper or Ask Questions

Representing Knowledge by Spans: A Knowledge-Enhanced Model for Information Extraction

Aug 20, 2022

Jiacheng Li, Yannis Katsis, Tyler Baldwin, Ho-Cheol Kim, Andrew Bartko, Julian McAuley, Chun-Nan Hsu

Figure 1 for Representing Knowledge by Spans: A Knowledge-Enhanced Model for Information Extraction

Figure 2 for Representing Knowledge by Spans: A Knowledge-Enhanced Model for Information Extraction

Figure 3 for Representing Knowledge by Spans: A Knowledge-Enhanced Model for Information Extraction

Figure 4 for Representing Knowledge by Spans: A Knowledge-Enhanced Model for Information Extraction

Abstract:Knowledge-enhanced pre-trained models for language representation have been shown to be more effective in knowledge base construction tasks (i.e.,~relation extraction) than language models such as BERT. These knowledge-enhanced language models incorporate knowledge into pre-training to generate representations of entities or relationships. However, existing methods typically represent each entity with a separate embedding. As a result, these methods struggle to represent out-of-vocabulary entities and a large amount of parameters, on top of their underlying token models (i.e.,~the transformer), must be used and the number of entities that can be handled is limited in practice due to memory constraints. Moreover, existing models still struggle to represent entities and relationships simultaneously. To address these problems, we propose a new pre-trained model that learns representations of both entities and relationships from token spans and span pairs in the text respectively. By encoding spans efficiently with span modules, our model can represent both entities and their relationships but requires fewer parameters than existing models. We pre-trained our model with the knowledge graph extracted from Wikipedia and test it on a broad range of supervised and unsupervised information extraction tasks. Results show that our model learns better representations for both entities and relationships than baselines, while in supervised settings, fine-tuning our model outperforms RoBERTa consistently and achieves competitive results on information extraction tasks.

* CIKM 2022

Via

Access Paper or Ask Questions

Personalized Showcases: Generating Multi-Modal Explanations for Recommendations

Jun 30, 2022

An Yan, Zhankui He, Jiacheng Li, Tianyang Zhang, Julian McAuley

Figure 1 for Personalized Showcases: Generating Multi-Modal Explanations for Recommendations

Figure 2 for Personalized Showcases: Generating Multi-Modal Explanations for Recommendations

Figure 3 for Personalized Showcases: Generating Multi-Modal Explanations for Recommendations

Figure 4 for Personalized Showcases: Generating Multi-Modal Explanations for Recommendations

Abstract:Existing explanation models generate only text for recommendations but still struggle to produce diverse contents. In this paper, to further enrich explanations, we propose a new task named personalized showcases, in which we provide both textual and visual information to explain our recommendations. Specifically, we first select a personalized image set that is the most relevant to a user's interest toward a recommended item. Then, natural language explanations are generated accordingly given our selected images. For this new task, we collect a large-scale dataset from Google Local (i.e.,~maps) and construct a high-quality subset for generating multi-modal explanations. We propose a personalized multi-modal framework which can generate diverse and visually-aligned explanations via contrastive learning. Experiments show that our framework benefits from different modalities as inputs, and is able to produce more diverse and expressive explanations compared to previous methods on a variety of evaluation metrics.

* 10 pages, 7 figures

Via

Access Paper or Ask Questions

Fine-grained Contrastive Learning for Relation Extraction

May 25, 2022

William Hogan, Jiacheng Li, Jingbo Shang

Figure 1 for Fine-grained Contrastive Learning for Relation Extraction

Figure 2 for Fine-grained Contrastive Learning for Relation Extraction

Figure 3 for Fine-grained Contrastive Learning for Relation Extraction

Figure 4 for Fine-grained Contrastive Learning for Relation Extraction

Abstract:Recent relation extraction (RE) works have shown encouraging improvements by conducting contrastive learning on silver labels generated by distant supervision before fine-tuning on gold labels. Existing methods typically assume all these silver labels are accurate and therefore treat them equally in contrastive learning; however, distant supervision is inevitably noisy -- some silver labels are more reliable than others. In this paper, we first assess the quality of silver labels via a simple and automatic approach we call "learning order denoising," where we train a language model to learn these relations and record the order of learned training instances. We show that learning order largely corresponds to label accuracy -- early learned silver labels have, on average, more accurate labels compared to later learned silver labels. We then propose a novel fine-grained contrastive learning (FineCL) for RE, which leverages this additional, fine-grained information about which silver labels are and are not noisy to improve the quality of learned relationship representations for RE. Experiments on many RE benchmarks show consistent, significant performance gains of FineCL over state-of-the-art methods.

* 8 pages, 4 figures

Via

Access Paper or Ask Questions

MuCGEC: a Multi-Reference Multi-Source Evaluation Dataset for Chinese Grammatical Error Correction

May 04, 2022

Yue Zhang, Zhenghua Li, Zuyi Bao, Jiacheng Li, Bo Zhang, Chen Li, Fei Huang, Min Zhang

Figure 1 for MuCGEC: a Multi-Reference Multi-Source Evaluation Dataset for Chinese Grammatical Error Correction

Figure 2 for MuCGEC: a Multi-Reference Multi-Source Evaluation Dataset for Chinese Grammatical Error Correction

Figure 3 for MuCGEC: a Multi-Reference Multi-Source Evaluation Dataset for Chinese Grammatical Error Correction

Figure 4 for MuCGEC: a Multi-Reference Multi-Source Evaluation Dataset for Chinese Grammatical Error Correction

Abstract:This paper presents MuCGEC, a multi-reference multi-source evaluation dataset for Chinese Grammatical Error Correction (CGEC), consisting of 7,063 sentences collected from three Chinese-as-a-Second-Language (CSL) learner sources. Each sentence is corrected by three annotators, and their corrections are carefully reviewed by a senior annotator, resulting in 2.3 references per sentence. We conduct experiments with two mainstream CGEC models, i.e., the sequence-to-sequence model and the sequence-to-edit model, both enhanced with large pretrained language models, achieving competitive benchmark performance on previous and our datasets. We also discuss CGEC evaluation methodologies, including the effect of multiple references and using a char-based metric. Our annotation guidelines, data, and code are available at \url{https://github.com/HillZhang1999/MuCGEC}.

* Accepted by NAACL2022 (main conference)

Via

Access Paper or Ask Questions