Alert button
Picture for Pengcheng Wang

Pengcheng Wang

Alert button

Active Prompting with Chain-of-Thought for Large Language Models

Feb 26, 2023
Shizhe Diao, Pengcheng Wang, Yong Lin, Tong Zhang

Figure 1 for Active Prompting with Chain-of-Thought for Large Language Models
Figure 2 for Active Prompting with Chain-of-Thought for Large Language Models
Figure 3 for Active Prompting with Chain-of-Thought for Large Language Models
Figure 4 for Active Prompting with Chain-of-Thought for Large Language Models

The increasing scale of large language models (LLMs) brings emergent abilities to various complex tasks requiring reasoning, such as arithmetic and commonsense reasoning. It is known that the effective design of task-specific prompts is critical for LLMs' ability to produce high-quality answers. In particular, an effective approach for complex question-and-answer tasks is example-based prompting with chain-of-thought (CoT) reasoning, which significantly improves the performance of LLMs. However, current CoT methods rely on a fixed set of human-annotated exemplars, which are not necessarily the most effective examples for different tasks. This paper proposes a new method, Active-Prompt, to adapt LLMs to different tasks with task-specific example prompts (annotated with human-designed CoT reasoning). For this purpose, we propose a solution to the key problem of determining which questions are the most important and helpful ones to annotate from a pool of task-specific queries. By borrowing ideas from the related problem of uncertainty-based active learning, we introduce several metrics to characterize the uncertainty so as to select the most uncertain questions for annotation. Experimental results demonstrate the superiority of our proposed method, achieving state-of-the-art on eight complex reasoning tasks. Further analyses of different uncertainty metrics, pool sizes, zero-shot learning, and accuracy-uncertainty relationship demonstrate the effectiveness of our method. Our code will be available at https://github.com/shizhediao/active-prompt.

* 20 pages, 3 figures, 11 tables 
Viaarxiv icon

Multi-Sem Fusion: Multimodal Semantic Fusion for 3D Object Detection

Dec 10, 2022
Shaoqing Xu, Dingfu Zhou, Jin Fang, Pengcheng Wang, Liangjun Zhang

Figure 1 for Multi-Sem Fusion: Multimodal Semantic Fusion for 3D Object Detection
Figure 2 for Multi-Sem Fusion: Multimodal Semantic Fusion for 3D Object Detection
Figure 3 for Multi-Sem Fusion: Multimodal Semantic Fusion for 3D Object Detection
Figure 4 for Multi-Sem Fusion: Multimodal Semantic Fusion for 3D Object Detection

LiDAR-based 3D Object detectors have achieved impressive performances in many benchmarks, however, multisensors fusion-based techniques are promising to further improve the results. PointPainting, as a recently proposed framework, can add the semantic information from the 2D image into the 3D LiDAR point by the painting operation to boost the detection performance. However, due to the limited resolution of 2D feature maps, severe boundary-blurring effect happens during re-projection of 2D semantic segmentation into the 3D point clouds. To well handle this limitation, a general multimodal fusion framework MSF has been proposed to fuse the semantic information from both the 2D image and 3D points scene parsing results. Specifically, MSF includes three main modules. First, SOTA off-the-shelf 2D/3D semantic segmentation approaches are employed to generate the parsing results for 2D images and 3D point clouds. The 2D semantic information is further re-projected into the 3D point clouds with calibrated parameters. To handle the misalignment between the 2D and 3D parsing results, an AAF module is proposed to fuse them by learning an adaptive fusion score. Then the point cloud with the fused semantic label is sent to the following 3D object detectors. Furthermore, we propose a DFF module to aggregate deep features in different levels to boost the final detection performance. The effectiveness of the framework has been verified on two public large-scale 3D object detection benchmarks by comparing with different baselines. The experimental results show that the proposed fusion strategies can significantly improve the detection performance compared to the methods using only point clouds and the methods using only 2D semantic information. Most importantly, the proposed approach significantly outperforms other approaches and sets new SOTA results on the nuScenes testing benchmark.

* Submitted to T-ITS Journal 
Viaarxiv icon

Automatic Facial Paralysis Estimation with Facial Action Units

Mar 30, 2022
Xuri Ge, Joemon M. Jose, Pengcheng Wang, Arunachalam Iyer, Xiao Liu, Hu Han

Figure 1 for Automatic Facial Paralysis Estimation with Facial Action Units
Figure 2 for Automatic Facial Paralysis Estimation with Facial Action Units
Figure 3 for Automatic Facial Paralysis Estimation with Facial Action Units
Figure 4 for Automatic Facial Paralysis Estimation with Facial Action Units

Facial palsy is unilateral facial nerve weakness or paralysis of rapid onset with unknown causes. Automatically estimating facial palsy severeness can be helpful for the diagnosis and treatment of people suffering from it across the world. In this work, we develop and experiment with a novel model for estimating facial palsy severity. For this, an effective Facial Action Units (AU) detection technique is incorporated into our model, where AUs refer to a unique set of facial muscle movements used to describe almost every anatomically possible facial expression. In this paper, we propose a novel Adaptive Local-Global Relational Network (ALGRNet) for facial AU detection and use it to classify facial paralysis severity. ALGRNet mainly consists of three main novel structures: (i) an adaptive region learning module that learns the adaptive muscle regions based on the detected landmarks; (ii) a skip-BiLSTM that models the latent relationships among local AUs; and (iii) a feature fusion&refining module that investigates the complementary between the local and global face. Quantitative results on two AU benchmarks, i.e., BP4D and DISFA, demonstrate our ALGRNet can achieve promising AU detection accuracy. We further demonstrate the effectiveness of its application to facial paralysis estimation by migrating ALGRNet to a facial paralysis dataset collected and annotated by medical professionals.

* 12 pages, 5 figures, resubmitted to IEEE Transactions on Affective Computing 
Viaarxiv icon

Adaptive Local-Global Relational Network for Facial Action Units Recognition and Facial Paralysis Estimation

Mar 03, 2022
Xuri Ge, Joemon M. Jose, Pengcheng Wang, Arunachalam Iyer, Xiao Liu, Hu Han

Figure 1 for Adaptive Local-Global Relational Network for Facial Action Units Recognition and Facial Paralysis Estimation
Figure 2 for Adaptive Local-Global Relational Network for Facial Action Units Recognition and Facial Paralysis Estimation
Figure 3 for Adaptive Local-Global Relational Network for Facial Action Units Recognition and Facial Paralysis Estimation
Figure 4 for Adaptive Local-Global Relational Network for Facial Action Units Recognition and Facial Paralysis Estimation

Facial action units (AUs) refer to a unique set of facial muscle movements at certain facial locations defined by the Facial Action Coding System (FACS), which can be used for describing nearly any anatomically possible facial expression. Many existing facial action units (AUs) recognition approaches often enhance the AU representation by combining local features from multiple independent branches, each corresponding to a different AU, which usually neglect potential mutual assistance and exclusion relationship between AU branches or simply employ a pre-defined and fixed knowledge-graph as a prior. In addition, extracting features from pre-defined AU regions of regular shapes limits the representation ability. In this paper, we propose a novel Adaptive Local-Global Relational Network (ALGRNet) for facial AU recognition and apply it to facial paralysis estimation. ALGRNet mainly consists of three novel structures, i.e., an adaptive region learning module which learns the adaptive muscle regions based on the detected landmarks, a skip-BiLSTM module which models the latent mutual assistance and exclusion relationship among local AU features, and a feature fusion\&refining module which explores the complementarity between local AUs and the whole face for the local AU refinement. In order to evaluate our proposed method, we migrated ALGRNet to a facial paralysis dataset which is collected and annotated by medical professionals. Experiments on the BP4D and DISFA AU datasets show that the proposed approach outperforms the state-of-the-art methods by a large margin. Additionally, we also demonstrated the effectiveness of the proposed ALGRNet in applications to facial paralysis estimation.

* 10 pages, 5 figures, submitted to IEEE-TMI 
Viaarxiv icon

1st Place Solutions for UG2+ Challenge 2021 -- (Semi-)supervised Face detection in the low light condition

Jul 02, 2021
Pengcheng Wang, Lingqiao Ji, Zhilong Ji, Yuan Gao, Xiao Liu

Figure 1 for 1st Place Solutions for UG2+ Challenge 2021 -- (Semi-)supervised Face detection in the low light condition
Figure 2 for 1st Place Solutions for UG2+ Challenge 2021 -- (Semi-)supervised Face detection in the low light condition
Figure 3 for 1st Place Solutions for UG2+ Challenge 2021 -- (Semi-)supervised Face detection in the low light condition
Figure 4 for 1st Place Solutions for UG2+ Challenge 2021 -- (Semi-)supervised Face detection in the low light condition

In this technical report, we briefly introduce the solution of our team "TAL-ai" for (Semi-) supervised Face detection in the low light condition in UG2+ Challenge in CVPR 2021. By conducting several experiments with popular image enhancement methods and image transfer methods, we pulled the low light image and the normal image to a more closer domain. And it is observed that using these data to training can achieve better performance. We also adapt several popular object detection frameworks, e.g., DetectoRS, Cascade-RCNN, and large backbone like Swin-transformer. Finally, we ensemble several models which achieved mAP 74.89 on the testing set, ranking 1st on the final leaderboard.

Viaarxiv icon

JANUS: Benchmarking Commercial and Open-Source Cloud and Edge Platforms for Object and Anomaly Detection Workloads

Dec 09, 2020
Karthick Shankar, Pengcheng Wang, Ran Xu, Ashraf Mahgoub, Somali Chaterji

Figure 1 for JANUS: Benchmarking Commercial and Open-Source Cloud and Edge Platforms for Object and Anomaly Detection Workloads
Figure 2 for JANUS: Benchmarking Commercial and Open-Source Cloud and Edge Platforms for Object and Anomaly Detection Workloads
Figure 3 for JANUS: Benchmarking Commercial and Open-Source Cloud and Edge Platforms for Object and Anomaly Detection Workloads
Figure 4 for JANUS: Benchmarking Commercial and Open-Source Cloud and Edge Platforms for Object and Anomaly Detection Workloads

With diverse IoT workloads, placing compute and analytics close to where data is collected is becoming increasingly important. We seek to understand what is the performance and the cost implication of running analytics on IoT data at the various available platforms. These workloads can be compute-light, such as outlier detection on sensor data, or compute-intensive, such as object detection from video feeds obtained from drones. In our paper, JANUS, we profile the performance/$ and the compute versus communication cost for a compute-light IoT workload and a compute-intensive IoT workload. In addition, we also look at the pros and cons of some of the proprietary deep-learning object detection packages, such as Amazon Rekognition, Google Vision, and Azure Cognitive Services, to contrast with open-source and tunable solutions, such as Faster R-CNN (FRCNN). We find that AWS IoT Greengrass delivers at least 2X lower latency and 1.25X lower cost compared to all other cloud platforms for the compute-light outlier detection workload. For the compute-intensive streaming video analytics task, an opensource solution to object detection running on cloud VMs saves on dollar costs compared to proprietary solutions provided by Amazon, Microsoft, and Google, but loses out on latency (up to 6X). If it runs on a low-powered edge device, the latency is up to 49X lower.

* "JANUS: Benchmarking Commercial and Open-Source Cloud and Edge Platforms for Object and Anomaly Detection Workloads," IEEE International Conference on Cloud Computing (IEEE Cloud), pp. 1--10, Oct 18-24, 2020  
* Appeared at the IEEE Cloud 2020 conference. 10 pages 
Viaarxiv icon

GLGE: A New General Language Generation Evaluation Benchmark

Nov 24, 2020
Dayiheng Liu, Yu Yan, Yeyun Gong, Weizhen Qi, Hang Zhang, Jian Jiao, Weizhu Chen, Jie Fu, Linjun Shou, Ming Gong, Pengcheng Wang, Jiusheng Chen, Daxin Jiang, Jiancheng Lv, Ruofei Zhang, Winnie Wu, Ming Zhou, Nan Duan

Figure 1 for GLGE: A New General Language Generation Evaluation Benchmark
Figure 2 for GLGE: A New General Language Generation Evaluation Benchmark
Figure 3 for GLGE: A New General Language Generation Evaluation Benchmark
Figure 4 for GLGE: A New General Language Generation Evaluation Benchmark

Multi-task benchmarks such as GLUE and SuperGLUE have driven great progress of pretraining and transfer learning in Natural Language Processing (NLP). These benchmarks mostly focus on a range of Natural Language Understanding (NLU) tasks, without considering the Natural Language Generation (NLG) models. In this paper, we present the General Language Generation Evaluation (GLGE), a new multi-task benchmark for evaluating the generalization capabilities of NLG models across eight language generation tasks. For each task, we continue to design three subtasks in terms of task difficulty (GLGE-Easy, GLGE-Medium, and GLGE-Hard). This introduces 24 subtasks to comprehensively compare model performance. To encourage research on pretraining and transfer learning on NLG models, we make GLGE publicly available and build a leaderboard with strong baselines including MASS, BART, and ProphetNet\footnote{The source code and dataset will be publicly available at https://github.com/microsoft/glge.

* 11 pages 
Viaarxiv icon

ApproxDet: Content and Contention-Aware Approximate Object Detection for Mobiles

Oct 21, 2020
Ran Xu, Chen-lin Zhang, Pengcheng Wang, Jayoung Lee, Subrata Mitra, Somali Chaterji, Yin Li, Saurabh Bagchi

Figure 1 for ApproxDet: Content and Contention-Aware Approximate Object Detection for Mobiles
Figure 2 for ApproxDet: Content and Contention-Aware Approximate Object Detection for Mobiles
Figure 3 for ApproxDet: Content and Contention-Aware Approximate Object Detection for Mobiles
Figure 4 for ApproxDet: Content and Contention-Aware Approximate Object Detection for Mobiles

Advanced video analytic systems, including scene classification and object detection, have seen widespread success in various domains such as smart cities and autonomous transportation. With an ever-growing number of powerful client devices, there is incentive to move these heavy video analytics workloads from the cloud to mobile devices to achieve low latency and real-time processing and to preserve user privacy. However, most video analytic systems are heavyweight and are trained offline with some pre-defined latency or accuracy requirements. This makes them unable to adapt at runtime in the face of three types of dynamism -- the input video characteristics change, the amount of compute resources available on the node changes due to co-located applications, and the user's latency-accuracy requirements change. In this paper we introduce ApproxDet, an adaptive video object detection framework for mobile devices to meet accuracy-latency requirements in the face of changing content and resource contention scenarios. To achieve this, we introduce a multi-branch object detection kernel (layered on Faster R-CNN), which incorporates a data-driven modeling approach on the performance metrics, and a latency SLA-driven scheduler to pick the best execution branch at runtime. We couple this kernel with approximable video object tracking algorithms to create an end-to-end video object detection system. We evaluate ApproxDet on a large benchmark video dataset and compare quantitatively to AdaScale and YOLOv3. We find that ApproxDet is able to adapt to a wide variety of contention and content characteristics and outshines all baselines, e.g., it achieves 52% lower latency and 11.1% higher accuracy over YOLOv3.

* Accepted to appear at the 18th ACM Conference on Embedded Networked Sensor Systems (SenSys), 2020 
Viaarxiv icon

TAL EmotioNet Challenge 2020 Rethinking the Model Chosen Problem in Multi-Task Learning

Apr 21, 2020
Pengcheng Wang, Zihao Wang, Zhilong Ji, Xiao Liu, Songfan Yang, Zhongqin Wu

Figure 1 for TAL EmotioNet Challenge 2020 Rethinking the Model Chosen Problem in Multi-Task Learning
Figure 2 for TAL EmotioNet Challenge 2020 Rethinking the Model Chosen Problem in Multi-Task Learning
Figure 3 for TAL EmotioNet Challenge 2020 Rethinking the Model Chosen Problem in Multi-Task Learning
Figure 4 for TAL EmotioNet Challenge 2020 Rethinking the Model Chosen Problem in Multi-Task Learning

This paper introduces our approach to the EmotioNet Challenge 2020. We pose the AU recognition problem as a multi-task learning problem, where the non-rigid facial muscle motion (mainly the first 17 AUs) and the rigid head motion (the last 6 AUs) are modeled separately. The co-occurrence of the expression features and the head pose features are explored. We observe that different AUs converge at various speed. By choosing the optimal checkpoint for each AU, the recognition results are improved. We are able to obtain a final score of 0.746 in validation set and 0.7306 in the test set of the challenge.

* 4 pages, 2 figures, 3 tables, CVPRW2020 
Viaarxiv icon