Alert button
Picture for He Wen

He Wen

Alert button

Vulnerability Assessment of Industrial Control System with an Improved CVSS

Jun 14, 2023
He Wen

Figure 1 for Vulnerability Assessment of Industrial Control System with an Improved CVSS
Figure 2 for Vulnerability Assessment of Industrial Control System with an Improved CVSS
Figure 3 for Vulnerability Assessment of Industrial Control System with an Improved CVSS
Figure 4 for Vulnerability Assessment of Industrial Control System with an Improved CVSS

Cyberattacks on industrial control systems (ICS) have been drawing attention in academia. However, this has not raised adequate concerns among some industrial practitioners. Therefore, it is necessary to identify the vulnerable locations and components in the ICS and investigate the attack scenarios and techniques. This study proposes a method to assess the risk of cyberattacks on ICS with an improved Common Vulnerability Scoring System (CVSS) and applies it to a continuous stirred tank reactor (CSTR) model. The results show the physical system levels of ICS have the highest severity once cyberattacked, and controllers, workstations, and human-machine interface are the crucial components in the cyberattack and defense.

Viaarxiv icon

The Digital Divide in Process Safety: Quantitative Risk Analysis of Human-AI Collaboration

May 29, 2023
He Wen

Figure 1 for The Digital Divide in Process Safety: Quantitative Risk Analysis of Human-AI Collaboration
Figure 2 for The Digital Divide in Process Safety: Quantitative Risk Analysis of Human-AI Collaboration
Figure 3 for The Digital Divide in Process Safety: Quantitative Risk Analysis of Human-AI Collaboration
Figure 4 for The Digital Divide in Process Safety: Quantitative Risk Analysis of Human-AI Collaboration

Digital technologies have dramatically accelerated the digital transformation in process industries, boosted new industrial applications, upgraded the production system, and enhanced operational efficiency. In contrast, the challenges and gaps between human and artificial intelligence (AI) have become more and more prominent, whereas the digital divide in process safety is aggregating. The study attempts to address the following questions: (i)What is AI in the process safety context? (ii)What is the difference between AI and humans in process safety? (iii)How do AI and humans collaborate in process safety? (iv)What are the challenges and gaps in human-AI collaboration? (v)How to quantify the risk of human-AI collaboration in process safety? Qualitative risk analysis based on brainstorming and literature review, and quantitative risk analysis based on layer of protection analysis (LOPA) and Bayesian network (BN), were applied to explore and model. The importance of human reliability should be stressed in the digital age, not usually to increase the reliability of AI, and human-centered AI design in process safety needs to be propagated.

Viaarxiv icon

Alert of the Second Decision-maker: An Introduction to Human-AI Conflict

May 25, 2023
He Wen

Figure 1 for Alert of the Second Decision-maker: An Introduction to Human-AI Conflict
Figure 2 for Alert of the Second Decision-maker: An Introduction to Human-AI Conflict
Figure 3 for Alert of the Second Decision-maker: An Introduction to Human-AI Conflict
Figure 4 for Alert of the Second Decision-maker: An Introduction to Human-AI Conflict

The collaboration between humans and artificial intelligence (AI) is a significant feature in this digital age. However, humans and AI may have observation, interpretation, and action conflicts when working synchronously. This phenomenon is often masked by faults and, unfortunately, overlooked. This paper systematically introduces the human-AI conflict concept, causes, measurement methods, and risk assessment. The results highlight that there is a potential second decision-maker besides the human, which is the AI; the human-AI conflict is a unique and emerging risk in digitalized process systems; and this is an interdisciplinary field that needs to be distinguished from traditional fault and failure analysis; the conflict risk is significant and cannot be ignored.

Viaarxiv icon

Garment Avatars: Realistic Cloth Driving using Pattern Registration

Jun 07, 2022
Oshri Halimi, Fabian Prada, Tuur Stuyck, Donglai Xiang, Timur Bagautdinov, He Wen, Ron Kimmel, Takaaki Shiratori, Chenglei Wu, Yaser Sheikh

Figure 1 for Garment Avatars: Realistic Cloth Driving using Pattern Registration
Figure 2 for Garment Avatars: Realistic Cloth Driving using Pattern Registration
Figure 3 for Garment Avatars: Realistic Cloth Driving using Pattern Registration
Figure 4 for Garment Avatars: Realistic Cloth Driving using Pattern Registration

Virtual telepresence is the future of online communication. Clothing is an essential part of a person's identity and self-expression. Yet, ground truth data of registered clothes is currently unavailable in the required resolution and accuracy for training telepresence models for realistic cloth animation. Here, we propose an end-to-end pipeline for building drivable representations for clothing. The core of our approach is a multi-view patterned cloth tracking algorithm capable of capturing deformations with high accuracy. We further rely on the high-quality data produced by our tracking method to build a Garment Avatar: an expressive and fully-drivable geometry model for a piece of clothing. The resulting model can be animated using a sparse set of views and produces highly realistic reconstructions which are faithful to the driving signals. We demonstrate the efficacy of our pipeline on a realistic virtual telepresence application, where a garment is being reconstructed from two views, and a user can pick and swap garment design as they wish. In addition, we show a challenging scenario when driven exclusively with body pose, our drivable garment avatar is capable of producing realistic cloth geometry of significantly higher quality than the state-of-the-art.

Viaarxiv icon

Active Learning with Pseudo-Labels for Multi-View 3D Pose Estimation

Dec 27, 2021
Qi Feng, Kun He, He Wen, Cem Keskin, Yuting Ye

Figure 1 for Active Learning with Pseudo-Labels for Multi-View 3D Pose Estimation
Figure 2 for Active Learning with Pseudo-Labels for Multi-View 3D Pose Estimation
Figure 3 for Active Learning with Pseudo-Labels for Multi-View 3D Pose Estimation
Figure 4 for Active Learning with Pseudo-Labels for Multi-View 3D Pose Estimation

Pose estimation of the human body/hand is a fundamental problem in computer vision, and learning-based solutions require a large amount of annotated data. Given limited annotation budgets, a common approach to increasing label efficiency is Active Learning (AL), which selects examples with the highest value to annotate, but choosing the selection strategy is often nontrivial. In this work, we improve Active Learning for the problem of 3D pose estimation in a multi-view setting, which is of increasing importance in many application scenarios. We develop a framework that allows us to efficiently extend existing single-view AL strategies, and then propose two novel AL strategies that make full use of multi-view geometry. Moreover, we demonstrate additional performance gains by incorporating predicted pseudo-labels, which is a form of self-training. Our system significantly outperforms baselines in 3D body and hand pose estimation on two large-scale benchmarks: CMU Panoptic Studio and InterHand2.6M. Notably, on CMU Panoptic Studio, we are able to match the performance of a fully-supervised model using only 20% of labeled training data.

* Work done during internship at Meta Reality Labs 
Viaarxiv icon

Explicit Clothing Modeling for an Animatable Full-Body Avatar

Jun 30, 2021
Donglai Xiang, Fabian Andres Prada, Timur Bagautdinov, Weipeng Xu, Yuan Dong, He Wen, Jessica Hodgins, Chenglei Wu

Figure 1 for Explicit Clothing Modeling for an Animatable Full-Body Avatar
Figure 2 for Explicit Clothing Modeling for an Animatable Full-Body Avatar
Figure 3 for Explicit Clothing Modeling for an Animatable Full-Body Avatar
Figure 4 for Explicit Clothing Modeling for an Animatable Full-Body Avatar

Recent work has shown great progress in building photorealistic animatable full-body codec avatars, but these avatars still face difficulties in generating high-fidelity animation of clothing. To address the difficulties, we propose a method to build an animatable clothed body avatar with an explicit representation of the clothing on the upper body from multi-view captured videos. We use a two-layer mesh representation to separately register the 3D scans with templates. In order to improve the photometric correspondence across different frames, texture alignment is then performed through inverse rendering of the clothing geometry and texture predicted by a variational autoencoder. We then train a new two-layer codec avatar with separate modeling of the upper clothing and the inner body layer. To learn the interaction between the body dynamics and clothing states, we use a temporal convolution network to predict the clothing latent code based on a sequence of input skeletal poses. We show photorealistic animation output for three different actors, and demonstrate the advantage of our clothed-body avatars over single-layer avatars in the previous work. We also show the benefit of an explicit clothing model which allows the clothing texture to be edited in the animation output.

Viaarxiv icon

InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image

Aug 21, 2020
Gyeongsik Moon, Shoou-i Yu, He Wen, Takaaki Shiratori, Kyoung Mu Lee

Figure 1 for InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image
Figure 2 for InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image
Figure 3 for InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image
Figure 4 for InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image

Analysis of hand-hand interactions is a crucial step towards better understanding human behavior. However, most researches in 3D hand pose estimation have focused on the isolated single hand case. Therefore, we firstly propose (1) a large-scale dataset, InterHand2.6M, and (2) a baseline network, InterNet, for 3D interacting hand pose estimation from a single RGB image. The proposed InterHand2.6M consists of \textbf{2.6M labeled single and interacting hand frames} under various poses from multiple subjects. Our InterNet simultaneously performs 3D single and interacting hand pose estimation. In our experiments, we demonstrate big gains in 3D interacting hand pose estimation accuracy when leveraging the interacting hand data in InterHand2.6M. We also report the accuracy of InterNet on InterHand2.6M, which serves as a strong baseline for this new dataset. Finally, we show 3D interacting hand pose estimation results from general images. Our code and dataset are available at https://mks0601.github.io/InterHand2.6M/.

* Published at ECCV 2020 
Viaarxiv icon

DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients

Feb 02, 2018
Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, Yuheng Zou

Figure 1 for DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients
Figure 2 for DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients
Figure 3 for DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients
Figure 4 for DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients

We propose DoReFa-Net, a method to train convolutional neural networks that have low bitwidth weights and activations using low bitwidth parameter gradients. In particular, during backward pass, parameter gradients are stochastically quantized to low bitwidth numbers before being propagated to convolutional layers. As convolutions during forward/backward passes can now operate on low bitwidth weights and activations/gradients respectively, DoReFa-Net can use bit convolution kernels to accelerate both training and inference. Moreover, as bit convolutions can be efficiently implemented on CPU, FPGA, ASIC and GPU, DoReFa-Net opens the way to accelerate training of low bitwidth neural network on these hardware. Our experiments on SVHN and ImageNet datasets prove that DoReFa-Net can achieve comparable prediction accuracy as 32-bit counterparts. For example, a DoReFa-Net derived from AlexNet that has 1-bit weights, 2-bit activations, can be trained from scratch using 6-bit gradients to get 46.1\% top-1 accuracy on ImageNet validation set. The DoReFa-Net AlexNet model is released publicly.

Viaarxiv icon

EAST: An Efficient and Accurate Scene Text Detector

Jul 10, 2017
Xinyu Zhou, Cong Yao, He Wen, Yuzhi Wang, Shuchang Zhou, Weiran He, Jiajun Liang

Figure 1 for EAST: An Efficient and Accurate Scene Text Detector
Figure 2 for EAST: An Efficient and Accurate Scene Text Detector
Figure 3 for EAST: An Efficient and Accurate Scene Text Detector
Figure 4 for EAST: An Efficient and Accurate Scene Text Detector

Previous approaches for scene text detection have already achieved promising performances across various benchmarks. However, they usually fall short when dealing with challenging scenarios, even when equipped with deep neural network models, because the overall performance is determined by the interplay of multiple stages and components in the pipelines. In this work, we propose a simple yet powerful pipeline that yields fast and accurate text detection in natural scenes. The pipeline directly predicts words or text lines of arbitrary orientations and quadrilateral shapes in full images, eliminating unnecessary intermediate steps (e.g., candidate aggregation and word partitioning), with a single neural network. The simplicity of our pipeline allows concentrating efforts on designing loss functions and neural network architecture. Experiments on standard datasets including ICDAR 2015, COCO-Text and MSRA-TD500 demonstrate that the proposed algorithm significantly outperforms state-of-the-art methods in terms of both accuracy and efficiency. On the ICDAR 2015 dataset, the proposed algorithm achieves an F-score of 0.7820 at 13.2fps at 720p resolution.

* Accepted to CVPR 2017, fix equation (3) 
Viaarxiv icon