Alert button
Picture for Shuang Yang

Shuang Yang

Alert button

Unpaired Multi-domain Attribute Translation of 3D Facial Shapes with a Square and Symmetric Geometric Map

Aug 25, 2023
Zhenfeng Fan, Zhiheng Zhang, Shuang Yang, Chongyang Zhong, Min Cao, Shihong Xia

Figure 1 for Unpaired Multi-domain Attribute Translation of 3D Facial Shapes with a Square and Symmetric Geometric Map
Figure 2 for Unpaired Multi-domain Attribute Translation of 3D Facial Shapes with a Square and Symmetric Geometric Map
Figure 3 for Unpaired Multi-domain Attribute Translation of 3D Facial Shapes with a Square and Symmetric Geometric Map
Figure 4 for Unpaired Multi-domain Attribute Translation of 3D Facial Shapes with a Square and Symmetric Geometric Map

While impressive progress has recently been made in image-oriented facial attribute translation, shape-oriented 3D facial attribute translation remains an unsolved issue. This is primarily limited by the lack of 3D generative models and ineffective usage of 3D facial data. We propose a learning framework for 3D facial attribute translation to relieve these limitations. Firstly, we customize a novel geometric map for 3D shape representation and embed it in an end-to-end generative adversarial network. The geometric map represents 3D shapes symmetrically on a square image grid, while preserving the neighboring relationship of 3D vertices in a local least-square sense. This enables effective learning for the latent representation of data with different attributes. Secondly, we employ a unified and unpaired learning framework for multi-domain attribute translation. It not only makes effective usage of data correlation from multiple domains, but also mitigates the constraint for hardly accessible paired data. Finally, we propose a hierarchical architecture for the discriminator to guarantee robust results against both global and local artifacts. We conduct extensive experiments to demonstrate the advantage of the proposed framework over the state-of-the-art in generating high-fidelity facial shapes. Given an input 3D facial shape, the proposed framework is able to synthesize novel shapes of different attributes, which covers some downstream applications, such as expression transfer, gender translation, and aging. Code at https://github.com/NaughtyZZ/3D_facial_shape_attribute_translation_ssgmap.

Viaarxiv icon

Breaking the Curse of Quality Saturation with User-Centric Ranking

May 24, 2023
Zhuokai Zhao, Yang Yang, Wenyu Wang, Chihuang Liu, Yu Shi, Wenjie Hu, Haotian Zhang, Shuang Yang

Figure 1 for Breaking the Curse of Quality Saturation with User-Centric Ranking
Figure 2 for Breaking the Curse of Quality Saturation with User-Centric Ranking
Figure 3 for Breaking the Curse of Quality Saturation with User-Centric Ranking
Figure 4 for Breaking the Curse of Quality Saturation with User-Centric Ranking

A key puzzle in search, ads, and recommendation is that the ranking model can only utilize a small portion of the vastly available user interaction data. As a result, increasing data volume, model size, or computation FLOPs will quickly suffer from diminishing returns. We examined this problem and found that one of the root causes may lie in the so-called ``item-centric'' formulation, which has an unbounded vocabulary and thus uncontrolled model complexity. To mitigate quality saturation, we introduce an alternative formulation named ``user-centric ranking'', which is based on a transposed view of the dyadic user-item interaction data. We show that this formulation has a promising scaling property, enabling us to train better-converged models on substantially larger data sets.

* accepted to publish at KDD 2023 
Viaarxiv icon

Tile Networks: Learning Optimal Geometric Layout for Whole-page Recommendation

Mar 03, 2023
Shuai Xiao, Zaifan Jiang, Shuang Yang

Figure 1 for Tile Networks: Learning Optimal Geometric Layout for Whole-page Recommendation
Figure 2 for Tile Networks: Learning Optimal Geometric Layout for Whole-page Recommendation
Figure 3 for Tile Networks: Learning Optimal Geometric Layout for Whole-page Recommendation
Figure 4 for Tile Networks: Learning Optimal Geometric Layout for Whole-page Recommendation

Finding optimal configurations in a geometric space is a key challenge in many technological disciplines. Current approaches either rely heavily on human domain expertise and are difficult to scale. In this paper we show it is possible to solve configuration optimization problems for whole-page recommendation using reinforcement learning. The proposed \textit{Tile Networks} is a neural architecture that optimizes 2D geometric configurations by arranging items on proper positions. Empirical results on real dataset demonstrate its superior performance compared to traditional learning to rank approaches and recent deep models.

* Published at Proceedings of the 25th International Conference on Artificial Intelligence and Statistics (AISTATS) 2022 
Viaarxiv icon

Model-based Constrained MDP for Budget Allocation in Sequential Incentive Marketing

Mar 02, 2023
Shuai Xiao, Le Guo, Zaifan Jiang, Lei Lv, Yuanbo Chen, Jun Zhu, Shuang Yang

Figure 1 for Model-based Constrained MDP for Budget Allocation in Sequential Incentive Marketing
Figure 2 for Model-based Constrained MDP for Budget Allocation in Sequential Incentive Marketing
Figure 3 for Model-based Constrained MDP for Budget Allocation in Sequential Incentive Marketing
Figure 4 for Model-based Constrained MDP for Budget Allocation in Sequential Incentive Marketing

Sequential incentive marketing is an important approach for online businesses to acquire customers, increase loyalty and boost sales. How to effectively allocate the incentives so as to maximize the return (e.g., business objectives) under the budget constraint, however, is less studied in the literature. This problem is technically challenging due to the facts that 1) the allocation strategy has to be learned using historically logged data, which is counterfactual in nature, and 2) both the optimality and feasibility (i.e., that cost cannot exceed budget) needs to be assessed before being deployed to online systems. In this paper, we formulate the problem as a constrained Markov decision process (CMDP). To solve the CMDP problem with logged counterfactual data, we propose an efficient learning algorithm which combines bisection search and model-based planning. First, the CMDP is converted into its dual using Lagrangian relaxation, which is proved to be monotonic with respect to the dual variable. Furthermore, we show that the dual problem can be solved by policy learning, with the optimal dual variable being found efficiently via bisection search (i.e., by taking advantage of the monotonicity). Lastly, we show that model-based planing can be used to effectively accelerate the joint optimization process without retraining the policy for every dual variable. Empirical results on synthetic and real marketing datasets confirm the effectiveness of our methods.

* Published at CIKM '19: Proceedings of the 28th ACM International Conference on Information and Knowledge Management 
Viaarxiv icon

UniCon+: ICTCAS-UCAS Submission to the AVA-ActiveSpeaker Task at ActivityNet Challenge 2022

Jun 22, 2022
Yuanhang Zhang, Susan Liang, Shuang Yang, Shiguang Shan

Figure 1 for UniCon+: ICTCAS-UCAS Submission to the AVA-ActiveSpeaker Task at ActivityNet Challenge 2022
Figure 2 for UniCon+: ICTCAS-UCAS Submission to the AVA-ActiveSpeaker Task at ActivityNet Challenge 2022
Figure 3 for UniCon+: ICTCAS-UCAS Submission to the AVA-ActiveSpeaker Task at ActivityNet Challenge 2022
Figure 4 for UniCon+: ICTCAS-UCAS Submission to the AVA-ActiveSpeaker Task at ActivityNet Challenge 2022

This report presents a brief description of our winning solution to the AVA Active Speaker Detection (ASD) task at ActivityNet Challenge 2022. Our underlying model UniCon+ continues to build on our previous work, the Unified Context Network (UniCon) and Extended UniCon which are designed for robust scene-level ASD. We augment the architecture with a simple GRU-based module that allows information of recurring identities to flow across scenes through read and update operations. We report a best result of 94.47% mAP on the AVA-ActiveSpeaker test set, which continues to rank first on this year's challenge leaderboard and significantly pushes the state-of-the-art.

* 5 pages, 3 figures; technical report for AVA Challenge (see https://research.google.com/ava/challenge.html) at the International Challenge on Activity Recognition (ActivityNet), CVPR 2022 
Viaarxiv icon

UniCon: Unified Context Network for Robust Active Speaker Detection

Aug 05, 2021
Yuanhang Zhang, Susan Liang, Shuang Yang, Xiao Liu, Zhongqin Wu, Shiguang Shan, Xilin Chen

Figure 1 for UniCon: Unified Context Network for Robust Active Speaker Detection
Figure 2 for UniCon: Unified Context Network for Robust Active Speaker Detection
Figure 3 for UniCon: Unified Context Network for Robust Active Speaker Detection
Figure 4 for UniCon: Unified Context Network for Robust Active Speaker Detection

We introduce a new efficient framework, the Unified Context Network (UniCon), for robust active speaker detection (ASD). Traditional methods for ASD usually operate on each candidate's pre-cropped face track separately and do not sufficiently consider the relationships among the candidates. This potentially limits performance, especially in challenging scenarios with low-resolution faces, multiple candidates, etc. Our solution is a novel, unified framework that focuses on jointly modeling multiple types of contextual information: spatial context to indicate the position and scale of each candidate's face, relational context to capture the visual relationships among the candidates and contrast audio-visual affinities with each other, and temporal context to aggregate long-term information and smooth out local uncertainties. Based on such information, our model optimizes all candidates in a unified process for robust and reliable ASD. A thorough ablation study is performed on several challenging ASD benchmarks under different settings. In particular, our method outperforms the state-of-the-art by a large margin of about 15% mean Average Precision (mAP) absolute on two challenging subsets: one with three candidate speakers, and the other with faces smaller than 64 pixels. Together, our UniCon achieves 92.0% mAP on the AVA-ActiveSpeaker validation set, surpassing 90% for the first time on this challenging dataset at the time of submission. Project website: https://unicon-asd.github.io/.

* 10 pages, 6 figures; to appear at ACM Multimedia 2021 
Viaarxiv icon

Progressive-Scale Boundary Blackbox Attack via Projective Gradient Estimation

Jun 10, 2021
Jiawei Zhang, Linyi Li, Huichen Li, Xiaolu Zhang, Shuang Yang, Bo Li

Figure 1 for Progressive-Scale Boundary Blackbox Attack via Projective Gradient Estimation
Figure 2 for Progressive-Scale Boundary Blackbox Attack via Projective Gradient Estimation
Figure 3 for Progressive-Scale Boundary Blackbox Attack via Projective Gradient Estimation
Figure 4 for Progressive-Scale Boundary Blackbox Attack via Projective Gradient Estimation

Boundary based blackbox attack has been recognized as practical and effective, given that an attacker only needs to access the final model prediction. However, the query efficiency of it is in general high especially for high dimensional image data. In this paper, we show that such efficiency highly depends on the scale at which the attack is applied, and attacking at the optimal scale significantly improves the efficiency. In particular, we propose a theoretical framework to analyze and show three key characteristics to improve the query efficiency. We prove that there exists an optimal scale for projective gradient estimation. Our framework also explains the satisfactory performance achieved by existing boundary black-box attacks. Based on our theoretical framework, we propose Progressive-Scale enabled projective Boundary Attack (PSBA) to improve the query efficiency via progressive scaling techniques. In particular, we employ Progressive-GAN to optimize the scale of projections, which we call PSBA-PGAN. We evaluate our approach on both spatial and frequency scales. Extensive experiments on MNIST, CIFAR-10, CelebA, and ImageNet against different models including a real-world face recognition API show that PSBA-PGAN significantly outperforms existing baseline attacks in terms of query efficiency and attack success rate. We also observe relatively stable optimal scales for different models and datasets. The code is publicly available at https://github.com/AI-secure/PSBA.

* ICML 2021 
Viaarxiv icon

A Bi-Level Framework for Learning to Solve Combinatorial Optimization on Graphs

Jun 09, 2021
Runzhong Wang, Zhigang Hua, Gan Liu, Jiayi Zhang, Junchi Yan, Feng Qi, Shuang Yang, Jun Zhou, Xiaokang Yang

Figure 1 for A Bi-Level Framework for Learning to Solve Combinatorial Optimization on Graphs
Figure 2 for A Bi-Level Framework for Learning to Solve Combinatorial Optimization on Graphs
Figure 3 for A Bi-Level Framework for Learning to Solve Combinatorial Optimization on Graphs
Figure 4 for A Bi-Level Framework for Learning to Solve Combinatorial Optimization on Graphs

Combinatorial Optimization (CO) has been a long-standing challenging research topic featured by its NP-hard nature. Traditionally such problems are approximately solved with heuristic algorithms which are usually fast but may sacrifice the solution quality. Currently, machine learning for combinatorial optimization (MLCO) has become a trending research topic, but most existing MLCO methods treat CO as a single-level optimization by directly learning the end-to-end solutions, which are hard to scale up and mostly limited by the capacity of ML models given the high complexity of CO. In this paper, we propose a hybrid approach to combine the best of the two worlds, in which a bi-level framework is developed with an upper-level learning method to optimize the graph (e.g. add, delete or modify edges in a graph), fused with a lower-level heuristic algorithm solving on the optimized graph. Such a bi-level approach simplifies the learning on the original hard CO and can effectively mitigate the demand for model capacity. The experiments and results on several popular CO problems like Directed Acyclic Graph scheduling, Graph Edit Distance and Hamiltonian Cycle Problem show its effectiveness over manually designed heuristics and single-level learning methods.

Viaarxiv icon

Learning to Schedule DAG Tasks

Mar 05, 2021
Zhigang Hua, Feng Qi, Gan Liu, Shuang Yang

Figure 1 for Learning to Schedule DAG Tasks
Figure 2 for Learning to Schedule DAG Tasks
Figure 3 for Learning to Schedule DAG Tasks
Figure 4 for Learning to Schedule DAG Tasks

Scheduling computational tasks represented by directed acyclic graphs (DAGs) is challenging because of its complexity. Conventional scheduling algorithms rely heavily on simple heuristics such as shortest job first (SJF) and critical path (CP), and are often lacking in scheduling quality. In this paper, we present a novel learning-based approach to scheduling DAG tasks. The algorithm employs a reinforcement learning agent to iteratively add directed edges to the DAG, one at a time, to enforce ordering (i.e., priorities of execution and resource allocation) of "tricky" job nodes. By doing so, the original DAG scheduling problem is dramatically reduced to a much simpler proxy problem, on which heuristic scheduling algorithms such as SJF and CP can be efficiently improved. Our approach can be easily applied to any existing heuristic scheduling algorithms. On the benchmark dataset of TPC-H, we show that our learning based approach can significantly improve over popular heuristic algorithms and consistently achieves the best performance among several methods under a variety of settings.

Viaarxiv icon

Nonlinear Projection Based Gradient Estimation for Query Efficient Blackbox Attacks

Feb 25, 2021
Huichen Li, Linyi Li, Xiaojun Xu, Xiaolu Zhang, Shuang Yang, Bo Li

Figure 1 for Nonlinear Projection Based Gradient Estimation for Query Efficient Blackbox Attacks
Figure 2 for Nonlinear Projection Based Gradient Estimation for Query Efficient Blackbox Attacks
Figure 3 for Nonlinear Projection Based Gradient Estimation for Query Efficient Blackbox Attacks
Figure 4 for Nonlinear Projection Based Gradient Estimation for Query Efficient Blackbox Attacks

Gradient estimation and vector space projection have been studied as two distinct topics. We aim to bridge the gap between the two by investigating how to efficiently estimate gradient based on a projected low-dimensional space. We first provide lower and upper bounds for gradient estimation under both linear and nonlinear projections, and outline checkable sufficient conditions under which one is better than the other. Moreover, we analyze the query complexity for the projection-based gradient estimation and present a sufficient condition for query-efficient estimators. Built upon our theoretic analysis, we propose a novel query-efficient Nonlinear Gradient Projection-based Boundary Blackbox Attack (NonLinear-BA). We conduct extensive experiments on four image datasets: ImageNet, CelebA, CIFAR-10, and MNIST, and show the superiority of the proposed methods compared with the state-of-the-art baselines. In particular, we show that the projection-based boundary blackbox attacks are able to achieve much smaller magnitude of perturbations with 100% attack success rate based on efficient queries. Both linear and nonlinear projections demonstrate their advantages under different conditions. We also evaluate NonLinear-BA against the commercial online API MEGVII Face++, and demonstrate the high blackbox attack performance both quantitatively and qualitatively. The code is publicly available at https://github.com/AI-secure/NonLinear-BA.

* Accepted by AISTATS 2021; 9 pages excluding references and appendices 
Viaarxiv icon