Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zheng Wen

A Benchmark and Baseline for Language-Driven Image Editing

Oct 05, 2020

Jing Shi, Ning Xu, Trung Bui, Franck Dernoncourt, Zheng Wen, Chenliang Xu

Figure 1 for A Benchmark and Baseline for Language-Driven Image Editing

Figure 2 for A Benchmark and Baseline for Language-Driven Image Editing

Figure 3 for A Benchmark and Baseline for Language-Driven Image Editing

Figure 4 for A Benchmark and Baseline for Language-Driven Image Editing

Abstract:Language-driven image editing can significantly save the laborious image editing work and be friendly to the photography novice. However, most similar work can only deal with a specific image domain or can only do global retouching. To solve this new task, we first present a new language-driven image editing dataset that supports both local and global editing with editing operation and mask annotations. Besides, we also propose a baseline method that fully utilizes the annotation to solve this problem. Our new method treats each editing operation as a sub-module and can automatically predict operation parameters. Not only performing well on challenging user data, but such an approach is also highly interpretable. We believe our work, including both the benchmark and the baseline, will advance the image editing area towards a more general and free-form level.

* Accepted by ACCV 2020

Via

Access Paper or Ask Questions

On the Sample Complexity of Reinforcement Learning with Policy Space Generalization

Aug 17, 2020

Wenlong Mou, Zheng Wen, Xi Chen

Abstract:We study the optimal sample complexity in large-scale Reinforcement Learning (RL) problems with policy space generalization, i.e. the agent has a prior knowledge that the optimal policy lies in a known policy space. Existing results show that without a generalization model, the sample complexity of an RL algorithm will inevitably depend on the cardinalities of state space and action space, which are intractably large in many practical problems. To avoid such undesirable dependence on the state and action space sizes, this paper proposes a new notion of eluder dimension for the policy space, which characterizes the intrinsic complexity of policy learning in an arbitrary Markov Decision Process (MDP). Using a simulator oracle, we prove a near-optimal sample complexity upper bound that only depends linearly on the eluder dimension. We further prove a similar regret bound in deterministic systems without the simulator.

Via

Access Paper or Ask Questions

Low-rank Tensor Bandits

Jul 31, 2020

Botao Hao, Jie Zhou, Zheng Wen, Will Wei Sun

Abstract:In recent years, multi-dimensional online decision making has been playing a crucial role in many practical applications such as online recommendation and digital marketing. To solve it, we introduce stochastic low-rank tensor bandits, a class of bandits whose mean rewards can be represented as a low-rank tensor. We propose two learning algorithms, tensor epoch-greedy and tensor elimination, and develop finite-time regret bounds for them. We observe that tensor elimination has an optimal dependency on the time horizon, while tensor epoch-greedy has a sharper dependency on tensor dimensions. Numerical experiments further back up these theoretical findings and show that our algorithms outperform various state-of-the-art approaches that ignore the tensor low-rank structure.

Via

Access Paper or Ask Questions

Structured Policy Iteration for Linear Quadratic Regulator

Jul 13, 2020

Youngsuk Park, Ryan A. Rossi, Zheng Wen, Gang Wu, Handong Zhao

Figure 1 for Structured Policy Iteration for Linear Quadratic Regulator

Figure 2 for Structured Policy Iteration for Linear Quadratic Regulator

Figure 3 for Structured Policy Iteration for Linear Quadratic Regulator

Figure 4 for Structured Policy Iteration for Linear Quadratic Regulator

Abstract:Linear quadratic regulator (LQR) is one of the most popular frameworks to tackle continuous Markov decision process tasks. With its fundamental theory and tractable optimal policy, LQR has been revisited and analyzed in recent years, in terms of reinforcement learning scenarios such as the model-free or model-based setting. In this paper, we introduce the \textit{Structured Policy Iteration} (S-PI) for LQR, a method capable of deriving a structured linear policy. Such a structured policy with (block) sparsity or low-rank can have significant advantages over the standard LQR policy: more interpretable, memory-efficient, and well-suited for the distributed setting. In order to derive such a policy, we first cast a regularized LQR problem when the model is known. Then, our Structured Policy Iteration (S-PI) algorithm, which takes a policy evaluation step and a policy improvement step in an iterative manner, can solve this regularized LQR efficiently. We further extend the S-PI algorithm to the model-free setting where a smoothing procedure is adopted to estimate the gradient. In both the known-model and model-free setting, we prove convergence analysis under the proper choice of parameters. Finally, the experiments demonstrate the advantages of S-PI in terms of balancing the LQR performance and level of structure by varying the weight parameter.

Via

Access Paper or Ask Questions

Influence Diagram Bandits: Variational Thompson Sampling for Structured Bandit Problems

Jul 09, 2020

Tong Yu, Branislav Kveton, Zheng Wen, Ruiyi Zhang, Ole J. Mengshoel

Figure 1 for Influence Diagram Bandits: Variational Thompson Sampling for Structured Bandit Problems

Figure 2 for Influence Diagram Bandits: Variational Thompson Sampling for Structured Bandit Problems

Figure 3 for Influence Diagram Bandits: Variational Thompson Sampling for Structured Bandit Problems

Figure 4 for Influence Diagram Bandits: Variational Thompson Sampling for Structured Bandit Problems

Abstract:We propose a novel framework for structured bandits, which we call an influence diagram bandit. Our framework captures complex statistical dependencies between actions, latent variables, and observations; and thus unifies and extends many existing models, such as combinatorial semi-bandits, cascading bandits, and low-rank bandits. We develop novel online learning algorithms that learn to act efficiently in our models. The key idea is to track a structured posterior distribution of model parameters, either exactly or approximately. To act, we sample model parameters from their posterior and then use the structure of the influence diagram to find the most optimistic action under the sampled parameters. We empirically evaluate our algorithms in three structured bandit problems, and show that they perform as well as or better than problem-specific state-of-the-art baselines.

Via

Access Paper or Ask Questions

Hypermodels for Exploration

Jun 12, 2020

Vikranth Dwaracherla, Xiuyuan Lu, Morteza Ibrahimi, Ian Osband, Zheng Wen, Benjamin Van Roy

Figure 1 for Hypermodels for Exploration

Figure 2 for Hypermodels for Exploration

Figure 3 for Hypermodels for Exploration

Figure 4 for Hypermodels for Exploration

Abstract:We study the use of hypermodels to represent epistemic uncertainty and guide exploration. This generalizes and extends the use of ensembles to approximate Thompson sampling. The computational cost of training an ensemble grows with its size, and as such, prior work has typically been limited to ensembles with tens of elements. We show that alternative hypermodels can enjoy dramatic efficiency gains, enabling behavior that would otherwise require hundreds or thousands of elements, and even succeed in situations where ensemble methods fail to learn regardless of size. This allows more accurate approximation of Thompson sampling as well as use of more sophisticated exploration schemes. In particular, we consider an approximate form of information-directed sampling and demonstrate performance gains relative to Thompson sampling. As alternatives to ensembles, we consider linear and neural network hypermodels, also known as hypernetworks. We prove that, with neural network base models, a linear hypermodel can represent essentially any distribution over functions, and as such, hypernetworks are no more expressive.

* Published as a conference paper at ICLR 2020

Via

Access Paper or Ask Questions

Improving Adversarial Text Generation by Modeling the Distant Future

May 04, 2020

Ruiyi Zhang, Changyou Chen, Zhe Gan, Wenlin Wang, Dinghan Shen, Guoyin Wang, Zheng Wen, Lawrence Carin

Figure 1 for Improving Adversarial Text Generation by Modeling the Distant Future

Figure 2 for Improving Adversarial Text Generation by Modeling the Distant Future

Figure 3 for Improving Adversarial Text Generation by Modeling the Distant Future

Figure 4 for Improving Adversarial Text Generation by Modeling the Distant Future

Abstract:Auto-regressive text generation models usually focus on local fluency, and may cause inconsistent semantic meaning in long text generation. Further, automatically generating words with similar semantics is challenging, and hand-crafted linguistic rules are difficult to apply. We consider a text planning scheme and present a model-based imitation-learning approach to alleviate the aforementioned issues. Specifically, we propose a novel guider network to focus on the generative process over a longer horizon, which can assist next-word prediction and provide intermediate rewards for generator optimization. Extensive experiments demonstrate that the proposed method leads to improved performance.

* ACL 2020. arXiv admin note: substantial text overlap with arXiv:1811.00696

Via

Access Paper or Ask Questions

Nested-Wasserstein Self-Imitation Learning for Sequence Generation

Jan 20, 2020

Ruiyi Zhang, Changyou Chen, Zhe Gan, Zheng Wen, Wenlin Wang, Lawrence Carin

Figure 1 for Nested-Wasserstein Self-Imitation Learning for Sequence Generation

Figure 2 for Nested-Wasserstein Self-Imitation Learning for Sequence Generation

Figure 3 for Nested-Wasserstein Self-Imitation Learning for Sequence Generation

Figure 4 for Nested-Wasserstein Self-Imitation Learning for Sequence Generation

Abstract:Reinforcement learning (RL) has been widely studied for improving sequence-generation models. However, the conventional rewards used for RL training typically cannot capture sufficient semantic information and therefore render model bias. Further, the sparse and delayed rewards make RL exploration inefficient. To alleviate these issues, we propose the concept of nested-Wasserstein distance for distributional semantic matching. To further exploit it, a novel nested-Wasserstein self-imitation learning framework is developed, encouraging the model to exploit historical high-rewarded sequences for enhanced exploration and better semantic matching. Our solution can be understood as approximately executing proximal policy optimization with Wasserstein trust-regions. Experiments on a variety of unconditional and conditional sequence-generation tasks demonstrate the proposed approach consistently leads to improved performance.

* Accepted by AISTATS2020

Via

Access Paper or Ask Questions

Bootstrapping Upper Confidence Bound

Jul 23, 2019

Botao Hao, Yasin Abbasi-Yadkori, Zheng Wen, Guang Cheng

Figure 1 for Bootstrapping Upper Confidence Bound

Figure 2 for Bootstrapping Upper Confidence Bound

Figure 3 for Bootstrapping Upper Confidence Bound

Figure 4 for Bootstrapping Upper Confidence Bound

Abstract:Upper Confidence Bound (UCB) method is arguably the most celebrated one used in online decision making with partial information feedback. Existing techniques for constructing confidence bounds are typically built upon various concentration inequalities, which thus lead to over-exploration. In this paper, we propose a non-parametric and data-dependent UCB algorithm based on the multiplier bootstrap. To improve its finite sample performance, we further incorporate second-order correction into the above construction. In theory, we derive both problem-dependent and problem-independent regret bounds for multi-armed bandits under a much weaker tail assumption than the standard sub-Gaussianity. Numerical results demonstrate significant regret reductions by our method, in comparison with several baselines in a range of multi-armed and linear bandit problems.

Via

Access Paper or Ask Questions

Waterfall Bandits: Learning to Sell Ads Online

Apr 20, 2019

Branislav Kveton, Saied Mahdian, S. Muthukrishnan, Zheng Wen, Yikun Xian

Figure 1 for Waterfall Bandits: Learning to Sell Ads Online

Figure 2 for Waterfall Bandits: Learning to Sell Ads Online

Figure 3 for Waterfall Bandits: Learning to Sell Ads Online

Figure 4 for Waterfall Bandits: Learning to Sell Ads Online

Abstract:A popular approach to selling online advertising is by a waterfall, where a publisher makes sequential price offers to ad networks for an inventory, and chooses the winner in that order. The publisher picks the order and prices to maximize her revenue. A traditional solution is to learn the demand model and then subsequently solve the optimization problem for the given demand model. This will incur a linear regret. We design an online learning algorithm for solving this problem, which interleaves learning and optimization, and prove that this algorithm has sublinear regret. We evaluate the algorithm on both synthetic and real-world data, and show that it quickly learns high quality pricing strategies. This is the first principled study of learning a waterfall design online by sequential experimentation.

Via

Access Paper or Ask Questions