Alert button
Picture for Shuai Li

Shuai Li

Alert button

SeeSR: Towards Semantics-Aware Real-World Image Super-Resolution

Nov 27, 2023
Rongyuan Wu, Tao Yang, Lingchen Sun, Zhengqiang Zhang, Shuai Li, Lei Zhang

Owe to the powerful generative priors, the pre-trained text-to-image (T2I) diffusion models have become increasingly popular in solving the real-world image super-resolution problem. However, as a consequence of the heavy quality degradation of input low-resolution (LR) images, the destruction of local structures can lead to ambiguous image semantics. As a result, the content of reproduced high-resolution image may have semantic errors, deteriorating the super-resolution performance. To address this issue, we present a semantics-aware approach to better preserve the semantic fidelity of generative real-world image super-resolution. First, we train a degradation-aware prompt extractor, which can generate accurate soft and hard semantic prompts even under strong degradation. The hard semantic prompts refer to the image tags, aiming to enhance the local perception ability of the T2I model, while the soft semantic prompts compensate for the hard ones to provide additional representation information. These semantic prompts can encourage the T2I model to generate detailed and semantically accurate results. Furthermore, during the inference process, we integrate the LR images into the initial sampling noise to mitigate the diffusion model's tendency to generate excessive random details. The experiments show that our method can reproduce more realistic image details and hold better the semantics.

Viaarxiv icon

Learning Adversarial Low-rank Markov Decision Processes with Unknown Transition and Full-information Feedback

Nov 14, 2023
Canzhe Zhao, Ruofeng Yang, Baoxiang Wang, Xuezhou Zhang, Shuai Li

In this work, we study the low-rank MDPs with adversarially changed losses in the full-information feedback setting. In particular, the unknown transition probability kernel admits a low-rank matrix decomposition \citep{REPUCB22}, and the loss functions may change adversarially but are revealed to the learner at the end of each episode. We propose a policy optimization-based algorithm POLO, and we prove that it attains the $\widetilde{O}(K^{\frac{5}{6}}A^{\frac{1}{2}}d\ln(1+M)/(1-\gamma)^2)$ regret guarantee, where $d$ is rank of the transition kernel (and hence the dimension of the unknown representations), $A$ is the cardinality of the action space, $M$ is the cardinality of the model class, and $\gamma$ is the discounted factor. Notably, our algorithm is oracle-efficient and has a regret guarantee with no dependence on the size of potentially arbitrarily large state space. Furthermore, we also prove an $\Omega(\frac{\gamma^2}{1-\gamma} \sqrt{d A K})$ regret lower bound for this problem, showing that low-rank MDPs are statistically more difficult to learn than linear MDPs in the regret minimization setting. To the best of our knowledge, we present the first algorithm that interleaves representation learning, exploration, and exploitation to achieve the sublinear regret guarantee for RL with nonlinear function approximation and adversarial losses.

Viaarxiv icon

Adversarial Attacks on Cooperative Multi-agent Bandits

Nov 03, 2023
Jinhang Zuo, Zhiyao Zhang, Xuchuang Wang, Cheng Chen, Shuai Li, John C. S. Lui, Mohammad Hajiesmaili, Adam Wierman

Figure 1 for Adversarial Attacks on Cooperative Multi-agent Bandits
Figure 2 for Adversarial Attacks on Cooperative Multi-agent Bandits
Figure 3 for Adversarial Attacks on Cooperative Multi-agent Bandits

Cooperative multi-agent multi-armed bandits (CMA2B) consider the collaborative efforts of multiple agents in a shared multi-armed bandit game. We study latent vulnerabilities exposed by this collaboration and consider adversarial attacks on a few agents with the goal of influencing the decisions of the rest. More specifically, we study adversarial attacks on CMA2B in both homogeneous settings, where agents operate with the same arm set, and heterogeneous settings, where agents have distinct arm sets. In the homogeneous setting, we propose attack strategies that, by targeting just one agent, convince all agents to select a particular target arm $T-o(T)$ times while incurring $o(T)$ attack costs in $T$ rounds. In the heterogeneous setting, we prove that a target arm attack requires linear attack costs and propose attack strategies that can force a maximum number of agents to suffer linear regrets while incurring sublinear costs and only manipulating the observations of a few target agents. Numerical experiments validate the effectiveness of our proposed attack strategies.

Viaarxiv icon

Online Clustering of Bandits with Misspecified User Models

Oct 10, 2023
Zhiyong Wang, Jize Xie, Xutong Liu, Shuai Li, John C. S. Lui

Figure 1 for Online Clustering of Bandits with Misspecified User Models
Figure 2 for Online Clustering of Bandits with Misspecified User Models

The contextual linear bandit is an important online learning problem where given arm features, a learning agent selects an arm at each round to maximize the cumulative rewards in the long run. A line of works, called the clustering of bandits (CB), utilize the collaborative effect over user preferences and have shown significant improvements over classic linear bandit algorithms. However, existing CB algorithms require well-specified linear user models and can fail when this critical assumption does not hold. Whether robust CB algorithms can be designed for more practical scenarios with misspecified user models remains an open problem. In this paper, we are the first to present the important problem of clustering of bandits with misspecified user models (CBMUM), where the expected rewards in user models can be perturbed away from perfect linear models. We devise two robust CB algorithms, RCLUMB and RSCLUMB (representing the learned clustering structure with dynamic graph and sets, respectively), that can accommodate the inaccurate user preference estimations and erroneous clustering caused by model misspecifications. We prove regret upper bounds of $O(\epsilon_*T\sqrt{md\log T} + d\sqrt{mT}\log T)$ for our algorithms under milder assumptions than previous CB works (notably, we move past a restrictive technical assumption on the distribution of the arms), which match the lower bound asymptotically in $T$ up to logarithmic factors, and also match the state-of-the-art results in several degenerate cases. The techniques in proving the regret caused by misclustering users are quite general and may be of independent interest. Experiments on both synthetic and real-world data show our outperformance over previous algorithms.

Viaarxiv icon

Online Corrupted User Detection and Regret Minimization

Oct 10, 2023
Zhiyong Wang, Jize Xie, Tong Yu, Shuai Li, John C. S. Lui

Figure 1 for Online Corrupted User Detection and Regret Minimization
Figure 2 for Online Corrupted User Detection and Regret Minimization
Figure 3 for Online Corrupted User Detection and Regret Minimization
Figure 4 for Online Corrupted User Detection and Regret Minimization

In real-world online web systems, multiple users usually arrive sequentially into the system. For applications like click fraud and fake reviews, some users can maliciously perform corrupted (disrupted) behaviors to trick the system. Therefore, it is crucial to design efficient online learning algorithms to robustly learn from potentially corrupted user behaviors and accurately identify the corrupted users in an online manner. Existing works propose bandit algorithms robust to adversarial corruption. However, these algorithms are designed for a single user, and cannot leverage the implicit social relations among multiple users for more efficient learning. Moreover, none of them consider how to detect corrupted users online in the multiple-user scenario. In this paper, we present an important online learning problem named LOCUD to learn and utilize unknown user relations from disrupted behaviors to speed up learning, and identify the corrupted users in an online setting. To robustly learn and utilize the unknown relations among potentially corrupted users, we propose a novel bandit algorithm RCLUB-WCU. To detect the corrupted users, we devise a novel online detection algorithm OCCUD based on RCLUB-WCU's inferred user relations. We prove a regret upper bound for RCLUB-WCU, which asymptotically matches the lower bound with respect to $T$ up to logarithmic factors, and matches the state-of-the-art results in degenerate cases. We also give a theoretical guarantee for the detection accuracy of OCCUD. With extensive experiments, our methods achieve superior performance over previous bandit algorithms and high corrupted user detection accuracy.

Viaarxiv icon

Efficient State Estimation with Constrained Rao-Blackwellized Particle Filter

Oct 07, 2023
Shuai Li, Siwei Lyu, Jeff Trinkle

Figure 1 for Efficient State Estimation with Constrained Rao-Blackwellized Particle Filter
Figure 2 for Efficient State Estimation with Constrained Rao-Blackwellized Particle Filter
Figure 3 for Efficient State Estimation with Constrained Rao-Blackwellized Particle Filter
Figure 4 for Efficient State Estimation with Constrained Rao-Blackwellized Particle Filter

Due to the limitations of the robotic sensors, during a robotic manipulation task, the acquisition of the object's state can be unreliable and noisy. Combining an accurate model of multi-body dynamic system with Bayesian filtering methods has been shown to be able to filter out noise from the object's observed states. However, efficiency of these filtering methods suffers from samples that violate the physical constraints, e.g., no penetration constraint. In this paper, we propose a Rao-Blackwellized Particle Filter (RBPF) that samples the contact states and updates the object's poses using Kalman filters. This RBPF also enforces the physical constraints on the samples by solving a quadratic programming problem. By comparing our method with methods that does not consider physical constraints, we show that our proposed RBPF is not only able to estimate the object's states, e.g., poses, more accurately but also able to infer unobserved states, e.g., velocities, with higher precision.

Viaarxiv icon

Pick Planning Strategies for Large-Scale Package Manipulation

Sep 23, 2023
Shuai Li, Azarakhsh Keipour, Kevin Jamieson, Nicolas Hudson, Sicong Szhao, Charles Swan, Kostas Bekris

Figure 1 for Pick Planning Strategies for Large-Scale Package Manipulation
Figure 2 for Pick Planning Strategies for Large-Scale Package Manipulation
Figure 3 for Pick Planning Strategies for Large-Scale Package Manipulation
Figure 4 for Pick Planning Strategies for Large-Scale Package Manipulation

Automating warehouse operations can reduce logistics overhead costs, ultimately driving down the final price for consumers, increasing the speed of delivery, and enhancing the resiliency to market fluctuations. This extended abstract showcases a large-scale package manipulation from unstructured piles in Amazon Robotics' Robot Induction (Robin) fleet, which is used for picking and singulating up to 6 million packages per day and so far has manipulated over 2 billion packages. It describes the various heuristic methods developed over time and their successor, which utilizes a pick success predictor trained on real production data. To the best of the authors' knowledge, this work is the first large-scale deployment of learned pick quality estimation methods in a real production system.

* 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Learning Meets Model-based Methods for Manipulation and Grasping Workshop. arXiv admin note: substantial text overlap with arXiv:2305.10272 
Viaarxiv icon

Spatial-Temporal Transformer based Video Compression Framework

Sep 21, 2023
Yanbo Gao, Wenjia Huang, Shuai Li, Hui Yuan, Mao Ye, Siwei Ma

Figure 1 for Spatial-Temporal Transformer based Video Compression Framework
Figure 2 for Spatial-Temporal Transformer based Video Compression Framework
Figure 3 for Spatial-Temporal Transformer based Video Compression Framework
Figure 4 for Spatial-Temporal Transformer based Video Compression Framework

Learned video compression (LVC) has witnessed remarkable advancements in recent years. Similar as the traditional video coding, LVC inherits motion estimation/compensation, residual coding and other modules, all of which are implemented with neural networks (NNs). However, within the framework of NNs and its training mechanism using gradient backpropagation, most existing works often struggle to consistently generate stable motion information, which is in the form of geometric features, from the input color features. Moreover, the modules such as the inter-prediction and residual coding are independent from each other, making it inefficient to fully reduce the spatial-temporal redundancy. To address the above problems, in this paper, we propose a novel Spatial-Temporal Transformer based Video Compression (STT-VC) framework. It contains a Relaxed Deformable Transformer (RDT) with Uformer based offsets estimation for motion estimation and compensation, a Multi-Granularity Prediction (MGP) module based on multi-reference frames for prediction refinement, and a Spatial Feature Distribution prior based Transformer (SFD-T) for efficient temporal-spatial joint residual compression. Specifically, RDT is developed to stably estimate the motion information between frames by thoroughly investigating the relationship between the similarity based geometric motion feature extraction and self-attention. MGP is designed to fuse the multi-reference frame information by effectively exploring the coarse-grained prediction feature generated with the coded motion information. SFD-T is to compress the residual information by jointly exploring the spatial feature distributions in both residual and temporal prediction to further reduce the spatial-temporal redundancy. Experimental results demonstrate that our method achieves the best result with 13.5% BD-Rate saving over VTM.

Viaarxiv icon

MOFA: A Model Simplification Roadmap for Image Restoration on Mobile Devices

Aug 24, 2023
Xiangyu Chen, Ruiwen Zhen, Shuai Li, Xiaotian Li, Guanghui Wang

Figure 1 for MOFA: A Model Simplification Roadmap for Image Restoration on Mobile Devices
Figure 2 for MOFA: A Model Simplification Roadmap for Image Restoration on Mobile Devices
Figure 3 for MOFA: A Model Simplification Roadmap for Image Restoration on Mobile Devices
Figure 4 for MOFA: A Model Simplification Roadmap for Image Restoration on Mobile Devices

Image restoration aims to restore high-quality images from degraded counterparts and has seen significant advancements through deep learning techniques. The technique has been widely applied to mobile devices for tasks such as mobile photography. Given the resource limitations on mobile devices, such as memory constraints and runtime requirements, the efficiency of models during deployment becomes paramount. Nevertheless, most previous works have primarily concentrated on analyzing the efficiency of single modules and improving them individually. This paper examines the efficiency across different layers. We propose a roadmap that can be applied to further accelerate image restoration models prior to deployment while simultaneously increasing PSNR (Peak Signal-to-Noise Ratio) and SSIM (Structural Similarity Index). The roadmap first increases the model capacity by adding more parameters to partial convolutions on FLOPs non-sensitive layers. Then, it applies partial depthwise convolution coupled with decoupling upsampling/downsampling layers to accelerate the model speed. Extensive experiments demonstrate that our approach decreases runtime by up to 13% and reduces the number of parameters by up to 23%, while increasing PSNR and SSIM on several image restoration datasets. Source Code of our method is available at \href{https://github.com/xiangyu8/MOFA}{https://github.com/xiangyu8/MOFA}.

* Accepted by 2023 ICCV Workshop (RCV) 
Viaarxiv icon