Image-adaptive lookup tables (LUTs) have achieved great success in real-time image enhancement tasks due to their high efficiency for modeling color transforms. However, they embed the complete transform, including the color component-independent and the component-correlated parts, into only a single type of LUTs, either 1D or 3D, in a coupled manner. This scheme raises a dilemma of improving model expressiveness or efficiency due to two factors. On the one hand, the 1D LUTs provide high computational efficiency but lack the critical capability of color components interaction. On the other, the 3D LUTs present enhanced component-correlated transform capability but suffer from heavy memory footprint, high training difficulty, and limited cell utilization. Inspired by the conventional divide-and-conquer practice in the image signal processor, we present SepLUT (separable image-adaptive lookup table) to tackle the above limitations. Specifically, we separate a single color transform into a cascade of component-independent and component-correlated sub-transforms instantiated as 1D and 3D LUTs, respectively. In this way, the capabilities of two sub-transforms can facilitate each other, where the 3D LUT complements the ability to mix up color components, and the 1D LUT redistributes the input colors to increase the cell utilization of the 3D LUT and thus enable the use of a more lightweight 3D LUT. Experiments demonstrate that the proposed method presents enhanced performance on photo retouching benchmark datasets than the current state-of-the-art and achieves real-time processing on both GPUs and CPUs.
Image outpainting, which is well studied with Convolution Neural Network (CNN) based framework, has recently drawn more attention in computer vision. However, CNNs rely on inherent inductive biases to achieve effective sample learning, which may degrade the performance ceiling. In this paper, motivated by the flexible self-attention mechanism with minimal inductive biases in transformer architecture, we reframe the generalised image outpainting problem as a patch-wise sequence-to-sequence autoregression problem, enabling query-based image outpainting. Specifically, we propose a novel hybrid vision-transformer-based encoder-decoder framework, named \textbf{Query} \textbf{O}utpainting \textbf{TR}ansformer (\textbf{QueryOTR}), for extrapolating visual context all-side around a given image. Patch-wise mode's global modeling capacity allows us to extrapolate images from the attention mechanism's query standpoint. A novel Query Expansion Module (QEM) is designed to integrate information from the predicted queries based on the encoder's output, hence accelerating the convergence of the pure transformer even with a relatively small dataset. To further enhance connectivity between each patch, the proposed Patch Smoothing Module (PSM) re-allocates and averages the overlapped regions, thus providing seamless predicted images. We experimentally show that QueryOTR could generate visually appealing results smoothly and realistically against the state-of-the-art image outpainting approaches.
Intelligent reflecting surface (IRS) has emerged as a promising technique to control wireless propagation environment for enhancing the communication performance cost-effectively. However, the rapidly time-varying channel in high-mobility communication scenarios such as vehicular communication renders it challenging to obtain the instantaneous channel state information (CSI) efficiently for IRS with a large number of reflecting elements. In this paper, we propose a new roadside IRS-aided vehicular communication system to tackle this challenge. Specifically, by exploiting the symmetrical deployment of IRSs with inter-laced equal intervals on both sides of the road and the cooperation among nearby IRS controllers, we propose a new two-stage channel estimation scheme with off-line and online training, respectively, to obtain the static/time-varying CSI required by the proposed low-complexity passive beamforming scheme efficiently. The proposed IRS beamforming and online channel estimation designs leverage the existing uplink pilots in wireless networks and do not require any change of the existing transmission protocol. Moreover, they can be implemented by each of IRS controllers independently, without the need of any real-time feedback from the user's serving BS. Simulation results show that the proposed designs can efficiently achieve the high IRS passive beamforming gain and thus significantly enhance the achievable communication throughput for high-speed vehicular communications.
Intelligent reflecting surface (IRS) has emerged as a promising technology to enhance the wireless communication network coverage and capacity by dynamically controlling the radio signal propagation environment. In contrast to the existing works that considered active or passive IRS only, we propose in this paper a new hybrid active-passive IRS architecture that consists of both active and passive reflecting elements, thus achieving their combined advantages flexibly. Under a practical channel setup with Rician fading where only the statistical channel state information (CSI) is available, we study the hybrid IRS design in a multi-user communication system. Specifically, we formulate an optimization problem to maximize the achievable ergodic capacity of the worst-case user by designing the hybrid IRS beamforming and active/passive elements allocation based on the statistical CSI, subject to various practical constraints on the active-element amplification factor and amplification power consumption, as well as the total active and passive elements deployment budget. To solve this challenging problem, we first approximate the ergodic capacity in a simpler form and then propose an efficient algorithm to solve the problem optimally. Moreover, we show that for the special case with all channels to be line-of-sight (LoS), only active elements need to be deployed when the total deployment budget is sufficiently small, while both active and passive elements should be deployed with a decreasing number ratio when the budget increases and exceeds a certain threshold. Finally, numerical results are presented which demonstrate the performance gains of the proposed hybrid IRS architecture and its optimal design over the conventional schemes with active/passive IRS only under various practical system setups.
We present the results of the Workshop on Multilingual Information Access (MIA) 2022 Shared Task, evaluating cross-lingual open-retrieval question answering (QA) systems in 16 typologically diverse languages. In this task, we adapted two large-scale cross-lingual open-retrieval QA datasets in 14 typologically diverse languages, and newly annotated open-retrieval QA data in 2 underrepresented languages: Tagalog and Tamil. Four teams submitted their systems. The best system leveraging iteratively mined diverse negative examples and larger pretrained models achieves 32.2 F1, outperforming our baseline by 4.5 points. The second best system uses entity-aware contextualized representations for document retrieval, and achieves significant improvements in Tamil (20.8 F1), whereas most of the other systems yield nearly zero scores.
With the capability of reconfiguring the wireless electromagnetic environment, intelligent reflecting surface (IRS) is a new paradigm for designing future wireless communication systems. In this paper, we consider optical IRS for improving the performance of visible light communication (VLC) under a multiple-input and multiple-output (MIMO) setting. Specifically, we focus on the downlink communication of an indoor MIMO VLC system and aim to minimize the mean square error (MSE) of demodulated signals at the receiver. To this end, the MIMO channel gain of the IRS-aided VLC is first derived under the point source assumption, based on which the MSE minimization problem is then formulated subject to the emission power constraints. Next, we propose an alternating optimization algorithm, which decomposes the original problem into three subproblems, to iteratively optimize the IRS configuration, the precoding and detection matrices for minimizing the MSE. Moreover, theoretical analysis on the performance of the proposed algorithm in high and low signal-to-noise rate (SNR) regimes is provided, revealing that the joint optimization process can be simplified in such special cases, and the algorithm's convergence property and computational complexity are also discussed. Finally, numerical results show that IRS-aided schemes significantly reduce the MSE as compared to their counterparts without IRS, and the proposed algorithm outperforms other baseline schemes.
Detecting beneficial feature interactions is essential in recommender systems, and existing approaches achieve this by examining all the possible feature interactions. However, the cost of examining all the possible higher-order feature interactions is prohibitive (exponentially growing with the order increasing). Hence existing approaches only detect limited order (e.g., combinations of up to four features) beneficial feature interactions, which may miss beneficial feature interactions with orders higher than the limitation. In this paper, we propose a hypergraph neural network based model named HIRS. HIRS is the first work that directly generates beneficial feature interactions of arbitrary orders and makes recommendation predictions accordingly. The number of generated feature interactions can be specified to be much smaller than the number of all the possible interactions and hence, our model admits a much lower running time. To achieve an effective algorithm, we exploit three properties of beneficial feature interactions, and propose deep-infomax-based methods to guide the interaction generation. Our experimental results show that HIRS outperforms state-of-the-art algorithms by up to 5% in terms of recommendation accuracy.
Black-box attacks usually face two problems: poor transferability and the inability to evade the adversarial defense. To overcome these shortcomings, we create an original approach to generate adversarial examples by smoothing the linear structure of the texture in the benign image, called AdvSmo. We construct the adversarial examples without relying on any internal information to the target model and design the imperceptible-high attack success rate constraint to guide the Gabor filter to select appropriate angles and scales to smooth the linear texture from the input images to generate adversarial examples. Benefiting from the above design concept, AdvSmo will generate adversarial examples with strong transferability and solid evasiveness. Finally, compared to the four advanced black-box adversarial attack methods, for the eight target models, the results show that AdvSmo improves the average attack success rate by 9% on the CIFAR-10 and 16% on the Tiny-ImageNet dataset compared to the best of these attack methods.
Most black-box adversarial attack schemes for object detectors mainly face two shortcomings: requiring access to the target model and generating inefficient adversarial examples (failing to make objects disappear in large numbers). To overcome these shortcomings, we propose a black-box adversarial attack scheme based on semantic segmentation and model inversion (SSMI). We first locate the position of the target object using semantic segmentation techniques. Next, we design a neighborhood background pixel replacement to replace the target region pixels with background pixels to ensure that the pixel modifications are not easily detected by human vision. Finally, we reconstruct a machine-recognizable example and use the mask matrix to select pixels in the reconstructed example to modify the benign image to generate an adversarial example. Detailed experimental results show that SSMI can generate efficient adversarial examples to evade human-eye perception and make objects of interest disappear. And more importantly, SSMI outperforms existing same kinds of attacks. The maximum increase in new and disappearing labels is 16%, and the maximum decrease in mAP metrics for object detection is 36%.
Navier-Stokes equations are significant partial differential equations that describe the motion of fluids such as liquids and air. Due to the importance of Navier-Stokes equations, the development on efficient numerical schemes is important for both science and engineer. Recently, with the development of AI techniques, several approaches have been designed to integrate deep neural networks in simulating and inferring the fluid dynamics governed by incompressible Navier-Stokes equations, which can accelerate the simulation or inferring process in a mesh-free and differentiable way. In this paper, we point out that the capability of existing deep Navier-Stokes informed methods is limited to handle non-smooth or fractional equations, which are two critical situations in reality. To this end, we propose the \emph{Deep Random Vortex Method} (DRVM), which combines the neural network with a random vortex dynamics system equivalent to the Navier-Stokes equation. Specifically, the random vortex dynamics motivates a Monte Carlo based loss function for training the neural network, which avoids the calculation of derivatives through auto-differentiation. Therefore, DRVM not only can efficiently solve Navier-Stokes equations involving rough path, non-differentiable initial conditions and fractional operators, but also inherits the mesh-free and differentiable benefits of the deep-learning-based solver. We conduct experiments on the Cauchy problem, parametric solver learning, and the inverse problem of both 2-d and 3-d incompressible Navier-Stokes equations. The proposed method achieves accurate results for simulation and inference of Navier-Stokes equations. Especially for the cases that include singular initial conditions, DRVM significantly outperforms existing PINN method.