In this letter, we consider a Linear Quadratic Gaussian (LQG) control system where feedback occurs over a noiseless binary channel and derive lower bounds on the minimum communication cost (quantified via the channel bitrate) required to attain a given control performance. We assume that at every time step an encoder can convey a packet containing a variable number of bits over the channel to a decoder at the controller. Our system model provides for the possibility that the encoder and decoder have shared randomness, as is the case in systems using dithered quantizers. We define two extremal prefix-free requirements that may be imposed on the message packets; such constraints are useful in that they allow the decoder, and potentially other agents to uniquely identify the end of a transmission in an online fashion. We then derive a lower bound on the rate of prefix-free coding in terms of directed information; in particular we show that a previously known bound still holds in the case with shared randomness. We also provide a generalization of the bound that applies if prefix-free requirements are relaxed. We conclude with a rate-distortion formulation.
Partial information decomposition (PID) seeks to decompose the multivariate mutual information that a set of source variables contains about a target variable into basic pieces, the so called "atoms of information". Each atom describes a distinct way in which the sources may contain information about the target. In this paper we show, first, that the entire theory of partial information decomposition can be derived from considerations of elementary parthood relationships between information contributions. This way of approaching the problem has the advantage of directly characterizing the atoms of information, instead of taking an indirect approach via the concept of redundancy. Secondly, we describe several intriguing links between PID and formal logic. In particular, we show how to define a measure of PID based on the information provided by certain statements about source realizations. Furthermore, we show how the mathematical lattice structure underlying PID theory can be translated into an isomorphic structure of logical statements with a particularly simple ordering relation: logical implication. The conclusion to be drawn from these considerations is that there are three isomorphic "worlds" of partial information decomposition, i.e. three equivalent ways to mathematically describe the decomposition of the information carried by a set of sources about a target: the world of parthood relationships, the world of logical statements, and the world of antichains that was utilized by Williams and Beer in their original exposition of PID theory. We additionally show how the parthood perspective provides a systematic way to answer a type of question that has been much discussed in the PID field: whether a partial information decomposition can be uniquely determined based on concepts other than redundant information.
Ptychography is a lensless, computational imaging method that utilises diffraction patterns to determine the amplitude and phase of an object. In transmission ptychography, the diffraction patterns are recorded by a detector positioned along the optical axis downstream of the object. The light scattered at the highest diffraction angle carries information about the finest structures of the object. We present a setup to simultaneously capture a signal near the optical axis and a signal scattered at high diffraction angles. Moreover, we present an algorithm based on a shifted angular spectrum method and automatic differentiation that utilises this recorded signal. By jointly reconstructing the object from the resulting low and high diffraction angle images, the resolution of the reconstructed image is improved remarkably. The effective numerical aperture of the compound sensor is determined by the maximum diffraction angle captured by the off axis sensor.
Neural ranking models (NRMs) have shown remarkable success in recent years, especially with pre-trained language models. However, deep neural models are notorious for their vulnerability to adversarial examples. Adversarial attacks may become a new type of web spamming technique given our increased reliance on neural information retrieval models. Therefore, it is important to study potential adversarial attacks to identify vulnerabilities of NRMs before they are deployed. In this paper, we introduce the Adversarial Document Ranking Attack (ADRA) task against NRMs, which aims to promote a target document in rankings by adding adversarial perturbations to its text. We focus on the decision-based black-box attack setting, where the attackers have no access to the model parameters and gradients, but can only acquire the rank positions of the partial retrieved list by querying the target model. This attack setting is realistic in real-world search engines. We propose a novel Pseudo Relevance-based ADversarial ranking Attack method (PRADA) that learns a surrogate model based on Pseudo Relevance Feedback (PRF) to generate gradients for finding the adversarial perturbations. Experiments on two web search benchmark datasets show that PRADA can outperform existing attack strategies and successfully fool the NRM with small indiscernible perturbations of text.
The performance of supervised deep learning methods for medical image segmentation is often limited by the scarcity of labeled data. As a promising research direction, semi-supervised learning addresses this dilemma by leveraging unlabeled data information to assist the learning process. In this paper, a novel geometry-aware semi-supervised learning framework is proposed for medical image segmentation, which is a consistency-based method. Considering that the hard-to-segment regions are mainly located around the object boundary, we introduce an auxiliary prediction task to learn the global geometric information. Based on the geometric constraint, the ambiguous boundary regions are emphasized through an exponentially weighted strategy for the model training to better exploit both labeled and unlabeled data. In addition, a dual-view network is designed to perform segmentation from different perspectives and reduce the prediction uncertainty. The proposed method is evaluated on the public left atrium benchmark dataset and improves fully supervised method by 8.7% in Dice with 10% labeled images, while 4.3% with 20% labeled images. Meanwhile, our framework outperforms six state-of-the-art semi-supervised segmentation methods.
We develop a new method for portrait image editing, which supports fine-grained editing of geometries, colors, lights and shadows using a single neural network model. We adopt a novel asymmetric conditional GAN architecture: the generators take the transformed conditional inputs, such as edge maps, color palette, sliders and masks, that can be directly edited by the user; the discriminators take the conditional inputs in the way that can guide controllable image generation more effectively. Taking color editing as an example, we feed color palettes (which can be edited easily) into the generator, and color maps (which contain positional information of colors) into the discriminator. We also design a region-weighted discriminator so that higher weights are assigned to more important regions, like eyes and skin. Using a color palette, the user can directly specify the desired colors of hair, skin, eyes, lip and background. Color sliders allow the user to blend colors in an intuitive manner. The user can also edit lights and shadows by modifying the corresponding masks. We demonstrate the effectiveness of our method by evaluating it on the CelebAMask-HQ dataset with a wide range of tasks, including geometry/color/shadow/light editing, hand-drawn sketch to image translation, and color transfer. We also present ablation studies to justify our design.
The fast-spreading information over the internet is essential to support the rapid supply of numerous public utility services and entertainment to users. Social networks and online media paved the way for modern, timely-communication-fashion and convenient access to all types of information. However, it also provides new chances for ill use of the massive amount of available data, such as spreading fake content to manipulate public opinion. Detection of counterfeit content has raised attention in the last few years for the advances in deepfake generation. The rapid growth of machine learning techniques, particularly deep learning, can predict fake content in several application domains, including fake image and video manipulation. This paper presents a comprehensive review of recent studies for deepfake content detection using deep learning-based approaches. We aim to broaden the state-of-the-art research by systematically reviewing the different categories of fake content detection. Furthermore, we report the advantages and drawbacks of the examined works and future directions towards the issues and shortcomings still unsolved on deepfake detection.
Single frame data contains finite information which limits the performance of the existing vision-based multi-camera 3D object detection paradigms. For fundamentally pushing the performance boundary in this area, BEVDet4D is proposed to lift the scalable BEVDet paradigm from the spatial-only 3D space to the spatial-temporal 4D space. We upgrade the framework with a few modifications just for fusing the feature from the previous frame with the corresponding one in the current frame. In this way, with negligible extra computing budget, we enable the algorithm to access the temporal cues by querying and comparing the two candidate features. Beyond this, we also simplify the velocity learning task by removing the factors of ego-motion and time, which equips BEVDet4D with robust generalization performance and reduces the velocity error by 52.8%. This makes vision-based methods, for the first time, become comparable with those relied on LiDAR or radar in this aspect. On challenge benchmark nuScenes, we report a new record of 51.5% NDS with the high-performance configuration dubbed BEVDet4D-Base, which surpasses the previous leading method BEVDet by +4.3% NDS.
For many years now, understanding the brain mechanism has been a great research subject in many different fields. Brain signal processing and especially electroencephalogram (EEG) has recently known a growing interest both in academia and industry. One of the main examples is the increasing number of Brain-Computer Interfaces (BCI) aiming to link brains and computers. In this paper, we present a novel framework allowing us to retrieve the attention state, i.e degree of attention given to a specific task, from EEG signals. While previous methods often consider the spatial relationship in EEG through electrodes and process them in recurrent or convolutional based architecture, we propose here to also exploit the spatial and temporal information with a transformer-based network that has already shown its supremacy in many machine-learning (ML) related studies, e.g. machine translation. In addition to this novel architecture, an extensive study on the feature extraction methods, frequential bands and temporal windows length has also been carried out. The proposed network has been trained and validated on two public datasets and achieves higher results compared to state-of-the-art models. As well as proposing better results, the framework could be used in real applications, e.g. Attention Deficit Hyperactivity Disorder (ADHD) symptoms or vigilance during a driving assessment.
The paper explores the use of approximate inference techniques as solution methods for stochastic planning problems with discrete factored spaces. While much prior work exists on this topic, subtle variations hinder a global understanding of different approaches for their differences and potential advantages. Here we abstract a simple framework that captures and connects prior work along two dimensions, direction of information flow, i.e., forward vs backward inference, and the type of approximation used, e.g., Belief Propagation (BP) vs mean field variational inference (MFVI). Through this analysis we also propose a novel algorithm, CSVI, which provides a tighter variational approximation compared to prior work. An extensive experimental evaluation compares algorithms from different branches of the framework, showing that methods based on BP are generally better than methods based on MFVI, that CSVI is competitive with BP algorithms, and that while inference direction does not show a significant effect for VI methods, forward inference provides stronger performance with BP.