Predicting material properties base on micro structure of materials has long been a challenging problem. Recently many deep learning methods have been developed for material property prediction. In this study, we propose a crystal representation learning framework, Orbital CrystalNet, OCrystalNet, which consists of two parts: atomic descriptor generation and graph representation learning. In OCrystalNet, we first incorporate orbital field matrix (OFM) and atomic features to construct OFM-feature atomic descriptor, and then the atomic descriptor is used as atom embedding in the atom-bond message passing module which takes advantage of the topological structure of crystal graphs to learn crystal representation. To demonstrate the capabilities of OCrystalNet we performed a number of prediction tasks on Material Project dataset and JARVIS dataset and compared our model with other baselines and state of art methods. To further present the effectiveness of OCrystalNet, we conducted ablation study and case study of our model. The results show that our model have various advantages over other state of art models.
In the scenario of black-box adversarial attack, the target model's parameters are unknown, and the attacker aims to find a successful adversarial perturbation based on query feedback under a query budget. Due to the limited feedback information, existing query-based black-box attack methods often require many queries for attacking each benign example. To reduce query cost, we propose to utilize the feedback information across historical attacks, dubbed example-level adversarial transferability. Specifically, by treating the attack on each benign example as one task, we develop a meta-learning framework by training a meta-generator to produce perturbations conditioned on benign examples. When attacking a new benign example, the meta generator can be quickly fine-tuned based on the feedback information of the new task as well as a few historical attacks to produce effective perturbations. Moreover, since the meta-train procedure consumes many queries to learn a generalizable generator, we utilize model-level adversarial transferability to train the meta-generator on a white-box surrogate model, then transfer it to help the attack against the target model. The proposed framework with the two types of adversarial transferability can be naturally combined with any off-the-shelf query-based attack methods to boost their performance, which is verified by extensive experiments.
In this paper, a semantic communication framework for image transmission is developed. In the investigated framework, a set of servers cooperatively transmit images to a set of users utilizing semantic communication techniques. To evaluate the performance of studied semantic communication system, a multimodal metric is proposed to measure the correlation between the extracted semantic information and the original image. To meet the ISS requirement of each user, each server must jointly determine the semantic information to be transmitted and the resource blocks (RBs) used for semantic information transmission. We formulate this problem as an optimization problem aiming to minimize each server's transmission latency while reaching the ISS requirement. To solve this problem, a value decomposition based entropy-maximized multi-agent reinforcement learning (RL) is proposed, which enables servers to coordinate for training and execute RB allocation in a distributed manner to approach to a globally optimal performance with less training iterations. Compared to traditional multi-agent RL, the proposed RL improves the valuable action exploration of servers and the probability of finding a globally optimal RB allocation policy based on local observation. Simulation results show that the proposed algorithm can reduce the transmission delay by up to 16.1% compared to traditional multi-agent RL.
Current dataset collection methods typically scrape large amounts of data from the web. While this technique is extremely scalable, data collected in this way tends to reinforce stereotypical biases, can contain personally identifiable information, and typically originates from Europe and North America. In this work, we rethink the dataset collection paradigm and introduce GeoDE, a geographically diverse dataset with 61,940 images from 40 classes and 6 world regions, and no personally identifiable information, collected through crowd-sourcing. We analyse GeoDE to understand differences in images collected in this manner compared to web-scraping. Despite the smaller size of this dataset, we demonstrate its use as both an evaluation and training dataset, highlight shortcomings in current models, as well as show improved performances when even small amounts of GeoDE (1000 - 2000 images per region) are added to a training dataset. We release the full dataset and code at https://geodiverse-data-collection.cs.princeton.edu/
In this paper, we introduce Jointist, an instrument-aware multi-instrument framework that is capable of transcribing, recognizing, and separating multiple musical instruments from an audio clip. Jointist consists of an instrument recognition module that conditions the other two modules: a transcription module that outputs instrument-specific piano rolls, and a source separation module that utilizes instrument information and transcription results. The joint training of the transcription and source separation modules serves to improve the performance of both tasks. The instrument module is optional and can be directly controlled by human users. This makes Jointist a flexible user-controllable framework. Our challenging problem formulation makes the model highly useful in the real world given that modern popular music typically consists of multiple instruments. Its novelty, however, necessitates a new perspective on how to evaluate such a model. In our experiments, we assess the proposed model from various aspects, providing a new evaluation perspective for multi-instrument transcription. Our subjective listening study shows that Jointist achieves state-of-the-art performance on popular music, outperforming existing multi-instrument transcription models such as MT3. We conducted experiments on several downstream tasks and found that the proposed method improved transcription by more than 1 percentage points (ppt.), source separation by 5 SDR, downbeat detection by 1.8 ppt., chord recognition by 1.4 ppt., and key estimation by 1.4 ppt., when utilizing transcription results obtained from Jointist. Demo available at \url{https://jointist.github.io/Demo}.
This paper studies the fundamental limits of reinforcement learning (RL) in the challenging \emph{partially observable} setting. While it is well-established that learning in Partially Observable Markov Decision Processes (POMDPs) requires exponentially many samples in the worst case, a surge of recent work shows that polynomial sample complexities are achievable under the \emph{revealing condition} -- A natural condition that requires the observables to reveal some information about the unobserved latent states. However, the fundamental limits for learning in revealing POMDPs are much less understood, with existing lower bounds being rather preliminary and having substantial gaps from the current best upper bounds. We establish strong PAC and regret lower bounds for learning in revealing POMDPs. Our lower bounds scale polynomially in all relevant problem parameters in a multiplicative fashion, and achieve significantly smaller gaps against the current best upper bounds, providing a solid starting point for future studies. In particular, for \emph{multi-step} revealing POMDPs, we show that (1) the latent state-space dependence is at least $\Omega(S^{1.5})$ in the PAC sample complexity, which is notably harder than the $\widetilde{\Theta}(S)$ scaling for fully-observable MDPs; (2) Any polynomial sublinear regret is at least $\Omega(T^{2/3})$, suggesting its fundamental difference from the \emph{single-step} case where $\widetilde{O}(\sqrt{T})$ regret is achievable. Technically, our hard instance construction adapts techniques in \emph{distribution testing}, which is new to the RL literature and may be of independent interest.
When solving a problem, human beings have the adaptive ability in terms of the type of information they use, the procedure they take, and the amount of time they spend approaching and solving the problem. However, most standard neural networks have the same function type and fixed computation budget on different samples regardless of their nature and difficulty. Adaptivity is a powerful paradigm as it not only imbues practitioners with flexibility pertaining to the downstream usage of these models but can also serve as a powerful inductive bias for solving certain challenging classes of problems. In this work, we propose a new strategy, AdaTape, that enables dynamic computation in neural networks via adaptive tape tokens. AdaTape employs an elastic input sequence by equipping an existing architecture with a dynamic read-and-write tape. Specifically, we adaptively generate input sequences using tape tokens obtained from a tape bank that can either be trainable or generated from input data. We analyze the challenges and requirements to obtain dynamic sequence content and length, and propose the Adaptive Tape Reader (ATR) algorithm to achieve both objectives. Via extensive experiments on image recognition tasks, we show that AdaTape can achieve better performance while maintaining the computational cost.
The emergence of the fifth-generation (5G) New Radio (NR) brings additional possibilities to vehicle-to-everything (V2X) network with improved quality of services. In order to obtain accurate channel state information (CSI) in high-mobility V2X networks, pilot signals and frequent handover between vehicles and infrastructures are required to establish and maintain the communication link, which increases the overheads and reduces the communication throughput. To address this issue, integrated sensing and communications (ISAC) was employed at the base station (BS) in the vehicle-to-infrastructure (V2I) network to reduce a certain amount of overheads, thus improve the spectral efficiency. Nevertheless, the exact amount of overheads reduction remains unclear, particularly for practical NR based V2X networks. In this paper, we study a link-level NR based V2I system employing ISAC signaling to facilitate the communication beam management, where the Extended Kalman filtering (EKF) algorithm is performed to realize the functions of tracking and predicting the motion of the vehicle. We provide detailed analysis on the overheads reduction with the aid of ISAC, and show that up to 43.24% overheads can be reduced under assigned NR frame structure. In addition, numerical results are provided to validate the improved performance on the beam tracking and communication throughput.
A distributed spatio-temporal information based cooperative positioning (STICP) algorithm is proposed for wireless networks that require three-dimensional (3D) coordinates and operate in the global navigation satellite system (GNSS) denied environments. Our algorithm supports any type of ranging measurements that can determine the distance between nodes. We first utilize a finite symmetric sampling based scaled unscented transform (SUT) method for approximating the nonlinear terms of the messages passing on the associated factor graph (FG) with high precision, despite relying on a small number of samples. Then, we propose an enhanced anchor upgrading mechanism to avoid any redundant iterations. Our simulation results and analysis show that the proposed STICP has a lower computational complexity than the state-of-the-art belief propagation based localizer, despite achieving an even more competitive positioning performance.
Stroke is a medical condition that can affect motor function, particularly dynamic balance. Biofeedback can aid in rehabilitation procedures which help patients to regain lost motor activity and recover functionality. In this work, we are presenting a robotic smart-vest device that can analyze Inertial Measurement Unit (IMU) data and assist in rehabilitation procedures by providing timed feedback in the form of vibrotactile stimulation. Information provided by principal caregivers and patients in the form of surveys and interviews, is used to hypothesize potential clinical causes and to derive alternative three alternative clinical modalities: Artificial Vestibular Feedback, Gait Pacemaker and Risk-Predictor.