This paper introduces the distributed and intelligent integrated sensing and communications (DISAC) concept, a transformative approach for 6G wireless networks that extends the emerging concept of integrated sensing and communications (ISAC). DISAC addresses the limitations of the existing ISAC models and, to overcome them, it introduces two novel foundational functionalities for both sensing and communications: a distributed architecture and a semantic and goal-oriented framework. The distributed architecture enables large-scale and energy-efficient tracking of connected users and objects, leveraging the fusion of heterogeneous sensors. The semantic and goal-oriented intelligent and parsimonious framework, enables the transition from classical data fusion to the composition of semantically selected information, offering new paradigms for the optimization of resource utilization and exceptional multi-modal sensing performance across various use cases. This paper details DISAC's principles, architecture, and potential applications.
In Large Visual Language Models (LVLMs), the efficacy of In-Context Learning (ICL) remains limited by challenges in cross-modal interactions and representation disparities. To overcome these challenges, we introduce a novel Visual In-Context Learning (VICL) method comprising Visual Demonstration Retrieval, Intent-Oriented Image Summarization, and Intent-Oriented Demonstration Composition. Our approach retrieves images via ''Retrieval & Rerank'' paradigm, summarises images with task intent and task-specific visual parsing, and composes language-based demonstrations that reduce token count and alleviate cross-modal interaction problem. Experimental evaluations on five visual reasoning datasets demonstrate the effectiveness of our method. Moreover, our extensive experiments leverage information flow analysis to elucidate the effectiveness of our method, and investigate the impact of length and position of demonstrations for LVLM. The use of in-context unlearning further shows promise in resetting specific model knowledge without retraining.
Dense matching is crucial for 3D scene reconstruction since it enables the recovery of scene 3D geometry from image acquisition. Deep Learning (DL)-based methods have shown effectiveness in the special case of epipolar stereo disparity estimation in the computer vision community. DL-based methods depend heavily on the quality and quantity of training datasets. However, generating ground-truth disparity maps for real scenes remains a challenging task in the photogrammetry community. To address this challenge, we propose a method for generating ground-truth disparity maps directly from Light Detection and Ranging (LiDAR) and images to produce a large and diverse dataset for six aerial datasets across four different areas and two areas with different resolution images. We also introduce a LiDAR-to-image co-registration refinement to the framework that takes special precautions regarding occlusions and refrains from disparity interpolation to avoid precision loss. Evaluating 11 dense matching methods across datasets with diverse scene types, image resolutions, and geometric configurations, which are deeply investigated in dataset shift, GANet performs best with identical training and testing data, and PSMNet shows robustness across different datasets, and we proposed the best strategy for training with a limit dataset. We will also provide the dataset and training models; more information can be found at https://github.com/whuwuteng/Aerial_Stereo_Dataset.
Object properties perceived through the tactile sense, such as weight, friction, and slip, greatly influence motor control during manipulation tasks. However, the provision of tactile information during robotic training in neurorehabilitation has not been well explored. Therefore, we designed and evaluated a tactile interface based on a two-degrees-of-freedom moving platform mounted on a hand rehabilitation robot that provides skin stretch at four fingertips, from the index through the little finger. To accurately control the rendered forces, we included a custom magnetic-based force sensor to control the tactile interface in a closed loop. The technical evaluation showed that our custom force sensor achieved measurable shear forces of +-8N with accuracies of 95.2-98.4% influenced by hysteresis, viscoelastic creep, and torsional deformation. The tactile interface accurately rendered forces with a step response steady-state accuracy of 97.5-99.4% and a frequency response in the range of most activities of daily living. Our sensor showed the highest measurement-range-to-size ratio and comparable accuracy to sensors of its kind. These characteristics enabled the closed-loop force control of the tactile interface for precise rendering of multi-finger two-dimensional skin stretch. The proposed system is a first step towards more realistic and rich haptic feedback during robotic sensorimotor rehabilitation, potentially improving therapy outcomes.
In robotic manipulation, preventing objects from slipping and establishing a secure grip on them is critical. Successful manipulation requires tactile sensors that detect the microscopic incipient slip phenomenon at the contact surface. Unfortunately, the tiny signals generated by incipient slip are quickly buried by environmental noise, and precise stress-distribution measurement requires an extensive optical system and integrated circuits. In this study, we focus on the macroscopic deformation of the entire fingertip's soft structure instead of directly observing the contact surface and its role as a vibration medium for sensing. The proposed method compresses the stick ratio's information into a one-dimensional pressure signal using the change in the propagation characteristics by vibration injection into the soft structure, which magnifies the microscopic incipient slip phenomena into the entire deformation. This mechanism allows a tactile sensor to use just a single vibration sensor. In the implemented system, a biomimetic tactile sensor is vibrated using a white signal from a PZT motor and utilizes frequency spectrum change of the propagated vibration as features. We investigated the proposed method's effectiveness on stick-ratio estimation and \red{stick-ratio stabilization} control during incipient slip. Our estimation error and the control performance results significantly outperformed the conventional methods.
Reconstructing and rendering 3D objects from highly sparse views is of critical importance for promoting applications of 3D vision techniques and improving user experience. However, images from sparse views only contain very limited 3D information, leading to two significant challenges: 1) Difficulty in building multi-view consistency as images for matching are too few; 2) Partially omitted or highly compressed object information as view coverage is insufficient. To tackle these challenges, we propose GaussianObject, a framework to represent and render the 3D object with Gaussian splatting, that achieves high rendering quality with only 4 input images. We first introduce techniques of visual hull and floater elimination which explicitly inject structure priors into the initial optimization process for helping build multi-view consistency, yielding a coarse 3D Gaussian representation. Then we construct a Gaussian repair model based on diffusion models to supplement the omitted object information, where Gaussians are further refined. We design a self-generating strategy to obtain image pairs for training the repair model. Our GaussianObject is evaluated on several challenging datasets, including MipNeRF360, OmniObject3D, and OpenIllumination, achieving strong reconstruction results from only 4 views and significantly outperforming previous state-of-the-art methods.
The theory of sampling and recovery of bandlimited graph signals has been extensively studied. However, in many cases, the observation of a signal is quite coarse. For example, users only provide simple comments such as "like" or "dislike" for a product on an e-commerce platform. This is a particular scenario where only the sign information of a graph signal can be measured. In this paper, we are interested in how to sample based on sign information in an online manner, by which the direction of the original graph signal can be estimated. The online signed sampling problem of a graph signal can be formulated as a Markov decision process in a finite horizon. Unfortunately, it is intractable for large size graphs. We propose a low-complexity greedy signed sampling algorithm (GSS) as well as a stopping criterion. Meanwhile, we prove that the objective function is adaptive monotonic and adaptive submodular, so that the performance is close enough to the global optimum with a lower bound. Finally, we demonstrate the effectiveness of the GSS algorithm by both synthesis and realworld data.
Fourier Neural Operator (FNO) is a popular operator learning method, which has demonstrated state-of-the-art performance across many tasks. However, FNO is mainly used in forward prediction, yet a large family of applications rely on solving inverse problems. In this paper, we propose an invertible Fourier Neural Operator (iFNO) that tackles both the forward and inverse problems. We designed a series of invertible Fourier blocks in the latent channel space to share the model parameters, efficiently exchange the information, and mutually regularize the learning for the bi-directional tasks. We integrated a variational auto-encoder to capture the intrinsic structures within the input space and to enable posterior inference so as to overcome challenges of illposedness, data shortage, noises, etc. We developed a three-step process for pre-training and fine tuning for efficient training. The evaluations on five benchmark problems have demonstrated the effectiveness of our approach.
With the development of foundation models such as large language models, zero-shot transfer learning has become increasingly significant. This is highlighted by the generative capabilities of NLP models like GPT-4, and the retrieval-based approaches of CV models like CLIP, both of which effectively bridge the gap between seen and unseen data. In the realm of graph learning, the continuous emergence of new graphs and the challenges of human labeling also amplify the necessity for zero-shot transfer learning, driving the exploration of approaches that can generalize across diverse graph data without necessitating dataset-specific and label-specific fine-tuning. In this study, we extend such paradigms to zero-shot transferability in graphs by introducing ZeroG, a new framework tailored to enable cross-dataset generalization. Addressing the inherent challenges such as feature misalignment, mismatched label spaces, and negative transfer, we leverage a language model to encode both node attributes and class semantics, ensuring consistent feature dimensions across datasets. We also propose a prompt-based subgraph sampling module that enriches the semantic information and structure information of extracted subgraphs using prompting nodes and neighborhood aggregation, respectively. We further adopt a lightweight fine-tuning strategy that reduces the risk of overfitting and maintains the zero-shot learning efficacy of the language model. The results underscore the effectiveness of our model in achieving significant cross-dataset zero-shot transferability, opening pathways for the development of graph foundation models. Especially, ZeroG, as a zero-shot method, can even achieve results comparable to those of semi-supervised learning on Pubmed.
In the field of online sequential decision-making, we address the problem with delays utilizing the framework of online convex optimization (OCO), where the feedback of a decision can arrive with an unknown delay. Unlike previous research that is limited to Euclidean norm and gradient information, we propose three families of delayed algorithms based on approximate solutions to handle different types of received feedback. Our proposed algorithms are versatile and applicable to universal norms. Specifically, we introduce a family of Follow the Delayed Regularized Leader algorithms for feedback with full information on the loss function, a family of Delayed Mirror Descent algorithms for feedback with gradient information on the loss function and a family of Simplified Delayed Mirror Descent algorithms for feedback with the value information of the loss function's gradients at corresponding decision points. For each type of algorithm, we provide corresponding regret bounds under cases of general convexity and relative strong convexity, respectively. We also demonstrate the efficiency of each algorithm under different norms through concrete examples. Furthermore, our theoretical results are consistent with the current best bounds when degenerated to standard settings.