Segment Anything Model (SAM) has revolutionized the way of segmentation. However, SAM's performance may decline when applied to tasks involving domains that differ from natural images. Nonetheless, by employing fine-tuning techniques, SAM exhibits promising capabilities in specific domains, such as medicine and planetary science. Notably, there is a lack of research on the application of SAM to sonar imaging. In this paper, we aim to address this gap by conducting a comprehensive investigation of SAM's performance on sonar images. Specifically, we evaluate SAM using various settings on sonar images. Additionally, we fine-tune SAM using effective methods both with prompts and for semantic segmentation, thereby expanding its applicability to tasks requiring automated segmentation. Experimental results demonstrate a significant improvement in the performance of the fine-tuned SAM.
The widespread adoption of edge computing has emerged as a prominent trend for alleviating task processing delays and reducing energy consumption. However, the dynamic nature of network conditions and the varying computation capacities of edge servers (ESs) can introduce disparities between computation loads and available computing resources in edge computing networks, potentially leading to inadequate service quality. To address this challenge, this paper investigates a practical scenario characterized by dynamic task offloading. Initially, we examine traditional Multi-armed Bandit (MAB) algorithms, namely the $\varepsilon$-greedy algorithm and the UCB1-based algorithm. However, both algorithms exhibit certain weaknesses in effectively addressing the tidal data traffic patterns. Consequently, based on MAB, we propose an adaptive task offloading algorithm (ATOA) that overcomes these limitations. By conducting extensive simulations, we demonstrate the superiority of our ATOA solution in reducing task processing latency compared to conventional MAB methods. This substantiates the effectiveness of our approach in enhancing the performance of edge computing networks and improving overall service quality.
Achieving seamless global coverage is one of the ultimate goals of space-air-ground integrated network, as a part of which High Altitude Platform (HAP) network can provide wide-area coverage. However, deploying a large number of HAPs will lead to severe congestion of existing frequency bands. Spectrum sharing improves spectrum utilization. The coverage performance improvement and interference caused by spectrum sharing need to be investigated. To this end, this paper analyzes the performance of spectrum sharing between HAP network and terrestrial network. We firstly generalize the Poisson Point Process (PPP) to curves, surfaces and manifolds to model the distribution of terrestrial Base Stations (BSs) and HAPs. Then, the closed-form expressions for coverage probability of HAP network and terrestrial network are derived based on differential geometry and stochastic geometry. We verify the accuracy of closed-form expressions by Monte Carlo simulation. The results show that HAP network has less interference to terrestrial network. Low height and suitable deployment density can improve the coverage probability and transmission capacity of HAP network.
In this paper, we introduce FMapping, an efficient neural field mapping framework that facilitates the continuous estimation of a colorized point cloud map in real-time dense RGB SLAM. To achieve this challenging goal without depth, a hurdle is how to improve efficiency and reduce the mapping uncertainty of the RGB SLAM system. To this end, we first build up a theoretical analysis by decomposing the SLAM system into tracking and mapping parts, and the mapping uncertainty is explicitly defined within the frame of neural representations. Based on the analysis, we then propose an effective factorization scheme for scene representation and introduce a sliding window strategy to reduce the uncertainty for scene reconstruction. Specifically, we leverage the factorized neural field to decompose uncertainty into a lower-dimensional space, which enhances robustness to noise and improves training efficiency. We then propose the sliding window sampler to reduce uncertainty by incorporating coherent geometric cues from observed frames during map initialization to enhance convergence. Our factorized neural mapping approach enjoys some advantages, such as low memory consumption, more efficient computation, and fast convergence during map initialization. Experiments on two benchmark datasets show that our method can update the map of high-fidelity colorized point clouds around 2 seconds in real time while requiring no customized CUDA kernels. Additionally, it utilizes x20 fewer parameters than the most concise neural implicit mapping of prior methods for SLAM, e.g., iMAP [ 31] and around x1000 fewer parameters than the state-of-the-art approach, e.g., NICE-SLAM [ 42]. For more details, please refer to our project homepage: https://vlis2022.github.io/fmap/.
Generating and editing a 3D scene guided by natural language poses a challenge, primarily due to the complexity of specifying the positional relations and volumetric changes within the 3D space. Recent advancements in Large Language Models (LLMs) have demonstrated impressive reasoning, conversational, and zero-shot generation abilities across various domains. Surprisingly, these models also show great potential in realizing and interpreting the 3D space. In light of this, we propose a novel language-guided interactive 3D generation system, dubbed LI3D, that integrates LLMs as a 3D layout interpreter into the off-the-shelf layout-to-3D generative models, allowing users to flexibly and interactively generate visual content. Specifically, we design a versatile layout structure base on the bounding boxes and semantics to prompt the LLMs to model the spatial generation and reasoning from language. Our system also incorporates LLaVA, a large language and vision assistant, to provide generative feedback from the visual aspect for improving the visual quality of generated content. We validate the effectiveness of LI3D, primarily in 3D generation and editing through multi-round interactions, which can be flexibly extended to 2D generation and editing. Various experiments demonstrate the potential benefits of incorporating LLMs in generative AI for applications, e.g., metaverse. Moreover, we benchmark the layout reasoning performance of LLMs with neural visual artist tasks, revealing their emergent ability in the spatial layout domain.
Images captured by rolling shutter (RS) cameras under fast camera motion often contain obvious image distortions and blur, which can be modeled as a row-wise combination of a sequence of global shutter (GS) frames within the exposure time naturally, recovering high-frame-rate GS sharp frames from an RS blur image needs to simultaneously consider RS correction, deblur, and frame interpolation Taking this task is nontrivial, and to our knowledge, no feasible solutions exist by far. A naive way is to decompose the complete process into separate tasks and simply cascade existing methods; however, this results in cumulative errors and noticeable artifacts. Event cameras enjoy many advantages, e.g., high temporal resolution, making them potential for our problem. To this end, we make the first attempt to recover high-frame-rate sharp GS frames from an RS blur image and paired event data. Our key idea is to learn an implicit neural representation (INR) to directly map the position and time coordinates to RGB values to address the interlocking degradations in the image restoration process. Specifically, we introduce spatial-temporal implicit encoding (STE) to convert an RS blur image and events into a spatial-temporal representation (STR). To query a specific sharp frame (GS or RS), we embed the exposure time into STR and decode the embedded features to recover a sharp frame. Moreover, we propose an RS blur image-guided integral loss to better train the network. Our method is relatively lightweight as it contains only 0.379M parameters and demonstrates high efficiency as the STE is called only once for any number of interpolation frames. Extensive experiments show that our method significantly outperforms prior methods addressing only one or two of the tasks.
This paper addresses the robustness of a network to sustain its connectivity and controllability against malicious attacks. This kind of network robustness is typically measured by the time-consuming attack simulation, which returns a sequence of values that record the remaining connectivity and controllability after a sequence of node- or edge-removal attacks. For improvement, this paper develops an efficient framework for network robustness prediction, the spatial pyramid pooling convolutional neural network (SPP-CNN). The new framework installs a spatial pyramid pooling layer between the convolutional and fully-connected layers, overcoming the common mismatch issue in the CNN-based prediction approaches and extending its generalizability. Extensive experiments are carried out by comparing SPP-CNN with three state-of-the-art robustness predictors, namely a CNN-based and two graph neural networks-based frameworks. Synthetic and real-world networks, both directed and undirected, are investigated. Experimental results demonstrate that the proposed SPP-CNN achieves better prediction performances and better generalizability to unknown datasets, with significantly lower time-consumption, than its counterparts.
Cloud native technology has revolutionized 5G beyond and 6G communication networks, offering unprecedented levels of operational automation, flexibility, and adaptability. However, the vast array of cloud native services and applications presents a new challenge in resource allocation for dynamic cloud computing environments. To tackle this challenge, we investigate a cloud native wireless architecture that employs container-based virtualization to enable flexible service deployment. We then study two representative use cases: network slicing and Multi-Access Edge Computing. To optimize resource allocation in these scenarios, we leverage deep reinforcement learning techniques and introduce two model-free algorithms capable of monitoring the network state and dynamically training allocation policies. We validate the effectiveness of our algorithms in a testbed developed using Free5gc. Our findings demonstrate significant improvements in network efficiency, underscoring the potential of our proposed techniques in unlocking the full potential of cloud native wireless networks.
Recently, omnidirectional images (ODIs) have become increasingly popular; however, their angular resolution tends to be lower than that of perspective images.This leads to degraded structural details such as edges, causing difficulty in learning 3D scene understanding tasks, especially monocular depth estimation. Existing methods typically leverage high-resolution (HR) ODI as the input, so as to recover the structural details via fully-supervised learning. However, the HR depth ground truth (GT) maps may be arduous or expensive to be collected due to resource-constrained devices in practice. Therefore, in this paper, we explore for the first time to estimate the HR omnidirectional depth directly from a low-resolution (LR) ODI, when no HR depth GT map is available. Our key idea is to transfer the scene structural knowledge from the readily available HR image modality and the corresponding LR depth maps to achieve the goal of HR depth estimation without extra inference cost. Specifically, we introduce ODI super-resolution (SR) as an auxiliary task and train both tasks collaboratively in a weakly supervised manner to boost the performance of HR depth estimation. The ODI SR task takes an LR ODI as the input to predict an HR image, enabling us to extract the scene structural knowledge via uncertainty estimation. Buttressed by this, a scene structural knowledge transfer (SSKT) module is proposed with two key components. First, we employ a cylindrical implicit interpolation function (CIIF) to learn cylindrical neural interpolation weights for feature up-sampling and share the parameters of CIIFs between the two tasks. Then, we propose a feature distillation (FD) loss that provides extra structural regularization to help the HR depth estimation task learn more scene structural knowledge.
Semi-supervised learning (SSL) has attracted much attention since it reduces the expensive costs of collecting adequate well-labeled training data, especially for deep learning methods. However, traditional SSL is built upon an assumption that labeled and unlabeled data should be from the same distribution e.g., classes and domains. However, in practical scenarios, unlabeled data would be from unseen classes or unseen domains, and it is still challenging to exploit them by existing SSL methods. Therefore, in this paper, we proposed a unified framework to leverage these unseen unlabeled data for open-scenario semi-supervised medical image classification. We first design a novel scoring mechanism, called dual-path outliers estimation, to identify samples from unseen classes. Meanwhile, to extract unseen-domain samples, we then apply an effective variational autoencoder (VAE) pre-training. After that, we conduct domain adaptation to fully exploit the value of the detected unseen-domain samples to boost semi-supervised training. We evaluated our proposed framework on dermatology and ophthalmology tasks. Extensive experiments demonstrate our model can achieve superior classification performance in various medical SSL scenarios.