Thanks to the advantages of flexible deployment and high mobility, unmanned aerial vehicles (UAVs) have been widely applied in the areas of disaster management, agricultural plant protection, environment monitoring and so on. With the development of UAV and sensor technologies, UAV assisted data collection for Internet of Things (IoT) has attracted increasing attentions. In this article, the scenarios and key technologies of UAV assisted data collection are comprehensively reviewed. First, we present the system model including the network model and mathematical model of UAV assisted data collection for IoT. Then, we review the key technologies including clustering of sensors, UAV data collection mode as well as joint path planning and resource allocation. Finally, the open problems are discussed from the perspectives of efficient multiple access as well as joint sensing and data collection. This article hopefully provides some guidelines and insights for researchers in the area of UAV assisted data collection for IoT.
Temporal action localization (TAL) aims to detect the boundary and identify the class of each action instance in a long untrimmed video. Current approaches treat video frames homogeneously, and tend to give background and key objects excessive attention. This limits their sensitivity to localize action boundaries. To this end, we propose a prior-enhanced temporal action localization method (PETAL), which only takes in RGB input and incorporates action subjects as priors. This proposal leverages action subjects' information with a plug-and-play subject-aware spatial attention module (SA-SAM) to generate an aggregated and subject-prioritized representation. Experimental results on THUMOS-14 and ActivityNet-1.3 datasets demonstrate that the proposed PETAL achieves competitive performance using only RGB features, e.g., boosting mAP by 2.41% or 0.25% over the state-of-the-art approach that uses RGB features or with additional optical flow features on the THUMOS-14 dataset.
Multimodal tasks in the fashion domain have significant potential for e-commerce, but involve challenging vision-and-language learning problems - e.g., retrieving a fashion item given a reference image plus text feedback from a user. Prior works on multimodal fashion tasks have either been limited by the data in individual benchmarks, or have leveraged generic vision-and-language pre-training but have not taken advantage of the characteristics of fashion data. Additionally, these works have mainly been restricted to multimodal understanding tasks. To address these gaps, we make two key contributions. First, we propose a novel fashion-specific pre-training framework based on weakly-supervised triplets constructed from fashion image-text pairs. We show the triplet-based tasks are an effective addition to standard multimodal pre-training tasks. Second, we propose a flexible decoder-based model architecture capable of both fashion retrieval and captioning tasks. Together, our model design and pre-training approach are competitive on a diverse set of fashion tasks, including cross-modal retrieval, image retrieval with text feedback, image captioning, relative image captioning, and multimodal categorization.
The line spectrum estimation problem is considered in this paper. We propose a CFAR-based Newtonized OMP (NOMP-CFAR) method which can maintain a desired false alarm rate without the knowledge of the noise variance. The NOMP-CFAR consists of two steps, namely, an initialization step and a detection step. In the initialization step, NOMP is employed to obtain candidate sinusoidal components. In the detection step, CFAR detector is applied to detect each candidate frequency, and remove the most unlikely frequency component. Then, the Newton refinements are used to refine the remaining parameters. The relationship between the false alarm rate and the required threshold is established. By comparing with the NOMP, NOMP-CFAR has only $1$ dB performance loss in additive white Gaussian noise scenario with false alarm probability $10^{-2}$ and detection probability $0.8$ without knowledge of noise variance. For varied noise variance scenario, NOMP-CFAR still preserves its CFAR property, while NOMP violates the CFAR. Besides, real experiments are also conducted to demonstrate the detection performance of NOMP-CFAR, compared to CFAR and NOMP.
Most reinforcement learning algorithms implicitly assume strong synchrony. We present novel attacks targeting Q-learning that exploit a vulnerability entailed by this assumption by delaying the reward signal for a limited time period. We consider two types of attack goals: targeted attacks, which aim to cause a target policy to be learned, and untargeted attacks, which simply aim to induce a policy with a low reward. We evaluate the efficacy of the proposed attacks through a series of experiments. Our first observation is that reward-delay attacks are extremely effective when the goal is simply to minimize reward. Indeed, we find that even naive baseline reward-delay attacks are also highly successful in minimizing the reward. Targeted attacks, on the other hand, are more challenging, although we nevertheless demonstrate that the proposed approaches remain highly effective at achieving the attacker's targets. In addition, we introduce a second threat model that captures a minimal mitigation that ensures that rewards cannot be used out of sequence. We find that this mitigation remains insufficient to ensure robustness to attacks that delay, but preserve the order, of rewards.
Automatically measuring lesion/tumor size with RECIST (Response Evaluation Criteria In Solid Tumors) diameters and segmentation is important for computer-aided diagnosis. Although it has been studied in recent years, there is still space to improve its accuracy and robustness, such as (1) enhancing features by incorporating rich contextual information while keeping a high spatial resolution and (2) involving new tasks and losses for joint optimization. To reach this goal, this paper proposes a transformer-based network (MeaFormer, Measurement transFormer) for lesion RECIST diameter prediction and segmentation (LRDPS). It is formulated as three correlative and complementary tasks: lesion segmentation, heatmap prediction, and keypoint regression. To the best of our knowledge, it is the first time to use keypoint regression for RECIST diameter prediction. MeaFormer can enhance high-resolution features by employing transformers to capture their long-range dependencies. Two consistency losses are introduced to explicitly build relationships among these tasks for better optimization. Experiments show that MeaFormer achieves the state-of-the-art performance of LRDPS on the large-scale DeepLesion dataset and produces promising results of two downstream clinic-relevant tasks, i.e., 3D lesion segmentation and RECIST assessment in longitudinal studies.
In order to cope with the increasing demand for labeling data and privacy issues with human detection, synthetic data has been used as a substitute and showing promising results in human detection and tracking tasks. We participate in the 7th Workshop on Benchmarking Multi-Target Tracking (BMTT), themed on "How Far Can Synthetic Data Take us"? Our solution, PieTrack, is developed based on synthetic data without using any pre-trained weights. We propose a self-supervised domain adaptation method that enables mitigating the domain shift issue between the synthetic (e.g., MOTSynth) and real data (e.g., MOT17) without involving extra human labels. By leveraging the proposed multi-scale ensemble inference, we achieved a final HOTA score of 58.7 on the MOT17 testing set, ranked third place in the challenge.
Capitalizing on the recent advances in image generation models, existing controllable face image synthesis methods are able to generate high-fidelity images with some levels of controllability, e.g., controlling the shapes, expressions, textures, and poses of the generated face images. However, these methods focus on 2D image generative models, which are prone to producing inconsistent face images under large expression and pose changes. In this paper, we propose a new NeRF-based conditional 3D face synthesis framework, which enables 3D controllability over the generated face images by imposing explicit 3D conditions from 3D face priors. At its core is a conditional Generative Occupancy Field (cGOF) that effectively enforces the shape of the generated face to commit to a given 3D Morphable Model (3DMM) mesh. To achieve accurate control over fine-grained 3D face shapes of the synthesized image, we additionally incorporate a 3D landmark loss as well as a volume warping loss into our synthesis algorithm. Experiments validate the effectiveness of the proposed method, which is able to generate high-fidelity face images and shows more precise 3D controllability than state-of-the-art 2D-based controllable face synthesis methods. Find code and demo at https://keqiangsun.github.io/projects/cgof.
Anomaly subgraph detection has been widely used in various applications, ranging from cyber attack in computer networks to malicious activities in social networks. Despite an increasing need for federated anomaly detection across multiple attributed networks, only a limited number of approaches are available for this problem. Federated anomaly detection faces two major challenges. One is that isolated data in most industries are restricted share with others for data privacy and security. The other is most of the centralized approaches training based on data integration. The main idea of federated anomaly detection is aligning private anomalies from local data owners on the public anomalies from the attributed network in the server through public anomalies to federate local anomalies. In each private attributed network, the detected anomaly subgraph is aligned with an anomaly subgraph in the public attributed network. The significant public anomaly subgraphs are selected for federated private anomalies while preventing local private data leakage. The proposed algorithm FadMan is a vertical federated learning framework for public node aligned with many private nodes of different features, and is validated on two tasks correlated anomaly detection on multiple attributed networks and anomaly detection on an attributeless network using five real-world datasets. In the first scenario, FadMan outperforms competitive methods by at least 12% accuracy at 10% noise level. In the second scenario, by analyzing the distribution of abnormal nodes, we find that the nodes of traffic anomalies are associated with the event of postgraduate entrance examination on the same day.
As an important technology to ensure data security, consistency, traceability, etc., blockchain has been increasingly used in Internet of Things (IoT) applications. The integration of blockchain and edge computing can further improve the resource utilization in terms of network, computing, storage, and security. This paper aims to present a survey on the integration of blockchain and edge computing. In particular, we first give an overview of blockchain and edge computing. We then present a general architecture of an integration of blockchain and edge computing system. We next study how to utilize blockchain to benefit edge computing, as well as how to use edge computing to benefit blockchain. We also discuss the issues brought by the integration of blockchain and edge computing system and solutions from perspectives of resource management, joint optimization, data management, computation offloading and security mechanism. Finally, we analyze and summarize the existing challenges posed by the integration of blockchain and edge computing system and the potential solutions in the future.