Weakly supervised object localization and semantic segmentation aim to localize objects using only image-level labels. Recently, a new paradigm has emerged by generating a foreground prediction map (FPM) to achieve pixel-level localization. While existing FPM-based methods use cross-entropy to evaluate the foreground prediction map and to guide the learning of the generator, this paper presents two astonishing experimental observations on the object localization learning process: For a trained network, as the foreground mask expands, 1) the cross-entropy converges to zero when the foreground mask covers only part of the object region. 2) The activation value continuously increases until the foreground mask expands to the object boundary. Therefore, to achieve a more effective localization performance, we argue for the usage of activation value to learn more object regions. In this paper, we propose a Background Activation Suppression (BAS) method. Specifically, an Activation Map Constraint (AMC) module is designed to facilitate the learning of generator by suppressing the background activation value. Meanwhile, by using foreground region guidance and area constraint, BAS can learn the whole region of the object. In the inference phase, we consider the prediction maps of different categories together to obtain the final localization results. Extensive experiments show that BAS achieves significant and consistent improvement over the baseline methods on the CUB-200-2011 and ILSVRC datasets. In addition, our method also achieves state-of-the-art weakly supervised semantic segmentation performance on the PASCAL VOC 2012 and MS COCO 2014 datasets. Code and models are available at https://github.com/wpy1999/BAS-Extension.
Inductive link prediction -- where entities during training and inference stages can be different -- has shown great potential for completing evolving knowledge graphs in an entity-independent manner. Many popular methods mainly focus on modeling graph-level features, while the edge-level interactions -- especially the semantic correlations between relations -- have been less explored. However, we notice a desirable property of semantic correlations between relations is that they are inherently edge-level and entity-independent. This implies the great potential of the semantic correlations for the entity-independent inductive link prediction task. Inspired by this observation, we propose a novel subgraph-based method, namely TACO, to model Topology-Aware COrrelations between relations that are highly correlated to their topological structures within subgraphs. Specifically, we prove that semantic correlations between any two relations can be categorized into seven topological patterns, and then proposes Relational Correlation Network (RCN) to learn the importance of each pattern. To further exploit the potential of RCN, we propose Complete Common Neighbor induced subgraph that can effectively preserve complete topological patterns within the subgraph. Extensive experiments demonstrate that TACO effectively unifies the graph-level information and edge-level interactions to jointly perform reasoning, leading to a superior performance over existing state-of-the-art methods for the inductive link prediction task.
Logic Synthesis (LS) plays a vital role in chip design -- a cornerstone of the semiconductor industry. A key task in LS is to transform circuits -- modeled by directed acyclic graphs (DAGs) -- into simplified circuits with equivalent functionalities. To tackle this task, many LS operators apply transformations to subgraphs -- rooted at each node on an input DAG -- sequentially. However, we found that a large number of transformations are ineffective, which makes applying these operators highly time-consuming. In particular, we notice that the runtime of the Resub and Mfs2 operators often dominates the overall runtime of LS optimization processes. To address this challenge, we propose a novel data-driven LS operator paradigm, namely PruneX, to reduce ineffective transformations. The major challenge of developing PruneX is to learn models that well generalize to unseen circuits, i.e., the out-of-distribution (OOD) generalization problem. Thus, the major technical contribution of PruneX is the novel circuit domain generalization framework, which learns domain-invariant representations based on the transformation-invariant domain-knowledge. To the best of our knowledge, PruneX is the first approach to tackle the OOD problem in LS operators. We integrate PruneX with the aforementioned Resub and Mfs2 operators. Experiments demonstrate that PruneX significantly improves their efficiency while keeping comparable optimization performance on industrial and very large-scale circuits, achieving up to $3.1\times$ faster runtime.
Tremendous efforts have been made on document image rectification, but how to learn effective representation of such distorted images is still under-explored. In this paper, we present DocMAE, a novel self-supervised framework for document image rectification. Our motivation is to encode the structural cues in document images by leveraging masked autoencoder to benefit the rectification, i.e., the document boundaries, and text lines. Specifically, we first mask random patches of the background-excluded document images and then reconstruct the missing pixels. With such a self-supervised learning approach, the network is encouraged to learn the intrinsic structure of deformed documents by restoring document boundaries and missing text lines. Transfer performance in the downstream rectification task validates the effectiveness of our method. Extensive experiments are conducted to demonstrate the effectiveness of our method.
Local feature matching aims at finding correspondences between a pair of images. Although current detector-free methods leverage Transformer architecture to obtain an impressive performance, few works consider maintaining local consistency. Meanwhile, most methods struggle with large scale variations. To deal with the above issues, we propose Adaptive Spot-Guided Transformer (ASTR) for local feature matching, which jointly models the local consistency and scale variations in a unified coarse-to-fine architecture. The proposed ASTR enjoys several merits. First, we design a spot-guided aggregation module to avoid interfering with irrelevant areas during feature aggregation. Second, we design an adaptive scaling module to adjust the size of grids according to the calculated depth information at fine stage. Extensive experimental results on five standard benchmarks demonstrate that our ASTR performs favorably against state-of-the-art methods. Our code will be released on https://astr2023.github.io.
Subgraph-wise sampling -- a promising class of mini-batch training techniques for graph neural networks (GNNs -- is critical for real-world applications. During the message passing (MP) in GNNs, subgraph-wise sampling methods discard messages outside the mini-batches in backward passes to avoid the well-known neighbor explosion problem, i.e., the exponentially increasing dependencies of nodes with the number of MP iterations. However, discarding messages may sacrifice the gradient estimation accuracy, posing significant challenges to their convergence analysis and convergence speeds. To address this challenge, we propose a novel subgraph-wise sampling method with a convergence guarantee, namely Local Message Compensation (LMC). To the best of our knowledge, LMC is the first subgraph-wise sampling method with provable convergence. The key idea is to retrieve the discarded messages in backward passes based on a message passing formulation of backward passes. By efficient and effective compensations for the discarded messages in both forward and backward passes, LMC computes accurate mini-batch gradients and thus accelerates convergence. Moreover, LMC is applicable to various MP-based GNN architectures, including convolutional GNNs (finite message passing iterations with different layers) and recurrent GNNs (infinite message passing iterations with a shared layer). Experiments on large-scale benchmarks demonstrate that LMC is significantly faster than state-of-the-art subgraph-wise sampling methods.
Compared with random sampling, low-discrepancy sampling is more effective in covering the search space. However, the existing research cannot definitely state whether the impact of a low-discrepancy sample on particle swarm optimization (PSO) is positive or negative. Using Niderreiter's theorem, this study completes an error analysis of PSO, which reveals that the error bound of PSO at each iteration depends on the dispersion of the sample set in an expanded dimensional space. Based on this error analysis, an acceleration technique for PSO-type algorithms is proposed with low-discrepancy sampling in the expanded dimensional space. The acceleration technique can generate a low-discrepancy sample set with a smaller dispersion, compared with a random sampling, in the expanded dimensional space; it also reduces the error at each iteration, and hence improves the convergence speed. The acceleration technique is combined with the standard PSO and the comprehensive learning particle swarm optimization, and the performance of the improved algorithm is compared with the original algorithm. The experimental results show that the two improved algorithms have significantly faster convergence speed under the same accuracy requirement.
Edge service caching can effectively reduce the delay or bandwidth overhead for acquiring and initializing applications. To address single-base station (BS) transmission limitation and serious edge effect in traditional cellular-based edge service caching networks, in this paper, we proposed a novel user-centric edge service caching framework where each user is jointly provided with edge caching and wireless transmission services by a specific BS cluster instead of a single BS. To minimize the long-term average delay under the constraint of the caching cost, a mixed integer non-linear programming (MINLP) problem is formulated by jointly optimizing the BS clustering and service caching decisions. To tackle the problem, we propose JO-CDSD, an efficiently joint optimization algorithm based on Lyapunov optimization and generalized benders decomposition (GBD). In particular, the long-term optimization problem can be transformed into a primal problem and a master problem in each time slot that is much simpler to solve. The near-optimal clustering and caching strategy can be obtained through solving the primal and master problem alternately. Extensive simulations show that the proposed joint optimization algorithm outperforms other algorithms and can effectively reduce the long-term delay by at most $93.75% and caching cost by at most $53.12%.
In the traditional mobile edge computing (MEC) system, the availability of MEC services is greatly limited for the edge users of the cell due to serious signal attenuation and inter-cell interference. User-centric MEC (UC-MEC) can be seen as a promising solution to address this issue. In UC-MEC, each user is served by a dedicated access point (AP) cluster enabled with MEC capability instead of a single MEC server, however, at the expense of more energy consumption and greater privacy risks. To achieve efficient and reliable resource utilization with user-centric services, we propose an energy efficient blockchain-enabled UC-MEC system where blockchain operations and resource optimization are jointly performed. Firstly, we design a resource-aware, reliable, replicated, redundant, and fault-tolerant (R-RAFT) consensus mechanism to implement secure and reliable resource trading. Then, an optimization framework based on alternating direction method of multipliers (ADMM) is proposed to minimize the total energy consumed by wireless transmission, consensus and task computing, where APs clustering, computing resource allocation and bandwidth allocation are jointly considered. Simulation results show superiority of the proposed UC-MEC system over reference schemes, at most 33.96% reduction in the total delay and 48.77% reduction in the total energy consumption.
Generalization in partially observed markov decision processes (POMDPs) is critical for successful applications of visual reinforcement learning (VRL) in real scenarios. A widely used idea is to learn task-relevant representations that encode task-relevant information of common features in POMDPs, i.e., rewards and transition dynamics. As transition dynamics in the latent state space -- which are task-relevant and invariant to visual distractions -- are unknown to the agents, existing methods alternatively use transition dynamics in the observation space to extract task-relevant information in transition dynamics. However, such transition dynamics in the observation space involve task-irrelevant visual distractions, degrading the generalization performance of VRL methods. To tackle this problem, we propose the reward sequence distribution conditioned on the starting observation and the predefined subsequent action sequence (RSD-OA). The appealing features of RSD-OA include that: (1) RSD-OA is invariant to visual distractions, as it is conditioned on the predefined subsequent action sequence without task-irrelevant information from transition dynamics, and (2) the reward sequence captures long-term task-relevant information in both rewards and transition dynamics. Experiments demonstrate that our representation learning approach based on RSD-OA significantly improves the generalization performance on unseen environments, outperforming several state-of-the-arts on DeepMind Control tasks with visual distractions.