Abstract:Language-Guided object recognition in remote sensing imagery is crucial for large-scale mapping and automated data annotation. However, existing open-vocabulary and visual grounding methods rely on explicit category cues, limiting their ability to handle complex or implicit queries that require advanced reasoning. To address this issue, we introduce a new suite of tasks, including Instruction-Oriented Object Counting, Detection, and Segmentation (InstructCDS), covering open-vocabulary, open-ended, and open-subclass scenarios. We further present EarthInstruct, the first InstructCDS benchmark for earth observation. It is constructed from two diverse remote sensing datasets with varying spatial resolutions and annotation rules across 20 categories, necessitating models to interpret dataset-specific instructions. Given the scarcity of semantically rich labeled data in remote sensing, we propose InstructSAM, a training-free framework for instruction-driven object recognition. InstructSAM leverages large vision-language models to interpret user instructions and estimate object counts, employs SAM2 for mask proposal, and formulates mask-label assignment as a binary integer programming problem. By integrating semantic similarity with counting constraints, InstructSAM efficiently assigns categories to predicted masks without relying on confidence thresholds. Experiments demonstrate that InstructSAM matches or surpasses specialized baselines across multiple tasks while maintaining near-constant inference time regardless of object count, reducing output tokens by 89% and overall runtime by over 32% compared to direct generation approaches. We believe the contributions of the proposed tasks, benchmark, and effective approach will advance future research in developing versatile object recognition systems.
Abstract:Very high-resolution (VHR) satellite imagery has emerged as a powerful tool for monitoring marine animals on a large scale. However, existing deep learning-based whale detection methods usually require manually created, high-quality bounding box annotations, which are labor-intensive to produce. Moreover, existing studies often exclude ``uncertain whales'', individuals that have ambiguous appearances in satellite imagery, limiting the applicability of these models in real-world scenarios. To address these limitations, this study introduces an automated pipeline for detecting beluga whales and harp seals in VHR satellite imagery. The pipeline leverages point annotations and the Segment Anything Model (SAM) to generate precise bounding box annotations, which are used to train YOLOv8 for multiclass detection of certain whales, uncertain whales, and harp seals. Experimental results demonstrated that SAM-generated annotations significantly improved detection performance, achieving higher $\text{F}_\text{1}$-scores compared to traditional buffer-based annotations. YOLOv8 trained on SAM-labeled boxes achieved an overall $\text{F}_\text{1}$-score of 72.2% for whales overall and 70.3% for harp seals, with superior performance in dense scenes. The proposed approach not only reduces the manual effort required for annotation but also enhances the detection of uncertain whales, offering a more comprehensive solution for marine animal monitoring. This method holds great potential for extending to other species, habitats, and remote sensing platforms, as well as for estimating whale biometrics, thereby advancing ecological monitoring and conservation efforts. The codes for our label and detection pipeline are publicly available at http://github.com/voyagerxvoyagerx/beluga-seeker .
Abstract:In this paper, we propose a Boosting Tail Neural Network (BTNN) for improving the performance of Realtime Custom Keyword Spotting (RCKS) that is still an industrial challenge for demanding powerful classification ability with limited computation resources. Inspired by Brain Science that a brain is only partly activated for a nerve simulation and numerous machine learning algorithms are developed to use a batch of weak classifiers to resolve arduous problems, which are often proved to be effective. We show that this method is helpful to the RCKS problem. The proposed approach achieve better performances in terms of wakeup rate and false alarm. In our experiments compared with those traditional algorithms that use only one strong classifier, it gets 18\% relative improvement. We also point out that this approach may be promising in future ASR exploration.
Abstract:Real noisy-clean pairs on a large scale are costly and difficult to obtain. Meanwhile, supervised denoisers trained on synthetic data perform poorly in practice. Self-supervised denoisers, which learn only from single noisy images, solve the data collection problem. However, self-supervised denoising methods, especially blindspot-driven ones, suffer sizable information loss during input or network design. The absence of valuable information dramatically reduces the upper bound of denoising performance. In this paper, we propose a simple yet efficient approach called Blind2Unblind to overcome the information loss in blindspot-driven denoising methods. First, we introduce a global-aware mask mapper that enables global perception and accelerates training. The mask mapper samples all pixels at blind spots on denoised volumes and maps them to the same channel, allowing the loss function to optimize all blind spots at once. Second, we propose a re-visible loss to train the denoising network and make blind spots visible. The denoiser can learn directly from raw noise images without losing information or being trapped in identity mapping. We also theoretically analyze the convergence of the re-visible loss. Extensive experiments on synthetic and real-world datasets demonstrate the superior performance of our approach compared to previous work. Code is available at https://github.com/demonsjin/Blind2Unblind.
Abstract:The community integrated energy system (CIES) is an essential energy internet carrier that has recently been the focus of much attention. A scheduling model based on chance-constrained programming is proposed for integrated demand response (IDR)-enabled CIES in uncertain environments to minimize the system operating costs, where an IDR program is used to explore the potential interaction ability of electricity-gas-heat flexible loads and electric vehicles. Moreover, power to gas (P2G) and micro-gas turbine (MT), as links of multi-energy carriers, are adopted to strengthen the coupling of different energy subsystems. Sequence operation theory (SOT) and linearization methods are employed to transform the original model into a solvable mixed-integer linear programming model. Simulation results on a practical CIES in North China demonstrate an improvement in the CIES operational economy via the coordination of IDR and renewable uncertainties, with P2G and MT enhancing the system operational flexibility and user comprehensive satisfaction. The CIES operation is able to achieve a trade-off between economy and system reliability by setting a suitable confidence level for the spinning reserve constraints. Besides, the proposed solution method outperforms the Hybrid Intelligent Algorithm in terms of both optimization results and calculation efficiency.
Abstract:A community integrated energy system (CIES) with an electric vehicle charging station (EVCS) provides a new way for tackling growing concerns of energy efficiency and environmental pollution, it is a critical task to coordinate flexible demand response and multiple renewable uncertainties. To this end, a novel bi-level optimal dispatching model for the CIES with an EVCS in multi-stakeholder scenarios is established in this paper. In this model, an integrated demand response program is designed to promote a balance between energy supply and demand while maintaining a user comprehensive satisfaction within an acceptable range. To further tap the potential of demand response through flexibly guiding users' energy consumption and electric vehicles' behaviors (charging, discharging and providing spinning reserves), a dynamic pricing mechanism combining time-of-use and real-time pricing is put forward. In the solution phase, by using sequence operation theory (SOT), the original chance-constrained programming (CCP) model is converted into a readily solvable mixed-integer linear programming (MILP) formulation and finally solved by CPLEX solver. The simulation results on a practical CIES located in North China demonstrate that the presented method manages to balance the interests between CIES and EVCS via the coordination of flexible demand response and uncertain renewables.
Abstract:In order to balance the interests of integrated energy operator (IEO) and users, a novel Stackelberg game-based optimization framework is proposed for the optimal scheduling of integrated demand response (IDR)-enabled integrated energy systems with uncertain renewable generations, where the IEO acts as the leader who pursues the maximization of his profits by setting energy prices, while the users are the follower who adjusts energy consumption plans to minimize their energy costs. Taking into account the inherent uncertainty of renewable generations, the probabilistic spinning reserve is written in the form of a chance constraint; in addition, a district heating network model is built considering the characteristics of time delay and thermal attenuation by fully exploiting its potential, and the flexible thermal comfort requirements of users in IDR are considered by introducing a predicted mean vote (PMV) index. To solve the raised model, sequence operation theory is introduced to convert the chance constraint into its deterministic equivalent form, and thereby, the leader-follower Stackelberg game is tackled into a mixed-integer quadratic programming formulation through Karush-Kuhn-Tucker optimality conditions and is finally solved by the CPLEX optimizer. The results of two case studies demonstrate that the proposed Stackelberg game-based approach manages to achieve the Stackelberg equilibrium between IEO and users by the coordination of renewable generations and IDR. Furthermore, the study on a real integrated energy system in China verifies the applicability of the proposed approach for real-world applications.
Abstract:Many complex processes can be viewed as dynamical systems of interacting agents. In many cases, only the state sequences of individual agents are observed, while the interacting relations and the dynamical rules are unknown. The neural relational inference (NRI) model adopts graph neural networks that pass messages over a latent graph to jointly learn the relations and the dynamics based on the observed data. However, NRI infers the relations independently and suffers from error accumulation in multi-step prediction at dynamics learning procedure. Besides, relation reconstruction without prior knowledge becomes more difficult in more complex systems. This paper introduces efficient message passing mechanisms to the graph neural networks with structural prior knowledge to address these problems. A relation interaction mechanism is proposed to capture the coexistence of all relations, and a spatio-temporal message passing mechanism is proposed to use historical information to alleviate error accumulation. Additionally, the structural prior knowledge, symmetry as a special case, is introduced for better relation prediction in more complex systems. The experimental results on simulated physics systems show that the proposed method outperforms existing state-of-the-art methods.
Abstract:Recently, flow-based methods have achieved promising success in video frame interpolation. However, electron microscopic (EM) images suffer from unstable image quality, low PSNR, and disorderly deformation. Existing flow-based interpolation methods cannot precisely compute optical flow for EM images since only predicting each position's unique offset. To overcome these problems, we propose a novel interpolation framework for EM images that progressively synthesizes interpolated features in a coarse-to-fine manner. First, we extract missing intermediate features by the proposed temporal spatial-adaptive (TSA) interpolation module. The TSA interpolation module aggregates temporal contexts and then adaptively samples the spatial-related features with the proposed residual spatial adaptive block. Second, we introduce a stacked deformable refinement block (SDRB) further enhance the reconstruction quality, which is aware of the matching positions and relevant features from input frames with the feedback mechanism. Experimental results demonstrate the superior performance of our approach compared to previous works, both quantitatively and qualitatively.
Abstract:The continuity of biological tissue between consecutive biomedical images makes it possible for the video interpolation algorithm, to recover large area defects and tears that are common in biomedical images. However, noise and blur differences, large deformation, and drift between biomedical images, make the task challenging. To address the problem, this paper introduces a deformation-aware network to synthesize each pixel in accordance with the continuity of biological tissue. First, we develop a deformation-aware layer for consecutive biomedical images interpolation that implicitly adopting global perceptual deformation. Second, we present an adaptive style-balance loss to take the style differences of consecutive biomedical images such as blur and noise into consideration. Guided by the deformation-aware module, we synthesize each pixel from a global domain adaptively which further improves the performance of pixel synthesis. Quantitative and qualitative experiments on the benchmark dataset show that the proposed method is superior to the state-of-the-art approaches.