Alert button
Picture for Zhen Bai

Zhen Bai

Alert button

Salient Object Detection in Optical Remote Sensing Images Driven by Transformer

Sep 15, 2023
Gongyang Li, Zhen Bai, Zhi Liu, Xinpeng Zhang, Haibin Ling

Figure 1 for Salient Object Detection in Optical Remote Sensing Images Driven by Transformer
Figure 2 for Salient Object Detection in Optical Remote Sensing Images Driven by Transformer
Figure 3 for Salient Object Detection in Optical Remote Sensing Images Driven by Transformer
Figure 4 for Salient Object Detection in Optical Remote Sensing Images Driven by Transformer

Existing methods for Salient Object Detection in Optical Remote Sensing Images (ORSI-SOD) mainly adopt Convolutional Neural Networks (CNNs) as the backbone, such as VGG and ResNet. Since CNNs can only extract features within certain receptive fields, most ORSI-SOD methods generally follow the local-to-contextual paradigm. In this paper, we propose a novel Global Extraction Local Exploration Network (GeleNet) for ORSI-SOD following the global-to-local paradigm. Specifically, GeleNet first adopts a transformer backbone to generate four-level feature embeddings with global long-range dependencies. Then, GeleNet employs a Direction-aware Shuffle Weighted Spatial Attention Module (D-SWSAM) and its simplified version (SWSAM) to enhance local interactions, and a Knowledge Transfer Module (KTM) to further enhance cross-level contextual interactions. D-SWSAM comprehensively perceives the orientation information in the lowest-level features through directional convolutions to adapt to various orientations of salient objects in ORSIs, and effectively enhances the details of salient objects with an improved attention mechanism. SWSAM discards the direction-aware part of D-SWSAM to focus on localizing salient objects in the highest-level features. KTM models the contextual correlation knowledge of two middle-level features of different scales based on the self-attention mechanism, and transfers the knowledge to the raw features to generate more discriminative features. Finally, a saliency predictor is used to generate the saliency map based on the outputs of the above three modules. Extensive experiments on three public datasets demonstrate that the proposed GeleNet outperforms relevant state-of-the-art methods. The code and results of our method are available at https://github.com/MathLee/GeleNet.

* 13 pages, 6 figures, Accepted by IEEE Transactions on Image Processing 2023 
Viaarxiv icon

Participatory Design of AI with Children: Reflections on IDC Design Challenge

Apr 18, 2023
Zhen Bai, Frances Judd, Naomi Polinsky, Elmira Yadollahi

Figure 1 for Participatory Design of AI with Children: Reflections on IDC Design Challenge

Children growing up in the era of Artificial Intelligence (AI) will be most impacted by the technology across their life span. Participatory Design (PD) is widely adopted by the Interaction Design and Children (IDC) community, which empowers children to bring their interests, needs, and creativity to the design process of future technologies. While PD has drawn increasing attention to human-centered AI design, it remains largely untapped in facilitating the design process of AI technologies relevant to children and their community. In this paper, we report intriguing children's design ideas on AI technologies resulting from the "Research and Design Challenge" of the 22nd ACM Interaction Design and Children (IDC 2023) conference. The diversity of design problems, AI applications and capabilities revealed by the children's design ideas shed light on the potential of engaging children in PD activities for future AI technologies. We discuss opportunities and challenges for accessible and inclusive PD experiences with children in shaping the future of AI-powered society.

Viaarxiv icon

A Novel Multimodal Approach for Studying the Dynamics of Curiosity in Small Group Learning

Apr 01, 2022
Tanmay Sinha, Zhen Bai, Justine Cassell

Figure 1 for A Novel Multimodal Approach for Studying the Dynamics of Curiosity in Small Group Learning
Figure 2 for A Novel Multimodal Approach for Studying the Dynamics of Curiosity in Small Group Learning
Figure 3 for A Novel Multimodal Approach for Studying the Dynamics of Curiosity in Small Group Learning
Figure 4 for A Novel Multimodal Approach for Studying the Dynamics of Curiosity in Small Group Learning

Curiosity is a vital metacognitive skill in educational contexts, leading to creativity, and a love of learning. And while many school systems increasingly undercut curiosity by teaching to the test, teachers are increasingly interested in how to evoke curiosity in their students to prepare them for a world in which lifelong learning and reskilling will be more and more important. One aspect of curiosity that has received little attention, however, is the role of peers in eliciting curiosity. We present what we believe to be the first theoretical framework that articulates an integrated socio-cognitive account of curiosity that ties observable behaviors in peers to underlying curiosity states. We make a bipartite distinction between individual and interpersonal functions that contribute to curiosity, and multimodal behaviors that fulfill these functions. We validate the proposed framework by leveraging a longitudinal latent variable modeling approach. Findings confirm a positive predictive relationship between the latent variables of individual and interpersonal functions and curiosity, with the interpersonal functions exercising a comparatively stronger influence. Prominent behavioral realizations of these functions are also discovered in a data-driven manner. We instantiate the proposed theoretical framework in a set of strategies and tactics that can be incorporated into learning technologies to indicate, evoke, and scaffold curiosity. This work is a step towards designing learning technologies that can recognize and evoke moment-by-moment curiosity during learning in social contexts and towards a more complete multimodal learning analytics. The underlying rationale is applicable more generally for developing computer support for other metacognitive and socio-emotional skills.

* arXiv admin note: text overlap with arXiv:1704.07480 
Viaarxiv icon

Lightweight Salient Object Detection in Optical Remote Sensing Images via Feature Correlation

Jan 20, 2022
Gongyang Li, Zhi Liu, Zhen Bai, Weisi Lin, and Haibin Ling

Figure 1 for Lightweight Salient Object Detection in Optical Remote Sensing Images via Feature Correlation
Figure 2 for Lightweight Salient Object Detection in Optical Remote Sensing Images via Feature Correlation
Figure 3 for Lightweight Salient Object Detection in Optical Remote Sensing Images via Feature Correlation
Figure 4 for Lightweight Salient Object Detection in Optical Remote Sensing Images via Feature Correlation

Salient object detection in optical remote sensing images (ORSI-SOD) has been widely explored for understanding ORSIs. However, previous methods focus mainly on improving the detection accuracy while neglecting the cost in memory and computation, which may hinder their real-world applications. In this paper, we propose a novel lightweight ORSI-SOD solution, named CorrNet, to address these issues. In CorrNet, we first lighten the backbone (VGG-16) and build a lightweight subnet for feature extraction. Then, following the coarse-to-fine strategy, we generate an initial coarse saliency map from high-level semantic features in a Correlation Module (CorrM). The coarse saliency map serves as the location guidance for low-level features. In CorrM, we mine the object location information between high-level semantic features through the cross-layer correlation operation. Finally, based on low-level detailed features, we refine the coarse saliency map in the refinement subnet equipped with Dense Lightweight Refinement Blocks, and produce the final fine saliency map. By reducing the parameters and computations of each component, CorrNet ends up having only 4.09M parameters and running with 21.09G FLOPs. Experimental results on two public datasets demonstrate that our lightweight CorrNet achieves competitive or even better performance compared with 26 state-of-the-art methods (including 16 large CNN-based methods and 2 lightweight methods), and meanwhile enjoys the clear memory and run time efficiency. The code and results of our method are available at https://github.com/MathLee/CorrNet.

* 11 pages, 6 figures, Accepted by IEEE Transactions on Geoscience and Remote Sensing 2022 
Viaarxiv icon