Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yang Zhou

Fast algorithms for k-submodular maximization subject to a matroid constraint

Jul 26, 2023
Shuxian Niu, Qian Liu, Yang Zhou, Min Li

In this paper, we apply a Threshold-Decreasing Algorithm to maximize $k$-submodular functions under a matroid constraint, which reduces the query complexity of the algorithm compared to the greedy algorithm with little loss in approximation ratio. We give a $(\frac{1}{2} - \epsilon)$-approximation algorithm for monotone $k$-submodular function maximization, and a $(\frac{1}{3} - \epsilon)$-approximation algorithm for non-monotone case, with complexity $O(\frac{n(k\cdot EO + IO)}{\epsilon} \log \frac{r}{\epsilon})$, where $r$ denotes the rank of the matroid, and $IO, EO$ denote the number of oracles to evaluate whether a subset is an independent set and to compute the function value of $f$, respectively. Since the constraint of total size can be looked as a special matroid, called uniform matroid, then we present the fast algorithm for maximizing $k$-submodular functions subject to a total size constraint as corollaries. corollaries.

Via

Access Paper or Ask Questions

Learning Navigational Visual Representations with Semantic Map Supervision

Jul 23, 2023
Yicong Hong, Yang Zhou, Ruiyi Zhang, Franck Dernoncourt, Trung Bui, Stephen Gould, Hao Tan

Figure 1 for Learning Navigational Visual Representations with Semantic Map Supervision

Figure 2 for Learning Navigational Visual Representations with Semantic Map Supervision

Figure 3 for Learning Navigational Visual Representations with Semantic Map Supervision

Figure 4 for Learning Navigational Visual Representations with Semantic Map Supervision

Being able to perceive the semantics and the spatial structure of the environment is essential for visual navigation of a household robot. However, most existing works only employ visual backbones pre-trained either with independent images for classification or with self-supervised learning methods to adapt to the indoor navigation domain, neglecting the spatial relationships that are essential to the learning of navigation. Inspired by the behavior that humans naturally build semantically and spatially meaningful cognitive maps in their brains during navigation, in this paper, we propose a novel navigational-specific visual representation learning method by contrasting the agent's egocentric views and semantic maps (Ego$^2$-Map). We apply the visual transformer as the backbone encoder and train the model with data collected from the large-scale Habitat-Matterport3D environments. Ego$^2$-Map learning transfers the compact and rich information from a map, such as objects, structure and transition, to the agent's egocentric representations for navigation. Experiments show that agents using our learned representations on object-goal navigation outperform recent visual pre-training methods. Moreover, our representations significantly improve vision-and-language navigation in continuous environments for both high-level and low-level action spaces, achieving new state-of-the-art results of 47% SR and 41% SPL on the test server.

Via

Access Paper or Ask Questions

DRMC: A Generalist Model with Dynamic Routing for Multi-Center PET Image Synthesis

Jul 11, 2023
Zhiwen Yang, Yang Zhou, Hui Zhang, Bingzheng Wei, Yubo Fan, Yan Xu

Figure 1 for DRMC: A Generalist Model with Dynamic Routing for Multi-Center PET Image Synthesis

Figure 2 for DRMC: A Generalist Model with Dynamic Routing for Multi-Center PET Image Synthesis

Figure 3 for DRMC: A Generalist Model with Dynamic Routing for Multi-Center PET Image Synthesis

Figure 4 for DRMC: A Generalist Model with Dynamic Routing for Multi-Center PET Image Synthesis

Multi-center positron emission tomography (PET) image synthesis aims at recovering low-dose PET images from multiple different centers. The generalizability of existing methods can still be suboptimal for a multi-center study due to domain shifts, which result from non-identical data distribution among centers with different imaging systems/protocols. While some approaches address domain shifts by training specialized models for each center, they are parameter inefficient and do not well exploit the shared knowledge across centers. To address this, we develop a generalist model that shares architecture and parameters across centers to utilize the shared knowledge. However, the generalist model can suffer from the center interference issue, \textit{i.e.} the gradient directions of different centers can be inconsistent or even opposite owing to the non-identical data distribution. To mitigate such interference, we introduce a novel dynamic routing strategy with cross-layer connections that routes data from different centers to different experts. Experiments show that our generalist model with dynamic routing (DRMC) exhibits excellent generalizability across centers. Code and data are available at: https://github.com/Yaziwel/Multi-Center-PET-Image-Synthesis.

* This article has been early accepted by MICCAI 2023,but has not been fully edited. Content may change prior to final publication

Via

Access Paper or Ask Questions

Efficient Visual Fault Detection for Freight Train Braking System via Heterogeneous Self Distillation in the Wild

Jul 03, 2023
Yang Zhang, Huilin Pan, Yang Zhou, Mingying Li, Guodong Sun

Figure 1 for Efficient Visual Fault Detection for Freight Train Braking System via Heterogeneous Self Distillation in the Wild

Figure 2 for Efficient Visual Fault Detection for Freight Train Braking System via Heterogeneous Self Distillation in the Wild

Figure 3 for Efficient Visual Fault Detection for Freight Train Braking System via Heterogeneous Self Distillation in the Wild

Figure 4 for Efficient Visual Fault Detection for Freight Train Braking System via Heterogeneous Self Distillation in the Wild

Efficient visual fault detection of freight trains is a critical part of ensuring the safe operation of railways under the restricted hardware environment. Although deep learning-based approaches have excelled in object detection, the efficiency of freight train fault detection is still insufficient to apply in real-world engineering. This paper proposes a heterogeneous self-distillation framework to ensure detection accuracy and speed while satisfying low resource requirements. The privileged information in the output feature knowledge can be transferred from the teacher to the student model through distillation to boost performance. We first adopt a lightweight backbone to extract features and generate a new heterogeneous knowledge neck. Such neck models positional information and long-range dependencies among channels through parallel encoding to optimize feature extraction capabilities. Then, we utilize the general distribution to obtain more credible and accurate bounding box estimates. Finally, we employ a novel loss function that makes the network easily concentrate on values near the label to improve learning efficiency. Experiments on four fault datasets reveal that our framework can achieve over 37 frames per second and maintain the highest accuracy in comparison with traditional distillation approaches. Moreover, compared to state-of-the-art methods, our framework demonstrates more competitive performance with lower memory usage and the smallest model size.

* 12 pages, 9 figures

Via

Access Paper or Ask Questions

Zero-shot Nuclei Detection via Visual-Language Pre-trained Models

Jun 30, 2023
Yongjian Wu, Yang Zhou, Jiya Saiyin, Bingzheng Wei, Maode Lai, Jianzhong Shou, Yubo Fan, Yan Xu

Figure 1 for Zero-shot Nuclei Detection via Visual-Language Pre-trained Models

Figure 2 for Zero-shot Nuclei Detection via Visual-Language Pre-trained Models

Figure 3 for Zero-shot Nuclei Detection via Visual-Language Pre-trained Models

Figure 4 for Zero-shot Nuclei Detection via Visual-Language Pre-trained Models

Large-scale visual-language pre-trained models (VLPM) have proven their excellent performance in downstream object detection for natural scenes. However, zero-shot nuclei detection on H\&E images via VLPMs remains underexplored. The large gap between medical images and the web-originated text-image pairs used for pre-training makes it a challenging task. In this paper, we attempt to explore the potential of the object-level VLPM, Grounded Language-Image Pre-training (GLIP) model, for zero-shot nuclei detection. Concretely, an automatic prompts design pipeline is devised based on the association binding trait of VLPM and the image-to-text VLPM BLIP, avoiding empirical manual prompts engineering. We further establish a self-training framework, using the automatically designed prompts to generate the preliminary results as pseudo labels from GLIP and refine the predicted boxes in an iterative manner. Our method achieves a remarkable performance for label-free nuclei detection, surpassing other comparison methods. Foremost, our work demonstrates that the VLPM pre-trained on natural image-text pairs exhibits astonishing potential for downstream tasks in the medical field as well. Code will be released at https://github.com/wuyongjianCODE/VLPMNuD.

* This article has been accepted by MICCAI 2023,but has not been fully edited. Content may change prior to final publication

Via

Access Paper or Ask Questions

On building machine learning pipelines for Android malware detection: a procedural survey of practices, challenges and opportunities

Jun 12, 2023
Masoud Mehrabi Koushki, Ibrahim AbuAlhaol, Anandharaju Durai Raju, Yang Zhou, Ronnie Salvador Giagone, Huang Shengqiang

As the smartphone market leader, Android has been a prominent target for malware attacks. The number of malicious applications (apps) identified for it has increased continually over the past decade, creating an immense challenge for all parties involved. For market holders and researchers, in particular, the large number of samples has made manual malware detection unfeasible, leading to an influx of research that investigate Machine Learning (ML) approaches to automate this process. However, while some of the proposed approaches achieve high performance, rapidly evolving Android malware has made them unable to maintain their accuracy over time. This has created a need in the community to conduct further research, and build more flexible ML pipelines. Doing so, however, is currently hindered by a lack of systematic overview of the existing literature, to learn from and improve upon the existing solutions. Existing survey papers often focus only on parts of the ML process (e.g., data collection or model deployment), while omitting other important stages, such as model evaluation and explanation. In this paper, we address this problem with a review of 42 highly-cited papers, spanning a decade of research (from 2011 to 2021). We introduce a novel procedural taxonomy of the published literature, covering how they have used ML algorithms, what features they have engineered, which dimensionality reduction techniques they have employed, what datasets they have employed for training, and what their evaluation and explanation strategies are. Drawing from this taxonomy, we also identify gaps in knowledge and provide ideas for improvement and future work.

* SpringerOpen Cybersecurity 2022
* file:///C:/Users/ibrahim_abualhaol/Downloads/s42400-022-00119-8.pdf

Via

Access Paper or Ask Questions

A Survey on Cross-Architectural IoT Malware Threat Hunting

Jun 09, 2023
Anandharaju Durai Raju, Ibrahim Abualhaol, Ronnie Salvador Giagone, Yang Zhou, Shengqiang Huang

Figure 1 for A Survey on Cross-Architectural IoT Malware Threat Hunting

Figure 2 for A Survey on Cross-Architectural IoT Malware Threat Hunting

Figure 3 for A Survey on Cross-Architectural IoT Malware Threat Hunting

Figure 4 for A Survey on Cross-Architectural IoT Malware Threat Hunting

In recent years, the increase in non-Windows malware threats had turned the focus of the cybersecurity community. Research works on hunting Windows PE-based malwares are maturing, whereas the developments on Linux malware threat hunting are relatively scarce. With the advent of the Internet of Things (IoT) era, smart devices that are getting integrated into human life have become a hackers highway for their malicious activities. The IoT devices employ various Unix-based architectures that follow ELF (Executable and Linkable Format) as their standard binary file specification. This study aims at providing a comprehensive survey on the latest developments in cross-architectural IoT malware detection and classification approaches. Aided by a modern taxonomy, we discuss the feature representations, feature extraction techniques, and machine learning models employed in the surveyed works. We further provide more insights on the practical challenges involved in cross-architectural IoT malware threat hunting and discuss various avenues to instill potential future research.

* IEEE Access 2021
* https://ieeexplore.ieee.org/abstract/document/9462110

Via

Access Paper or Ask Questions

Cyclic Learning: Bridging Image-level Labels and Nuclei Instance Segmentation

Jun 05, 2023
Yang Zhou, Yongjian Wu, Zihua Wang, Bingzheng Wei, Maode Lai, Jianzhong Shou, Yubo Fan, Yan Xu

Figure 1 for Cyclic Learning: Bridging Image-level Labels and Nuclei Instance Segmentation

Figure 2 for Cyclic Learning: Bridging Image-level Labels and Nuclei Instance Segmentation

Figure 3 for Cyclic Learning: Bridging Image-level Labels and Nuclei Instance Segmentation

Figure 4 for Cyclic Learning: Bridging Image-level Labels and Nuclei Instance Segmentation

Nuclei instance segmentation on histopathology images is of great clinical value for disease analysis. Generally, fully-supervised algorithms for this task require pixel-wise manual annotations, which is especially time-consuming and laborious for the high nuclei density. To alleviate the annotation burden, we seek to solve the problem through image-level weakly supervised learning, which is underexplored for nuclei instance segmentation. Compared with most existing methods using other weak annotations (scribble, point, etc.) for nuclei instance segmentation, our method is more labor-saving. The obstacle to using image-level annotations in nuclei instance segmentation is the lack of adequate location information, leading to severe nuclei omission or overlaps. In this paper, we propose a novel image-level weakly supervised method, called cyclic learning, to solve this problem. Cyclic learning comprises a front-end classification task and a back-end semi-supervised instance segmentation task to benefit from multi-task learning (MTL). We utilize a deep learning classifier with interpretability as the front-end to convert image-level labels to sets of high-confidence pseudo masks and establish a semi-supervised architecture as the back-end to conduct nuclei instance segmentation under the supervision of these pseudo masks. Most importantly, cyclic learning is designed to circularly share knowledge between the front-end classifier and the back-end semi-supervised part, which allows the whole system to fully extract the underlying information from image-level labels and converge to a better optimum. Experiments on three datasets demonstrate the good generality of our method, which outperforms other image-level weakly supervised methods for nuclei instance segmentation, and achieves comparable performance to fully-supervised methods.

* This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI https://doi.org/10.1109/TMI.2023.3275609, IEEE Transactions on Medical Imaging. Code: https://github.com/wuyongjianCODE/Cyclic

Via

Access Paper or Ask Questions

Single-View View Synthesis with Self-Rectified Pseudo-Stereo

Apr 20, 2023
Yang Zhou, Hanjie Wu, Wenxi Liu, Zheng Xiong, Jing Qin, Shengfeng He

Figure 1 for Single-View View Synthesis with Self-Rectified Pseudo-Stereo

Figure 2 for Single-View View Synthesis with Self-Rectified Pseudo-Stereo

Figure 3 for Single-View View Synthesis with Self-Rectified Pseudo-Stereo

Figure 4 for Single-View View Synthesis with Self-Rectified Pseudo-Stereo

Synthesizing novel views from a single view image is a highly ill-posed problem. We discover an effective solution to reduce the learning ambiguity by expanding the single-view view synthesis problem to a multi-view setting. Specifically, we leverage the reliable and explicit stereo prior to generate a pseudo-stereo viewpoint, which serves as an auxiliary input to construct the 3D space. In this way, the challenging novel view synthesis process is decoupled into two simpler problems of stereo synthesis and 3D reconstruction. In order to synthesize a structurally correct and detail-preserved stereo image, we propose a self-rectified stereo synthesis to amend erroneous regions in an identify-rectify manner. Hard-to-train and incorrect warping samples are first discovered by two strategies, 1) pruning the network to reveal low-confident predictions; and 2) bidirectionally matching between stereo images to allow the discovery of improper mapping. These regions are then inpainted to form the final pseudo-stereo. With the aid of this extra input, a preferable 3D reconstruction can be easily obtained, and our method can work with arbitrary 3D representations. Extensive experiments show that our method outperforms state-of-the-art single-view view synthesis methods and stereo synthesis methods.

Via

Access Paper or Ask Questions

PENet: A Joint Panoptic Edge Detection Network

Mar 15, 2023
Yang Zhou, Giuseppe Loianno

Figure 1 for PENet: A Joint Panoptic Edge Detection Network

Figure 2 for PENet: A Joint Panoptic Edge Detection Network

Figure 3 for PENet: A Joint Panoptic Edge Detection Network

Figure 4 for PENet: A Joint Panoptic Edge Detection Network

In recent years, compact and efficient scene understanding representations have gained popularity in increasing situational awareness and autonomy of robotic systems. In this work, we illustrate the concept of a panoptic edge segmentation and propose PENet, a novel detection network called that combines semantic edge detection and instance-level perception into a compact panoptic edge representation. This is obtained through a joint network by multi-task learning that concurrently predicts semantic edges, instance centers and offset flow map without bounding box predictions exploiting the cross-task correlations among the tasks. The proposed approach allows extending semantic edge detection to panoptic edge detection which encapsulates both category-aware and instance-aware segmentation. We validate the proposed panoptic edge segmentation method and demonstrate its effectiveness on the real-world Cityscapes dataset.

* 7 pages, 5 figures, submitted to the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2023)

Via

Access Paper or Ask Questions