Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yan Xu

NICE: CVPR 2023 Challenge on Zero-shot Image Captioning

Sep 11, 2023

Taehoon Kim, Pyunghwan Ahn, Sangyun Kim, Sihaeng Lee, Mark Marsden, Alessandra Sala, Seung Hwan Kim, Bohyung Han, Kyoung Mu Lee, Honglak Lee(+32 more)

Figure 1 for NICE: CVPR 2023 Challenge on Zero-shot Image Captioning

Figure 2 for NICE: CVPR 2023 Challenge on Zero-shot Image Captioning

Figure 3 for NICE: CVPR 2023 Challenge on Zero-shot Image Captioning

Figure 4 for NICE: CVPR 2023 Challenge on Zero-shot Image Captioning

Abstract:In this report, we introduce NICE (New frontiers for zero-shot Image Captioning Evaluation) project and share the results and outcomes of 2023 challenge. This project is designed to challenge the computer vision community to develop robust image captioning models that advance the state-of-the-art both in terms of accuracy and fairness. Through the challenge, the image captioning models were tested using a new evaluation dataset that includes a large variety of visual concepts from many domains. There was no specific training data provided for the challenge, and therefore the challenge entries were required to adapt to new types of image descriptions that had not been seen during training. This report includes information on the newly proposed NICE dataset, evaluation methods, challenge results, and technical details of top-ranking entries. We expect that the outcomes of the challenge will contribute to the improvement of AI models on various vision-language tasks.

* Tech report, project page https://nice.lgresearch.ai/

Via

Access Paper or Ask Questions

Physics-Informed Deep Learning to Reduce the Bias in Joint Prediction of Nitrogen Oxides

Aug 14, 2023

Lianfa Li, Roxana Khalili, Frederick Lurmann, Nathan Pavlovic, Jun Wu, Yan Xu, Yisi Liu, Karl O'Sharkey, Beate Ritz, Luke Oman(+5 more)

Figure 1 for Physics-Informed Deep Learning to Reduce the Bias in Joint Prediction of Nitrogen Oxides

Figure 2 for Physics-Informed Deep Learning to Reduce the Bias in Joint Prediction of Nitrogen Oxides

Figure 3 for Physics-Informed Deep Learning to Reduce the Bias in Joint Prediction of Nitrogen Oxides

Figure 4 for Physics-Informed Deep Learning to Reduce the Bias in Joint Prediction of Nitrogen Oxides

Abstract:Atmospheric nitrogen oxides (NOx) primarily from fuel combustion have recognized acute and chronic health and environmental effects. Machine learning (ML) methods have significantly enhanced our capacity to predict NOx concentrations at ground-level with high spatiotemporal resolution but may suffer from high estimation bias since they lack physical and chemical knowledge about air pollution dynamics. Chemical transport models (CTMs) leverage this knowledge; however, accurate predictions of ground-level concentrations typically necessitate extensive post-calibration. Here, we present a physics-informed deep learning framework that encodes advection-diffusion mechanisms and fluid dynamics constraints to jointly predict NO2 and NOx and reduce ML model bias by 21-42%. Our approach captures fine-scale transport of NO2 and NOx, generates robust spatial extrapolation, and provides explicit uncertainty estimation. The framework fuses knowledge-driven physicochemical principles of CTMs with the predictive power of ML for air quality exposure, health, and policy applications. Our approach offers significant improvements over purely data-driven ML methods and has unprecedented bias reduction in joint NO2 and NOx prediction.

Via

Access Paper or Ask Questions

Partitioned Saliency Ranking with Dense Pyramid Transformers

Aug 01, 2023

Chengxiao Sun, Yan Xu, Jialun Pei, Haopeng Fang, He Tang

Abstract:In recent years, saliency ranking has emerged as a challenging task focusing on assessing the degree of saliency at instance-level. Being subjective, even humans struggle to identify the precise order of all salient instances. Previous approaches undertake the saliency ranking by directly sorting the rank scores of salient instances, which have not explicitly resolved the inherent ambiguities. To overcome this limitation, we propose the ranking by partition paradigm, which segments unordered salient instances into partitions and then ranks them based on the correlations among these partitions. The ranking by partition paradigm alleviates ranking ambiguities in a general sense, as it consistently improves the performance of other saliency ranking models. Additionally, we introduce the Dense Pyramid Transformer (DPT) to enable global cross-scale interactions, which significantly enhances feature interactions with reduced computational burden. Extensive experiments demonstrate that our approach outperforms all existing methods. The code for our method is available at \url{https://github.com/ssecv/PSR}.

Via

Access Paper or Ask Questions

Urban Radiance Field Representation with Deformable Neural Mesh Primitives

Jul 20, 2023

Fan Lu, Yan Xu, Guang Chen, Hongsheng Li, Kwan-Yee Lin, Changjun Jiang

Figure 1 for Urban Radiance Field Representation with Deformable Neural Mesh Primitives

Figure 2 for Urban Radiance Field Representation with Deformable Neural Mesh Primitives

Figure 3 for Urban Radiance Field Representation with Deformable Neural Mesh Primitives

Figure 4 for Urban Radiance Field Representation with Deformable Neural Mesh Primitives

Abstract:Neural Radiance Fields (NeRFs) have achieved great success in the past few years. However, most current methods still require intensive resources due to ray marching-based rendering. To construct urban-level radiance fields efficiently, we design Deformable Neural Mesh Primitive~(DNMP), and propose to parameterize the entire scene with such primitives. The DNMP is a flexible and compact neural variant of classic mesh representation, which enjoys both the efficiency of rasterization-based rendering and the powerful neural representation capability for photo-realistic image synthesis. Specifically, a DNMP consists of a set of connected deformable mesh vertices with paired vertex features to parameterize the geometry and radiance information of a local area. To constrain the degree of freedom for optimization and lower the storage budgets, we enforce the shape of each primitive to be decoded from a relatively low-dimensional latent space. The rendering colors are decoded from the vertex features (interpolated with rasterization) by a view-dependent MLP. The DNMP provides a new paradigm for urban-level scene representation with appealing properties: $(1)$ High-quality rendering. Our method achieves leading performance for novel view synthesis in urban scenarios. $(2)$ Low computational costs. Our representation enables fast rendering (2.07ms/1k pixels) and low peak memory usage (110MB/1k pixels). We also present a lightweight version that can run 33$\times$ faster than vanilla NeRFs, and comparable to the highly-optimized Instant-NGP (0.61 vs 0.71ms/1k pixels). Project page: \href{https://dnmp.github.io/}{https://dnmp.github.io/}.

* Accepted to ICCV2023

Via

Access Paper or Ask Questions

DRMC: A Generalist Model with Dynamic Routing for Multi-Center PET Image Synthesis

Jul 11, 2023

Zhiwen Yang, Yang Zhou, Hui Zhang, Bingzheng Wei, Yubo Fan, Yan Xu

Abstract:Multi-center positron emission tomography (PET) image synthesis aims at recovering low-dose PET images from multiple different centers. The generalizability of existing methods can still be suboptimal for a multi-center study due to domain shifts, which result from non-identical data distribution among centers with different imaging systems/protocols. While some approaches address domain shifts by training specialized models for each center, they are parameter inefficient and do not well exploit the shared knowledge across centers. To address this, we develop a generalist model that shares architecture and parameters across centers to utilize the shared knowledge. However, the generalist model can suffer from the center interference issue, \textit{i.e.} the gradient directions of different centers can be inconsistent or even opposite owing to the non-identical data distribution. To mitigate such interference, we introduce a novel dynamic routing strategy with cross-layer connections that routes data from different centers to different experts. Experiments show that our generalist model with dynamic routing (DRMC) exhibits excellent generalizability across centers. Code and data are available at: https://github.com/Yaziwel/Multi-Center-PET-Image-Synthesis.

* This article has been early accepted by MICCAI 2023,but has not been fully edited. Content may change prior to final publication

Via

Access Paper or Ask Questions

Zero-shot Nuclei Detection via Visual-Language Pre-trained Models

Jun 30, 2023

Yongjian Wu, Yang Zhou, Jiya Saiyin, Bingzheng Wei, Maode Lai, Jianzhong Shou, Yubo Fan, Yan Xu

Figure 1 for Zero-shot Nuclei Detection via Visual-Language Pre-trained Models

Figure 2 for Zero-shot Nuclei Detection via Visual-Language Pre-trained Models

Figure 3 for Zero-shot Nuclei Detection via Visual-Language Pre-trained Models

Figure 4 for Zero-shot Nuclei Detection via Visual-Language Pre-trained Models

Abstract:Large-scale visual-language pre-trained models (VLPM) have proven their excellent performance in downstream object detection for natural scenes. However, zero-shot nuclei detection on H\&E images via VLPMs remains underexplored. The large gap between medical images and the web-originated text-image pairs used for pre-training makes it a challenging task. In this paper, we attempt to explore the potential of the object-level VLPM, Grounded Language-Image Pre-training (GLIP) model, for zero-shot nuclei detection. Concretely, an automatic prompts design pipeline is devised based on the association binding trait of VLPM and the image-to-text VLPM BLIP, avoiding empirical manual prompts engineering. We further establish a self-training framework, using the automatically designed prompts to generate the preliminary results as pseudo labels from GLIP and refine the predicted boxes in an iterative manner. Our method achieves a remarkable performance for label-free nuclei detection, surpassing other comparison methods. Foremost, our work demonstrates that the VLPM pre-trained on natural image-text pairs exhibits astonishing potential for downstream tasks in the medical field as well. Code will be released at https://github.com/wuyongjianCODE/VLPMNuD.

* This article has been accepted by MICCAI 2023,but has not been fully edited. Content may change prior to final publication

Via

Access Paper or Ask Questions

Elastically-Constrained Meta-Learner for Federated Learning

Jun 30, 2023

Peng Lan, Donglai Chen, Chong Xie, Keshu Chen, Jinyuan He, Juntao Zhang, Yonghong Chen, Yan Xu

Figure 1 for Elastically-Constrained Meta-Learner for Federated Learning

Figure 2 for Elastically-Constrained Meta-Learner for Federated Learning

Figure 3 for Elastically-Constrained Meta-Learner for Federated Learning

Figure 4 for Elastically-Constrained Meta-Learner for Federated Learning

Abstract:Federated learning is an approach to collaboratively training machine learning models for multiple parties that prohibit data sharing. One of the challenges in federated learning is non-IID data between clients, as a single model can not fit the data distribution for all clients. Meta-learning, such as Per-FedAvg, is introduced to cope with the challenge. Meta-learning learns shared initial parameters for all clients. Each client employs gradient descent to adapt the initialization to local data distributions quickly to realize model personalization. However, due to non-convex loss function and randomness of sampling update, meta-learning approaches have unstable goals in local adaptation for the same client. This fluctuation in different adaptation directions hinders the convergence in meta-learning. To overcome this challenge, we use the historical local adapted model to restrict the direction of the inner loop and propose an elastic-constrained method. As a result, the current round inner loop keeps historical goals and adapts to better solutions. Experiments show our method boosts meta-learning convergence and improves personalization without additional calculation and communication. Our method achieved SOTA on all metrics in three public datasets.

* FL-IJCAI'23

Via

Access Paper or Ask Questions

A Novel Dual-pooling Attention Module for UAV Vehicle Re-identification

Jun 25, 2023

Xiaoyan Guo, Jie Yang, Xinyu Jia, Chuanyan Zang, Yan Xu, Zhaoyang Chen

Abstract:Vehicle re-identification (Re-ID) involves identifying the same vehicle captured by other cameras, given a vehicle image. It plays a crucial role in the development of safe cities and smart cities. With the rapid growth and implementation of unmanned aerial vehicles (UAVs) technology, vehicle Re-ID in UAV aerial photography scenes has garnered significant attention from researchers. However, due to the high altitude of UAVs, the shooting angle of vehicle images sometimes approximates vertical, resulting in fewer local features for Re-ID. Therefore, this paper proposes a novel dual-pooling attention (DpA) module, which achieves the extraction and enhancement of locally important information about vehicles from both channel and spatial dimensions by constructing two branches of channel-pooling attention (CpA) and spatial-pooling attention (SpA), and employing multiple pooling operations to enhance the attention to fine-grained information of vehicles. Specifically, the CpA module operates between the channels of the feature map and splices features by combining four pooling operations so that vehicle regions containing discriminative information are given greater attention. The SpA module uses the same pooling operations strategy to identify discriminative representations and merge vehicle features in image regions in a weighted manner. The feature information of both dimensions is finally fused and trained jointly using label smoothing cross-entropy loss and hard mining triplet loss, thus solving the problem of missing detail information due to the high height of UAV shots. The proposed method's effectiveness is demonstrated through extensive experiments on the UAV-based vehicle datasets VeRi-UAV and VRU.

Via

Access Paper or Ask Questions

Cyclic Learning: Bridging Image-level Labels and Nuclei Instance Segmentation

Jun 05, 2023

Yang Zhou, Yongjian Wu, Zihua Wang, Bingzheng Wei, Maode Lai, Jianzhong Shou, Yubo Fan, Yan Xu

Figure 1 for Cyclic Learning: Bridging Image-level Labels and Nuclei Instance Segmentation

Figure 2 for Cyclic Learning: Bridging Image-level Labels and Nuclei Instance Segmentation

Figure 3 for Cyclic Learning: Bridging Image-level Labels and Nuclei Instance Segmentation

Figure 4 for Cyclic Learning: Bridging Image-level Labels and Nuclei Instance Segmentation

Abstract:Nuclei instance segmentation on histopathology images is of great clinical value for disease analysis. Generally, fully-supervised algorithms for this task require pixel-wise manual annotations, which is especially time-consuming and laborious for the high nuclei density. To alleviate the annotation burden, we seek to solve the problem through image-level weakly supervised learning, which is underexplored for nuclei instance segmentation. Compared with most existing methods using other weak annotations (scribble, point, etc.) for nuclei instance segmentation, our method is more labor-saving. The obstacle to using image-level annotations in nuclei instance segmentation is the lack of adequate location information, leading to severe nuclei omission or overlaps. In this paper, we propose a novel image-level weakly supervised method, called cyclic learning, to solve this problem. Cyclic learning comprises a front-end classification task and a back-end semi-supervised instance segmentation task to benefit from multi-task learning (MTL). We utilize a deep learning classifier with interpretability as the front-end to convert image-level labels to sets of high-confidence pseudo masks and establish a semi-supervised architecture as the back-end to conduct nuclei instance segmentation under the supervision of these pseudo masks. Most importantly, cyclic learning is designed to circularly share knowledge between the front-end classifier and the back-end semi-supervised part, which allows the whole system to fully extract the underlying information from image-level labels and converge to a better optimum. Experiments on three datasets demonstrate the good generality of our method, which outperforms other image-level weakly supervised methods for nuclei instance segmentation, and achieves comparable performance to fully-supervised methods.

* This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI https://doi.org/10.1109/TMI.2023.3275609, IEEE Transactions on Medical Imaging. Code: https://github.com/wuyongjianCODE/Cyclic

Via

Access Paper or Ask Questions

Diverse and Faithful Knowledge-Grounded Dialogue Generation via Sequential Posterior Inference

Jun 01, 2023

Yan Xu, Deqian Kong, Dehong Xu, Ziwei Ji, Bo Pang, Pascale Fung, Ying Nian Wu

Figure 1 for Diverse and Faithful Knowledge-Grounded Dialogue Generation via Sequential Posterior Inference

Figure 2 for Diverse and Faithful Knowledge-Grounded Dialogue Generation via Sequential Posterior Inference

Figure 3 for Diverse and Faithful Knowledge-Grounded Dialogue Generation via Sequential Posterior Inference

Figure 4 for Diverse and Faithful Knowledge-Grounded Dialogue Generation via Sequential Posterior Inference

Abstract:The capability to generate responses with diversity and faithfulness using factual knowledge is paramount for creating a human-like, trustworthy dialogue system. Common strategies either adopt a two-step paradigm, which optimizes knowledge selection and response generation separately, and may overlook the inherent correlation between these two tasks, or leverage conditional variational method to jointly optimize knowledge selection and response generation by employing an inference network. In this paper, we present an end-to-end learning framework, termed Sequential Posterior Inference (SPI), capable of selecting knowledge and generating dialogues by approximately sampling from the posterior distribution. Unlike other methods, SPI does not require the inference network or assume a simple geometry of the posterior distribution. This straightforward and intuitive inference procedure of SPI directly queries the response generation model, allowing for accurate knowledge selection and generation of faithful responses. In addition to modeling contributions, our experimental results on two common dialogue datasets (Wizard of Wikipedia and Holl-E) demonstrate that SPI outperforms previous strong baselines according to both automatic and human evaluation metrics.

Via

Access Paper or Ask Questions