Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xin Yu

SoundAI Technology Co., Ltd

Snap and Diagnose: An Advanced Multimodal Retrieval System for Identifying Plant Diseases in the Wild

Aug 27, 2024

Tianqi Wei, Zhi Chen, Xin Yu

Figure 1 for Snap and Diagnose: An Advanced Multimodal Retrieval System for Identifying Plant Diseases in the Wild

Figure 2 for Snap and Diagnose: An Advanced Multimodal Retrieval System for Identifying Plant Diseases in the Wild

Figure 3 for Snap and Diagnose: An Advanced Multimodal Retrieval System for Identifying Plant Diseases in the Wild

Abstract:Plant disease recognition is a critical task that ensures crop health and mitigates the damage caused by diseases. A handy tool that enables farmers to receive a diagnosis based on query pictures or the text description of suspicious plants is in high demand for initiating treatment before potential diseases spread further. In this paper, we develop a multimodal plant disease image retrieval system to support disease search based on either image or text prompts. Specifically, we utilize the largest in-the-wild plant disease dataset PlantWild, which includes over 18,000 images across 89 categories, to provide a comprehensive view of potential diseases relating to the query. Furthermore, cross-modal retrieval is achieved in the developed system, facilitated by a novel CLIP-based vision-language model that encodes both disease descriptions and disease images into the same latent space. Built on top of the retriever, our retrieval system allows users to upload either plant disease images or disease descriptions to retrieve the corresponding images with similar characteristics from the disease dataset to suggest candidate diseases for end users' consideration.

Via

Access Paper or Ask Questions

Benchmarking In-the-wild Multimodal Disease Recognition and A Versatile Baseline

Aug 06, 2024

Tianqi Wei, Zhi Chen, Zi Huang, Xin Yu

Abstract:Existing plant disease classification models have achieved remarkable performance in recognizing in-laboratory diseased images. However, their performance often significantly degrades in classifying in-the-wild images. Furthermore, we observed that in-the-wild plant images may exhibit similar appearances across various diseases (i.e., small inter-class discrepancy) while the same diseases may look quite different (i.e., large intra-class variance). Motivated by this observation, we propose an in-the-wild multimodal plant disease recognition dataset that contains the largest number of disease classes but also text-based descriptions for each disease. Particularly, the newly provided text descriptions are introduced to provide rich information in textual modality and facilitate in-the-wild disease classification with small inter-class discrepancy and large intra-class variance issues. Therefore, our proposed dataset can be regarded as an ideal testbed for evaluating disease recognition methods in the real world. In addition, we further present a strong yet versatile baseline that models text descriptions and visual data through multiple prototypes for a given class. By fusing the contributions of multimodal prototypes in classification, our baseline can effectively address the small inter-class discrepancy and large intra-class variance issues. Remarkably, our baseline model can not only classify diseases but also recognize diseases in few-shot or training-free scenarios. Extensive benchmarking results demonstrate that our proposed in-the-wild multimodal dataset sets many new challenges to the plant disease recognition task and there is a large space to improve for future works.

Via

Access Paper or Ask Questions

Affective Behaviour Analysis via Progressive Learning

Jul 26, 2024

Chen Liu, Wei Zhang, Feng Qiu, Lincheng Li, Xin Yu

Figure 1 for Affective Behaviour Analysis via Progressive Learning

Figure 2 for Affective Behaviour Analysis via Progressive Learning

Figure 3 for Affective Behaviour Analysis via Progressive Learning

Figure 4 for Affective Behaviour Analysis via Progressive Learning

Abstract:Affective Behavior Analysis aims to develop emotionally intelligent technology that can recognize and respond to human emotions. To advance this, the 7th Affective Behavior Analysis in-the-wild (ABAW) competition establishes two tracks: i.e., the Multi-task Learning (MTL) Challenge and the Compound Expression (CE) challenge based on Aff-Wild2 and C-EXPR-DB datasets. In this paper, we present our methods and experimental results for the two competition tracks. Specifically, it can be summarized in the following four aspects: 1) To attain high-quality facial features, we train a Masked-Auto Encoder in a self-supervised manner. 2) We devise a temporal convergence module to capture the temporal information between video frames and explore the impact of window size and sequence length on each sub-task. 3) To facilitate the joint optimization of various sub-tasks, we explore the impact of sub-task joint training and feature fusion from individual tasks on each task performance improvement. 4) We utilize curriculum learning to transition the model from recognizing single expressions to recognizing compound expressions, thereby improving the accuracy of compound expression recognition. Extensive experiments demonstrate the superiority of our designs.

* Techical Report for 7th ABAW Competition

Via

Access Paper or Ask Questions

DreamCar: Leveraging Car-specific Prior for in-the-wild 3D Car Reconstruction

Jul 24, 2024

Xiaobiao Du, Haiyang Sun, Ming Lu, Tianqing Zhu, Xin Yu

Figure 1 for DreamCar: Leveraging Car-specific Prior for in-the-wild 3D Car Reconstruction

Figure 2 for DreamCar: Leveraging Car-specific Prior for in-the-wild 3D Car Reconstruction

Figure 3 for DreamCar: Leveraging Car-specific Prior for in-the-wild 3D Car Reconstruction

Figure 4 for DreamCar: Leveraging Car-specific Prior for in-the-wild 3D Car Reconstruction

Abstract:Self-driving industries usually employ professional artists to build exquisite 3D cars. However, it is expensive to craft large-scale digital assets. Since there are already numerous datasets available that contain a vast number of images of cars, we focus on reconstructing high-quality 3D car models from these datasets. However, these datasets only contain one side of cars in the forward-moving scene. We try to use the existing generative models to provide more supervision information, but they struggle to generalize well in cars since they are trained on synthetic datasets not car-specific. In addition, The reconstructed 3D car texture misaligns due to a large error in camera pose estimation when dealing with in-the-wild images. These restrictions make it challenging for previous methods to reconstruct complete 3D cars. To address these problems, we propose a novel method, named DreamCar, which can reconstruct high-quality 3D cars given a few images even a single image. To generalize the generative model, we collect a car dataset, named Car360, with over 5,600 vehicles. With this dataset, we make the generative model more robust to cars. We use this generative prior specific to the car to guide its reconstruction via Score Distillation Sampling. To further complement the supervision information, we utilize the geometric and appearance symmetry of cars. Finally, we propose a pose optimization method that rectifies poses to tackle texture misalignment. Extensive experiments demonstrate that our method significantly outperforms existing methods in reconstructing high-quality 3D cars. \href{https://xiaobiaodu.github.io/dreamcar-project/}{Our code is available.}

* Projet Page: https://xiaobiaodu.github.io/dreamcar-project/

Via

Access Paper or Ask Questions

Hierarchical Consensus-Based Multi-Agent Reinforcement Learning for Multi-Robot Cooperation Tasks

Jul 11, 2024

Pu Feng, Junkang Liang, Size Wang, Xin Yu, Rongye Shi, Wenjun Wu

Figure 1 for Hierarchical Consensus-Based Multi-Agent Reinforcement Learning for Multi-Robot Cooperation Tasks

Figure 2 for Hierarchical Consensus-Based Multi-Agent Reinforcement Learning for Multi-Robot Cooperation Tasks

Figure 3 for Hierarchical Consensus-Based Multi-Agent Reinforcement Learning for Multi-Robot Cooperation Tasks

Figure 4 for Hierarchical Consensus-Based Multi-Agent Reinforcement Learning for Multi-Robot Cooperation Tasks

Abstract:In multi-agent reinforcement learning (MARL), the Centralized Training with Decentralized Execution (CTDE) framework is pivotal but struggles due to a gap: global state guidance in training versus reliance on local observations in execution, lacking global signals. Inspired by human societal consensus mechanisms, we introduce the Hierarchical Consensus-based Multi-Agent Reinforcement Learning (HC-MARL) framework to address this limitation. HC-MARL employs contrastive learning to foster a global consensus among agents, enabling cooperative behavior without direct communication. This approach enables agents to form a global consensus from local observations, using it as an additional piece of information to guide collaborative actions during execution. To cater to the dynamic requirements of various tasks, consensus is divided into multiple layers, encompassing both short-term and long-term considerations. Short-term observations prompt the creation of an immediate, low-layer consensus, while long-term observations contribute to the formation of a strategic, high-layer consensus. This process is further refined through an adaptive attention mechanism that dynamically adjusts the influence of each consensus layer. This mechanism optimizes the balance between immediate reactions and strategic planning, tailoring it to the specific demands of the task at hand. Extensive experiments and real-world applications in multi-robot systems showcase our framework's superior performance, marking significant advancements over baselines.

* 8 pages, 10 figures. Accepted for presentation at the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024)

Via

Access Paper or Ask Questions

DiPEx: Dispersing Prompt Expansion for Class-Agnostic Object Detection

Jun 21, 2024

Jia Syuen Lim, Zhuoxiao Chen, Mahsa Baktashmotlagh, Zhi Chen, Xin Yu, Zi Huang, Yadan Luo

Figure 1 for DiPEx: Dispersing Prompt Expansion for Class-Agnostic Object Detection

Figure 2 for DiPEx: Dispersing Prompt Expansion for Class-Agnostic Object Detection

Figure 3 for DiPEx: Dispersing Prompt Expansion for Class-Agnostic Object Detection

Figure 4 for DiPEx: Dispersing Prompt Expansion for Class-Agnostic Object Detection

Abstract:Class-agnostic object detection (OD) can be a cornerstone or a bottleneck for many downstream vision tasks. Despite considerable advancements in bottom-up and multi-object discovery methods that leverage basic visual cues to identify salient objects, consistently achieving a high recall rate remains difficult due to the diversity of object types and their contextual complexity. In this work, we investigate using vision-language models (VLMs) to enhance object detection via a self-supervised prompt learning strategy. Our initial findings indicate that manually crafted text queries often result in undetected objects, primarily because detection confidence diminishes when the query words exhibit semantic overlap. To address this, we propose a Dispersing Prompt Expansion (DiPEx) approach. DiPEx progressively learns to expand a set of distinct, non-overlapping hyperspherical prompts to enhance recall rates, thereby improving performance in downstream tasks such as out-of-distribution OD. Specifically, DiPEx initiates the process by self-training generic parent prompts and selecting the one with the highest semantic uncertainty for further expansion. The resulting child prompts are expected to inherit semantics from their parent prompts while capturing more fine-grained semantics. We apply dispersion losses to ensure high inter-class discrepancy among child prompts while preserving semantic consistency between parent-child prompt pairs. To prevent excessive growth of the prompt sets, we utilize the maximum angular coverage (MAC) of the semantic space as a criterion for early termination. We demonstrate the effectiveness of DiPEx through extensive class-agnostic OD and OOD-OD experiments on MS-COCO and LVIS, surpassing other prompting methods by up to 20.1% in AR and achieving a 21.3% AP improvement over SAM. The code is available at https://github.com/jason-lim26/DiPEx.

* 19 pages

Via

Access Paper or Ask Questions

Enhancing Single-Slice Segmentation with 3D-to-2D Unpaired Scan Distillation

Jun 18, 2024

Xin Yu, Qi Yang, Han Liu, Ho Hin Lee, Yucheng Tang, Lucas W. Remedios, Michael Kim, Shunxing Bao, Ann Xenobia Moore, Luigi Ferrucci(+1 more)

Figure 1 for Enhancing Single-Slice Segmentation with 3D-to-2D Unpaired Scan Distillation

Figure 2 for Enhancing Single-Slice Segmentation with 3D-to-2D Unpaired Scan Distillation

Figure 3 for Enhancing Single-Slice Segmentation with 3D-to-2D Unpaired Scan Distillation

Figure 4 for Enhancing Single-Slice Segmentation with 3D-to-2D Unpaired Scan Distillation

Abstract:2D single-slice abdominal computed tomography (CT) enables the assessment of body habitus and organ health with low radiation exposure. However, single-slice data necessitates the use of 2D networks for segmentation, but these networks often struggle to capture contextual information effectively. Consequently, even when trained on identical datasets, 3D networks typically achieve superior segmentation results. In this work, we propose a novel 3D-to-2D distillation framework, leveraging pre-trained 3D models to enhance 2D single-slice segmentation. Specifically, we extract the prediction distribution centroid from the 3D representations, to guide the 2D student by learning intra- and inter-class correlation. Unlike traditional knowledge distillation methods that require the same data input, our approach employs unpaired 3D CT scans with any contrast to guide the 2D student model. Experiments conducted on 707 subjects from the single-slice Baltimore Longitudinal Study of Aging (BLSA) dataset demonstrate that state-of-the-art 2D multi-organ segmentation methods can benefit from the 3D teacher model, achieving enhanced performance in single-slice multi-organ segmentation. Notably, our approach demonstrates considerable efficacy in low-data regimes, outperforming the model trained with all available training subjects even when utilizing only 200 training subjects. Thus, this work underscores the potential to alleviate manual annotation burdens.

Via

Access Paper or Ask Questions

3DRealCar: An In-the-wild RGB-D Car Dataset with 360-degree Views

Jun 07, 2024

Xiaobiao Du, Haiyang Sun, Shuyun Wang, Zhuojie Wu, Hongwei Sheng, Jiaying Ying, Ming Lu, Tianqing Zhu, Kun Zhan, Xin Yu

Figure 1 for 3DRealCar: An In-the-wild RGB-D Car Dataset with 360-degree Views

Figure 2 for 3DRealCar: An In-the-wild RGB-D Car Dataset with 360-degree Views

Figure 3 for 3DRealCar: An In-the-wild RGB-D Car Dataset with 360-degree Views

Figure 4 for 3DRealCar: An In-the-wild RGB-D Car Dataset with 360-degree Views

Abstract:3D cars are commonly used in self-driving systems, virtual/augmented reality, and games. However, existing 3D car datasets are either synthetic or low-quality, presenting a significant gap toward the high-quality real-world 3D car datasets and limiting their applications in practical scenarios. In this paper, we propose the first large-scale 3D real car dataset, termed 3DRealCar, offering three distinctive features. (1) \textbf{High-Volume}: 2,500 cars are meticulously scanned by 3D scanners, obtaining car images and point clouds with real-world dimensions; (2) \textbf{High-Quality}: Each car is captured in an average of 200 dense, high-resolution 360-degree RGB-D views, enabling high-fidelity 3D reconstruction; (3) \textbf{High-Diversity}: The dataset contains various cars from over 100 brands, collected under three distinct lighting conditions, including reflective, standard, and dark. Additionally, we offer detailed car parsing maps for each instance to promote research in car parsing tasks. Moreover, we remove background point clouds and standardize the car orientation to a unified axis for the reconstruction only on cars without background and controllable rendering. We benchmark 3D reconstruction results with state-of-the-art methods across each lighting condition in 3DRealCar. Extensive experiments demonstrate that the standard lighting condition part of 3DRealCar can be used to produce a large number of high-quality 3D cars, improving various 2D and 3D tasks related to cars. Notably, our dataset brings insight into the fact that recent 3D reconstruction methods face challenges in reconstructing high-quality 3D cars under reflective and dark lighting conditions. \textcolor{red}{\href{https://xiaobiaodu.github.io/3drealcar/}{Our dataset is available here.}}

* Project Page: https://xiaobiaodu.github.io/3drealcar

Via

Access Paper or Ask Questions

QGait: Toward Accurate Quantization for Gait Recognition with Binarized Input

May 22, 2024

Senmao Tian, Haoyu Gao, Gangyi Hong, Shuyun Wang, JingJie Wang, Xin Yu, Shunli Zhang

Abstract:Existing deep learning methods have made significant progress in gait recognition. Typically, appearance-based models binarize inputs into silhouette sequences. However, mainstream quantization methods prioritize minimizing task loss over quantization error, which is detrimental to gait recognition with binarized inputs. Minor variations in silhouette sequences can be diminished in the network's intermediate layers due to the accumulation of quantization errors. To address this, we propose a differentiable soft quantizer, which better simulates the gradient of the round function during backpropagation. This enables the network to learn from subtle input perturbations. However, our theoretical analysis and empirical studies reveal that directly applying the soft quantizer can hinder network convergence. We further refine the training strategy to ensure convergence while simulating quantization errors. Additionally, we visualize the distribution of outputs from different samples in the feature space and observe significant changes compared to the full precision network, which harms performance. Based on this, we propose an Inter-class Distance-guided Distillation (IDD) strategy to preserve the relative distance between the embeddings of samples with different labels. Extensive experiments validate the effectiveness of our approach, demonstrating state-of-the-art accuracy across various settings and datasets. The code will be made publicly available.

Via

Access Paper or Ask Questions

Machine Unlearning via Null Space Calibration

Apr 21, 2024

Huiqiang Chen, Tianqing Zhu, Xin Yu, Wanlei Zhou

Figure 1 for Machine Unlearning via Null Space Calibration

Figure 2 for Machine Unlearning via Null Space Calibration

Figure 3 for Machine Unlearning via Null Space Calibration

Figure 4 for Machine Unlearning via Null Space Calibration

Abstract:Machine unlearning aims to enable models to forget specific data instances when receiving deletion requests. Current research centres on efficient unlearning to erase the influence of data from the model and neglects the subsequent impacts on the remaining data. Consequently, existing unlearning algorithms degrade the model's performance after unlearning, known as \textit{over-unlearning}. This paper addresses this critical yet under-explored issue by introducing machine \underline{U}nlearning via \underline{N}ull \underline{S}pace \underline{C}alibration (UNSC), which can accurately unlearn target samples without over-unlearning. On the contrary, by calibrating the decision space during unlearning, UNSC can significantly improve the model's performance on the remaining samples. In particular, our approach hinges on confining the unlearning process to a specified null space tailored to the remaining samples, which is augmented by strategically pseudo-labeling the unlearning samples. Comparative analyses against several established baselines affirm the superiority of our approach. Code is released at this \href{https://github.com/HQC-ML/Machine-Unlearning-via-Null-Space-Calibration}{URL}.

* Accepted by IJCAI-2024

Via

Access Paper or Ask Questions