Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Guangyong Chen

LLM-Assisted Multi-Teacher Continual Learning for Visual Question Answering in Robotic Surgery

Feb 26, 2024

Kexin Chen, Yuyang Du, Tao You, Mobarakol Islam, Ziyu Guo, Yueming Jin, Guangyong Chen, Pheng-Ann Heng

Figure 1 for LLM-Assisted Multi-Teacher Continual Learning for Visual Question Answering in Robotic Surgery

Figure 2 for LLM-Assisted Multi-Teacher Continual Learning for Visual Question Answering in Robotic Surgery

Figure 3 for LLM-Assisted Multi-Teacher Continual Learning for Visual Question Answering in Robotic Surgery

Figure 4 for LLM-Assisted Multi-Teacher Continual Learning for Visual Question Answering in Robotic Surgery

Abstract:Visual question answering (VQA) can be fundamentally crucial for promoting robotic-assisted surgical education. In practice, the needs of trainees are constantly evolving, such as learning more surgical types, adapting to different robots, and learning new surgical instruments and techniques for one surgery. Therefore, continually updating the VQA system by a sequential data stream from multiple resources is demanded in robotic surgery to address new tasks. In surgical scenarios, the storage cost and patient data privacy often restrict the availability of old data when updating the model, necessitating an exemplar-free continual learning (CL) setup. However, prior studies overlooked two vital problems of the surgical domain: i) large domain shifts from diverse surgical operations collected from multiple departments or clinical centers, and ii) severe data imbalance arising from the uneven presence of surgical instruments or activities during surgical procedures. This paper proposes to address these two problems with a multimodal large language model (LLM) and an adaptive weight assignment methodology. We first develop a new multi-teacher CL framework that leverages a multimodal LLM as the additional teacher. The strong generalization ability of the LLM can bridge the knowledge gap when domain shifts and data imbalances occur. We then put forth a novel data processing method that transforms complex LLM embeddings into logits compatible with our CL framework. We further design an adaptive weight assignment approach that balances the generalization ability of the LLM and the domain expertise of the old CL model. We construct a new dataset for surgical VQA tasks, providing valuable data resources for future research. Extensive experimental results on three datasets demonstrate the superiority of our method to other advanced CL models.

* This paper has been accapted by 2024 IEEE International Conference on Robotics and Automation (ICRA)

Via

Access Paper or Ask Questions

An Autonomous Large Language Model Agent for Chemical Literature Data Mining

Feb 20, 2024

Kexin Chen, Hanqun Cao, Junyou Li, Yuyang Du, Menghao Guo, Xin Zeng, Lanqing Li, Jiezhong Qiu, Pheng Ann Heng, Guangyong Chen

Figure 1 for An Autonomous Large Language Model Agent for Chemical Literature Data Mining

Figure 2 for An Autonomous Large Language Model Agent for Chemical Literature Data Mining

Figure 3 for An Autonomous Large Language Model Agent for Chemical Literature Data Mining

Figure 4 for An Autonomous Large Language Model Agent for Chemical Literature Data Mining

Abstract:Chemical synthesis, which is crucial for advancing material synthesis and drug discovery, impacts various sectors including environmental science and healthcare. The rise of technology in chemistry has generated extensive chemical data, challenging researchers to discern patterns and refine synthesis processes. Artificial intelligence (AI) helps by analyzing data to optimize synthesis and increase yields. However, AI faces challenges in processing literature data due to the unstructured format and diverse writing style of chemical literature. To overcome these difficulties, we introduce an end-to-end AI agent framework capable of high-fidelity extraction from extensive chemical literature. This AI agent employs large language models (LLMs) for prompt generation and iterative optimization. It functions as a chemistry assistant, automating data collection and analysis, thereby saving manpower and enhancing performance. Our framework's efficacy is evaluated using accuracy, recall, and F1 score of reaction condition data, and we compared our method with human experts in terms of content correctness and time efficiency. The proposed approach marks a significant advancement in automating chemical literature extraction and demonstrates the potential for AI to revolutionize data management and utilization in chemistry.

Via

Access Paper or Ask Questions

SignVTCL: Multi-Modal Continuous Sign Language Recognition Enhanced by Visual-Textual Contrastive Learning

Jan 22, 2024

Hao Chen, Jiaze Wang, Ziyu Guo, Jinpeng Li, Donghao Zhou, Bian Wu, Chenyong Guan, Guangyong Chen, Pheng-Ann Heng

Abstract:Sign language recognition (SLR) plays a vital role in facilitating communication for the hearing-impaired community. SLR is a weakly supervised task where entire videos are annotated with glosses, making it challenging to identify the corresponding gloss within a video segment. Recent studies indicate that the main bottleneck in SLR is the insufficient training caused by the limited availability of large-scale datasets. To address this challenge, we present SignVTCL, a multi-modal continuous sign language recognition framework enhanced by visual-textual contrastive learning, which leverages the full potential of multi-modal data and the generalization ability of language model. SignVTCL integrates multi-modal data (video, keypoints, and optical flow) simultaneously to train a unified visual backbone, thereby yielding more robust visual representations. Furthermore, SignVTCL contains a visual-textual alignment approach incorporating gloss-level and sentence-level alignment to ensure precise correspondence between visual features and glosses at the level of individual glosses and sentence. Experimental results conducted on three datasets, Phoenix-2014, Phoenix-2014T, and CSL-Daily, demonstrate that SignVTCL achieves state-of-the-art results compared with previous methods.

Via

Access Paper or Ask Questions

Towards an Automatic AI Agent for Reaction Condition Recommendation in Chemical Synthesis

Nov 28, 2023

Kexin Chen, Junyou Li, Kunyi Wang, Yuyang Du, Jiahui Yu, Jiamin Lu, Lanqing Li, Jiezhong Qiu, Qun Fang, Pheng Ann Heng(+1 more)

Figure 1 for Towards an Automatic AI Agent for Reaction Condition Recommendation in Chemical Synthesis

Figure 2 for Towards an Automatic AI Agent for Reaction Condition Recommendation in Chemical Synthesis

Figure 3 for Towards an Automatic AI Agent for Reaction Condition Recommendation in Chemical Synthesis

Figure 4 for Towards an Automatic AI Agent for Reaction Condition Recommendation in Chemical Synthesis

Abstract:Artificial intelligence (AI) for reaction condition optimization has become an important topic in the pharmaceutical industry, given that a data-driven AI model can assist drug discovery and accelerate reaction design. However, existing AI models lack the chemical insights and real-time knowledge acquisition abilities of experienced human chemists. This paper proposes a Large Language Model (LLM) empowered AI agent to bridge this gap. We put forth a novel three-phase paradigm and applied advanced intelligence-enhancement methods like in-context learning and multi-LLM debate so that the AI agent can borrow human insight and update its knowledge by searching the latest chemical literature. Additionally, we introduce a novel Coarse-label Contrastive Learning (CCL) based chemical fingerprint that greatly enhances the agent's performance in optimizing the reaction condition. With the above efforts, the proposed AI agent can autonomously generate the optimal reaction condition recommendation without any human interaction. Further, the agent is highly professional in terms of chemical reactions. It demonstrates close-to-human performance and strong generalization capability in both dry-lab and wet-lab experiments. As the first attempt in the chemical AI agent, this work goes a step further in the field of "AI for chemistry" and opens up new possibilities for computer-aided synthesis planning.

Via

Access Paper or Ask Questions

Distribution-Aware Calibration for Object Detection with Noisy Bounding Boxes

Aug 23, 2023

Donghao Zhou, Jialin Li, Jinpeng Li, Jiancheng Huang, Qiang Nie, Yong Liu, Bin-Bin Gao, Qiong Wang, Pheng-Ann Heng, Guangyong Chen

Abstract:Large-scale well-annotated datasets are of great importance for training an effective object detector. However, obtaining accurate bounding box annotations is laborious and demanding. Unfortunately, the resultant noisy bounding boxes could cause corrupt supervision signals and thus diminish detection performance. Motivated by the observation that the real ground-truth is usually situated in the aggregation region of the proposals assigned to a noisy ground-truth, we propose DIStribution-aware CalibratiOn (DISCO) to model the spatial distribution of proposals for calibrating supervision signals. In DISCO, spatial distribution modeling is performed to statistically extract the potential locations of objects. Based on the modeled distribution, three distribution-aware techniques, i.e., distribution-aware proposal augmentation (DA-Aug), distribution-aware box refinement (DA-Ref), and distribution-aware confidence estimation (DA-Est), are developed to improve classification, localization, and interpretability, respectively. Extensive experiments on large-scale noisy image datasets (i.e., Pascal VOC and MS-COCO) demonstrate that DISCO can achieve state-of-the-art detection performance, especially at high noise levels.

* 12 pages, 9 figures

Via

Access Paper or Ask Questions

DPPMask: Masked Image Modeling with Determinantal Point Processes

Mar 25, 2023

Junde Xu, Zikai Lin, Donghao Zhou, Yaodong Yang, Xiangyun Liao, Bian Wu, Guangyong Chen, Pheng-Ann Heng

Figure 1 for DPPMask: Masked Image Modeling with Determinantal Point Processes

Figure 2 for DPPMask: Masked Image Modeling with Determinantal Point Processes

Figure 3 for DPPMask: Masked Image Modeling with Determinantal Point Processes

Figure 4 for DPPMask: Masked Image Modeling with Determinantal Point Processes

Abstract:Masked Image Modeling (MIM) has achieved impressive representative performance with the aim of reconstructing randomly masked images. Despite the empirical success, most previous works have neglected the important fact that it is unreasonable to force the model to reconstruct something beyond recovery, such as those masked objects. In this work, we show that uniformly random masking widely used in previous works unavoidably loses some key objects and changes original semantic information, resulting in a misalignment problem and hurting the representative learning eventually. To address this issue, we augment MIM with a new masking strategy namely the DPPMask by substituting the random process with Determinantal Point Process (DPPs) to reduce the semantic change of the image after masking. Our method is simple yet effective and requires no extra learnable parameters when implemented within various frameworks. In particular, we evaluate our method on two representative MIM frameworks, MAE and iBOT. We show that DPPMask surpassed random sampling under both lower and higher masking ratios, indicating that DPPMask makes the reconstruction task more reasonable. We further test our method on the background challenge and multi-class classification tasks, showing that our method is more robust at various tasks.

Via

Access Paper or Ask Questions

Adaptive Negative Evidential Deep Learning for Open-set Semi-supervised Learning

Mar 21, 2023

Yang Yu, Danruo Deng, Furui Liu, Yueming Jin, Qi Dou, Guangyong Chen, Pheng-Ann Heng

Figure 1 for Adaptive Negative Evidential Deep Learning for Open-set Semi-supervised Learning

Figure 2 for Adaptive Negative Evidential Deep Learning for Open-set Semi-supervised Learning

Figure 3 for Adaptive Negative Evidential Deep Learning for Open-set Semi-supervised Learning

Figure 4 for Adaptive Negative Evidential Deep Learning for Open-set Semi-supervised Learning

Abstract:Semi-supervised learning (SSL) methods assume that labeled data, unlabeled data and test data are from the same distribution. Open-set semi-supervised learning (Open-set SSL) considers a more practical scenario, where unlabeled data and test data contain new categories (outliers) not observed in labeled data (inliers). Most previous works focused on outlier detection via binary classifiers, which suffer from insufficient scalability and inability to distinguish different types of uncertainty. In this paper, we propose a novel framework, Adaptive Negative Evidential Deep Learning (ANEDL) to tackle these limitations. Concretely, we first introduce evidential deep learning (EDL) as an outlier detector to quantify different types of uncertainty, and design different uncertainty metrics for self-training and inference. Furthermore, we propose a novel adaptive negative optimization strategy, making EDL more tailored to the unlabeled dataset containing both inliers and outliers. As demonstrated empirically, our proposed method outperforms existing state-of-the-art methods across four datasets.

Via

Access Paper or Ask Questions

Uncertainty Estimation by Fisher Information-based Evidential Deep Learning

Mar 13, 2023

Danruo Deng, Guangyong Chen, Yang Yu, Furui Liu, Pheng-Ann Heng

Figure 1 for Uncertainty Estimation by Fisher Information-based Evidential Deep Learning

Figure 2 for Uncertainty Estimation by Fisher Information-based Evidential Deep Learning

Figure 3 for Uncertainty Estimation by Fisher Information-based Evidential Deep Learning

Figure 4 for Uncertainty Estimation by Fisher Information-based Evidential Deep Learning

Abstract:Uncertainty estimation is a key factor that makes deep learning reliable in practical applications. Recently proposed evidential neural networks explicitly account for different uncertainties by treating the network's outputs as evidence to parameterize the Dirichlet distribution, and achieve impressive performance in uncertainty estimation. However, for high data uncertainty samples but annotated with the one-hot label, the evidence-learning process for those mislabeled classes is over-penalized and remains hindered. To address this problem, we propose a novel method, Fisher Information-based Evidential Deep Learning ($\mathcal{I}$-EDL). In particular, we introduce Fisher Information Matrix (FIM) to measure the informativeness of evidence carried by each sample, according to which we can dynamically reweight the objective loss terms to make the network more focused on the representation learning of uncertain classes. The generalization ability of our network is further improved by optimizing the PAC-Bayesian bound. As demonstrated empirically, our proposed method consistently outperforms traditional EDL-related algorithms in multiple uncertainty estimation tasks, especially in the more challenging few-shot classification settings.

Via

Access Paper or Ask Questions

Traj-MAE: Masked Autoencoders for Trajectory Prediction

Mar 12, 2023

Hao Chen, Jiaze Wang, Kun Shao, Furui Liu, Jianye Hao, Chenyong Guan, Guangyong Chen, Pheng-Ann Heng

Figure 1 for Traj-MAE: Masked Autoencoders for Trajectory Prediction

Figure 2 for Traj-MAE: Masked Autoencoders for Trajectory Prediction

Figure 3 for Traj-MAE: Masked Autoencoders for Trajectory Prediction

Figure 4 for Traj-MAE: Masked Autoencoders for Trajectory Prediction

Abstract:Trajectory prediction has been a crucial task in building a reliable autonomous driving system by anticipating possible dangers. One key issue is to generate consistent trajectory predictions without colliding. To overcome the challenge, we propose an efficient masked autoencoder for trajectory prediction (Traj-MAE) that better represents the complicated behaviors of agents in the driving environment. Specifically, our Traj-MAE employs diverse masking strategies to pre-train the trajectory encoder and map encoder, allowing for the capture of social and temporal information among agents while leveraging the effect of environment from multiple granularities. To address the catastrophic forgetting problem that arises when pre-training the network with multiple masking strategies, we introduce a continual pre-training framework, which can help Traj-MAE learn valuable and diverse information from various strategies efficiently. Our experimental results in both multi-agent and single-agent settings demonstrate that Traj-MAE achieves competitive results with state-of-the-art methods and significantly outperforms our baseline model.

Via

Access Paper or Ask Questions

PointPatchMix: Point Cloud Mixing with Patch Scoring

Mar 12, 2023

Yi Wang, Jiaze Wang, Jinpeng Li, Zixu Zhao, Guangyong Chen, Anfeng Liu, Pheng-Ann Heng

Figure 1 for PointPatchMix: Point Cloud Mixing with Patch Scoring

Figure 2 for PointPatchMix: Point Cloud Mixing with Patch Scoring

Figure 3 for PointPatchMix: Point Cloud Mixing with Patch Scoring

Figure 4 for PointPatchMix: Point Cloud Mixing with Patch Scoring

Abstract:Data augmentation is an effective regularization strategy for mitigating overfitting in deep neural networks, and it plays a crucial role in 3D vision tasks, where the point cloud data is relatively limited. While mixing-based augmentation has shown promise for point clouds, previous methods mix point clouds either on block level or point level, which has constrained their ability to strike a balance between generating diverse training samples and preserving the local characteristics of point clouds. Additionally, the varying importance of each part of the point clouds has not been fully considered, cause not all parts contribute equally to the classification task, and some parts may contain unimportant or redundant information. To overcome these challenges, we propose PointPatchMix, a novel approach that mixes point clouds at the patch level and integrates a patch scoring module to generate content-based targets for mixed point clouds. Our approach preserves local features at the patch level, while the patch scoring module assigns targets based on the content-based significance score from a pre-trained teacher model. We evaluate PointPatchMix on two benchmark datasets, ModelNet40 and ScanObjectNN, and demonstrate significant improvements over various baselines in both synthetic and real-world datasets, as well as few-shot settings. With Point-MAE as our baseline, our model surpasses previous methods by a significant margin, achieving 86.3% accuracy on ScanObjectNN and 94.1% accuracy on ModelNet40. Furthermore, our approach shows strong generalization across multiple architectures and enhances the robustness of the baseline model.

Via

Access Paper or Ask Questions