Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xiao Ma

Automated Review Generation Method Based on Large Language Models

Jul 30, 2024

Shican Wu, Xiao Ma, Dehui Luo, Lulu Li, Xiangcheng Shi, Xin Chang, Xiaoyun Lin, Ran Luo, Chunlei Pei, Zhi-Jian Zhao(+1 more)

Figure 1 for Automated Review Generation Method Based on Large Language Models

Figure 2 for Automated Review Generation Method Based on Large Language Models

Figure 3 for Automated Review Generation Method Based on Large Language Models

Figure 4 for Automated Review Generation Method Based on Large Language Models

Abstract:Literature research, vital for scientific advancement, is overwhelmed by the vast ocean of available information. Addressing this, we propose an automated review generation method based on Large Language Models (LLMs) to streamline literature processing and reduce cognitive load. In case study on propane dehydrogenation (PDH) catalysts, our method swiftly generated comprehensive reviews from 343 articles, averaging seconds per article per LLM account. Extended analysis of 1041 articles provided deep insights into catalysts' composition, structure, and performance. Recognizing LLMs' hallucinations, we employed a multi-layered quality control strategy, ensuring our method's reliability and effective hallucination mitigation. Expert verification confirms the accuracy and citation integrity of generated reviews, demonstrating LLM hallucination risks reduced to below 0.5% with over 95% confidence. Released Windows application enables one-click review generation, aiding researchers in tracking advancements and recommending literature. This approach showcases LLMs' role in enhancing scientific research productivity and sets the stage for further exploration.

* 16 pages, 3 figures, 3 tables

Via

Access Paper or Ask Questions

Mixture of Experts based Multi-task Supervise Learning from Crowds

Jul 18, 2024

Tao Han, Huaixuan Shi, Xinyi Ding, Xiao Ma, Huamao Gu, Yili Fang

Figure 1 for Mixture of Experts based Multi-task Supervise Learning from Crowds

Figure 2 for Mixture of Experts based Multi-task Supervise Learning from Crowds

Figure 3 for Mixture of Experts based Multi-task Supervise Learning from Crowds

Figure 4 for Mixture of Experts based Multi-task Supervise Learning from Crowds

Abstract:Existing truth inference methods in crowdsourcing aim to map redundant labels and items to the ground truth. They treat the ground truth as hidden variables and use statistical or deep learning-based worker behavior models to infer the ground truth. However, worker behavior models that rely on ground truth hidden variables overlook workers' behavior at the item feature level, leading to imprecise characterizations and negatively impacting the quality of truth inference. This paper proposes a new paradigm of multi-task supervised learning from crowds, which eliminates the need for modeling of items's ground truth in worker behavior models. Within this paradigm, we propose a worker behavior model at the item feature level called Mixture of Experts based Multi-task Supervised Learning from Crowds (MMLC). Two truth inference strategies are proposed within MMLC. The first strategy, named MMLC-owf, utilizes clustering methods in the worker spectral space to identify the projection vector of the oracle worker. Subsequently, the labels generated based on this vector are considered as the inferred truth. The second strategy, called MMLC-df, employs the MMLC model to fill the crowdsourced data, which can enhance the effectiveness of existing truth inference methods. Experimental results demonstrate that MMLC-owf outperforms state-of-the-art methods and MMLC-df enhances the quality of existing truth inference methods.

Via

Access Paper or Ask Questions

BiGym: A Demo-Driven Mobile Bi-Manual Manipulation Benchmark

Jul 11, 2024

Nikita Chernyadev, Nicholas Backshall, Xiao Ma, Yunfan Lu, Younggyo Seo, Stephen James

Figure 1 for BiGym: A Demo-Driven Mobile Bi-Manual Manipulation Benchmark

Figure 2 for BiGym: A Demo-Driven Mobile Bi-Manual Manipulation Benchmark

Figure 3 for BiGym: A Demo-Driven Mobile Bi-Manual Manipulation Benchmark

Figure 4 for BiGym: A Demo-Driven Mobile Bi-Manual Manipulation Benchmark

Abstract:We introduce BiGym, a new benchmark and learning environment for mobile bi-manual demo-driven robotic manipulation. BiGym features 40 diverse tasks set in home environments, ranging from simple target reaching to complex kitchen cleaning. To capture the real-world performance accurately, we provide human-collected demonstrations for each task, reflecting the diverse modalities found in real-world robot trajectories. BiGym supports a variety of observations, including proprioceptive data and visual inputs such as RGB, and depth from 3 camera views. To validate the usability of BiGym, we thoroughly benchmark the state-of-the-art imitation learning algorithms and demo-driven reinforcement learning algorithms within the environment and discuss the future opportunities.

* Project webpage: https://chernyadev.github.io/bigym/

Via

Access Paper or Ask Questions

Green Screen Augmentation Enables Scene Generalisation in Robotic Manipulation

Jul 10, 2024

Eugene Teoh, Sumit Patidar, Xiao Ma, Stephen James

Figure 1 for Green Screen Augmentation Enables Scene Generalisation in Robotic Manipulation

Figure 2 for Green Screen Augmentation Enables Scene Generalisation in Robotic Manipulation

Figure 3 for Green Screen Augmentation Enables Scene Generalisation in Robotic Manipulation

Figure 4 for Green Screen Augmentation Enables Scene Generalisation in Robotic Manipulation

Abstract:Generalising vision-based manipulation policies to novel environments remains a challenging area with limited exploration. Current practices involve collecting data in one location, training imitation learning or reinforcement learning policies with this data, and deploying the policy in the same location. However, this approach lacks scalability as it necessitates data collection in multiple locations for each task. This paper proposes a novel approach where data is collected in a location predominantly featuring green screens. We introduce Green-screen Augmentation (GreenAug), employing a chroma key algorithm to overlay background textures onto a green screen. Through extensive real-world empirical studies with over 850 training demonstrations and 8.2k evaluation episodes, we demonstrate that GreenAug surpasses no augmentation, standard computer vision augmentation, and prior generative augmentation methods in performance. While no algorithmic novelties are claimed, our paper advocates for a fundamental shift in data collection practices. We propose that real-world demonstrations in future research should utilise green screens, followed by the application of GreenAug. We believe GreenAug unlocks policy generalisation to visually distinct novel locations, addressing the current scene generalisation limitations in robot learning.

* Project website: https://greenaug.github.io/

Via

Access Paper or Ask Questions

Test-Time Generative Augmentation for Medical Image Segmentation

Jun 25, 2024

Xiao Ma, Yuhui Tao, Yuhan Zhang, Zexuan Ji, Yizhe Zhang, Qiang Chen

Figure 1 for Test-Time Generative Augmentation for Medical Image Segmentation

Figure 2 for Test-Time Generative Augmentation for Medical Image Segmentation

Figure 3 for Test-Time Generative Augmentation for Medical Image Segmentation

Figure 4 for Test-Time Generative Augmentation for Medical Image Segmentation

Abstract:In this paper, we propose a novel approach to enhance medical image segmentation during test time. Instead of employing hand-crafted transforms or functions on the input test image to create multiple views for test-time augmentation, we advocate for the utilization of an advanced domain-fine-tuned generative model (GM), e.g., stable diffusion (SD), for test-time augmentation. Given that the GM has been trained to comprehend and encapsulate comprehensive domain data knowledge, it is superior than segmentation models in terms of representing the data characteristics and distribution. Hence, by integrating the GM into test-time augmentation, we can effectively generate multiple views of a given test sample, aligning with the content and appearance characteristics of the sample and the related local data distribution. This approach renders the augmentation process more adaptable and resilient compared to conventional handcrafted transforms. Comprehensive experiments conducted across three medical image segmentation tasks (nine datasets) demonstrate the efficacy and versatility of the proposed TTGA in enhancing segmentation outcomes. Moreover, TTGA significantly improves pixel-wise error estimation, thereby facilitating the deployment of a more reliable segmentation system. Code will be released at: https://github.com/maxiao0234/TTGA.

* 12pages, 2figures

Via

Access Paper or Ask Questions

Safety Alignment Should Be Made More Than Just a Few Tokens Deep

Jun 10, 2024

Xiangyu Qi, Ashwinee Panda, Kaifeng Lyu, Xiao Ma, Subhrajit Roy, Ahmad Beirami, Prateek Mittal, Peter Henderson

Figure 1 for Safety Alignment Should Be Made More Than Just a Few Tokens Deep

Figure 2 for Safety Alignment Should Be Made More Than Just a Few Tokens Deep

Figure 3 for Safety Alignment Should Be Made More Than Just a Few Tokens Deep

Figure 4 for Safety Alignment Should Be Made More Than Just a Few Tokens Deep

Abstract:The safety alignment of current Large Language Models (LLMs) is vulnerable. Relatively simple attacks, or even benign fine-tuning, can jailbreak aligned models. We argue that many of these vulnerabilities are related to a shared underlying issue: safety alignment can take shortcuts, wherein the alignment adapts a model's generative distribution primarily over only its very first few output tokens. We refer to this issue as shallow safety alignment. In this paper, we present case studies to explain why shallow safety alignment can exist and provide evidence that current aligned LLMs are subject to this issue. We also show how these findings help explain multiple recently discovered vulnerabilities in LLMs, including the susceptibility to adversarial suffix attacks, prefilling attacks, decoding parameter attacks, and fine-tuning attacks. Importantly, we discuss how this consolidated notion of shallow safety alignment sheds light on promising research directions for mitigating these vulnerabilities. For instance, we show that deepening the safety alignment beyond just the first few tokens can often meaningfully improve robustness against some common exploits. Finally, we design a regularized finetuning objective that makes the safety alignment more persistent against fine-tuning attacks by constraining updates on initial tokens. Overall, we advocate that future safety alignment should be made more than just a few tokens deep.

Via

Access Paper or Ask Questions

Redundancy-aware Action Spaces for Robot Learning

Jun 06, 2024

Pietro Mazzaglia, Nicholas Backshall, Xiao Ma, Stephen James

Figure 1 for Redundancy-aware Action Spaces for Robot Learning

Figure 2 for Redundancy-aware Action Spaces for Robot Learning

Figure 3 for Redundancy-aware Action Spaces for Robot Learning

Figure 4 for Redundancy-aware Action Spaces for Robot Learning

Abstract:Joint space and task space control are the two dominant action modes for controlling robot arms within the robot learning literature. Actions in joint space provide precise control over the robot's pose, but tend to suffer from inefficient training; actions in task space boast data-efficient training but sacrifice the ability to perform tasks in confined spaces due to limited control over the full joint configuration. This work analyses the criteria for designing action spaces for robot manipulation and introduces ER (End-effector Redundancy), a novel action space formulation that, by addressing the redundancies present in the manipulator, aims to combine the advantages of both joint and task spaces, offering fine-grained comprehensive control with overactuated robot arms whilst achieving highly efficient robot learning. We present two implementations of ER, ERAngle (ERA) and ERJoint (ERJ), and we show that ERJ in particular demonstrates superior performance across multiple settings, especially when precise control over the robot configuration is required. We validate our results both in simulated and real robotic environments.

* Published in the RA-L journal

Via

Access Paper or Ask Questions

Hierarchical Action Recognition: A Contrastive Video-Language Approach with Hierarchical Interactions

May 28, 2024

Rui Zhang, Shuailong Li, Junxiao Xue, Feng Lin, Qing Zhang, Xiao Ma, Xiaoran Yan

Figure 1 for Hierarchical Action Recognition: A Contrastive Video-Language Approach with Hierarchical Interactions

Figure 2 for Hierarchical Action Recognition: A Contrastive Video-Language Approach with Hierarchical Interactions

Figure 3 for Hierarchical Action Recognition: A Contrastive Video-Language Approach with Hierarchical Interactions

Figure 4 for Hierarchical Action Recognition: A Contrastive Video-Language Approach with Hierarchical Interactions

Abstract:Video recognition remains an open challenge, requiring the identification of diverse content categories within videos. Mainstream approaches often perform flat classification, overlooking the intrinsic hierarchical structure relating categories. To address this, we formalize the novel task of hierarchical video recognition, and propose a video-language learning framework tailored for hierarchical recognition. Specifically, our framework encodes dependencies between hierarchical category levels, and applies a top-down constraint to filter recognition predictions. We further construct a new fine-grained dataset based on medical assessments for rehabilitation of stroke patients, serving as a challenging benchmark for hierarchical recognition. Through extensive experiments, we demonstrate the efficacy of our approach for hierarchical recognition, significantly outperforming conventional methods, especially for fine-grained subcategories. The proposed framework paves the way for hierarchical modeling in video understanding tasks, moving beyond flat categorization.

Via

Access Paper or Ask Questions

Memory-efficient High-resolution OCT Volume Synthesis with Cascaded Amortized Latent Diffusion Models

May 26, 2024

Kun Huang, Xiao Ma, Yuhan Zhang, Na Su, Songtao Yuan, Yong Liu, Qiang Chen, Huazhu Fu

Abstract:Optical coherence tomography (OCT) image analysis plays an important role in the field of ophthalmology. Current successful analysis models rely on available large datasets, which can be challenging to be obtained for certain tasks. The use of deep generative models to create realistic data emerges as a promising approach. However, due to limitations in hardware resources, it is still difficulty to synthesize high-resolution OCT volumes. In this paper, we introduce a cascaded amortized latent diffusion model (CA-LDM) that can synthesis high-resolution OCT volumes in a memory-efficient way. First, we propose non-holistic autoencoders to efficiently build a bidirectional mapping between high-resolution volume space and low-resolution latent space. In tandem with autoencoders, we propose cascaded diffusion processes to synthesize high-resolution OCT volumes with a global-to-local refinement process, amortizing the memory and computational demands. Experiments on a public high-resolution OCT dataset show that our synthetic data have realistic high-resolution and global features, surpassing the capabilities of existing methods. Moreover, performance gains on two down-stream fine-grained segmentation tasks demonstrate the benefit of the proposed method in training deep learning models for medical imaging tasks. The code is public available at: https://github.com/nicetomeetu21/CA-LDM.

* Provisionally accepted for medical image computing and computer-assisted intervention (MICCAI) 2024

Via

Access Paper or Ask Questions

DiTMoS: Delving into Diverse Tiny-Model Selection on Microcontrollers

Mar 14, 2024

Xiao Ma, Shengfeng He, Hezhe Qiao, Dong Ma

Figure 1 for DiTMoS: Delving into Diverse Tiny-Model Selection on Microcontrollers

Figure 2 for DiTMoS: Delving into Diverse Tiny-Model Selection on Microcontrollers

Figure 3 for DiTMoS: Delving into Diverse Tiny-Model Selection on Microcontrollers

Figure 4 for DiTMoS: Delving into Diverse Tiny-Model Selection on Microcontrollers

Abstract:Enabling efficient and accurate deep neural network (DNN) inference on microcontrollers is non-trivial due to the constrained on-chip resources. Current methodologies primarily focus on compressing larger models yet at the expense of model accuracy. In this paper, we rethink the problem from the inverse perspective by constructing small/weak models directly and improving their accuracy. Thus, we introduce DiTMoS, a novel DNN training and inference framework with a selector-classifiers architecture, where the selector routes each input sample to the appropriate classifier for classification. DiTMoS is grounded on a key insight: a composition of weak models can exhibit high diversity and the union of them can significantly boost the accuracy upper bound. To approach the upper bound, DiTMoS introduces three strategies including diverse training data splitting to increase the classifiers' diversity, adversarial selector-classifiers training to ensure synergistic interactions thereby maximizing their complementarity, and heterogeneous feature aggregation to improve the capacity of classifiers. We further propose a network slicing technique to alleviate the extra memory overhead incurred by feature aggregation. We deploy DiTMoS on the Neucleo STM32F767ZI board and evaluate it based on three time-series datasets for human activity recognition, keywords spotting, and emotion recognition, respectively. The experiment results manifest that: (a) DiTMoS achieves up to 13.4% accuracy improvement compared to the best baseline; (b) network slicing almost completely eliminates the memory overhead incurred by feature aggregation with a marginal increase of latency.

Via

Access Paper or Ask Questions