Alert button
Picture for Shi Qiu

Shi Qiu

Alert button

Agents: An Open-source Framework for Autonomous Language Agents

Sep 14, 2023
Wangchunshu Zhou, Yuchen Eleanor Jiang, Long Li, Jialong Wu, Tiannan Wang, Shi Qiu, Jintian Zhang, Jing Chen, Ruipu Wu, Shuai Wang, Shiding Zhu, Jiyu Chen, Wentao Zhang, Ningyu Zhang, Huajun Chen, Peng Cui, Mrinmaya Sachan

Figure 1 for Agents: An Open-source Framework for Autonomous Language Agents
Figure 2 for Agents: An Open-source Framework for Autonomous Language Agents
Figure 3 for Agents: An Open-source Framework for Autonomous Language Agents
Figure 4 for Agents: An Open-source Framework for Autonomous Language Agents

Recent advances on large language models (LLMs) enable researchers and developers to build autonomous language agents that can automatically solve various tasks and interact with environments, humans, and other agents using natural language interfaces. We consider language agents as a promising direction towards artificial general intelligence and release Agents, an open-source library with the goal of opening up these advances to a wider non-specialist audience. Agents is carefully engineered to support important features including planning, memory, tool usage, multi-agent communication, and fine-grained symbolic control. Agents is user-friendly as it enables non-specialists to build, customize, test, tune, and deploy state-of-the-art autonomous language agents without much coding. The library is also research-friendly as its modularized design makes it easily extensible for researchers. Agents is available at https://github.com/aiwaves-cn/agents.

* Code available at https://github.com/aiwaves-cn/agents 
Viaarxiv icon

Adaptive Low Rank Adaptation of Segment Anything to Salient Object Detection

Aug 10, 2023
Ruikai Cui, Siyuan He, Shi Qiu

Figure 1 for Adaptive Low Rank Adaptation of Segment Anything to Salient Object Detection

Foundation models, such as OpenAI's GPT-3 and GPT-4, Meta's LLaMA, and Google's PaLM2, have revolutionized the field of artificial intelligence. A notable paradigm shift has been the advent of the Segment Anything Model (SAM), which has exhibited a remarkable capability to segment real-world objects, trained on 1 billion masks and 11 million images. Although SAM excels in general object segmentation, it lacks the intrinsic ability to detect salient objects, resulting in suboptimal performance in this domain. To address this challenge, we present the Segment Salient Object Model (SSOM), an innovative approach that adaptively fine-tunes SAM for salient object detection by harnessing the low-rank structure inherent in deep learning. Comprehensive qualitative and quantitative evaluations across five challenging RGB benchmark datasets demonstrate the superior performance of our approach, surpassing state-of-the-art methods.

* 13 pages, 0 figures 
Viaarxiv icon

P2C: Self-Supervised Point Cloud Completion from Single Partial Clouds

Jul 27, 2023
Ruikai Cui, Shi Qiu, Saeed Anwar, Jiawei Liu, Chaoyue Xing, Jing Zhang, Nick Barnes

Figure 1 for P2C: Self-Supervised Point Cloud Completion from Single Partial Clouds
Figure 2 for P2C: Self-Supervised Point Cloud Completion from Single Partial Clouds
Figure 3 for P2C: Self-Supervised Point Cloud Completion from Single Partial Clouds
Figure 4 for P2C: Self-Supervised Point Cloud Completion from Single Partial Clouds

Point cloud completion aims to recover the complete shape based on a partial observation. Existing methods require either complete point clouds or multiple partial observations of the same object for learning. In contrast to previous approaches, we present Partial2Complete (P2C), the first self-supervised framework that completes point cloud objects using training samples consisting of only a single incomplete point cloud per object. Specifically, our framework groups incomplete point clouds into local patches as input and predicts masked patches by learning prior information from different partial objects. We also propose Region-Aware Chamfer Distance to regularize shape mismatch without limiting completion capability, and devise the Normal Consistency Constraint to incorporate a local planarity assumption, encouraging the recovered shape surface to be continuous and complete. In this way, P2C no longer needs multiple observations or complete point clouds as ground truth. Instead, structural cues are learned from a category-specific dataset to complete partial point clouds of objects. We demonstrate the effectiveness of our approach on both synthetic ShapeNet data and real-world ScanNet data, showing that P2C produces comparable results to methods trained with complete shapes, and outperforms methods learned with multiple partial observations. Code is available at https://github.com/CuiRuikai/Partial2Complete.

* Accepted to ICCV 2023 
Viaarxiv icon

A Comprehensive Overview of Large Language Models

Jul 12, 2023
Humza Naveed, Asad Ullah Khan, Shi Qiu, Muhammad Saqib, Saeed Anwar, Muhammad Usman, Nick Barnes, Ajmal Mian

Figure 1 for A Comprehensive Overview of Large Language Models
Figure 2 for A Comprehensive Overview of Large Language Models
Figure 3 for A Comprehensive Overview of Large Language Models
Figure 4 for A Comprehensive Overview of Large Language Models

Large Language Models (LLMs) have shown excellent generalization capabilities that have led to the development of numerous models. These models propose various new architectures, tweaking existing architectures with refined training strategies, increasing context length, using high-quality training data, and increasing training time to outperform baselines. Analyzing new developments is crucial for identifying changes that enhance training stability and improve generalization in LLMs. This survey paper comprehensively analyses the LLMs architectures and their categorization, training strategies, training datasets, and performance evaluations and discusses future research directions. Moreover, the paper also discusses the basic building blocks and concepts behind LLMs, followed by a complete overview of LLMs, including their important features and functions. Finally, the paper summarizes significant findings from LLM research and consolidates essential architectural and training strategies for developing advanced LLMs. Given the continuous advancements in LLMs, we intend to regularly update this paper by incorporating new sections and featuring the latest LLM models.

Viaarxiv icon

PointCaM: Cut-and-Mix for Open-Set Point Cloud Analysis

Dec 05, 2022
Jie Hong, Shi Qiu, Weihao Li, Saeed Anwar, Mehrtash Harandi, Nick Barnes, Lars Petersson

Figure 1 for PointCaM: Cut-and-Mix for Open-Set Point Cloud Analysis
Figure 2 for PointCaM: Cut-and-Mix for Open-Set Point Cloud Analysis
Figure 3 for PointCaM: Cut-and-Mix for Open-Set Point Cloud Analysis
Figure 4 for PointCaM: Cut-and-Mix for Open-Set Point Cloud Analysis

Point cloud analysis is receiving increasing attention, however, most existing point cloud models lack the practical ability to deal with the unavoidable presence of unknown objects. This paper mainly discusses point cloud analysis under open-set settings, where we train the model without data from unknown classes and identify them in the inference stage. Basically, we propose to solve open-set point cloud analysis using a novel Point Cut-and-Mix mechanism consisting of Unknown-Point Simulator and Unknown-Point Estimator modules. Specifically, we use the Unknown-Point Simulator to simulate unknown data in the training stage by manipulating the geometric context of partial known data. Based on this, the Unknown-Point Estimator module learns to exploit the point cloud's feature context for discriminating the known and unknown data. Extensive experiments show the plausibility of open-set point cloud analysis and the effectiveness of our proposed solutions. Our code is available at \url{https://github.com/ShiQiu0419/pointcam}.

Viaarxiv icon

Energy-Based Residual Latent Transport for Unsupervised Point Cloud Completion

Nov 13, 2022
Ruikai Cui, Shi Qiu, Saeed Anwar, Jing Zhang, Nick Barnes

Figure 1 for Energy-Based Residual Latent Transport for Unsupervised Point Cloud Completion
Figure 2 for Energy-Based Residual Latent Transport for Unsupervised Point Cloud Completion
Figure 3 for Energy-Based Residual Latent Transport for Unsupervised Point Cloud Completion
Figure 4 for Energy-Based Residual Latent Transport for Unsupervised Point Cloud Completion

Unsupervised point cloud completion aims to infer the whole geometry of a partial object observation without requiring partial-complete correspondence. Differing from existing deterministic approaches, we advocate generative modeling based unsupervised point cloud completion to explore the missing correspondence. Specifically, we propose a novel framework that performs completion by transforming a partial shape encoding into a complete one using a latent transport module, and it is designed as a latent-space energy-based model (EBM) in an encoder-decoder architecture, aiming to learn a probability distribution conditioned on the partial shape encoding. To train the latent code transport module and the encoder-decoder network jointly, we introduce a residual sampling strategy, where the residual captures the domain gap between partial and complete shape latent spaces. As a generative model-based framework, our method can produce uncertainty maps consistent with human perception, leading to explainable unsupervised point cloud completion. We experimentally show that the proposed method produces high-fidelity completion results, outperforming state-of-the-art models by a significant margin.

* BMVC 2022 paper 
Viaarxiv icon

PU-Transformer: Point Cloud Upsampling Transformer

Nov 24, 2021
Shi Qiu, Saeed Anwar, Nick Barnes

Figure 1 for PU-Transformer: Point Cloud Upsampling Transformer
Figure 2 for PU-Transformer: Point Cloud Upsampling Transformer
Figure 3 for PU-Transformer: Point Cloud Upsampling Transformer
Figure 4 for PU-Transformer: Point Cloud Upsampling Transformer

Given the rapid development of 3D scanners, point clouds are becoming popular in AI-driven machines. However, point cloud data is inherently sparse and irregular, causing major difficulties for machine perception. In this work, we focus on the point cloud upsampling task that intends to generate dense high-fidelity point clouds from sparse input data. Specifically, to activate the transformer's strong capability in representing features, we develop a new variant of a multi-head self-attention structure to enhance both point-wise and channel-wise relations of the feature map. In addition, we leverage a positional fusion block to comprehensively capture the local context of point cloud data, providing more position-related information about the scattered points. As the first transformer model introduced for point cloud upsampling, we demonstrate the outstanding performance of our approach by comparing with the state-of-the-art CNN-based methods on different benchmarks quantitatively and qualitatively.

Viaarxiv icon

PnP-3D: A Plug-and-Play for 3D Point Clouds

Aug 16, 2021
Shi Qiu, Saeed Anwar, Nick Barnes

Figure 1 for PnP-3D: A Plug-and-Play for 3D Point Clouds
Figure 2 for PnP-3D: A Plug-and-Play for 3D Point Clouds
Figure 3 for PnP-3D: A Plug-and-Play for 3D Point Clouds
Figure 4 for PnP-3D: A Plug-and-Play for 3D Point Clouds

With the help of the deep learning paradigm, many point cloud networks have been invented for visual analysis. However, there is great potential for development of these networks since the given information of point cloud data has not been fully exploited. To improve the effectiveness of existing networks in analyzing point cloud data, we propose a plug-and-play module, PnP-3D, aiming to refine the fundamental point cloud feature representations by involving more local context and global bilinear response from explicit 3D space and implicit feature space. To thoroughly evaluate our approach, we conduct experiments on three standard point cloud analysis tasks, including classification, semantic segmentation, and object detection, where we select three state-of-the-art networks from each task for evaluation. Serving as a plug-and-play module, PnP-3D can significantly boost the performances of established networks. In addition to achieving state-of-the-art results on four widely used point cloud benchmarks, we present comprehensive ablation studies and visualizations to demonstrate our approach's advantages. The code will be available at https://github.com/ShiQiu0419/pnp-3d.

Viaarxiv icon

Investigating Attention Mechanism in 3D Point Cloud Object Detection

Aug 02, 2021
Shi Qiu, Yunfan Wu, Saeed Anwar, Chongyi Li

Figure 1 for Investigating Attention Mechanism in 3D Point Cloud Object Detection
Figure 2 for Investigating Attention Mechanism in 3D Point Cloud Object Detection
Figure 3 for Investigating Attention Mechanism in 3D Point Cloud Object Detection
Figure 4 for Investigating Attention Mechanism in 3D Point Cloud Object Detection

Object detection in three-dimensional (3D) space attracts much interest from academia and industry since it is an essential task in AI-driven applications such as robotics, autonomous driving, and augmented reality. As the basic format of 3D data, the point cloud can provide detailed geometric information about the objects in the original 3D space. However, due to 3D data's sparsity and unorderedness, specially designed networks and modules are needed to process this type of data. Attention mechanism has achieved impressive performance in diverse computer vision tasks; however, it is unclear how attention modules would affect the performance of 3D point cloud object detection and what sort of attention modules could fit with the inherent properties of 3D data. This work investigates the role of the attention mechanism in 3D point cloud object detection and provides insights into the potential of different attention modules. To achieve that, we comprehensively investigate classical 2D attentions, novel 3D attentions, including the latest point cloud transformers on SUN RGB-D and ScanNetV2 datasets. Based on the detailed experiments and analysis, we conclude the effects of different attention modules. This paper is expected to serve as a reference source for benefiting attention-embedded 3D point cloud object detection. The code and trained models are available at: https://github.com/ShiQiu0419/attentions_in_3D_detection.

* Code and trained models are at \<https://github.com/ShiQiu0419/attentions_in_3D_detection> 
Viaarxiv icon

Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion

Mar 12, 2021
Shi Qiu, Saeed Anwar, Nick Barnes

Figure 1 for Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion
Figure 2 for Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion
Figure 3 for Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion
Figure 4 for Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion

Given the prominence of current 3D sensors, a fine-grained analysis on the basic point cloud data is worthy of further investigation. Particularly, real point cloud scenes can intuitively capture complex surroundings in the real world, but due to 3D data's raw nature, it is very challenging for machine perception. In this work, we concentrate on the essential visual task, semantic segmentation, for large-scale point cloud data collected in reality. On the one hand, to reduce the ambiguity in nearby points, we augment their local context by fully utilizing both geometric and semantic features in a bilateral structure. On the other hand, we comprehensively interpret the distinctness of the points from multiple resolutions and represent the feature map following an adaptive fusion method at point-level for accurate semantic segmentation. Further, we provide specific ablation studies and intuitive visualizations to validate our key modules. By comparing with state-of-the-art networks on three different benchmarks, we demonstrate the effectiveness of our network.

* Accepted in CVPR2021 
Viaarxiv icon