Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xi Mo

IronEngine: Towards General AI Assistant

Mar 09, 2026

Xi Mo

Abstract:This paper presents IronEngine, a general AI assistant platform organized around a unified orchestration core that connects a desktop user interface, REST and WebSocket APIs, Python clients, local and cloud model backends, persistent memory, task scheduling, reusable skills, 24-category tool execution, MCP-compatible extensibility, and hardware-facing integration. IronEngine introduces a three-phase pipeline -- Discussion (Planner--Reviewer collaboration), Model Switch (VRAM-aware transition), and Execution (tool-augmented action loop) -- that separates planning quality from execution capability. The system features a hierarchical memory architecture with multi-level consolidation, a vectorized skill repository backed by ChromaDB, an adaptive model management layer supporting 92 model profiles with VRAM-aware context budgeting, and an intelligent tool routing system with 130+ alias normalization and automatic error correction. We present experimental results on file operation benchmarks achieving 100\% task completion with a mean total time of 1541 seconds across four heterogeneous tasks, and provide detailed comparisons with representative AI assistant systems including ChatGPT, Claude Desktop, Cursor, Windsurf, and open-source agent frameworks. Without disclosing proprietary prompts or core algorithms, this paper analyzes the platform's architectural decomposition, subsystem design, experimental performance, safety boundaries, and comparative engineering advantages. The resulting study positions IronEngine as a system-oriented foundation for general-purpose personal assistants, automation frameworks, and future human-centered agent platforms.

* Technical Report

Via

Access Paper or Ask Questions

Dilated Continuous Random Field for Semantic Segmentation

Feb 01, 2022

Xi Mo, Xiangyu Chen, Cuncong Zhong, Rui Li, Kaidong Li, Usman Sajid

Figure 1 for Dilated Continuous Random Field for Semantic Segmentation

Figure 2 for Dilated Continuous Random Field for Semantic Segmentation

Figure 3 for Dilated Continuous Random Field for Semantic Segmentation

Figure 4 for Dilated Continuous Random Field for Semantic Segmentation

Abstract:Mean field approximation methodology has laid the foundation of modern Continuous Random Field (CRF) based solutions for the refinement of semantic segmentation. In this paper, we propose to relax the hard constraint of mean field approximation - minimizing the energy term of each node from probabilistic graphical model, by a global optimization with the proposed dilated sparse convolution module (DSConv). In addition, adaptive global average-pooling and adaptive global max-pooling are implemented as replacements of fully connected layers. In order to integrate DSConv, we design an end-to-end, time-efficient DilatedCRF pipeline. The unary energy term is derived either from pre-softmax and post-softmax features, or the predicted affordance map using a conventional classifier, making it easier to implement DilatedCRF for varieties of classifiers. We also present superior experimental results of proposed approach on the suction dataset comparing to other CRF-based approaches.

* Manuscript accepted by IEEE International Conference on Robotics and Automation (ICRA 2022)

Via

Access Paper or Ask Questions

Realtime Global Attention Network for Semantic Segmentation

Dec 24, 2021

Xi Mo, Xiangyu Chen

Figure 1 for Realtime Global Attention Network for Semantic Segmentation

Figure 2 for Realtime Global Attention Network for Semantic Segmentation

Figure 3 for Realtime Global Attention Network for Semantic Segmentation

Figure 4 for Realtime Global Attention Network for Semantic Segmentation

Abstract:In this paper, we proposed an end-to-end realtime global attention neural network (RGANet) for the challenging task of semantic segmentation. Different from the encoding strategy deployed by self-attention paradigms, the proposed global attention module encodes global attention via depth-wise convolution and affine transformations. The integration of these global attention modules into a hierarchy architecture maintains high inferential performance. In addition, an improved evaluation metric, namely MGRID, is proposed to alleviate the negative effect of non-convex, widely scattered ground-truth areas. Results from extensive experiments on state-of-the-art architectures for semantic segmentation manifest the leading performance of proposed approaches for robotic monocular visual perception.

* Ver1.0 for RA-L with ICRA presentation

Via

Access Paper or Ask Questions

Stereo Frustums: A Siamese Pipeline for 3D Object Detection

Nov 08, 2020

Xi Mo, Usman Sajid, Guanghui Wang

Figure 1 for Stereo Frustums: A Siamese Pipeline for 3D Object Detection

Figure 2 for Stereo Frustums: A Siamese Pipeline for 3D Object Detection

Figure 3 for Stereo Frustums: A Siamese Pipeline for 3D Object Detection

Figure 4 for Stereo Frustums: A Siamese Pipeline for 3D Object Detection

Abstract:The paper proposes a light-weighted stereo frustums matching module for 3D objection detection. The proposed framework takes advantage of a high-performance 2D detector and a point cloud segmentation network to regress 3D bounding boxes for autonomous driving vehicles. Instead of performing traditional stereo matching to compute disparities, the module directly takes the 2D proposals from both the left and the right views as input. Based on the epipolar constraints recovered from the well-calibrated stereo cameras, we propose four matching algorithms to search for the best match for each proposal between the stereo image pairs. Each matching pair proposes a segmentation of the scene which is then fed into a 3D bounding box regression network. Results of extensive experiments on KITTI dataset demonstrate that the proposed Siamese pipeline outperforms the state-of-the-art stereo-based 3D bounding box regression methods.

* Accepted by Journal of Intelligent & Robotic Systems (JIRS)

Via

Access Paper or Ask Questions

An Efficient Approach for Polyps Detection in Endoscopic Videos Based on Faster R-CNN

Sep 04, 2018

Xi Mo, Ke Tao, Quan Wang, Guanghui Wang

Figure 1 for An Efficient Approach for Polyps Detection in Endoscopic Videos Based on Faster R-CNN

Figure 2 for An Efficient Approach for Polyps Detection in Endoscopic Videos Based on Faster R-CNN

Figure 3 for An Efficient Approach for Polyps Detection in Endoscopic Videos Based on Faster R-CNN

Figure 4 for An Efficient Approach for Polyps Detection in Endoscopic Videos Based on Faster R-CNN

Abstract:Polyp has long been considered as one of the major etiologies to colorectal cancer which is a fatal disease around the world, thus early detection and recognition of polyps plays a crucial role in clinical routines. Accurate diagnoses of polyps through endoscopes operated by physicians becomes a challenging task not only due to the varying expertise of physicians, but also the inherent nature of endoscopic inspections. To facilitate this process, computer-aid techniques that emphasize fully-conventional image processing and novel machine learning enhanced approaches have been dedicatedly designed for polyp detection in endoscopic videos or images. Among all proposed algorithms, deep learning based methods take the lead in terms of multiple metrics in evolutions for algorithmic performance. In this work, a highly effective model, namely the faster region-based convolutional neural network (Faster R-CNN) is implemented for polyp detection. In comparison with the reported results of the state-of-the-art approaches on polyps detection, extensive experiments demonstrate that the Faster R-CNN achieves very competing results, and it is an efficient approach for clinical practice.

* 6 pages, 10 figures,2018 International Conference on Pattern Recognition

Via

Access Paper or Ask Questions