Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tianyuan Liu

Interpretable and Sparse Linear Attention with Decoupled Membership-Subspace Modeling via MCR2 Objective

Jan 20, 2026

Tianyuan Liu, Libin Hou, Linyuan Wang, Bin Yan

Abstract:Maximal Coding Rate Reduction (MCR2)-driven white-box transformer, grounded in structured representation learning, unifies interpretability and efficiency, providing a reliable white-box solution for visual modeling. However, in existing designs, tight coupling between "membership matrix" and "subspace matrix U" in MCR2 causes redundant coding under incorrect token projection. To this end, we decouple the functional relationship between the "membership matrix" and "subspaces U" in the MCR2 objective and derive an interpretable sparse linear attention operator from unrolled gradient descent of the optimized objective. Specifically, we propose to directly learn the membership matrix from inputs and subsequently derive sparse subspaces from the fullspace S. Consequently, gradient unrolling of the optimized MCR2 objective yields an interpretable sparse linear attention operator: Decoupled Membership-Subspace Attention (DMSA). Experimental results on visual tasks show that simply replacing the attention module in Token Statistics Transformer (ToST) with DMSA (we refer to as DMST) not only achieves a faster coding reduction rate but also outperforms ToST by 1.08%-1.45% in top-1 accuracy on the ImageNet-1K dataset. Compared with vanilla Transformer architectures, DMST exhibits significantly higher computational efficiency and interpretability.

* 8 pages with 6 figures

Via

Access Paper or Ask Questions

Optimizing Robotic Placement via Grasp-Dependent Feasibility Prediction

Dec 21, 2025

Tianyuan Liu, Richard Dazeley, Benjamin Champion, Akan Cosgun

Figure 1 for Optimizing Robotic Placement via Grasp-Dependent Feasibility Prediction

Figure 2 for Optimizing Robotic Placement via Grasp-Dependent Feasibility Prediction

Figure 3 for Optimizing Robotic Placement via Grasp-Dependent Feasibility Prediction

Figure 4 for Optimizing Robotic Placement via Grasp-Dependent Feasibility Prediction

Abstract:In this paper, we study whether inexpensive, physics-free supervision can reliably prioritize grasp-place candidates for budget-aware pick-and-place. From an object's initial pose, target pose, and a candidate grasp, we generate two path-aware geometric labels: path-wise inverse kinematics (IK) feasibility across a fixed approach-grasp-lift waypoint template, and a transit collision flag from mesh sweeps along the same template. A compact dual-output MLP learns these signals from pose encodings, and at test time its scores rank precomputed candidates for a rank-then-plan policy under the same IK gate and planner as the baseline. Although learned from cheap labels only, the scores transfer to physics-enabled executed trajectories: at a fixed planning budget the policy finds successful paths sooner with fewer planner calls while keeping final success on par or better. This work targets a single rigid cuboid with side-face grasps and a fixed waypoint template, and we outline extensions to varied objects and richer waypoint schemes.

* Accepted in ACRA 2025

Via

Access Paper or Ask Questions

TADT-CSA: Temporal Advantage Decision Transformer with Contrastive State Abstraction for Generative Recommendation

Jul 27, 2025

Xiang Gao, Tianyuan Liu, Yisha Li, Jingxin Liu, Lexi Gao, Xin Li, Haiyang Lu, Liyin Hong

Abstract:With the rapid advancement of Transformer-based Large Language Models (LLMs), generative recommendation has shown great potential in enhancing both the accuracy and semantic understanding of modern recommender systems. Compared to LLMs, the Decision Transformer (DT) is a lightweight generative model applied to sequential recommendation tasks. However, DT faces challenges in trajectory stitching, often producing suboptimal trajectories. Moreover, due to the high dimensionality of user states and the vast state space inherent in recommendation scenarios, DT can incur significant computational costs and struggle to learn effective state representations. To overcome these issues, we propose a novel Temporal Advantage Decision Transformer with Contrastive State Abstraction (TADT-CSA) model. Specifically, we combine the conventional Return-To-Go (RTG) signal with a novel temporal advantage (TA) signal that encourages the model to capture both long-term returns and their sequential trend. Furthermore, we integrate a contrastive state abstraction module into the DT framework to learn more effective and expressive state representations. Within this module, we introduce a TA-conditioned State Vector Quantization (TAC-SVQ) strategy, where the TA score guides the state codebooks to incorporate contextual token information. Additionally, a reward prediction network and a contrastive transition prediction (CTP) network are employed to ensure the state codebook preserves both the reward information of the current state and the transition information between adjacent states. Empirical results on both public datasets and an online recommendation system demonstrate the effectiveness of the TADT-CSA model and its superiority over baseline methods.

Via

Access Paper or Ask Questions

$\text{Alpha}^2$: Discovering Logical Formulaic Alphas using Deep Reinforcement Learning

Jun 26, 2024

Feng Xu, Yan Yin, Xinyu Zhang, Tianyuan Liu, Shengyi Jiang, Zongzhang Zhang

$Figure 1 for $\text{Alpha}^2$: Discovering Logical Formulaic Alphas using Deep Reinforcement Learning$

$Figure 2 for $\text{Alpha}^2$: Discovering Logical Formulaic Alphas using Deep Reinforcement Learning$

$Figure 3 for $\text{Alpha}^2$: Discovering Logical Formulaic Alphas using Deep Reinforcement Learning$

$Figure 4 for $\text{Alpha}^2$: Discovering Logical Formulaic Alphas using Deep Reinforcement Learning$

Abstract:Alphas are pivotal in providing signals for quantitative trading. The industry highly values the discovery of formulaic alphas for their interpretability and ease of analysis, compared with the expressive yet overfitting-prone black-box alphas. In this work, we focus on discovering formulaic alphas. Prior studies on automatically generating a collection of formulaic alphas were mostly based on genetic programming (GP), which is known to suffer from the problems of being sensitive to the initial population, converting to local optima, and slow computation speed. Recent efforts employing deep reinforcement learning (DRL) for alpha discovery have not fully addressed key practical considerations such as alpha correlations and validity, which are crucial for their effectiveness. In this work, we propose a novel framework for alpha discovery using DRL by formulating the alpha discovery process as program construction. Our agent, $\text{Alpha}^2$, assembles an alpha program optimized for an evaluation metric. A search algorithm guided by DRL navigates through the search space based on value estimates for potential alpha outcomes. The evaluation metric encourages both the performance and the diversity of alphas for a better final trading strategy. Our formulation of searching alphas also brings the advantage of pre-calculation dimensional analysis, ensuring the logical soundness of alphas, and pruning the vast search space to a large extent. Empirical experiments on real-world stock markets demonstrates $\text{Alpha}^2$'s capability to identify a diverse set of logical and effective alphas, which significantly improves the performance of the final trading strategy. The code of our method is available at https://github.com/x35f/alpha2.

Via

Access Paper or Ask Questions

Divide and Conquer in Video Anomaly Detection: A Comprehensive Review and New Approach

Oct 03, 2023

Jian Xiao, Tianyuan Liu, Genlin Ji

Figure 1 for Divide and Conquer in Video Anomaly Detection: A Comprehensive Review and New Approach

Figure 2 for Divide and Conquer in Video Anomaly Detection: A Comprehensive Review and New Approach

Figure 3 for Divide and Conquer in Video Anomaly Detection: A Comprehensive Review and New Approach

Figure 4 for Divide and Conquer in Video Anomaly Detection: A Comprehensive Review and New Approach

Abstract:Video anomaly detection is a complex task, and the principle of "divide and conquer" is often regarded as an effective approach to tackling intricate issues. It's noteworthy that recent methods in video anomaly detection have revealed the application of the divide and conquer philosophy (albeit with distinct perspectives from traditional usage), yielding impressive outcomes. This paper systematically reviews these literatures from six dimensions, aiming to enhance the use of the divide and conquer strategy in video anomaly detection. Furthermore, based on the insights gained from this review, a novel approach is presented, which integrates human skeletal frameworks with video data analysis techniques. This method achieves state-of-the-art performance on the ShanghaiTech dataset, surpassing all existing advanced methods.

Via

Access Paper or Ask Questions

Human Kinematics-inspired Skeleton-based Video Anomaly Detection

Sep 27, 2023

Jian Xiao, Tianyuan Liu, Genlin Ji

Figure 1 for Human Kinematics-inspired Skeleton-based Video Anomaly Detection

Figure 2 for Human Kinematics-inspired Skeleton-based Video Anomaly Detection

Figure 3 for Human Kinematics-inspired Skeleton-based Video Anomaly Detection

Figure 4 for Human Kinematics-inspired Skeleton-based Video Anomaly Detection

Abstract:Previous approaches to detecting human anomalies in videos have typically relied on implicit modeling by directly applying the model to video or skeleton data, potentially resulting in inaccurate modeling of motion information. In this paper, we conduct an exploratory study and introduce a new idea called HKVAD (Human Kinematic-inspired Video Anomaly Detection) for video anomaly detection, which involves the explicit use of human kinematic features to detect anomalies. To validate the effectiveness and potential of this perspective, we propose a pilot method that leverages the kinematic features of the skeleton pose, with a specific focus on the walking stride, skeleton displacement at feet level, and neck level. Following this, the method employs a normalizing flow model to estimate density and detect anomalies based on the estimated density. Based on the number of kinematic features used, we have devised three straightforward variant methods and conducted experiments on two highly challenging public datasets, ShanghaiTech and UBnormal. Our method achieves good results with minimal computational resources, validating its effectiveness and potential.

Via

Access Paper or Ask Questions

Rotating Objects via In-Hand Pivoting using Vision, Force and Touch

Mar 21, 2023

Shiyu Xu, Tianyuan Liu, Michael Wong, Dana Kulić, Akansel Cosgun

Figure 1 for Rotating Objects via In-Hand Pivoting using Vision, Force and Touch

Figure 2 for Rotating Objects via In-Hand Pivoting using Vision, Force and Touch

Figure 3 for Rotating Objects via In-Hand Pivoting using Vision, Force and Touch

Figure 4 for Rotating Objects via In-Hand Pivoting using Vision, Force and Touch

Abstract:We propose a robotic manipulation system that can pivot objects on a surface using vision, wrist force and tactile sensing. We aim to control the rotation of an object around the grip point of a parallel gripper by allowing rotational slip, while maintaining a desired wrist force profile. Our approach runs an end-effector position controller and a gripper width controller concurrently in a closed loop. The position controller maintains a desired force using vision and wrist force. The gripper controller uses tactile sensing to keep the grip firm enough to prevent translational slip, but loose enough to induce rotational slip. Our sensor-based control approach relies on matching a desired force profile derived from object dimensions and weight and vision-based monitoring of the object pose. The gripper controller uses tactile sensors to detect and prevent translational slip by tightening the grip when needed. Experimental results where the robot was tasked with rotating cuboid objects 90 degrees show that the multi-modal pivoting approach was able to rotate the objects without causing lift or slip, and was more energy-efficient compared to using a single sensor modality and to pick-and-place. While our work demonstrated the benefit of multi-modal sensing for the pivoting task, further work is needed to generalize our approach to any given object.

* 8 pages, 7 figures, 4 tables

Via

Access Paper or Ask Questions

How To Guide Your Learner: Imitation Learning with Active Adaptive Expert Involvement

Mar 03, 2023

Xu-Hui Liu, Feng Xu, Xinyu Zhang, Tianyuan Liu, Shengyi Jiang, Ruifeng Chen, Zongzhang Zhang, Yang Yu

Figure 1 for How To Guide Your Learner: Imitation Learning with Active Adaptive Expert Involvement

Figure 2 for How To Guide Your Learner: Imitation Learning with Active Adaptive Expert Involvement

Figure 3 for How To Guide Your Learner: Imitation Learning with Active Adaptive Expert Involvement

Figure 4 for How To Guide Your Learner: Imitation Learning with Active Adaptive Expert Involvement

Abstract:Imitation learning aims to mimic the behavior of experts without explicit reward signals. Passive imitation learning methods which use static expert datasets typically suffer from compounding error, low sample efficiency, and high hyper-parameter sensitivity. In contrast, active imitation learning methods solicit expert interventions to address the limitations. However, recent active imitation learning methods are designed based on human intuitions or empirical experience without theoretical guarantee. In this paper, we propose a novel active imitation learning framework based on a teacher-student interaction model, in which the teacher's goal is to identify the best teaching behavior and actively affect the student's learning process. By solving the optimization objective of this framework, we propose a practical implementation, naming it AdapMen. Theoretical analysis shows that AdapMen can improve the error bound and avoid compounding error under mild conditions. Experiments on the MetaDrive benchmark and Atari 2600 games validate our theoretical analysis and show that our method achieves near-expert performance with much less expert involvement and total sampling steps than previous methods. The code is available at https://github.com/liuxhym/AdapMen.

Via

Access Paper or Ask Questions