Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chang Cai

Score-Based Turbo Message Passing for Plug-and-Play Compressive Image Recovery

Mar 28, 2025

Chang Cai, Xiaojun Yuan, Ying-Jun Angela Zhang

Figure 1 for Score-Based Turbo Message Passing for Plug-and-Play Compressive Image Recovery

Figure 2 for Score-Based Turbo Message Passing for Plug-and-Play Compressive Image Recovery

Figure 3 for Score-Based Turbo Message Passing for Plug-and-Play Compressive Image Recovery

Figure 4 for Score-Based Turbo Message Passing for Plug-and-Play Compressive Image Recovery

Abstract:Message passing algorithms have been tailored for compressive imaging applications by plugging in different types of off-the-shelf image denoisers. These off-the-shelf denoisers mostly rely on some generic or hand-crafted priors for denoising. Due to their insufficient accuracy in capturing the true image prior, these methods often fail to produce satisfactory results, especially in largely underdetermined scenarios. On the other hand, score-based generative modeling offers a promising way to accurately characterize the sophisticated image distribution. In this paper, by exploiting the close relation between score-based modeling and empirical Bayes-optimal denoising, we devise a message passing framework that integrates a score-based minimum mean squared error (MMSE) denoiser for compressive image recovery. This framework is firmly rooted in Bayesian formalism, in which state evolution (SE) equations accurately predict its asymptotic performance. Experiments on the FFHQ dataset demonstrate that our method strikes a significantly better performance-complexity tradeoff than conventional message passing, regularized linear regression, and score-based posterior sampling baselines. Remarkably, our method typically requires less than 20 neural function evaluations (NFEs) to converge.

Via

Access Paper or Ask Questions

RoboReflect: Robotic Reflective Reasoning for Grasping Ambiguous-Condition Objects

Jan 16, 2025

Zhen Luo, Yixuan Yang, Chang Cai, Yanfu Zhang, Feng Zheng

Figure 1 for RoboReflect: Robotic Reflective Reasoning for Grasping Ambiguous-Condition Objects

Figure 2 for RoboReflect: Robotic Reflective Reasoning for Grasping Ambiguous-Condition Objects

Figure 3 for RoboReflect: Robotic Reflective Reasoning for Grasping Ambiguous-Condition Objects

Figure 4 for RoboReflect: Robotic Reflective Reasoning for Grasping Ambiguous-Condition Objects

Abstract:As robotic technology rapidly develops, robots are being employed in an increasing number of fields. However, due to the complexity of deployment environments or the prevalence of ambiguous-condition objects, the practical application of robotics still faces many challenges, leading to frequent errors. Traditional methods and some LLM-based approaches, although improved, still require substantial human intervention and struggle with autonomous error correction in complex scenarios.In this work, we propose RoboReflect, a novel framework leveraging large vision-language models (LVLMs) to enable self-reflection and autonomous error correction in robotic grasping tasks. RoboReflect allows robots to automatically adjust their strategies based on unsuccessful attempts until successful execution is achieved.The corrected strategies are saved in a memory for future task reference.We evaluate RoboReflect through extensive testing on eight common objects prone to ambiguous conditions of three categories.Our results demonstrate that RoboReflect not only outperforms existing grasp pose estimation methods like AnyGrasp and high-level action planning techniques using GPT-4V but also significantly enhances the robot's ability to adapt and correct errors independently. These findings underscore the critical importance of autonomous selfreflection in robotic systems while effectively addressing the challenges posed by ambiguous environments.

Via

Access Paper or Ask Questions

"My Grade is Wrong!": A Contestable AI Framework for Interactive Feedback in Evaluating Student Essays

Sep 11, 2024

Shengxin Hong, Chang Cai, Sixuan Du, Haiyue Feng, Siyuan Liu, Xiuyi Fan

Figure 1 for "My Grade is Wrong!": A Contestable AI Framework for Interactive Feedback in Evaluating Student Essays

Figure 2 for "My Grade is Wrong!": A Contestable AI Framework for Interactive Feedback in Evaluating Student Essays

Figure 3 for "My Grade is Wrong!": A Contestable AI Framework for Interactive Feedback in Evaluating Student Essays

Figure 4 for "My Grade is Wrong!": A Contestable AI Framework for Interactive Feedback in Evaluating Student Essays

Abstract:Interactive feedback, where feedback flows in both directions between teacher and student, is more effective than traditional one-way feedback. However, it is often too time-consuming for widespread use in educational practice. While Large Language Models (LLMs) have potential for automating feedback, they struggle with reasoning and interaction in an interactive setting. This paper introduces CAELF, a Contestable AI Empowered LLM Framework for automating interactive feedback. CAELF allows students to query, challenge, and clarify their feedback by integrating a multi-agent system with computational argumentation. Essays are first assessed by multiple Teaching-Assistant Agents (TA Agents), and then a Teacher Agent aggregates the evaluations through formal reasoning to generate feedback and grades. Students can further engage with the feedback to refine their understanding. A case study on 500 critical thinking essays with user studies demonstrates that CAELF significantly improves interactive feedback, enhancing the reasoning and interaction capabilities of LLMs. This approach offers a promising solution to overcoming the time and resource barriers that have limited the adoption of interactive feedback in educational settings.

Via

Access Paper or Ask Questions

End-to-End Learning for Task-Oriented Semantic Communications Over MIMO Channels: An Information-Theoretic Framework

Aug 30, 2024

Chang Cai, Xiaojun Yuan, Ying-Jun Angela Zhang

Figure 1 for End-to-End Learning for Task-Oriented Semantic Communications Over MIMO Channels: An Information-Theoretic Framework

Figure 2 for End-to-End Learning for Task-Oriented Semantic Communications Over MIMO Channels: An Information-Theoretic Framework

Figure 3 for End-to-End Learning for Task-Oriented Semantic Communications Over MIMO Channels: An Information-Theoretic Framework

Figure 4 for End-to-End Learning for Task-Oriented Semantic Communications Over MIMO Channels: An Information-Theoretic Framework

Abstract:This paper addresses the problem of end-to-end (E2E) design of learning and communication in a task-oriented semantic communication system. In particular, we consider a multi-device cooperative edge inference system over a wireless multiple-input multiple-output (MIMO) multiple access channel, where multiple devices transmit extracted features to a server to perform a classification task. We formulate the E2E design of feature encoding, MIMO precoding, and classification as a conditional mutual information maximization problem. However, it is notoriously difficult to design and train an E2E network that can be adaptive to both the task dataset and different channel realizations. Regarding network training, we propose a decoupled pretraining framework that separately trains the feature encoder and the MIMO precoder, with a maximum a posteriori (MAP) classifier employed at the server to generate the inference result. The feature encoder is pretrained exclusively using the task dataset, while the MIMO precoder is pretrained solely based on the channel and noise distributions. Nevertheless, we manage to align the pretraining objectives of each individual component with the E2E learning objective, so as to approach the performance bound of E2E learning. By leveraging the decoupled pretraining results for initialization, the E2E learning can be conducted with minimal training overhead. Regarding network architecture design, we develop two deep unfolded precoding networks that effectively incorporate the domain knowledge of the solution to the decoupled precoding problem. Simulation results on both the CIFAR-10 and ModelNet10 datasets verify that the proposed method achieves significantly higher classification accuracy compared to various baselines.

* major revision in IEEE JSAC

Via

Access Paper or Ask Questions

All Robots in One: A New Standard and Unified Dataset for Versatile, General-Purpose Embodied Agents

Aug 20, 2024

Zhiqiang Wang, Hao Zheng, Yunshuang Nie, Wenjun Xu, Qingwei Wang, Hua Ye, Zhe Li, Kaidong Zhang, Xuewen Cheng, Wanxi Dong(+4 more)

Figure 1 for All Robots in One: A New Standard and Unified Dataset for Versatile, General-Purpose Embodied Agents

Figure 2 for All Robots in One: A New Standard and Unified Dataset for Versatile, General-Purpose Embodied Agents

Figure 3 for All Robots in One: A New Standard and Unified Dataset for Versatile, General-Purpose Embodied Agents

Figure 4 for All Robots in One: A New Standard and Unified Dataset for Versatile, General-Purpose Embodied Agents

Abstract:Embodied AI is transforming how AI systems interact with the physical world, yet existing datasets are inadequate for developing versatile, general-purpose agents. These limitations include a lack of standardized formats, insufficient data diversity, and inadequate data volume. To address these issues, we introduce ARIO (All Robots In One), a new data standard that enhances existing datasets by offering a unified data format, comprehensive sensory modalities, and a combination of real-world and simulated data. ARIO aims to improve the training of embodied AI agents, increasing their robustness and adaptability across various tasks and environments. Building upon the proposed new standard, we present a large-scale unified ARIO dataset, comprising approximately 3 million episodes collected from 258 series and 321,064 tasks. The ARIO standard and dataset represent a significant step towards bridging the gaps of existing data resources. By providing a cohesive framework for data collection and representation, ARIO paves the way for the development of more powerful and versatile embodied AI agents, capable of navigating and interacting with the physical world in increasingly complex and diverse ways. The project is available on https://imaei.github.io/project_pages/ario/

* Project website: https://imaei.github.io/project_pages/ario/

Via

Access Paper or Ask Questions

Multi-Device Task-Oriented Communication via Maximal Coding Rate Reduction

Sep 06, 2023

Chang Cai, Xiaojun Yuan, Ying-Jun Angela Zhang

Figure 1 for Multi-Device Task-Oriented Communication via Maximal Coding Rate Reduction

Figure 2 for Multi-Device Task-Oriented Communication via Maximal Coding Rate Reduction

Figure 3 for Multi-Device Task-Oriented Communication via Maximal Coding Rate Reduction

Figure 4 for Multi-Device Task-Oriented Communication via Maximal Coding Rate Reduction

Abstract:Task-oriented communication offers ample opportunities to alleviate the communication burden in next-generation wireless networks. Most existing work designed the physical-layer communication modules and learning-based codecs with distinct objectives: learning is targeted at accurate execution of specific tasks, while communication aims at optimizing conventional communication metrics, such as throughput maximization, delay minimization, or bit error rate minimization. The inconsistency between the design objectives may hinder the exploitation of the full benefits of task-oriented communications. In this paper, we consider a specific task-oriented communication system for multi-device edge inference over a multiple-input multiple-output (MIMO) multiple-access channel, where the learning (i.e., feature encoding and classification) and communication (i.e., precoding) modules are designed with the same goal of inference accuracy maximization. Instead of end-to-end learning which involves both the task dataset and wireless channel during training, we advocate a separate design of learning and communication to achieve the consistent goal. Specifically, we leverage the maximal coding rate reduction (MCR2) objective as a surrogate to represent the inference accuracy, which allows us to explicitly formulate the precoding optimization problem. We cast valuable insights into this formulation and develop a block coordinate descent (BCD) solution algorithm. Moreover, the MCR2 objective also serves the loss function of the feature encoding network, based on which we characterize the received features as a Gaussian mixture (GM) model, facilitating a maximum a posteriori (MAP) classifier to infer the result. Simulation results on both the synthetic and real-world datasets demonstrate the superior performance of the proposed method compared to various baselines.

* submitted to IEEE for possible publication

Via

Access Paper or Ask Questions

RIS Partitioning Based Scalable Beamforming Design for Large-Scale MIMO: Asymptotic Analysis and Optimization

Mar 16, 2022

Chang Cai, Xiaojun Yuan, Ying-Jun Angela Zhang

Figure 1 for RIS Partitioning Based Scalable Beamforming Design for Large-Scale MIMO: Asymptotic Analysis and Optimization

Figure 2 for RIS Partitioning Based Scalable Beamforming Design for Large-Scale MIMO: Asymptotic Analysis and Optimization

Figure 3 for RIS Partitioning Based Scalable Beamforming Design for Large-Scale MIMO: Asymptotic Analysis and Optimization

Figure 4 for RIS Partitioning Based Scalable Beamforming Design for Large-Scale MIMO: Asymptotic Analysis and Optimization

Abstract:In next-generation wireless networks, reconfigurable intelligent surface (RIS)-assisted multiple-input multiple-output (MIMO) systems are foreseeable to support a large number of antennas at the transceiver as well as a large number of reflecting elements at the RIS. To fully unleash the potential of RIS, the phase shifts of RIS elements should be carefully designed, resulting in a high-dimensional non-convex optimization problem that is hard to solve with affordable computational complexity. In this paper, we address this scalability issue by partitioning RIS into sub-surfaces, so as to optimize the phase shifts in sub-surface levels to reduce complexity. Specifically, each sub-surface employs a linear phase variation structure to anomalously reflect the incident signal to a desired direction, and the sizes of sub-surfaces can be adaptively adjusted according to channel conditions. We formulate the achievable rate maximization problem by jointly optimizing the transmit covariance matrix and the RIS phase shifts. Then, we characterize the asymptotic behavior of the system with an infinitely large number of transceiver antennas and RIS elements. The asymptotic analysis provides useful insights on the understanding of the fundamental performance-complexity tradeoff in RIS partitioning design. We show that the achievable rate maximization problem has a rather simple form in the asymptotic regime, and we develop an efficient algorithm to find the optimal solution via one-dimensional (1D) search. Moreover, we discuss the insights and impacts of the asymptotically optimal solution on finite-size system design. By applying the asymptotic result to a finite-size system with necessary modifications, we show by numerical results that the proposed design achieves a favorable tradeoff between system performance and computational complexity.

* 30 pages, 6 figures

Via

Access Paper or Ask Questions

Efficient Hierarchical Bayesian Inference for Spatio-temporal Regression Models in Neuroimaging

Nov 23, 2021

Ali Hashemi, Yijing Gao, Chang Cai, Sanjay Ghosh, Klaus-Robert Müller, Srikantan S. Nagarajan, Stefan Haufe

Figure 1 for Efficient Hierarchical Bayesian Inference for Spatio-temporal Regression Models in Neuroimaging

Figure 2 for Efficient Hierarchical Bayesian Inference for Spatio-temporal Regression Models in Neuroimaging

Figure 3 for Efficient Hierarchical Bayesian Inference for Spatio-temporal Regression Models in Neuroimaging

Figure 4 for Efficient Hierarchical Bayesian Inference for Spatio-temporal Regression Models in Neuroimaging

Abstract:Several problems in neuroimaging and beyond require inference on the parameters of multi-task sparse hierarchical regression models. Examples include M/EEG inverse problems, neural encoding models for task-based fMRI analyses, and climate science. In these domains, both the model parameters to be inferred and the measurement noise may exhibit a complex spatio-temporal structure. Existing work either neglects the temporal structure or leads to computationally demanding inference schemes. Overcoming these limitations, we devise a novel flexible hierarchical Bayesian framework within which the spatio-temporal dynamics of model parameters and noise are modeled to have Kronecker product covariance structure. Inference in our framework is based on majorization-minimization optimization and has guaranteed convergence properties. Our highly efficient algorithms exploit the intrinsic Riemannian geometry of temporal autocovariance matrices. For stationary dynamics described by Toeplitz matrices, the theory of circulant embeddings is employed. We prove convex bounding properties and derive update rules of the resulting algorithms. On both synthetic and real neural data from M/EEG, we demonstrate that our methods lead to improved performance.

* Accepted to the 35th Conference on Neural Information Processing Systems (NeurIPS 2021)

Via

Access Paper or Ask Questions

Graph Neural Networks for UnsupervisedDomain Adaptation of Histopathological ImageAnalytics

Aug 21, 2020

Dou Xu, Chang Cai, Chaowei Fang, Bin Kong, Jihua Zhu, Zhongyu Li

Figure 1 for Graph Neural Networks for UnsupervisedDomain Adaptation of Histopathological ImageAnalytics

Figure 2 for Graph Neural Networks for UnsupervisedDomain Adaptation of Histopathological ImageAnalytics

Figure 3 for Graph Neural Networks for UnsupervisedDomain Adaptation of Histopathological ImageAnalytics

Figure 4 for Graph Neural Networks for UnsupervisedDomain Adaptation of Histopathological ImageAnalytics

Abstract:Annotating histopathological images is a time-consuming andlabor-intensive process, which requires broad-certificated pathologistscarefully examining large-scale whole-slide images from cells to tissues.Recent frontiers of transfer learning techniques have been widely investi-gated for image understanding tasks with limited annotations. However,when applied for the analytics of histology images, few of them can effec-tively avoid the performance degradation caused by the domain discrep-ancy between the source training dataset and the target dataset, suchas different tissues, staining appearances, and imaging devices. To thisend, we present a novel method for the unsupervised domain adaptationin histopathological image analysis, based on a backbone for embeddinginput images into a feature space, and a graph neural layer for propa-gating the supervision signals of images with labels. The graph model isset up by connecting every image with its close neighbors in the embed-ded feature space. Then graph neural network is employed to synthesizenew feature representation from every image. During the training stage,target samples with confident inferences are dynamically allocated withpseudo labels. The cross-entropy loss function is used to constrain thepredictions of source samples with manually marked labels and targetsamples with pseudo labels. Furthermore, the maximum mean diversityis adopted to facilitate the extraction of domain-invariant feature repre-sentations, and contrastive learning is exploited to enhance the categorydiscrimination of learned features. In experiments of the unsupervised do-main adaptation for histopathological image classification, our methodachieves state-of-the-art performance on four public datasets

Via

Access Paper or Ask Questions

Deep Doubly Supervised Transfer Network for Diagnosis of Breast Cancer with Imbalanced Ultrasound Imaging Modalities

Jun 29, 2020

Han Xiangmin, Wang Jun, Zhou Weijun, Chang Cai, Ying Shihui, Shi Jun

Figure 1 for Deep Doubly Supervised Transfer Network for Diagnosis of Breast Cancer with Imbalanced Ultrasound Imaging Modalities

Figure 2 for Deep Doubly Supervised Transfer Network for Diagnosis of Breast Cancer with Imbalanced Ultrasound Imaging Modalities

Figure 3 for Deep Doubly Supervised Transfer Network for Diagnosis of Breast Cancer with Imbalanced Ultrasound Imaging Modalities

Figure 4 for Deep Doubly Supervised Transfer Network for Diagnosis of Breast Cancer with Imbalanced Ultrasound Imaging Modalities

Abstract:Elastography ultrasound (EUS) provides additional bio-mechanical in-formation about lesion for B-mode ultrasound (BUS) in the diagnosis of breast cancers. However, joint utilization of both BUS and EUS is not popular due to the lack of EUS devices in rural hospitals, which arouses a novel modality im-balance problem in computer-aided diagnosis (CAD) for breast cancers. Current transfer learning (TL) pay little attention to this special issue of clinical modality imbalance, that is, the source domain (EUS modality) has fewer labeled samples than those in the target domain (BUS modality). Moreover, these TL methods cannot fully use the label information to explore the intrinsic relation between two modalities and then guide the promoted knowledge transfer. To this end, we propose a novel doubly supervised TL network (DDSTN) that integrates the Learning Using Privileged Information (LUPI) paradigm and the Maximum Mean Discrepancy (MMD) criterion into a unified deep TL framework. The proposed algorithm can not only make full use of the shared labels to effectively guide knowledge transfer by LUPI paradigm, but also perform additional super-vised transfer between unpaired data. We further introduce the MMD criterion to enhance the knowledge transfer. The experimental results on the breast ultra-sound dataset indicate that the proposed DDSTN outperforms all the compared state-of-the-art algorithms for the BUS-based CAD.

* Accepted by MICCAI 2020

Via

Access Paper or Ask Questions