Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Anqi Li

Purity Law for Generalizable Neural TSP Solvers

May 07, 2025

Wenzhao Liu, Haoran Li, Congying Han, Zicheng Zhang, Anqi Li, Tiande Guo

Abstract:Achieving generalization in neural approaches across different scales and distributions remains a significant challenge for the Traveling Salesman Problem~(TSP). A key obstacle is that neural networks often fail to learn robust principles for identifying universal patterns and deriving optimal solutions from diverse instances. In this paper, we first uncover Purity Law (PuLa), a fundamental structural principle for optimal TSP solutions, defining that edge prevalence grows exponentially with the sparsity of surrounding vertices. Statistically validated across diverse instances, PuLa reveals a consistent bias toward local sparsity in global optima. Building on this insight, we propose Purity Policy Optimization~(PUPO), a novel training paradigm that explicitly aligns characteristics of neural solutions with PuLa during the solution construction process to enhance generalization. Extensive experiments demonstrate that PUPO can be seamlessly integrated with popular neural solvers, significantly enhancing their generalization performance without incurring additional computational overhead during inference.

Via

Access Paper or Ask Questions

The Tenth NTIRE 2025 Image Denoising Challenge Report

Apr 16, 2025

Lei Sun, Hang Guo, Bin Ren, Luc Van Gool, Radu Timofte, Yawei Li, Xiangyu Kong, Hyunhee Park, Xiaoxuan Yu, Suejin Han(+84 more)

Figure 1 for The Tenth NTIRE 2025 Image Denoising Challenge Report

Figure 2 for The Tenth NTIRE 2025 Image Denoising Challenge Report

Figure 3 for The Tenth NTIRE 2025 Image Denoising Challenge Report

Figure 4 for The Tenth NTIRE 2025 Image Denoising Challenge Report

Abstract:This paper presents an overview of the NTIRE 2025 Image Denoising Challenge ({\sigma} = 50), highlighting the proposed methodologies and corresponding results. The primary objective is to develop a network architecture capable of achieving high-quality denoising performance, quantitatively evaluated using PSNR, without constraints on computational complexity or model size. The task assumes independent additive white Gaussian noise (AWGN) with a fixed noise level of 50. A total of 290 participants registered for the challenge, with 20 teams successfully submitting valid results, providing insights into the current state-of-the-art in image denoising.

Via

Access Paper or Ask Questions

HAMSTER: Hierarchical Action Models For Open-World Robot Manipulation

Feb 08, 2025

Yi Li, Yuquan Deng, Jesse Zhang, Joel Jang, Marius Memme, Raymond Yu, Caelan Reed Garrett, Fabio Ramos, Dieter Fox, Anqi Li(+2 more)

Figure 1 for HAMSTER: Hierarchical Action Models For Open-World Robot Manipulation

Figure 2 for HAMSTER: Hierarchical Action Models For Open-World Robot Manipulation

Figure 3 for HAMSTER: Hierarchical Action Models For Open-World Robot Manipulation

Figure 4 for HAMSTER: Hierarchical Action Models For Open-World Robot Manipulation

Abstract:Large foundation models have shown strong open-world generalization to complex problems in vision and language, but similar levels of generalization have yet to be achieved in robotics. One fundamental challenge is the lack of robotic data, which are typically obtained through expensive on-robot operation. A promising remedy is to leverage cheaper, off-domain data such as action-free videos, hand-drawn sketches or simulation data. In this work, we posit that hierarchical vision-language-action (VLA) models can be more effective in utilizing off-domain data than standard monolithic VLA models that directly finetune vision-language models (VLMs) to predict actions. In particular, we study a class of hierarchical VLA models, where the high-level VLM is finetuned to produce a coarse 2D path indicating the desired robot end-effector trajectory given an RGB image and a task description. The intermediate 2D path prediction is then served as guidance to the low-level, 3D-aware control policy capable of precise manipulation. Doing so alleviates the high-level VLM from fine-grained action prediction, while reducing the low-level policy's burden on complex task-level reasoning. We show that, with the hierarchical design, the high-level VLM can transfer across significant domain gaps between the off-domain finetuning data and real-robot testing scenarios, including differences on embodiments, dynamics, visual appearances and task semantics, etc. In the real-robot experiments, we observe an average of 20% improvement in success rate across seven different axes of generalization over OpenVLA, representing a 50% relative gain. Visual results are provided at: https://hamster-robot.github.io/

* to be published in ICLR 2025

Via

Access Paper or Ask Questions

Cosmos World Foundation Model Platform for Physical AI

Jan 07, 2025

NVIDIA, :, Niket Agarwal, Arslan Ali, Maciej Bala, Yogesh Balaji, Erik Barker, Tiffany Cai, Prithvijit Chattopadhyay, Yongxin Chen(+69 more)

Figure 1 for Cosmos World Foundation Model Platform for Physical AI

Figure 2 for Cosmos World Foundation Model Platform for Physical AI

Figure 3 for Cosmos World Foundation Model Platform for Physical AI

Figure 4 for Cosmos World Foundation Model Platform for Physical AI

Abstract:Physical AI needs to be trained digitally first. It needs a digital twin of itself, the policy model, and a digital twin of the world, the world model. In this paper, we present the Cosmos World Foundation Model Platform to help developers build customized world models for their Physical AI setups. We position a world foundation model as a general-purpose world model that can be fine-tuned into customized world models for downstream applications. Our platform covers a video curation pipeline, pre-trained world foundation models, examples of post-training of pre-trained world foundation models, and video tokenizers. To help Physical AI builders solve the most critical problems of our society, we make our platform open-source and our models open-weight with permissive licenses available via https://github.com/NVIDIA/Cosmos.

Via

Access Paper or Ask Questions

SAM-REF: Rethinking Image-Prompt Synergy for Refinement in Segment Anything

Aug 22, 2024

Chongkai Yu, Anqi Li, Xiaochao Qu, Luoqi Liu, Ting Liu

Figure 1 for SAM-REF: Rethinking Image-Prompt Synergy for Refinement in Segment Anything

Figure 2 for SAM-REF: Rethinking Image-Prompt Synergy for Refinement in Segment Anything

Figure 3 for SAM-REF: Rethinking Image-Prompt Synergy for Refinement in Segment Anything

Figure 4 for SAM-REF: Rethinking Image-Prompt Synergy for Refinement in Segment Anything

Abstract:The advent of the Segment Anything Model (SAM) marks a significant milestone for interactive segmentation using generalist models. As a late fusion model, SAM extracts image embeddings once and merges them with prompts in later interactions. This strategy limits the models ability to extract detailed information from the prompted target zone. Current specialist models utilize the early fusion strategy that encodes the combination of images and prompts to target the prompted objects, yet repetitive complex computations on the images result in high latency. The key to these issues is efficiently synergizing the images and prompts. We propose SAM-REF, a two-stage refinement framework that fully integrates images and prompts globally and locally while maintaining the accuracy of early fusion and the efficiency of late fusion. The first-stage GlobalDiff Refiner is a lightweight early fusion network that combines the whole image and prompts, focusing on capturing detailed information for the entire object. The second-stage PatchDiff Refiner locates the object detail window according to the mask and prompts, then refines the local details of the object. Experimentally, we demonstrated the high effectiveness and efficiency of our method in tackling complex cases with multiple interactions. Our SAM-REF model outperforms the current state-of-the-art method in most metrics on segmentation quality without compromising efficiency.

Via

Access Paper or Ask Questions

Predicting the Big Five Personality Traits in Chinese Counselling Dialogues Using Large Language Models

Jun 25, 2024

Yang Yan, Lizhi Ma, Anqi Li, Jingsong Ma, Zhenzhong Lan

Abstract:Accurate assessment of personality traits is crucial for effective psycho-counseling, yet traditional methods like self-report questionnaires are time-consuming and biased. This study exams whether Large Language Models (LLMs) can predict the Big Five personality traits directly from counseling dialogues and introduces an innovative framework to perform the task. Our framework applies role-play and questionnaire-based prompting to condition LLMs on counseling sessions, simulating client responses to the Big Five Inventory. We evaluated our framework on 853 real-world counseling sessions, finding a significant correlation between LLM-predicted and actual Big Five traits, proving the validity of framework. Moreover, ablation studies highlight the importance of role-play simulations and task simplification via questionnaires in enhancing prediction accuracy. Meanwhile, our fine-tuned Llama3-8B model, utilizing Direct Preference Optimization with Supervised Fine-Tuning, achieves a 130.95\% improvement, surpassing the state-of-the-art Qwen1.5-110B by 36.94\% in personality prediction validity. In conclusion, LLMs can predict personality based on counseling dialogues. Our code and model are publicly available at \url{https://github.com/kuri-leo/BigFive-LLM-Predictor}, providing a valuable tool for future research in computational psychometrics.

Via

Access Paper or Ask Questions

Once-for-All: Controllable Generative Image Compression with Dynamic Granularity Adaption

Jun 02, 2024

Anqi Li, Yuxi Liu, Huihui Bai, Feng Li, Runmin Cong, Meng Wang, Yao Zhao

Figure 1 for Once-for-All: Controllable Generative Image Compression with Dynamic Granularity Adaption

Figure 2 for Once-for-All: Controllable Generative Image Compression with Dynamic Granularity Adaption

Figure 3 for Once-for-All: Controllable Generative Image Compression with Dynamic Granularity Adaption

Figure 4 for Once-for-All: Controllable Generative Image Compression with Dynamic Granularity Adaption

Abstract:Although recent generative image compression methods have demonstrated impressive potential in optimizing the rate-distortion-perception trade-off, they still face the critical challenge of flexible rate adaption to diverse compression necessities and scenarios. To overcome this challenge, this paper proposes a Controllable Generative Image Compression framework, Control-GIC, the first capable of fine-grained bitrate adaption across a broad spectrum while ensuring high-fidelity and generality compression. We base Control-GIC on a VQGAN framework representing an image as a sequence of variable-length codes (i.e. VQ-indices), which can be losslessly compressed and exhibits a direct positive correlation with the bitrates. Therefore, drawing inspiration from the classical coding principle, we naturally correlate the information density of local image patches with their granular representations, to achieve dynamic adjustment of the code quantity following different granularity decisions. This implies we can flexibly determine a proper allocation of granularity for the patches to acquire desirable compression rates. We further develop a probabilistic conditional decoder that can trace back to historic encoded multi-granularity representations according to transmitted codes, and then reconstruct hierarchical granular features in the formalization of conditional probability, enabling more informative aggregation to improve reconstruction realism. Our experiments show that Control-GIC allows highly flexible and controllable bitrate adaption and even once compression on an entire dataset to fulfill constrained bitrate conditions. Experimental results demonstrate its superior performance over recent state-of-the-art methods.

Via

Access Paper or Ask Questions

SGRU: A High-Performance Structured Gated Recurrent Unit for Traffic Flow Prediction

Apr 18, 2024

Wenfeng Zhang, Xin Li, Anqi Li, Xiaoting Huang, Ti Wang, Honglei Gao

Figure 1 for SGRU: A High-Performance Structured Gated Recurrent Unit for Traffic Flow Prediction

Figure 2 for SGRU: A High-Performance Structured Gated Recurrent Unit for Traffic Flow Prediction

Figure 3 for SGRU: A High-Performance Structured Gated Recurrent Unit for Traffic Flow Prediction

Figure 4 for SGRU: A High-Performance Structured Gated Recurrent Unit for Traffic Flow Prediction

Abstract:Traffic flow prediction is an essential task in constructing smart cities and is a typical Multivariate Time Series (MTS) Problem. Recent research has abandoned Gated Recurrent Units (GRU) and utilized dilated convolutions or temporal slicing for feature extraction, and they have the following drawbacks: (1) Dilated convolutions fail to capture the features of adjacent time steps, resulting in the loss of crucial transitional data. (2) The connections within the same temporal slice are strong, while the connections between different temporal slices are too loose. In light of these limitations, we emphasize the importance of analyzing a complete time series repeatedly and the crucial role of GRU in MTS. Therefore, we propose SGRU: Structured Gated Recurrent Units, which involve structured GRU layers and non-linear units, along with multiple layers of time embedding to enhance the model's fitting performance. We evaluate our approach on four publicly available California traffic datasets: PeMS03, PeMS04, PeMS07, and PeMS08 for regression prediction. Experimental results demonstrate that our model outperforms baseline models with average improvements of 11.7%, 18.6%, 18.5%, and 12.0% respectively.

* 7 pages, 6 figures, conference

Via

Access Paper or Ask Questions

Adversarially-Robust Inference on Trees via Belief Propagation

Mar 31, 2024

Samuel B. Hopkins, Anqi Li

Figure 1 for Adversarially-Robust Inference on Trees via Belief Propagation

Abstract:We introduce and study the problem of posterior inference on tree-structured graphical models in the presence of a malicious adversary who can corrupt some observed nodes. In the well-studied broadcasting on trees model, corresponding to the ferromagnetic Ising model on a $d$-regular tree with zero external field, when a natural signal-to-noise ratio exceeds one (the celebrated Kesten-Stigum threshold), the posterior distribution of the root given the leaves is bounded away from $\mathrm{Ber}(1/2)$, and carries nontrivial information about the sign of the root. This posterior distribution can be computed exactly via dynamic programming, also known as belief propagation. We first confirm a folklore belief that a malicious adversary who can corrupt an inverse-polynomial fraction of the leaves of their choosing makes this inference impossible. Our main result is that accurate posterior inference about the root vertex given the leaves is possible when the adversary is constrained to make corruptions at a $\rho$-fraction of randomly-chosen leaf vertices, so long as the signal-to-noise ratio exceeds $O(\log d)$ and $\rho \leq c \varepsilon$ for some universal $c > 0$. Since inference becomes information-theoretically impossible when $\rho \gg \varepsilon$, this amounts to an information-theoretically optimal fraction of corruptions, up to a constant multiplicative factor. Furthermore, we show that the canonical belief propagation algorithm performs this inference.

Via

Access Paper or Ask Questions

Facilitating Pornographic Text Detection for Open-Domain Dialogue Systems via Knowledge Distillation of Large Language Models

Mar 20, 2024

Huachuan Qiu, Shuai Zhang, Hongliang He, Anqi Li, Zhenzhong Lan

Figure 1 for Facilitating Pornographic Text Detection for Open-Domain Dialogue Systems via Knowledge Distillation of Large Language Models

Figure 2 for Facilitating Pornographic Text Detection for Open-Domain Dialogue Systems via Knowledge Distillation of Large Language Models

Figure 3 for Facilitating Pornographic Text Detection for Open-Domain Dialogue Systems via Knowledge Distillation of Large Language Models

Figure 4 for Facilitating Pornographic Text Detection for Open-Domain Dialogue Systems via Knowledge Distillation of Large Language Models

Abstract:Pornographic content occurring in human-machine interaction dialogues can cause severe side effects for users in open-domain dialogue systems. However, research on detecting pornographic language within human-machine interaction dialogues is an important subject that is rarely studied. To advance in this direction, we introduce CensorChat, a dialogue monitoring dataset aimed at detecting whether the dialogue session contains pornographic content. To this end, we collect real-life human-machine interaction dialogues in the wild and break them down into single utterances and single-turn dialogues, with the last utterance spoken by the chatbot. We propose utilizing knowledge distillation of large language models to annotate the dataset. Specifically, first, the raw dataset is annotated by four open-source large language models, with the majority vote determining the label. Second, we use ChatGPT to update the empty label from the first step. Third, to ensure the quality of the validation and test sets, we utilize GPT-4 for label calibration. If the current label does not match the one generated by GPT-4, we employ a self-criticism strategy to verify its correctness. Finally, to facilitate the detection of pornographic text, we develop a series of text classifiers using a pseudo-labeled dataset. Detailed data analysis demonstrates that leveraging knowledge distillation techniques with large language models provides a practical and cost-efficient method for developing pornographic text detectors.

* Accepted to CSCWD 2024 (27th International Conference on Computer Supported Cooperative Work in Design). arXiv admin note: text overlap with arXiv:2309.09749

Via

Access Paper or Ask Questions