Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yuxi Wang

VoxWatermark: A Large-Scale Benchmark for Audio Watermark Detection under Perturbations

Jun 13, 2026

Farnaz Sedaghati, Yuxi Wang, Zicheng Weng, Wei Rao

Abstract:With the rapid deployment of speech generation systems in open environments, providing verifiable source attribution and copyright accountability for audio content has become critical. A gap in current research is the lack of a unified benchmark that systematically compares different watermark injection methods under realistic distribution shifts. To address this, we build VoxWatermark by applying 10 watermarking methods (4 neural and 6 traditional) with unified injection and annotation on multilingual, multi-source corpora, and introducing no-box, black-box, and white-box perturbations to simulate real recording and transmission conditions. Based on this benchmark, we propose AudioWMD as a robust baseline detector for large-scale, multi-method, cross-distribution settings. Results show that injection-method diversity and distribution shifts affect detection stability, while validating the effectiveness and scalability of AudioWMD. Dataset and code are publicly available.

* Accepted by Interspeech 2026

Via

Access Paper or Ask Questions

BotDirector: Robot Storytelling Across the Symmetrical Reality with Multi-modal Interactions

Jun 02, 2026

Zhe Sun, Meng Wang, Lei Wang, Yuxi Wang, Wanxin Li, Yujia Peng, Zhenliang Zhang

Abstract:Robot storytelling offers a unique blend of technological innovation and creative expression that engages children in unprecedented ways. However, the technical aspects are often too complicated for children. We propose an interactive system that facilitates robot storytelling with tangible and natural language interactions. Children arrange the playground with their own stuff and create narratives with an LLM agent. The created narratives are transformed into a motion sequence based on the map and characters, and the motions are executed by self-navigating swarm robots. This system enhances robot storytelling with flexible scenarios, enabling young children to create robot dramas with everyday objects.

* 2026 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW)

Via

Access Paper or Ask Questions

Does Compression Preserve Uncertainty? A Unified Benchmark for Quantized and Sparse LLMs via Conformal Prediction

Jun 01, 2026

Yujia Tong, Yuxi Wang, Yunyang Wan, Tian Zhang, Junhao Dong, Jingling Yuan

Abstract:Model compression techniques such as quantization and pruning are widely used to reduce the deployment cost of large language models (LLMs), with existing evaluations focusing almost exclusively on accuracy preservation. However, in safety-critical applications, a model's ability to reliably quantify its own uncertainty is equally important. We ask: does compression preserve this ability? To answer this question, we benchmark 12 LLMs under various compression configurations across five NLP tasks, using conformal prediction to provide a rigorous, distribution-free measure of uncertainty. Our experiments reveal that: (I) compression frequently decouples accuracy from uncertainty; (II) larger models absorb compression-induced uncertainty far more effectively than smaller ones; and (III) uncertainty inflation is often threshold-like rather than gradual. These results suggest that accuracy-only evaluation is insufficient for assessing the deployment readiness of compressed LLMs, and that uncertainty-aware benchmarking should be a standard component of model compression pipelines.

Via

Access Paper or Ask Questions

A gentle tutorial and a structured reformulation of Bock's algorithm for minimum directed spanning trees

Mar 29, 2026

Yuxi Wang, Jungyeul Park

Abstract:This paper presents a gentle tutorial and a structured reformulation of Bock's 1971 Algol procedure for constructing minimum directed spanning trees. Our aim is to make the original algorithm readable and reproducible for modern readers, while highlighting its relevance as an exact decoder for nonprojective graph based dependency parsing. We restate the minimum arborescence objective in Bock's notation and provide a complete line by line execution trace of the original ten node example, extending the partial trace given in the source paper from initialization to termination. We then introduce a structured reformulation that makes explicit the procedure's phase structure, maintained state, and control flow, while preserving the logic of the original method. As a further illustration, we include a worked example adapted from {jurafsky-martin-2026-book} for dependency parsing, showing how a maximum weight arborescence problem is reduced to Bock's minimum cost formulation by a standard affine transformation and traced under the same state variables.

Via

Access Paper or Ask Questions

Hand2World: Autoregressive Egocentric Interaction Generation via Free-Space Hand Gestures

Feb 13, 2026

Yuxi Wang, Wenqi Ouyang, Tianyi Wei, Yi Dong, Zhiqi Shen, Xingang Pan

Abstract:Egocentric interactive world models are essential for augmented reality and embodied AI, where visual generation must respond to user input with low latency, geometric consistency, and long-term stability. We study egocentric interaction generation from a single scene image under free-space hand gestures, aiming to synthesize photorealistic videos in which hands enter the scene, interact with objects, and induce plausible world dynamics under head motion. This setting introduces fundamental challenges, including distribution shift between free-space gestures and contact-heavy training data, ambiguity between hand motion and camera motion in monocular views, and the need for arbitrary-length video generation. We present Hand2World, a unified autoregressive framework that addresses these challenges through occlusion-invariant hand conditioning based on projected 3D hand meshes, allowing visibility and occlusion to be inferred from scene context rather than encoded in the control signal. To stabilize egocentric viewpoint changes, we inject explicit camera geometry via per-pixel Plücker-ray embeddings, disentangling camera motion from hand motion and preventing background drift. We further develop a fully automated monocular annotation pipeline and distill a bidirectional diffusion model into a causal generator, enabling arbitrary-length synthesis. Experiments on three egocentric interaction benchmarks show substantial improvements in perceptual quality and 3D consistency while supporting camera control and long-horizon interactive generation.

Via

Access Paper or Ask Questions

TongSIM: A General Platform for Simulating Intelligent Machines

Dec 23, 2025

Zhe Sun, Kunlun Wu, Chuanjian Fu, Zeming Song, Langyong Shi, Zihe Xue, Bohan Jing, Ying Yang, Xiaomeng Gao, Aijia Li(+17 more)

Abstract:As artificial intelligence (AI) rapidly advances, especially in multimodal large language models (MLLMs), research focus is shifting from single-modality text processing to the more complex domains of multimodal and embodied AI. Embodied intelligence focuses on training agents within realistic simulated environments, leveraging physical interaction and action feedback rather than conventionally labeled datasets. Yet, most existing simulation platforms remain narrowly designed, each tailored to specific tasks. A versatile, general-purpose training environment that can support everything from low-level embodied navigation to high-level composite activities, such as multi-agent social simulation and human-AI collaboration, remains largely unavailable. To bridge this gap, we introduce TongSIM, a high-fidelity, general-purpose platform for training and evaluating embodied agents. TongSIM offers practical advantages by providing over 100 diverse, multi-room indoor scenarios as well as an open-ended, interaction-rich outdoor town simulation, ensuring broad applicability across research needs. Its comprehensive evaluation framework and benchmarks enable precise assessment of agent capabilities, such as perception, cognition, decision-making, human-robot cooperation, and spatial and social reasoning. With features like customized scenes, task-adaptive fidelity, diverse agent types, and dynamic environmental simulation, TongSIM delivers flexibility and scalability for researchers, serving as a unified platform that accelerates training, evaluation, and advancement toward general embodied intelligence.

Via

Access Paper or Ask Questions

Knowledge-Augmented Large Language Model Agents for Explainable Financial Decision-Making

Dec 10, 2025

Qingyuan Zhang, Yuxi Wang, Cancan Hua, Yulin Huang, Ning Lyu

Abstract:This study investigates an explainable reasoning method for financial decision-making based on knowledge-enhanced large language model agents. To address the limitations of traditional financial decision methods that rely on parameterized knowledge, lack factual consistency, and miss reasoning chains, an integrated framework is proposed that combines external knowledge retrieval, semantic representation, and reasoning generation. The method first encodes financial texts and structured data to obtain semantic representations, and then retrieves task-related information from external knowledge bases using similarity computation. Internal representations and external knowledge are combined through weighted fusion, which ensures fluency while improving factual accuracy and completeness of generated content. In the reasoning stage, a multi-head attention mechanism is introduced to construct logical chains, allowing the model to present transparent causal relationships and traceability during generation. Finally, the model jointly optimizes task objectives and explanation consistency objectives, which enhances predictive performance and reasoning interpretability. Experiments on financial text processing and decision tasks show that the method outperforms baseline approaches in accuracy, text generation quality, and factual support, verifying the effectiveness of knowledge enhancement and explainable reasoning. Overall, the proposed approach overcomes the limitations of traditional models in semantic coverage and reasoning transparency, and demonstrates strong practical value in complex financial scenarios.

Via

Access Paper or Ask Questions

Advancing Text Classification with Large Language Models and Neural Attention Mechanisms

Dec 10, 2025

Ning Lyu, Yuxi Wang, Feng Chen, Qingyuan Zhang

Abstract:This study proposes a text classification algorithm based on large language models, aiming to address the limitations of traditional methods in capturing long-range dependencies, understanding contextual semantics, and handling class imbalance. The framework includes text encoding, contextual representation modeling, attention-based enhancement, feature aggregation, and classification prediction. In the representation stage, deep semantic embeddings are obtained through large-scale pretrained language models, and attention mechanisms are applied to enhance the selective representation of key features. In the aggregation stage, global and weighted strategies are combined to generate robust text-level vectors. In the classification stage, a fully connected layer and Softmax output are used to predict class distributions, and cross-entropy loss is employed to optimize model parameters. Comparative experiments introduce multiple baseline models, including recurrent neural networks, graph neural networks, and Transformers, and evaluate them on Precision, Recall, F1-Score, and AUC. Results show that the proposed method outperforms existing models on all metrics, with especially strong improvements in Recall and AUC. In addition, sensitivity experiments are conducted on hyperparameters and data conditions, covering the impact of hidden dimensions on AUC and the impact of class imbalance ratios on Recall. The findings demonstrate that proper model configuration has a significant effect on performance and reveal the adaptability and stability of the model under different conditions. Overall, the proposed text classification method not only achieves effective performance improvement but also verifies its robustness and applicability in complex data environments through systematic analysis.

Via

Access Paper or Ask Questions

Topology-Aware Graph Reinforcement Learning for Dynamic Routing in Cloud Networks

Sep 05, 2025

Yuxi Wang, Heyao Liu, Guanzi Yao, Nyutian Long, Yue Kang

Abstract:This paper proposes a topology-aware graph reinforcement learning approach to address the routing policy optimization problem in cloud server environments. The method builds a unified framework for state representation and structural evolution by integrating a Structure-Aware State Encoding (SASE) module and a Policy-Adaptive Graph Update (PAGU) mechanism. It aims to tackle the challenges of decision instability and insufficient structural awareness under dynamic topologies. The SASE module models node states through multi-layer graph convolution and structural positional embeddings, capturing high-order dependencies in the communication topology and enhancing the expressiveness of state representations. The PAGU module adjusts the graph structure based on policy behavior shifts and reward feedback, enabling adaptive structural updates in dynamic environments. Experiments are conducted on the real-world GEANT topology dataset, where the model is systematically evaluated against several representative baselines in terms of throughput, latency control, and link balance. Additional experiments, including hyperparameter sensitivity, graph sparsity perturbation, and node feature dimensionality variation, further explore the impact of structure modeling and graph updates on model stability and decision quality. Results show that the proposed method outperforms existing graph reinforcement learning models across multiple performance metrics, achieving efficient and robust routing in dynamic and complex cloud networks.

Via

Access Paper or Ask Questions

HLG: Comprehensive 3D Room Construction via Hierarchical Layout Generation

Aug 25, 2025

Xiping Wang, Yuxi Wang, Mengqi Zhou, Junsong Fan, Zhaoxiang Zhang

Abstract:Realistic 3D indoor scene generation is crucial for virtual reality, interior design, embodied intelligence, and scene understanding. While existing methods have made progress in coarse-scale furniture arrangement, they struggle to capture fine-grained object placements, limiting the realism and utility of generated environments. This gap hinders immersive virtual experiences and detailed scene comprehension for embodied AI applications. To address these issues, we propose Hierarchical Layout Generation (HLG), a novel method for fine-grained 3D scene generation. HLG is the first to adopt a coarse-to-fine hierarchical approach, refining scene layouts from large-scale furniture placement to intricate object arrangements. Specifically, our fine-grained layout alignment module constructs a hierarchical layout through vertical and horizontal decoupling, effectively decomposing complex 3D indoor scenes into multiple levels of granularity. Additionally, our trainable layout optimization network addresses placement issues, such as incorrect positioning, orientation errors, and object intersections, ensuring structurally coherent and physically plausible scene generation. We demonstrate the effectiveness of our approach through extensive experiments, showing superior performance in generating realistic indoor scenes compared to existing methods. This work advances the field of scene generation and opens new possibilities for applications requiring detailed 3D environments. We will release our code upon publication to encourage future research.

Via

Access Paper or Ask Questions