Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zihan Li

Adaptive Autoguidance for Item-Side Fairness in Diffusion Recommender Systems

Feb 16, 2026

Zihan Li, Gustavo Escobedo, Marta Moscati, Oleg Lesota, Markus Schedl

Abstract:Diffusion recommender systems achieve strong recommendation accuracy but often suffer from popularity bias, resulting in unequal item exposure. To address this shortcoming, we introduce A2G-DiffRec, a diffusion recommender that incorporates adaptive autoguidance, where the main model is guided by a less-trained version of itself. Instead of using a fixed guidance weight, A2G-DiffRec learns to adaptively weigh the outputs of the main and weak models during training, supervised by a popularity regularization that promotes balanced exposure across items with different popularity levels. Experimental results on the MovieLens-1M, Foursquare-Tokyo, and Music4All-Onion datasets show that A2G-DiffRec is effective in enhancing item-side fairness at a marginal cost of accuracy reduction compared to existing guided diffusion recommenders and other non-diffusion baselines.

Via

Access Paper or Ask Questions

Envy-Free Allocation of Indivisible Goods via Noisy Queries

Feb 06, 2026

Zihan Li, Yan Hao Ling, Jonathan Scarlett, Warut Suksompong

Abstract:We introduce a problem of fairly allocating indivisible goods (items) in which the agents' valuations cannot be observed directly, but instead can only be accessed via noisy queries. In the two-agent setting with Gaussian noise and bounded valuations, we derive upper and lower bounds on the required number of queries for finding an envy-free allocation in terms of the number of items, $m$, and the negative-envy of the optimal allocation, $Δ$. In particular, when $Δ$ is not too small (namely, $Δ\gg m^{1/4}$), we establish that the optimal number of queries scales as $\frac{\sqrt m }{(Δ/ m)^2} = \frac{m^{2.5}}{Δ^2}$ up to logarithmic factors. Our upper bound is based on non-adaptive queries and a simple thresholding-based allocation algorithm that runs in polynomial time, while our lower bound holds even under adaptive queries and arbitrary computation time.

Via

Access Paper or Ask Questions

Scale-aware Adaptive Supervised Network with Limited Medical Annotations

Jan 02, 2026

Zihan Li, Dandan Shan, Yunxiang Li, Paul E. Kinahan, Qingqi Hong

Abstract:Medical image segmentation faces critical challenges in semi-supervised learning scenarios due to severe annotation scarcity requiring expert radiological knowledge, significant inter-annotator variability across different viewpoints and expertise levels, and inadequate multi-scale feature integration for precise boundary delineation in complex anatomical structures. Existing semi-supervised methods demonstrate substantial performance degradation compared to fully supervised approaches, particularly in small target segmentation and boundary refinement tasks. To address these fundamental challenges, we propose SASNet (Scale-aware Adaptive Supervised Network), a dual-branch architecture that leverages both low-level and high-level feature representations through novel scale-aware adaptive reweight mechanisms. Our approach introduces three key methodological innovations, including the Scale-aware Adaptive Reweight strategy that dynamically weights pixel-wise predictions using temporal confidence accumulation, the View Variance Enhancement mechanism employing 3D Fourier domain transformations to simulate annotation variability, and segmentation-regression consistency learning through signed distance map algorithms for enhanced boundary precision. These innovations collectively address the core limitations of existing semi-supervised approaches by integrating spatial, temporal, and geometric consistency principles within a unified optimization framework. Comprehensive evaluation across LA, Pancreas-CT, and BraTS datasets demonstrates that SASNet achieves superior performance with limited labeled data, surpassing state-of-the-art semi-supervised methods while approaching fully supervised performance levels. The source code for SASNet is available at https://github.com/HUANGLIZI/SASNet.

* Accepted by Pattern Recognition, 8 figures, 11 tables

Via

Access Paper or Ask Questions

Cortex AISQL: A Production SQL Engine for Unstructured Data

Nov 19, 2025

Paweł Liskowski, Benjamin Han, Paritosh Aggarwal, Bowei Chen, Boxin Jiang, Nitish Jindal, Zihan Li, Aaron Lin, Kyle Schmaus, Jay Tayade(+4 more)

Abstract:Snowflake's Cortex AISQL is a production SQL engine that integrates native semantic operations directly into SQL. This integration allows users to write declarative queries that combine relational operations with semantic reasoning, enabling them to query both structured and unstructured data effortlessly. However, making semantic operations efficient at production scale poses fundamental challenges. Semantic operations are more expensive than traditional SQL operations, possess distinct latency and throughput characteristics, and their cost and selectivity are unknown during query compilation. Furthermore, existing query engines are not designed to optimize semantic operations. The AISQL query execution engine addresses these challenges through three novel techniques informed by production deployment data from Snowflake customers. First, AI-aware query optimization treats AI inference cost as a first-class optimization objective, reasoning about large language model (LLM) cost directly during query planning to achieve 2-8$\times$ speedups. Second, adaptive model cascades reduce inference costs by routing most rows through a fast proxy model while escalating uncertain cases to a powerful oracle model, achieving 2-6$\times$ speedups while maintaining 90-95% of oracle model quality. Third, semantic join query rewriting lowers the quadratic time complexity of join operations to linear through reformulation as multi-label classification tasks, achieving 15-70$\times$ speedups with often improved prediction quality. AISQL is deployed in production at Snowflake, where it powers diverse customer workloads across analytics, search, and content understanding.

Via

Access Paper or Ask Questions

SF-Recon: Simplification-Free Lightweight Building Reconstruction via 3D Gaussian Splatting

Nov 17, 2025

Zihan Li, Tengfei Wang, Wentian Gan, Hao Zhan, Xin Wang, Zongqian Zhan

Abstract:Lightweight building surface models are crucial for digital city, navigation, and fast geospatial analytics, yet conventional multi-view geometry pipelines remain cumbersome and quality-sensitive due to their reliance on dense reconstruction, meshing, and subsequent simplification. This work presents SF-Recon, a method that directly reconstructs lightweight building surfaces from multi-view images without post-hoc mesh simplification. We first train an initial 3D Gaussian Splatting (3DGS) field to obtain a view-consistent representation. Building structure is then distilled by a normal-gradient-guided Gaussian optimization that selects primitives aligned with roof and wall boundaries, followed by multi-view edge-consistency pruning to enhance structural sharpness and suppress non-structural artifacts without external supervision. Finally, a multi-view depth-constrained Delaunay triangulation converts the structured Gaussian field into a lightweight, structurally faithful building mesh. Based on a proposed SF dataset, the experimental results demonstrate that our SF-Recon can directly reconstruct lightweight building models from multi-view imagery, achieving substantially fewer faces and vertices while maintaining computational efficiency. Website:https://lzh282140127-cell.github.io/SF-Recon-project/

Via

Access Paper or Ask Questions

TCM-Eval: An Expert-Level Dynamic and Extensible Benchmark for Traditional Chinese Medicine

Nov 10, 2025

Zihao Cheng, Yuheng Lu, Huaiqian Ye, Zeming Liu, Minqi Wang, Jingjing Liu, Zihan Li, Wei Fan, Yuanfang Guo, Ruiji Fu(+3 more)

Abstract:Large Language Models (LLMs) have demonstrated remarkable capabilities in modern medicine, yet their application in Traditional Chinese Medicine (TCM) remains severely limited by the absence of standardized benchmarks and the scarcity of high-quality training data. To address these challenges, we introduce TCM-Eval, the first dynamic and extensible benchmark for TCM, meticulously curated from national medical licensing examinations and validated by TCM experts. Furthermore, we construct a large-scale training corpus and propose Self-Iterative Chain-of-Thought Enhancement (SI-CoTE) to autonomously enrich question-answer pairs with validated reasoning chains through rejection sampling, establishing a virtuous cycle of data and model co-evolution. Using this enriched training data, we develop ZhiMingTang (ZMT), a state-of-the-art LLM specifically designed for TCM, which significantly exceeds the passing threshold for human practitioners. To encourage future research and development, we release a public leaderboard, fostering community engagement and continuous improvement.

* Work in Progress

Via

Access Paper or Ask Questions

DETECT: Data-Driven Evaluation of Treatments Enabled by Classification Transformers

Nov 10, 2025

Yuanheng Mao, Lillian Yang, Stephen Yang, Ethan Shao, Zihan Li

Abstract:Chronic pain is a global health challenge affecting millions of individuals, making it essential for physicians to have reliable and objective methods to measure the functional impact of clinical treatments. Traditionally used methods, like the numeric rating scale, while personalized and easy to use, are subjective due to their self-reported nature. Thus, this paper proposes DETECT (Data-Driven Evaluation of Treatments Enabled by Classification Transformers), a data-driven framework that assesses treatment success by comparing patient activities of daily life before and after treatment. We use DETECT on public benchmark datasets and simulated patient data from smartphone sensors. Our results demonstrate that DETECT is objective yet lightweight, making it a significant and novel contribution to clinical decision-making. By using DETECT, independently or together with other self-reported metrics, physicians can improve their understanding of their treatment impacts, ultimately leading to more personalized and responsive patient care.

* 5 pages, 4 figures, 2 tables, accepted for presentation by IEEE ICDM 2025 UGHS Symposium and publication with proceedings forthcoming

Via

Access Paper or Ask Questions

RFI Detection and Identification at OVRO Using Pseudonymetry

Oct 31, 2025

Meles Weldegebriel, Zihan Li, Greg Hellbourg, Ning Zhang, Neal Patwari

Figure 1 for RFI Detection and Identification at OVRO Using Pseudonymetry

Figure 2 for RFI Detection and Identification at OVRO Using Pseudonymetry

Figure 3 for RFI Detection and Identification at OVRO Using Pseudonymetry

Figure 4 for RFI Detection and Identification at OVRO Using Pseudonymetry

Abstract:Protecting radio astronomy observatories from unintended interference is critical as wireless transmissions increases near protected bands. While database-driven coordination frameworks and radio quiet zones exist, they cannot rapidly identify or suppress specific interfering transmitters, especially at low signal-to-noise ratio (SNR) levels. This paper presents the first over-the-air field demonstration of Pseudonymetry at the Owens Valley Radio Observatory (OVRO), illustrating cooperative spectrum sharing between heterogeneous wireless systems. In our experiment, a narrow-band secondary transmitter embeds a pseudonym watermark into its signal, while the wide-band radio telescope passively extracts the watermark from spectrogram data. Results show that interference can be reliably detected and the interfering device uniquely identified even at low SNR where conventional demodulation is infeasible. These findings validate that passive scientific receivers can participate in a lightweight feedback loop to trigger shutdown of harmful transmissions, demonstrating the potential of Pseudonymetry as a complementary enforcement tool for protecting radio astronomy environments.

Via

Access Paper or Ask Questions

Sensing and Stopping Interfering Secondary Users: Validation of an Efficient Spectrum Sharing System

Jul 17, 2025

Meles Weldegebriel, Zihan Li, Dustin Maas, Greg Hellbourg, Ning Zhang, Neal Patwari

Abstract:We present the design and validation of Stoppable Secondary Use (StopSec), a privacy-preserving protocol with the capability to identify a secondary user (SU) causing interference to a primary user (PU) and to act quickly to stop the interference. All users are served by a database that provides a feedback mechanism from a PU to an interfering SU. We introduce a new lightweight and robust method to watermark an SU's OFDM packet. Through extensive over-the-air real-time experiments, we evaluate StopSec in terms of interference detection, identification, and stopping latency, as well as impact on SUs. We show that the watermarking method avoids negative impact to the secondary data link and is robust to real-world time-varying channels. Interfering SUs can be stopped in under 150 milliseconds, and when multiple users are simultaneously interfering, they can all be stopped. Even when the interference is 10 dB lower than the noise power, StopSec successfully stops interfering SUs within a few seconds of their appearance in the channel. StopSec can be an effective spectrum sharing protocol for cases when interference to a PU must be quickly and automatically stopped.

Via

Access Paper or Ask Questions

Kling-Foley: Multimodal Diffusion Transformer for High-Quality Video-to-Audio Generation

Jun 24, 2025

Jun Wang, Xijuan Zeng, Chunyu Qiang, Ruilong Chen, Shiyao Wang, Le Wang, Wangjing Zhou, Pengfei Cai, Jiahui Zhao, Nan Li(+12 more)

Figure 1 for Kling-Foley: Multimodal Diffusion Transformer for High-Quality Video-to-Audio Generation

Figure 2 for Kling-Foley: Multimodal Diffusion Transformer for High-Quality Video-to-Audio Generation

Figure 3 for Kling-Foley: Multimodal Diffusion Transformer for High-Quality Video-to-Audio Generation

Figure 4 for Kling-Foley: Multimodal Diffusion Transformer for High-Quality Video-to-Audio Generation

Abstract:We propose Kling-Foley, a large-scale multimodal Video-to-Audio generation model that synthesizes high-quality audio synchronized with video content. In Kling-Foley, we introduce multimodal diffusion transformers to model the interactions between video, audio, and text modalities, and combine it with a visual semantic representation module and an audio-visual synchronization module to enhance alignment capabilities. Specifically, these modules align video conditions with latent audio elements at the frame level, thereby improving semantic alignment and audio-visual synchronization. Together with text conditions, this integrated approach enables precise generation of video-matching sound effects. In addition, we propose a universal latent audio codec that can achieve high-quality modeling in various scenarios such as sound effects, speech, singing, and music. We employ a stereo rendering method that imbues synthesized audio with a spatial presence. At the same time, in order to make up for the incomplete types and annotations of the open-source benchmark, we also open-source an industrial-level benchmark Kling-Audio-Eval. Our experiments show that Kling-Foley trained with the flow matching objective achieves new audio-visual SOTA performance among public models in terms of distribution matching, semantic alignment, temporal alignment and audio quality.

Via

Access Paper or Ask Questions