Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kaiwen Zhang

SCOPE-FL: A Strategy-proof Chain-based Optimal pareto efficient Federated Learning System

Jun 16, 2026

Seyed Salar Ghazi, Kaiwen Zhang, Mehdi feizi, Hans-Arno Jacobsen

Abstract:Hierarchical Federated Learning (HFL) enables scalable collaborative model training across distributed devices while preserving data privacy. However, existing HFL client selection mechanisms suffer from a fundamental strategic inefficiency. By prioritizing stability over Pareto efficiency (PE), they produce suboptimal resource allocations, and without strategy proofness (SP), participants are incentivized to misrepresent their true preferences, both failures degrading system overall welfare in the Pareto sense in practice. To address it, we propose SCOPE-FL (Strategy-proof Chain-based Optimal pareto efficient Federated Learning), a synchronous HFL framework that formulates client selection as a two-sided school choice problem solved through the Top Trading Cycle (TTC) algorithm that simultaneously guarantees PE and SP. For reward distribution, SCOPE-FL employs a scalable Shapley value approximation based on One-Round Reconstruction (OR), ensuring compensation proportional to each client's contribution. The entire mechanism executes via blockchain smart contracts, providing the tamper-proof environment required for the SP guarantees to hold in practice. A comprehensive evaluation on MNIST, Fashion-MNIST, and CIFAR-10 demonstrates that SCOPE-FL outperforms state-of-the-art approaches, including DA, IAS, and other methods across model accuracy, convergence rate, and reward efficiency, while achieving communication latency comparable to DA and blockchain overhead significantly lower than DA at scale.

Via

Access Paper or Ask Questions

A Synthesis Method of Safe Rust Code Based on Pushdown Colored Petri Nets

Apr 02, 2026

Kaiwen Zhang, Guanjun Liu

Abstract:Safe Rust guarantees memory safety through strict compile-time constraints: ownership can be transferred, borrowing can temporarily guarantee either shared read-only or exclusive write access, and ownership and borrowing are scoped by lifetime. Automatically synthesizing correct and safe Rust code is challenging, as the generated code must not only satisfy ownership, borrowing, and lifetime constraints, but also meet type and interface requirements at compile time. This work proposes a synthesis method based on our newly defined Pushdown Colored Petri Net (PCPN) that models these compilation constraints directly from public API signatures to synthesize valid call sequences. Token colors encode dynamic resource states together with a scope level indicating the lifetime region in which a borrow is valid. The pushdown stack tracks the entering or leaving of lifetime parameter via pushing and popping tokens. A transition is enabled only when type matching and interface obligations both hold and the required resource states are available. Based on the bisimulation theory, we prove that the enabling and firing rules of PCPN are consistent with the compile-time check of these three constraints. We develop an automatic synthesis tool based on PCPN and the experimental results show that the synthesized codes are all correct.

* 20 pages

Via

Access Paper or Ask Questions

UniM: A Unified Any-to-Any Interleaved Multimodal Benchmark

Mar 05, 2026

Yanlin Li, Minghui Guo, Kaiwen Zhang, Shize Zhang, Yiran Zhao, Haodong Li, Congyue Zhou, Weijie Zheng, Yushen Yan, Shengqiong Wu(+6 more)

Abstract:In real-world multimodal applications, systems usually need to comprehend arbitrarily combined and interleaved multimodal inputs from users, while also generating outputs in any interleaved multimedia form. This capability defines the goal of any-to-any interleaved multimodal learning under a unified paradigm of understanding and generation, posing new challenges and opportunities for advancing Multimodal Large Language Models (MLLMs). To foster and benchmark this capability, this paper introduces the UniM benchmark, the first Unified Any-to-Any Interleaved Multimodal dataset. UniM contains 31K high-quality instances across 30 domains and 7 representative modalities: text, image, audio, video, document, code, and 3D, each requiring multiple intertwined reasoning and generation capabilities. We further introduce the UniM Evaluation Suite, which assesses models along three dimensions: Semantic Correctness & Generation Quality, Response Structure Integrity, and Interleaved Coherence. In addition, we propose UniMA, an agentic baseline model equipped with traceable reasoning for structured interleaved generation. Comprehensive experiments demonstrate the difficulty of UniM and highlight key challenges and directions for advancing unified any-to-any multimodal intelligence. The project page is https://any2any-mllm.github.io/unim.

* 70 pages, 63 figures, 30 tables, CVPR

Via

Access Paper or Ask Questions

ICBAC: an Intelligent Contract-Based Access Control framework for supply chain management by integrating blockchain and federated learning

Feb 08, 2026

Sadegh Sohani, Salar Ghazi, Farnaz Kamranfar, Sahar Pilehvar Moakhar, Mohammad Allahbakhsh, Haleh Amintoosi, Kaiwen Zhang

Abstract:This paper addresses the critical challenge of access control in modern supply chains, which operate across multiple independent and competing organizations. Existing access control is static and centralized, unable to adapt to insider threats or evolving contexts. Blockchain improves decentralization but lacks behavioral intelligence, while centralized machine learning for anomaly detection requires aggregating sensitive data, violating privacy. The proposed solution is ICBAC, an intelligent contract-based access control framework. It integrates permissioned blockchain (Hyperledger Fabric) with federated learning (FL). Built on Fabric, ICBAC uses a multi-channel architecture and three smart contracts for asset management, baseline access control, and dynamic revocation. To counter insider misuse, each channel deploys an AI agent that monitors activity and dynamically restricts access for anomalies. Federated learning allows these agents to collaboratively improve detection models without sharing raw data. For heterogeneous, competitive environments, ICBAC introduces a game-theoretic client selection mechanism using hedonic coalition formation. This enables supply chains to form stable, strategy-proof FL coalitions via preference-based selection without disclosing sensitive criteria. Extensive experiments on a Fabric testbed with a real-world dataset show ICBAC achieves blockchain performance comparable to static frameworks and provides effective anomaly detection under IID and non-IID data with zero raw-data sharing. ICBAC thus offers a practical, scalable solution for dynamic, privacy-preserving access control in decentralized supply chains.

* 19 pages, 6 Figures, 3 Tables

Via

Access Paper or Ask Questions

StoryMem: Multi-shot Long Video Storytelling with Memory

Dec 22, 2025

Kaiwen Zhang, Liming Jiang, Angtian Wang, Jacob Zhiyuan Fang, Tiancheng Zhi, Qing Yan, Hao Kang, Xin Lu, Xingang Pan

Figure 1 for StoryMem: Multi-shot Long Video Storytelling with Memory

Figure 2 for StoryMem: Multi-shot Long Video Storytelling with Memory

Figure 3 for StoryMem: Multi-shot Long Video Storytelling with Memory

Figure 4 for StoryMem: Multi-shot Long Video Storytelling with Memory

Abstract:Visual storytelling requires generating multi-shot videos with cinematic quality and long-range consistency. Inspired by human memory, we propose StoryMem, a paradigm that reformulates long-form video storytelling as iterative shot synthesis conditioned on explicit visual memory, transforming pre-trained single-shot video diffusion models into multi-shot storytellers. This is achieved by a novel Memory-to-Video (M2V) design, which maintains a compact and dynamically updated memory bank of keyframes from historical generated shots. The stored memory is then injected into single-shot video diffusion models via latent concatenation and negative RoPE shifts with only LoRA fine-tuning. A semantic keyframe selection strategy, together with aesthetic preference filtering, further ensures informative and stable memory throughout generation. Moreover, the proposed framework naturally accommodates smooth shot transitions and customized story generation applications. To facilitate evaluation, we introduce ST-Bench, a diverse benchmark for multi-shot video storytelling. Extensive experiments demonstrate that StoryMem achieves superior cross-shot consistency over previous methods while preserving high aesthetic quality and prompt adherence, marking a significant step toward coherent minute-long video storytelling.

* Project page: https://kevin-thu.github.io/StoryMem

Via

Access Paper or Ask Questions

Native Intelligence Emerges from Large-Scale Clinical Practice: A Retinal Foundation Model with Deployment Efficiency

Dec 16, 2025

Jia Guo, Jiawei Du, Shengzhu Yang, Shuai Lu, Wenquan Cheng, Kaiwen Zhang, Yihua Sun, Chuhong Yang, Weihang Zhang, Fang Chen(+14 more)

Abstract:Current retinal foundation models remain constrained by curated research datasets that lack authentic clinical context, and require extensive task-specific optimization for each application, limiting their deployment efficiency in low-resource settings. Here, we show that these barriers can be overcome by building clinical native intelligence directly from real-world medical practice. Our key insight is that large-scale telemedicine programs, where expert centers provide remote consultations across distributed facilities, represent a natural reservoir for learning clinical image interpretation. We present ReVision, a retinal foundation model that learns from the natural alignment between 485,980 color fundus photographs and their corresponding diagnostic reports, accumulated through a decade-long telemedicine program spanning 162 medical institutions across China. Through extensive evaluation across 27 ophthalmic benchmarks, we demonstrate that ReVison enables deployment efficiency with minimal local resources. Without any task-specific training, ReVision achieves zero-shot disease detection with an average AUROC of 0.946 across 12 public benchmarks and 0.952 on 3 independent clinical cohorts. When minimal adaptation is feasible, ReVision matches extensively fine-tuned alternatives while requiring orders of magnitude fewer trainable parameters and labeled examples. The learned representations also transfer effectively to new clinical sites, imaging domains, imaging modalities, and systemic health prediction tasks. In a prospective reader study with 33 ophthalmologists, ReVision's zero-shot assistance improved diagnostic accuracy by 14.8% across all experience levels. These results demonstrate that clinical native intelligence can be directly extracted from clinical archives without any further annotation to build medical AI systems suited to various low-resource settings.

Via

Access Paper or Ask Questions

An Automatic Detection Method for Hematoma Features in Placental Abruption Ultrasound Images Based on Few-Shot Learning

Oct 24, 2025

Xiaoqing Liu, Jitai Han, Hua Yan, Peng Li, Sida Tang, Ying Li, Kaiwen Zhang, Min Yu

Figure 1 for An Automatic Detection Method for Hematoma Features in Placental Abruption Ultrasound Images Based on Few-Shot Learning

Figure 2 for An Automatic Detection Method for Hematoma Features in Placental Abruption Ultrasound Images Based on Few-Shot Learning

Figure 3 for An Automatic Detection Method for Hematoma Features in Placental Abruption Ultrasound Images Based on Few-Shot Learning

Figure 4 for An Automatic Detection Method for Hematoma Features in Placental Abruption Ultrasound Images Based on Few-Shot Learning

Abstract:Placental abruption is a severe complication during pregnancy, and its early accurate diagnosis is crucial for ensuring maternal and fetal safety. Traditional ultrasound diagnostic methods heavily rely on physician experience, leading to issues such as subjective bias and diagnostic inconsistencies. This paper proposes an improved model, EH-YOLOv11n (Enhanced Hemorrhage-YOLOv11n), based on small-sample learning, aiming to achieve automatic detection of hematoma features in placental ultrasound images. The model enhances performance through multidimensional optimization: it integrates wavelet convolution and coordinate convolution to strengthen frequency and spatial feature extraction; incorporates a cascaded group attention mechanism to suppress ultrasound artifacts and occlusion interference, thereby improving bounding box localization accuracy. Experimental results demonstrate a detection accuracy of 78%, representing a 2.5% improvement over YOLOv11n and a 13.7% increase over YOLOv8. The model exhibits significant superiority in precision-recall curves, confidence scores, and occlusion scenarios. Combining high accuracy with real-time processing, this model provides a reliable solution for computer-aided diagnosis of placental abruption, holding significant clinical application value.

Via

Access Paper or Ask Questions

A Survey: Learning Embodied Intelligence from Physical Simulators and World Models

Jul 01, 2025

Xiaoxiao Long, Qingrui Zhao, Kaiwen Zhang, Zihao Zhang, Dingrui Wang, Yumeng Liu, Zhengjie Shu, Yi Lu, Shouzheng Wang, Xinzhe Wei(+8 more)

Figure 1 for A Survey: Learning Embodied Intelligence from Physical Simulators and World Models

Figure 2 for A Survey: Learning Embodied Intelligence from Physical Simulators and World Models

Figure 3 for A Survey: Learning Embodied Intelligence from Physical Simulators and World Models

Figure 4 for A Survey: Learning Embodied Intelligence from Physical Simulators and World Models

Abstract:The pursuit of artificial general intelligence (AGI) has placed embodied intelligence at the forefront of robotics research. Embodied intelligence focuses on agents capable of perceiving, reasoning, and acting within the physical world. Achieving robust embodied intelligence requires not only advanced perception and control, but also the ability to ground abstract cognition in real-world interactions. Two foundational technologies, physical simulators and world models, have emerged as critical enablers in this quest. Physical simulators provide controlled, high-fidelity environments for training and evaluating robotic agents, allowing safe and efficient development of complex behaviors. In contrast, world models empower robots with internal representations of their surroundings, enabling predictive planning and adaptive decision-making beyond direct sensory input. This survey systematically reviews recent advances in learning embodied AI through the integration of physical simulators and world models. We analyze their complementary roles in enhancing autonomy, adaptability, and generalization in intelligent robots, and discuss the interplay between external simulation and internal modeling in bridging the gap between simulated training and real-world deployment. By synthesizing current progress and identifying open challenges, this survey aims to provide a comprehensive perspective on the path toward more capable and generalizable embodied AI systems. We also maintain an active repository that contains up-to-date literature and open-source projects at https://github.com/NJU3DV-LoongGroup/Embodied-World-Models-Survey.

* https://github.com/NJU3DV-LoongGroup/Embodied-World-Models-Survey

Via

Access Paper or Ask Questions

WiP: Towards a Secure SECP256K1 for Crypto Wallets: Hardware Architecture and Implementation

Nov 06, 2024

Joel Poncha Lemayian, Ghyslain Gagnon, Kaiwen Zhang, Pascal Giard

Figure 1 for WiP: Towards a Secure SECP256K1 for Crypto Wallets: Hardware Architecture and Implementation

Figure 2 for WiP: Towards a Secure SECP256K1 for Crypto Wallets: Hardware Architecture and Implementation

Figure 3 for WiP: Towards a Secure SECP256K1 for Crypto Wallets: Hardware Architecture and Implementation

Figure 4 for WiP: Towards a Secure SECP256K1 for Crypto Wallets: Hardware Architecture and Implementation

Abstract:The SECP256K1 elliptic curve algorithm is fundamental in cryptocurrency wallets for generating secure public keys from private keys, thereby ensuring the protection and ownership of blockchain-based digital assets. However, the literature highlights several successful side-channel attacks on hardware wallets that exploit SECP256K1 to extract private keys. This work proposes a novel hardware architecture for SECP256K1, optimized for side-channel attack resistance and efficient resource utilization. The architecture incorporates complete addition formulas, temporary registers, and parallel processing techniques, making elliptic curve point addition and doubling operations indistinguishable. Implementation results demonstrate an average reduction of 45% in LUT usage compared to similar works, emphasizing the design's resource efficiency.

* Presented at HASP 2024 @ MICRO 2024 https://haspworkshop.org/2024/program.html

Via

Access Paper or Ask Questions

MultiConfederated Learning: Inclusive Non-IID Data handling with Decentralized Federated Learning

Apr 20, 2024

Michael Duchesne, Kaiwen Zhang, Chamseddine Talhi

Figure 1 for MultiConfederated Learning: Inclusive Non-IID Data handling with Decentralized Federated Learning

Figure 2 for MultiConfederated Learning: Inclusive Non-IID Data handling with Decentralized Federated Learning

Figure 3 for MultiConfederated Learning: Inclusive Non-IID Data handling with Decentralized Federated Learning

Figure 4 for MultiConfederated Learning: Inclusive Non-IID Data handling with Decentralized Federated Learning

Abstract:Federated Learning (FL) has emerged as a prominent privacy-preserving technique for enabling use cases like confidential clinical machine learning. FL operates by aggregating models trained by remote devices which owns the data. Thus, FL enables the training of powerful global models using crowd-sourced data from a large number of learners, without compromising their privacy. However, the aggregating server is a single point of failure when generating the global model. Moreover, the performance of the model suffers when the data is not independent and identically distributed (non-IID data) on all remote devices. This leads to vastly different models being aggregated, which can reduce the performance by as much as 50% in certain scenarios. In this paper, we seek to address the aforementioned issues while retaining the benefits of FL. We propose MultiConfederated Learning: a decentralized FL framework which is designed to handle non-IID data. Unlike traditional FL, MultiConfederated Learning will maintain multiple models in parallel (instead of a single global model) to help with convergence when the data is non-IID. With the help of transfer learning, learners can converge to fewer models. In order to increase adaptability, learners are allowed to choose which updates to aggregate from their peers.

* Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing, SAC '24, 1587-1595, April 2024. ACM

Via

Access Paper or Ask Questions