Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chi Zhang

Department of Computer Science and Engineering, University of Gothenburg, Sweden

Exploring Siamese Networks in Self-Supervised Fast MRI Reconstruction

Jan 18, 2025

Liyan Sun, Shaocong Yu, Chi Zhang, Xinghao Ding

Figure 1 for Exploring Siamese Networks in Self-Supervised Fast MRI Reconstruction

Figure 2 for Exploring Siamese Networks in Self-Supervised Fast MRI Reconstruction

Figure 3 for Exploring Siamese Networks in Self-Supervised Fast MRI Reconstruction

Figure 4 for Exploring Siamese Networks in Self-Supervised Fast MRI Reconstruction

Abstract:Reconstructing MR images using deep neural networks from undersampled k-space data without using fully sampled training references offers significant value in practice, which is a self-supervised regression problem calling for effective prior knowledge and supervision. The Siamese architectures are motivated by the definition "invariance" and shows promising results in unsupervised visual representative learning. Building homologous transformed images and avoiding trivial solutions are two major challenges in Siamese-based self-supervised model. In this work, we explore Siamese architecture for MRI reconstruction in a self-supervised training fashion called SiamRecon. We show the proposed approach mimics an expectation maximization algorithm. The alternative optimization provide effective supervision signal and avoid collapse. The proposed SiamRecon achieves the state-of-the-art reconstruction accuracy in the field of self-supervised learning on both single-coil brain MRI and multi-coil knee MRI.

Via

Access Paper or Ask Questions

DualOpt: A Dual Divide-and-Optimize Algorithm for the Large-scale Traveling Salesman Problem

Jan 15, 2025

Shipei Zhou, Yuandong Ding, Chi Zhang, Zhiguang Cao, Yan Jin

Figure 1 for DualOpt: A Dual Divide-and-Optimize Algorithm for the Large-scale Traveling Salesman Problem

Figure 2 for DualOpt: A Dual Divide-and-Optimize Algorithm for the Large-scale Traveling Salesman Problem

Figure 3 for DualOpt: A Dual Divide-and-Optimize Algorithm for the Large-scale Traveling Salesman Problem

Figure 4 for DualOpt: A Dual Divide-and-Optimize Algorithm for the Large-scale Traveling Salesman Problem

Abstract:This paper proposes a dual divide-and-optimize algorithm (DualOpt) for solving the large-scale traveling salesman problem (TSP). DualOpt combines two complementary strategies to improve both solution quality and computational efficiency. The first strategy is a grid-based divide-and-conquer procedure that partitions the TSP into smaller sub-problems, solving them in parallel and iteratively refining the solution by merging nodes and partial routes. The process continues until only one grid remains, yielding a high-quality initial solution. The second strategy involves a path-based divide-and-optimize procedure that further optimizes the solution by dividing it into sub-paths, optimizing each using a neural solver, and merging them back to progressively improve the overall solution. Extensive experiments conducted on two groups of TSP benchmark instances, including randomly generated instances with up to 100,000 nodes and real-world datasets from TSPLIB, demonstrate the effectiveness of DualOpt. The proposed DualOpt achieves highly competitive results compared to 10 state-of-the-art algorithms in the literature. In particular, DualOpt achieves an improvement gap up to 1.40% for the largest instance TSP100K with a remarkable 104x speed-up over the leading heuristic solver LKH3. Additionally, DualOpt demonstrates strong generalization on TSPLIB benchmarks, confirming its capability to tackle diverse real-world TSP applications.

* Accepted by AAAI-25, February 2025

Via

Access Paper or Ask Questions

Molecule-dynamic-based Aging Clock and Aging Roadmap Forecast with Sundial

Jan 04, 2025

Wei Wu, Zizhen Deng, Chi Zhang, Can Liao, Jinzhuo Wang

Figure 1 for Molecule-dynamic-based Aging Clock and Aging Roadmap Forecast with Sundial

Figure 2 for Molecule-dynamic-based Aging Clock and Aging Roadmap Forecast with Sundial

Abstract:Addressing the unavoidable bias inherent in supervised aging clocks, we introduce Sundial, a novel framework that models molecular dynamics through a diffusion field, capturing both the population-level aging process and the individual-level relative aging order. Sundial enables unbiasedestimation of biological age and the forecast of aging roadmap. Fasteraging individuals from Sundial exhibit a higher disease risk compared to those identified from supervised aging clocks. This framework opens new avenues for exploring key topics, including age- and sex-specific aging dynamics and faster yet healthy aging paths.

Via

Access Paper or Ask Questions

Detecting and Mitigating Adversarial Attacks on Deep Learning-Based MRI Reconstruction Without Any Retraining

Jan 03, 2025

Mahdi Saberi, Chi Zhang, Mehmet Akcakaya

Figure 1 for Detecting and Mitigating Adversarial Attacks on Deep Learning-Based MRI Reconstruction Without Any Retraining

Figure 2 for Detecting and Mitigating Adversarial Attacks on Deep Learning-Based MRI Reconstruction Without Any Retraining

Figure 3 for Detecting and Mitigating Adversarial Attacks on Deep Learning-Based MRI Reconstruction Without Any Retraining

Figure 4 for Detecting and Mitigating Adversarial Attacks on Deep Learning-Based MRI Reconstruction Without Any Retraining

Abstract:Deep learning (DL) methods, especially those based on physics-driven DL, have become the state-of-the-art for reconstructing sub-sampled magnetic resonance imaging (MRI) data. However, studies have shown that these methods are susceptible to small adversarial input perturbations, or attacks, resulting in major distortions in the output images. Various strategies have been proposed to reduce the effects of these attacks, but they require retraining and may lower reconstruction quality for non-perturbed/clean inputs. In this work, we propose a novel approach for detecting and mitigating adversarial attacks on MRI reconstruction models without any retraining. Our detection strategy is based on the idea of cyclic measurement consistency. The output of the model is mapped to another set of MRI measurements for a different sub-sampling pattern, and this synthesized data is reconstructed with the same model. Intuitively, without an attack, the second reconstruction is expected to be consistent with the first, while with an attack, disruptions are present. Subsequently, this idea is extended to devise a novel objective function, which is minimized within a small ball around the attack input for mitigation. Experimental results show that our method substantially reduces the impact of adversarial perturbations across different datasets, attack types/strengths and PD-DL networks, and qualitatively and quantitatively outperforms conventional mitigation methods that involve retraining.

Via

Access Paper or Ask Questions

VAST 1.0: A Unified Framework for Controllable and Consistent Video Generation

Dec 21, 2024

Chi Zhang, Yuanzhi Liang, Xi Qiu, Fangqiu Yi, Xuelong Li

Figure 1 for VAST 1.0: A Unified Framework for Controllable and Consistent Video Generation

Figure 2 for VAST 1.0: A Unified Framework for Controllable and Consistent Video Generation

Figure 3 for VAST 1.0: A Unified Framework for Controllable and Consistent Video Generation

Figure 4 for VAST 1.0: A Unified Framework for Controllable and Consistent Video Generation

Abstract:Generating high-quality videos from textual descriptions poses challenges in maintaining temporal coherence and control over subject motion. We propose VAST (Video As Storyboard from Text), a two-stage framework to address these challenges and enable high-quality video generation. In the first stage, StoryForge transforms textual descriptions into detailed storyboards, capturing human poses and object layouts to represent the structural essence of the scene. In the second stage, VisionForge generates videos from these storyboards, producing high-quality videos with smooth motion, temporal consistency, and spatial coherence. By decoupling text understanding from video generation, VAST enables precise control over subject dynamics and scene composition. Experiments on the VBench benchmark demonstrate that VAST outperforms existing methods in both visual quality and semantic expression, setting a new standard for dynamic and coherent video generation.

Via

Access Paper or Ask Questions

Adaptable and Precise: Enterprise-Scenario LLM Function-Calling Capability Training Pipeline

Dec 20, 2024

Guancheng Zeng, Wentao Ding, Beining Xu, Chi Zhang, Wenqiang Han, Gang Li, Jingjing Mo, Pengxu Qiu, Xinran Tao, Wang Tao(+1 more)

Figure 1 for Adaptable and Precise: Enterprise-Scenario LLM Function-Calling Capability Training Pipeline

Figure 2 for Adaptable and Precise: Enterprise-Scenario LLM Function-Calling Capability Training Pipeline

Figure 3 for Adaptable and Precise: Enterprise-Scenario LLM Function-Calling Capability Training Pipeline

Figure 4 for Adaptable and Precise: Enterprise-Scenario LLM Function-Calling Capability Training Pipeline

Abstract:Enterprises possess a vast array of API assets scattered across various functions, forming the backbone of existing business processes. By leveraging these APIs as functional tools, enterprises can design diverse, scenario-specific agent applications, driven by on-premise function-calling models as the core engine. However, generic models often fail to meet enterprise requirements in terms of computational efficiency, output accuracy, and stability, necessitating scenario-specific adaptation. In this paper, we propose a training pipeline for function-calling capabilities tailored to real-world business scenarios. This pipeline includes the synthesis and augmentation of scenario-specific function-calling data, model fine-tuning, and performance evaluation and analysis. Using this pipeline, we generated 1,260 fully AI-generated samples and 1,035 augmented manually-labeled samples in digital HR agent scenario. The Qwen2.5-Coder-7B-Instruct model was employed as the base model and fine-tuned using the LoRA method on four GPUs with 24GB VRAM. Our fine-tuned model demonstrated outstanding performance in evaluations and practical applications, surpassing GPT-4 and GPT-4o in accuracy on the test set. These results validate the reliability of the proposed pipeline for training scenario-specific function-calling models.

* 23 pages, 6 figures, 7 tables

Via

Access Paper or Ask Questions

Proposing and solving olympiad geometry with guided tree search

Dec 14, 2024

Chi Zhang, Jiajun Song, Siyu Li, Yitao Liang, Yuxi Ma, Wei Wang, Yixin Zhu, Song-Chun Zhu

Abstract:Mathematics olympiads are prestigious competitions, with problem proposing and solving highly honored. Building artificial intelligence that proposes and solves olympiads presents an unresolved challenge in automated theorem discovery and proving, especially in geometry for its combination of numerical and spatial elements. We introduce TongGeometry, a Euclidean geometry system supporting tree-search-based guided problem proposing and solving. The efficient geometry system establishes the most extensive repository of geometry theorems to date: within the same computational budget as the existing state-of-the-art, TongGeometry discovers 6.7 billion geometry theorems requiring auxiliary constructions, including 4.1 billion exhibiting geometric symmetry. Among them, 10 theorems were proposed to regional mathematical olympiads with 3 of TongGeometry's proposals selected in real competitions, earning spots in a national team qualifying exam or a top civil olympiad in China and the US. Guided by fine-tuned large language models, TongGeometry solved all International Mathematical Olympiad geometry in IMO-AG-30, outperforming gold medalists for the first time. It also surpasses the existing state-of-the-art across a broader spectrum of olympiad-level problems. The full capabilities of the system can be utilized on a consumer-grade machine, making the model more accessible and fostering widespread democratization of its use. By analogy, unlike existing systems that merely solve problems like students, TongGeometry acts like a geometry coach, discovering, presenting, and proving theorems.

Via

Access Paper or Ask Questions

StyleStudio: Text-Driven Style Transfer with Selective Control of Style Elements

Dec 11, 2024

Mingkun Lei, Xue Song, Beier Zhu, Hao Wang, Chi Zhang

Abstract:Text-driven style transfer aims to merge the style of a reference image with content described by a text prompt. Recent advancements in text-to-image models have improved the nuance of style transformations, yet significant challenges remain, particularly with overfitting to reference styles, limiting stylistic control, and misaligning with textual content. In this paper, we propose three complementary strategies to address these issues. First, we introduce a cross-modal Adaptive Instance Normalization (AdaIN) mechanism for better integration of style and text features, enhancing alignment. Second, we develop a Style-based Classifier-Free Guidance (SCFG) approach that enables selective control over stylistic elements, reducing irrelevant influences. Finally, we incorporate a teacher model during early generation stages to stabilize spatial layouts and mitigate artifacts. Our extensive evaluations demonstrate significant improvements in style transfer quality and alignment with textual prompts. Furthermore, our approach can be integrated into existing style transfer frameworks without fine-tuning.

Via

Access Paper or Ask Questions

UniScene: Unified Occupancy-centric Driving Scene Generation

Dec 06, 2024

Bohan Li, Jiazhe Guo, Hongsi Liu, Yingshuang Zou, Yikang Ding, Xiwu Chen, Hu Zhu, Feiyang Tan, Chi Zhang, Tiancai Wang(+7 more)

Abstract:Generating high-fidelity, controllable, and annotated training data is critical for autonomous driving. Existing methods typically generate a single data form directly from a coarse scene layout, which not only fails to output rich data forms required for diverse downstream tasks but also struggles to model the direct layout-to-data distribution. In this paper, we introduce UniScene, the first unified framework for generating three key data forms - semantic occupancy, video, and LiDAR - in driving scenes. UniScene employs a progressive generation process that decomposes the complex task of scene generation into two hierarchical steps: (a) first generating semantic occupancy from a customized scene layout as a meta scene representation rich in both semantic and geometric information, and then (b) conditioned on occupancy, generating video and LiDAR data, respectively, with two novel transfer strategies of Gaussian-based Joint Rendering and Prior-guided Sparse Modeling. This occupancy-centric approach reduces the generation burden, especially for intricate scenes, while providing detailed intermediate representations for the subsequent generation stages. Extensive experiments demonstrate that UniScene outperforms previous SOTAs in the occupancy, video, and LiDAR generation, which also indeed benefits downstream driving tasks.

Via

Access Paper or Ask Questions

Predicting Pedestrian Crossing Behavior in Germany and Japan: Insights into Model Transferability

Dec 04, 2024

Chi Zhang, Janis Sprenger, Zhongjun Ni, Christian Berger

Figure 1 for Predicting Pedestrian Crossing Behavior in Germany and Japan: Insights into Model Transferability

Figure 2 for Predicting Pedestrian Crossing Behavior in Germany and Japan: Insights into Model Transferability

Figure 3 for Predicting Pedestrian Crossing Behavior in Germany and Japan: Insights into Model Transferability

Figure 4 for Predicting Pedestrian Crossing Behavior in Germany and Japan: Insights into Model Transferability

Abstract:Predicting pedestrian crossing behavior is important for intelligent traffic systems to avoid pedestrian-vehicle collisions. Most existing pedestrian crossing behavior models are trained and evaluated on datasets collected from a single country, overlooking differences between countries. To address this gap, we compared pedestrian road-crossing behavior at unsignalized crossings in Germany and Japan. We presented four types of machine learning models to predict gap selection behavior, zebra crossing usage, and their trajectories using simulator data collected from both countries. When comparing the differences between countries, pedestrians from the study conducted in Japan are more cautious, selecting larger gaps compared to those in Germany. We evaluate and analyze model transferability. Our results show that neural networks outperform other machine learning models in predicting gap selection and zebra crossing usage, while random forest models perform best on trajectory prediction tasks, demonstrating strong performance and transferability. We develop a transferable model using an unsupervised clustering method, which improves prediction accuracy for gap selection and trajectory prediction. These findings provide a deeper understanding of pedestrian crossing behaviors in different countries and offer valuable insights into model transferability.

* 16 pages, 12 figures, 11 tables. Accepted in IEEE Transactions on Intelligent Vehicles

Via

Access Paper or Ask Questions