Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jie Zhao

TrajAgent: An Agent Framework for Unified Trajectory Modelling

Oct 27, 2024

Yuwei Du, Jie Feng, Jie Zhao, Yong Li

Figure 1 for TrajAgent: An Agent Framework for Unified Trajectory Modelling

Figure 2 for TrajAgent: An Agent Framework for Unified Trajectory Modelling

Figure 3 for TrajAgent: An Agent Framework for Unified Trajectory Modelling

Figure 4 for TrajAgent: An Agent Framework for Unified Trajectory Modelling

Abstract:Trajectory modeling, which includes research on trajectory data pattern mining and future prediction, has widespread applications in areas such as life services, urban transportation, and public administration. Numerous methods have been proposed to address specific problems within trajectory modelling. However, due to the heterogeneity of data and the diversity of trajectory tasks, achieving unified trajectory modelling remains an important yet challenging task. In this paper, we propose TrajAgent, a large language model-based agentic framework, to unify various trajectory modelling tasks. In TrajAgent, we first develop UniEnv, an execution environment with a unified data and model interface, to support the execution and training of various models. Building on UniEnv, we introduce TAgent, an agentic workflow designed for automatic trajectory modelling across various trajectory tasks. Specifically, we design AutOpt, a systematic optimization module within TAgent, to further improve the performance of the integrated model. With diverse trajectory tasks input in natural language, TrajAgent automatically generates competitive results via training and executing appropriate models. Extensive experiments on four tasks using four real-world datasets demonstrate the effectiveness of TrajAgent in unified trajectory modelling, achieving an average performance improvement of 15.43% over baseline methods.

* 12 pages; the code will be openly accessible at: https://github.com/tsinghua-fib-lab/TrajAgent

Via

Access Paper or Ask Questions

AgentMove: Predicting Human Mobility Anywhere Using Large Language Model based Agentic Framework

Aug 26, 2024

Jie Feng, Yuwei Du, Jie Zhao, Yong Li

Figure 1 for AgentMove: Predicting Human Mobility Anywhere Using Large Language Model based Agentic Framework

Figure 2 for AgentMove: Predicting Human Mobility Anywhere Using Large Language Model based Agentic Framework

Figure 3 for AgentMove: Predicting Human Mobility Anywhere Using Large Language Model based Agentic Framework

Figure 4 for AgentMove: Predicting Human Mobility Anywhere Using Large Language Model based Agentic Framework

Abstract:Human mobility prediction plays a crucial role in various real-world applications. Although deep learning based models have shown promising results over the past decade, their reliance on extensive private mobility data for training and their inability to perform zero-shot predictions, have hindered further advancements. Recently, attempts have been made to apply large language models (LLMs) to mobility prediction task. However, their performance has been constrained by the absence of a systematic design of workflow. They directly generate the final output using LLMs, which limits the potential of LLMs to uncover complex mobility patterns and underestimates their extensive reserve of global geospatial knowledge. In this paper, we introduce AgentMove, a systematic agentic prediction framework to achieve generalized mobility prediction for any cities worldwide. In AgentMove, we first decompose the mobility prediction task into three sub-tasks and then design corresponding modules to complete these subtasks, including spatial-temporal memory for individual mobility pattern mining, world knowledge generator for modeling the effects of urban structure and collective knowledge extractor for capturing the shared patterns among population. Finally, we combine the results of three modules and conduct a reasoning step to generate the final predictions. Extensive experiments on mobility data from two sources in 12 cities demonstrate that AgentMove outperforms the best baseline more than 8% in various metrics and it shows robust predictions with various LLMs as base and also less geographical bias across cities. Codes and data can be found in https://github.com/tsinghua-fib-lab/AgentMove.

* 13 pages

Via

Access Paper or Ask Questions

PropSAM: A Propagation-Based Model for Segmenting Any 3D Objects in Multi-Modal Medical Images

Aug 25, 2024

Zifan Chen, Xinyu Nan, Jiazheng Li, Jie Zhao, Haifeng Li, Zilin Lin, Haoshen Li, Heyun Chen, Yiting Liu, Bin Dong(+2 more)

Figure 1 for PropSAM: A Propagation-Based Model for Segmenting Any 3D Objects in Multi-Modal Medical Images

Figure 2 for PropSAM: A Propagation-Based Model for Segmenting Any 3D Objects in Multi-Modal Medical Images

Figure 3 for PropSAM: A Propagation-Based Model for Segmenting Any 3D Objects in Multi-Modal Medical Images

Figure 4 for PropSAM: A Propagation-Based Model for Segmenting Any 3D Objects in Multi-Modal Medical Images

Abstract:Volumetric segmentation is crucial for medical imaging but is often constrained by labor-intensive manual annotations and the need for scenario-specific model training. Furthermore, existing general segmentation models are inefficient due to their design and inferential approaches. Addressing this clinical demand, we introduce PropSAM, a propagation-based segmentation model that optimizes the use of 3D medical structure information. PropSAM integrates a CNN-based UNet for intra-slice processing with a Transformer-based module for inter-slice propagation, focusing on structural and semantic continuities to enhance segmentation across various modalities. Distinctively, PropSAM operates on a one-view prompt, such as a 2D bounding box or sketch mask, unlike conventional models that require two-view prompts. It has demonstrated superior performance, significantly improving the Dice Similarity Coefficient (DSC) across 44 medical datasets and various imaging modalities, outperforming models like MedSAM and SegVol with an average DSC improvement of 18.1%. PropSAM also maintains stable predictions despite prompt deviations and varying propagation configurations, confirmed by one-way ANOVA tests with P>0.5985 and P>0.6131, respectively. Moreover, PropSAM's efficient architecture enables faster inference speeds (Wilcoxon rank-sum test, P<0.001) and reduces user interaction time by 37.8% compared to two-view prompt models. Its ability to handle irregular and complex objects with robust performance further demonstrates its potential in clinical settings, facilitating more automated and reliable medical imaging analyses with minimal retraining.

* 26 figures, 6 figures

Via

Access Paper or Ask Questions

3D-UGCN: A Unified Graph Convolutional Network for Robust 3D Human Pose Estimation from Monocular RGB Images

Jul 23, 2024

Jie Zhao, Jianing Li, Weihan Chen, Wentong Wang, Pengfei Yuan, Xu Zhang, Deshu Peng

Figure 1 for 3D-UGCN: A Unified Graph Convolutional Network for Robust 3D Human Pose Estimation from Monocular RGB Images

Figure 2 for 3D-UGCN: A Unified Graph Convolutional Network for Robust 3D Human Pose Estimation from Monocular RGB Images

Figure 3 for 3D-UGCN: A Unified Graph Convolutional Network for Robust 3D Human Pose Estimation from Monocular RGB Images

Figure 4 for 3D-UGCN: A Unified Graph Convolutional Network for Robust 3D Human Pose Estimation from Monocular RGB Images

Abstract:Human pose estimation remains a multifaceted challenge in computer vision, pivotal across diverse domains such as behavior recognition, human-computer interaction, and pedestrian tracking. This paper proposes an improved method based on the spatial-temporal graph convolution net-work (UGCN) to address the issue of missing human posture skeleton sequences in single-view videos. We present the improved UGCN, which allows the network to process 3D human pose data and improves the 3D human pose skeleton sequence, thereby resolving the occlusion issue.

* Proceedings of IEEE AICON2024

Via

Access Paper or Ask Questions

Multi-Domain Evolutionary Optimization of Network Structures

Jun 21, 2024

Jie Zhao, Kang Hao Cheong, Yaochu Jin

Figure 1 for Multi-Domain Evolutionary Optimization of Network Structures

Figure 2 for Multi-Domain Evolutionary Optimization of Network Structures

Figure 3 for Multi-Domain Evolutionary Optimization of Network Structures

Figure 4 for Multi-Domain Evolutionary Optimization of Network Structures

Abstract:Multi-Task Evolutionary Optimization (MTEO), an important field focusing on addressing complex problems through optimizing multiple tasks simultaneously, has attracted much attention. While MTEO has been primarily focusing on task similarity, there remains a hugely untapped potential in harnessing the shared characteristics between different domains to enhance evolutionary optimization. For example, real-world complex systems usually share the same characteristics, such as the power-law rule, small-world property, and community structure, thus making it possible to transfer solutions optimized in one system to another to facilitate the optimization. Drawing inspiration from this observation of shared characteristics within complex systems, we set out to extend MTEO to a novel framework - multi-domain evolutionary optimization (MDEO). To examine the performance of the proposed MDEO, we utilize a challenging combinatorial problem of great security concern - community deception in complex networks as the optimization task. To achieve MDEO, we propose a community-based measurement of graph similarity to manage the knowledge transfer among domains. Furthermore, we develop a graph representation-based network alignment model that serves as the conduit for effectively transferring solutions between different domains. Moreover, we devise a self-adaptive mechanism to determine the number of transferred solutions from different domains and introduce a novel mutation operator based on the learned mapping to facilitate the utilization of knowledge from other domains. Experiments on eight real-world networks of different domains demonstrate MDEO superiority in efficacy compared to classical evolutionary optimization. Simulations of attacks on the community validate the effectiveness of the proposed MDEO in safeguarding community security.

Via

Access Paper or Ask Questions

Enhancing Criminal Case Matching through Diverse Legal Factors

Jun 17, 2024

Jie Zhao, Ziyu Guan, Wei Zhao, Yue Jiang

Figure 1 for Enhancing Criminal Case Matching through Diverse Legal Factors

Figure 2 for Enhancing Criminal Case Matching through Diverse Legal Factors

Figure 3 for Enhancing Criminal Case Matching through Diverse Legal Factors

Figure 4 for Enhancing Criminal Case Matching through Diverse Legal Factors

Abstract:Criminal case matching endeavors to determine the relevance between different criminal cases. Conventional methods predict the relevance solely based on instance-level semantic features and neglect the diverse legal factors (LFs), which are associated with diverse court judgments. Consequently, comprehensively representing a criminal case remains a challenge for these approaches. Moreover, extracting and utilizing these LFs for criminal case matching face two challenges: (1) the manual annotations of LFs rely heavily on specialized legal knowledge; (2) overlaps among LFs may potentially harm the model's performance. In this paper, we propose a two-stage framework named Diverse Legal Factor-enhanced Criminal Case Matching (DLF-CCM). Firstly, DLF-CCM employs a multi-task learning framework to pre-train an LF extraction network on a large-scale legal judgment prediction dataset. In stage two, DLF-CCM introduces an LF de-redundancy module to learn shared LF and exclusive LFs. Moreover, an entropy-weighted fusion strategy is introduced to dynamically fuse the multiple relevance generated by all LFs. Experimental results validate the effectiveness of DLF-CCM and show its significant improvements over competitive baselines. Code: https://github.com/jiezhao6/DLF-CCM.

Via

Access Paper or Ask Questions

SC2: Towards Enhancing Content Preservation and Style Consistency in Long Text Style Transfer

Jun 07, 2024

Jie Zhao, Ziyu Guan, Cai Xu, Wei Zhao, Yue Jiang

Abstract:Text style transfer (TST) aims to vary the style polarity of text while preserving the semantic content. Although recent advancements have demonstrated remarkable progress in short TST, it remains a relatively straightforward task with limited practical applications. The more comprehensive long TST task presents two challenges: (1) existing methods encounter difficulties in accurately evaluating content attributes in multiple words, leading to content degradation; (2) the conventional vanilla style classifier loss encounters obstacles in maintaining consistent style across multiple generated sentences. In this paper, we propose a novel method SC2, where a multilayer Joint Style-Content Weighed (JSCW) module and a Style Consistency loss are designed to address the two issues. The JSCW simultaneously assesses the amounts of style and content attributes within a token, aiming to acquire a lossless content representation and thereby enhancing content preservation. The multiple JSCW layers further progressively refine content representations. We design a style consistency loss to ensure the generated multiple sentences consistently reflect the target style polarity. Moreover, we incorporate a denoising non-autoregressive decoder to accelerate the training. We conduct plentiful experiments and the results show significant improvements of SC2 over competitive baselines. Our code: https://github.com/jiezhao6/SC2.

Via

Access Paper or Ask Questions

UrbanSARFloods: Sentinel-1 SLC-Based Benchmark Dataset for Urban and Open-Area Flood Mapping

Jun 06, 2024

Jie Zhao, Zhitong Xiong, Xiao Xiang Zhu

Figure 1 for UrbanSARFloods: Sentinel-1 SLC-Based Benchmark Dataset for Urban and Open-Area Flood Mapping

Figure 2 for UrbanSARFloods: Sentinel-1 SLC-Based Benchmark Dataset for Urban and Open-Area Flood Mapping

Figure 3 for UrbanSARFloods: Sentinel-1 SLC-Based Benchmark Dataset for Urban and Open-Area Flood Mapping

Figure 4 for UrbanSARFloods: Sentinel-1 SLC-Based Benchmark Dataset for Urban and Open-Area Flood Mapping

Abstract:Due to its cloud-penetrating capability and independence from solar illumination, satellite Synthetic Aperture Radar (SAR) is the preferred data source for large-scale flood mapping, providing global coverage and including various land cover classes. However, most studies on large-scale SAR-derived flood mapping using deep learning algorithms have primarily focused on flooded open areas, utilizing available open-access datasets (e.g., Sen1Floods11) and with limited attention to urban floods. To address this gap, we introduce \textbf{UrbanSARFloods}, a floodwater dataset featuring pre-processed Sentinel-1 intensity data and interferometric coherence imagery acquired before and during flood events. It contains 8,879 $512\times 512$ chips covering 807,500 $km^2$ across 20 land cover classes and 5 continents, spanning 18 flood events. We used UrbanSARFloods to benchmark existing state-of-the-art convolutional neural networks (CNNs) for segmenting open and urban flood areas. Our findings indicate that prevalent approaches, including the Weighted Cross-Entropy (WCE) loss and the application of transfer learning with pretrained models, fall short in overcoming the obstacles posed by imbalanced data and the constraints of a small training dataset. Urban flood detection remains challenging. Future research should explore strategies for addressing imbalanced data challenges and investigate transfer learning's potential for SAR-based large-scale flood mapping. Besides, expanding this dataset to include additional flood events holds promise for enhancing its utility and contributing to advancements in flood mapping techniques.

* Accepted by CVPR 2024 EarthVision Workshop

Via

Access Paper or Ask Questions

Adaptive Learning for Multi-view Stereo Reconstruction

Apr 08, 2024

Qinglu Min, Jie Zhao, Zhihao Zhang, Chen Min

Abstract:Deep learning has recently demonstrated its excellent performance on the task of multi-view stereo (MVS). However, loss functions applied for deep MVS are rarely studied. In this paper, we first analyze existing loss functions' properties for deep depth based MVS approaches. Regression based loss leads to inaccurate continuous results by computing mathematical expectation, while classification based loss outputs discretized depth values. To this end, we then propose a novel loss function, named adaptive Wasserstein loss, which is able to narrow down the difference between the true and predicted probability distributions of depth. Besides, a simple but effective offset module is introduced to better achieve sub-pixel prediction accuracy. Extensive experiments on different benchmarks, including DTU, Tanks and Temples and BlendedMVS, show that the proposed method with the adaptive Wasserstein loss and the offset module achieves state-of-the-art performance.

Via

Access Paper or Ask Questions

Chain of Compression: A Systematic Approach to Combinationally Compress Convolutional Neural Networks

Mar 26, 2024

Yingtao Shen, Minqing Sun, Jie Zhao, An Zou

Abstract:Convolutional neural networks (CNNs) have achieved significant popularity, but their computational and memory intensity poses challenges for resource-constrained computing systems, particularly with the prerequisite of real-time performance. To release this burden, model compression has become an important research focus. Many approaches like quantization, pruning, early exit, and knowledge distillation have demonstrated the effect of reducing redundancy in neural networks. Upon closer examination, it becomes apparent that each approach capitalizes on its unique features to compress the neural network, and they can also exhibit complementary behavior when combined. To explore the interactions and reap the benefits from the complementary features, we propose the Chain of Compression, which works on the combinational sequence to apply these common techniques to compress the neural network. Validated on the image-based regression and classification networks across different data sets, our proposed Chain of Compression can significantly compress the computation cost by 100-1000 times with ignorable accuracy loss compared with the baseline model.

* 10 pages, 15 figures

Via

Access Paper or Ask Questions