Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Roger Zimmermann

Modeling Spatio-temporal Dynamical Systems with Neural Discrete Learning and Levels-of-Experts

Feb 06, 2024

Kun Wang, Hao Wu, Guibin Zhang, Junfeng Fang, Yuxuan Liang, Yuankai Wu, Roger Zimmermann, Yang Wang

Abstract:In this paper, we address the issue of modeling and estimating changes in the state of the spatio-temporal dynamical systems based on a sequence of observations like video frames. Traditional numerical simulation systems depend largely on the initial settings and correctness of the constructed partial differential equations (PDEs). Despite recent efforts yielding significant success in discovering data-driven PDEs with neural networks, the limitations posed by singular scenarios and the absence of local insights prevent them from performing effectively in a broader real-world context. To this end, this paper propose the universal expert module -- that is, optical flow estimation component, to capture the evolution laws of general physical processes in a data-driven fashion. To enhance local insight, we painstakingly design a finer-grained physical pipeline, since local characteristics may be influenced by various internal contextual information, which may contradict the macroscopic properties of the whole system. Further, we harness currently popular neural discrete learning to unveil the underlying important features in its latent space, this process better injects interpretability, which can help us obtain a powerful prior over these discrete random variables. We conduct extensive experiments and ablations to demonstrate that the proposed framework achieves large performance margins, compared with the existing SOTA baselines.

Via

Access Paper or Ask Questions

Through the Dual-Prism: A Spectral Perspective on Graph Data Augmentation for Graph Classification

Jan 22, 2024

Yutong Xia, Runpeng Yu, Yuxuan Liang, Xavier Bresson, Xinchao Wang, Roger Zimmermann

Figure 1 for Through the Dual-Prism: A Spectral Perspective on Graph Data Augmentation for Graph Classification

Figure 2 for Through the Dual-Prism: A Spectral Perspective on Graph Data Augmentation for Graph Classification

Figure 3 for Through the Dual-Prism: A Spectral Perspective on Graph Data Augmentation for Graph Classification

Figure 4 for Through the Dual-Prism: A Spectral Perspective on Graph Data Augmentation for Graph Classification

Abstract:Graph Neural Networks (GNNs) have become the preferred tool to process graph data, with their efficacy being boosted through graph data augmentation techniques. Despite the evolution of augmentation methods, issues like graph property distortions and restricted structural changes persist. This leads to the question: Is it possible to develop more property-conserving and structure-sensitive augmentation methods? Through a spectral lens, we investigate the interplay between graph properties, their augmentation, and their spectral behavior, and found that keeping the low-frequency eigenvalues unchanged can preserve the critical properties at a large scale when generating augmented graphs. These observations inform our introduction of the Dual-Prism (DP) augmentation method, comprising DP-Noise and DP-Mask, which adeptly retains essential graph properties while diversifying augmented graphs. Extensive experiments validate the efficiency of our approach, providing a new and promising direction for graph data augmentation.

Via

Access Paper or Ask Questions

UniTime: A Language-Empowered Unified Model for Cross-Domain Time Series Forecasting

Oct 28, 2023

Xu Liu, Junfeng Hu, Yuan Li, Shizhe Diao, Yuxuan Liang, Bryan Hooi, Roger Zimmermann

Abstract:Multivariate time series forecasting plays a pivotal role in contemporary web technologies. In contrast to conventional methods that involve creating dedicated models for specific time series application domains, this research advocates for a unified model paradigm that transcends domain boundaries. However, learning an effective cross-domain model presents the following challenges. First, various domains exhibit disparities in data characteristics, e.g., the number of variables, posing hurdles for existing models that impose inflexible constraints on these factors. Second, the model may encounter difficulties in distinguishing data from various domains, leading to suboptimal performance in our assessments. Third, the diverse convergence rates of time series domains can also result in compromised empirical performance. To address these issues, we propose UniTime for effective cross-domain time series learning. Concretely, UniTime can flexibly adapt to data with varying characteristics. It also uses domain instructions and a Language-TS Transformer to offer identification information and align two modalities. In addition, UniTime employs masking to alleviate domain convergence speed imbalance issues. Our extensive experiments demonstrate the effectiveness of UniTime in advancing state-of-the-art forecasting performance and zero-shot transferability.

Via

Access Paper or Ask Questions

Towards Unifying Diffusion Models for Probabilistic Spatio-Temporal Graph Learning

Oct 26, 2023

Junfeng Hu, Xu Liu, Zhencheng Fan, Yuxuan Liang, Roger Zimmermann

Figure 1 for Towards Unifying Diffusion Models for Probabilistic Spatio-Temporal Graph Learning

Figure 2 for Towards Unifying Diffusion Models for Probabilistic Spatio-Temporal Graph Learning

Figure 3 for Towards Unifying Diffusion Models for Probabilistic Spatio-Temporal Graph Learning

Figure 4 for Towards Unifying Diffusion Models for Probabilistic Spatio-Temporal Graph Learning

Abstract:Spatio-temporal graph learning is a fundamental problem in the Web of Things era, which enables a plethora of Web applications such as smart cities, human mobility and climate analysis. Existing approaches tackle different learning tasks independently, tailoring their models to unique task characteristics. These methods, however, fall short of modeling intrinsic uncertainties in the spatio-temporal data. Meanwhile, their specialized designs limit their universality as general spatio-temporal learning solutions. In this paper, we propose to model the learning tasks in a unified perspective, viewing them as predictions based on conditional information with shared spatio-temporal patterns. Based on this proposal, we introduce Unified Spatio-Temporal Diffusion Models (USTD) to address the tasks uniformly within the uncertainty-aware diffusion framework. USTD is holistically designed, comprising a shared spatio-temporal encoder and attention-based denoising networks that are task-specific. The shared encoder, optimized by a pre-training strategy, effectively captures conditional spatio-temporal patterns. The denoising networks, utilizing both cross- and self-attention, integrate conditional dependencies and generate predictions. Opting for forecasting and kriging as downstream tasks, we design Gated Attention (SGA) and Temporal Gated Attention (TGA) for each task, with different emphases on the spatial and temporal dimensions, respectively. By combining the advantages of deterministic encoders and probabilistic diffusion models, USTD achieves state-of-the-art performances compared to deterministic and probabilistic baselines in both tasks, while also providing valuable uncertainty estimates.

Via

Access Paper or Ask Questions

When Urban Region Profiling Meets Large Language Models

Oct 22, 2023

Yibo Yan, Haomin Wen, Siru Zhong, Wei Chen, Haodong Chen, Qingsong Wen, Roger Zimmermann, Yuxuan Liang

Figure 1 for When Urban Region Profiling Meets Large Language Models

Figure 2 for When Urban Region Profiling Meets Large Language Models

Figure 3 for When Urban Region Profiling Meets Large Language Models

Figure 4 for When Urban Region Profiling Meets Large Language Models

Abstract:Urban region profiling from web-sourced data is of utmost importance for urban planning and sustainable development. We are witnessing a rising trend of LLMs for various fields, especially dealing with multi-modal data research such as vision-language learning, where the text modality serves as a supplement information for the image. Since textual modality has never been introduced into modality combinations in urban region profiling, we aim to answer two fundamental questions in this paper: i) Can textual modality enhance urban region profiling? ii) and if so, in what ways and with regard to which aspects? To answer the questions, we leverage the power of Large Language Models (LLMs) and introduce the first-ever LLM-enhanced framework that integrates the knowledge of textual modality into urban imagery profiling, named LLM-enhanced Urban Region Profiling with Contrastive Language-Image Pretraining (UrbanCLIP). Specifically, it first generates a detailed textual description for each satellite image by an open-source Image-to-Text LLM. Then, the model is trained on the image-text pairs, seamlessly unifying natural language supervision for urban visual representation learning, jointly with contrastive loss and language modeling loss. Results on predicting three urban indicators in four major Chinese metropolises demonstrate its superior performance, with an average improvement of 6.1% on R^2 compared to the state-of-the-art methods. Our code and the image-language dataset will be released upon paper notification.

Via

Access Paper or Ask Questions

Domain-wise Invariant Learning for Panoptic Scene Graph Generation

Oct 09, 2023

Li Li, You Qin, Wei Ji, Yuxiao Zhou, Roger Zimmermann

Abstract:Panoptic Scene Graph Generation (PSG) involves the detection of objects and the prediction of their corresponding relationships (predicates). However, the presence of biased predicate annotations poses a significant challenge for PSG models, as it hinders their ability to establish a clear decision boundary among different predicates. This issue substantially impedes the practical utility and real-world applicability of PSG models. To address the intrinsic bias above, we propose a novel framework to infer potentially biased annotations by measuring the predicate prediction risks within each subject-object pair (domain), and adaptively transfer the biased annotations to consistent ones by learning invariant predicate representation embeddings. Experiments show that our method significantly improves the performance of benchmark models, achieving a new state-of-the-art performance, and shows great generalization and effectiveness on PSG dataset.

* arXiv admin note: text overlap with arXiv:2307.15567

Via

Access Paper or Ask Questions

Towards Complex-query Referring Image Segmentation: A Novel Benchmark

Sep 29, 2023

Wei Ji, Li Li, Hao Fei, Xiangyan Liu, Xun Yang, Juncheng Li, Roger Zimmermann

Figure 1 for Towards Complex-query Referring Image Segmentation: A Novel Benchmark

Figure 2 for Towards Complex-query Referring Image Segmentation: A Novel Benchmark

Figure 3 for Towards Complex-query Referring Image Segmentation: A Novel Benchmark

Figure 4 for Towards Complex-query Referring Image Segmentation: A Novel Benchmark

Abstract:Referring Image Understanding (RIS) has been extensively studied over the past decade, leading to the development of advanced algorithms. However, there has been a lack of research investigating how existing algorithms should be benchmarked with complex language queries, which include more informative descriptions of surrounding objects and backgrounds (\eg \textit{"the black car."} vs. \textit{"the black car is parking on the road and beside the bus."}). Given the significant improvement in the semantic understanding capability of large pre-trained models, it is crucial to take a step further in RIS by incorporating complex language that resembles real-world applications. To close this gap, building upon the existing RefCOCO and Visual Genome datasets, we propose a new RIS benchmark with complex queries, namely \textbf{RIS-CQ}. The RIS-CQ dataset is of high quality and large scale, which challenges the existing RIS with enriched, specific and informative queries, and enables a more realistic scenario of RIS research. Besides, we present a nichetargeting method to better task the RIS-CQ, called dual-modality graph alignment model (\textbf{\textsc{DuMoGa}}), which outperforms a series of RIS methods.

Via

Access Paper or Ask Questions

Deciphering Spatio-Temporal Graph Forecasting: A Causal Lens and Treatment

Sep 23, 2023

Yutong Xia, Yuxuan Liang, Haomin Wen, Xu Liu, Kun Wang, Zhengyang Zhou, Roger Zimmermann

Figure 1 for Deciphering Spatio-Temporal Graph Forecasting: A Causal Lens and Treatment

Figure 2 for Deciphering Spatio-Temporal Graph Forecasting: A Causal Lens and Treatment

Figure 3 for Deciphering Spatio-Temporal Graph Forecasting: A Causal Lens and Treatment

Figure 4 for Deciphering Spatio-Temporal Graph Forecasting: A Causal Lens and Treatment

Abstract:Spatio-Temporal Graph (STG) forecasting is a fundamental task in many real-world applications. Spatio-Temporal Graph Neural Networks have emerged as the most popular method for STG forecasting, but they often struggle with temporal out-of-distribution (OoD) issues and dynamic spatial causation. In this paper, we propose a novel framework called CaST to tackle these two challenges via causal treatments. Concretely, leveraging a causal lens, we first build a structural causal model to decipher the data generation process of STGs. To handle the temporal OoD issue, we employ the back-door adjustment by a novel disentanglement block to separate invariant parts and temporal environments from input data. Moreover, we utilize the front-door adjustment and adopt the Hodge-Laplacian operator for edge-level convolution to model the ripple effect of causation. Experiments results on three real-world datasets demonstrate the effectiveness and practicality of CaST, which consistently outperforms existing methods with good interpretability.

* To appear at NeurIPS 2023

Via

Access Paper or Ask Questions

Video Infringement Detection via Feature Disentanglement and Mutual Information Maximization

Sep 13, 2023

Zhenguang Liu, Xinyang Yu, Ruili Wang, Shuai Ye, Zhe Ma, Jianfeng Dong, Sifeng He, Feng Qian, Xiaobo Zhang, Roger Zimmermann(+1 more)

Abstract:The self-media era provides us tremendous high quality videos. Unfortunately, frequent video copyright infringements are now seriously damaging the interests and enthusiasm of video creators. Identifying infringing videos is therefore a compelling task. Current state-of-the-art methods tend to simply feed high-dimensional mixed video features into deep neural networks and count on the networks to extract useful representations. Despite its simplicity, this paradigm heavily relies on the original entangled features and lacks constraints guaranteeing that useful task-relevant semantics are extracted from the features. In this paper, we seek to tackle the above challenges from two aspects: (1) We propose to disentangle an original high-dimensional feature into multiple sub-features, explicitly disentangling the feature into exclusive lower-dimensional components. We expect the sub-features to encode non-overlapping semantics of the original feature and remove redundant information. (2) On top of the disentangled sub-features, we further learn an auxiliary feature to enhance the sub-features. We theoretically analyzed the mutual information between the label and the disentangled features, arriving at a loss that maximizes the extraction of task-relevant information from the original feature. Extensive experiments on two large-scale benchmark datasets (i.e., SVD and VCSL) demonstrate that our method achieves 90.1% TOP-100 mAP on the large-scale SVD dataset and also sets the new state-of-the-art on the VCSL benchmark dataset. Our code and model have been released at https://github.com/yyyooooo/DMI/, hoping to contribute to the community.

* This paper is accepted by ACM MM 2023

Via

Access Paper or Ask Questions

A Survey on Service Route and Time Prediction in Instant Delivery: Taxonomy, Progress, and Prospects

Sep 03, 2023

Haomin Wen, Youfang Lin, Lixia Wu, Xiaowei Mao, Tianyue Cai, Yunfeng Hou, Shengnan Guo, Yuxuan Liang, Guangyin Jin, Yiji Zhao(+3 more)

Figure 1 for A Survey on Service Route and Time Prediction in Instant Delivery: Taxonomy, Progress, and Prospects

Figure 2 for A Survey on Service Route and Time Prediction in Instant Delivery: Taxonomy, Progress, and Prospects

Figure 3 for A Survey on Service Route and Time Prediction in Instant Delivery: Taxonomy, Progress, and Prospects

Figure 4 for A Survey on Service Route and Time Prediction in Instant Delivery: Taxonomy, Progress, and Prospects

Abstract:Instant delivery services, such as food delivery and package delivery, have achieved explosive growth in recent years by providing customers with daily-life convenience. An emerging research area within these services is service Route\&Time Prediction (RTP), which aims to estimate the future service route as well as the arrival time of a given worker. As one of the most crucial tasks in those service platforms, RTP stands central to enhancing user satisfaction and trimming operational expenditures on these platforms. Despite a plethora of algorithms developed to date, there is no systematic, comprehensive survey to guide researchers in this domain. To fill this gap, our work presents the first comprehensive survey that methodically categorizes recent advances in service route and time prediction. We start by defining the RTP challenge and then delve into the metrics that are often employed. Following that, we scrutinize the existing RTP methodologies, presenting a novel taxonomy of them. We categorize these methods based on three criteria: (i) type of task, subdivided into only-route prediction, only-time prediction, and joint route\&time prediction; (ii) model architecture, which encompasses sequence-based and graph-based models; and (iii) learning paradigm, including Supervised Learning (SL) and Deep Reinforcement Learning (DRL). Conclusively, we highlight the limitations of current research and suggest prospective avenues. We believe that the taxonomy, progress, and prospects introduced in this paper can significantly promote the development of this field.

Via

Access Paper or Ask Questions