Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

"Information": models, code, and papers

Towards Unifying Diffusion Models for Probabilistic Spatio-Temporal Graph Learning

Oct 26, 2023
Junfeng Hu, Xu Liu, Zhencheng Fan, Yuxuan Liang, Roger Zimmermann

Figure 1 for Towards Unifying Diffusion Models for Probabilistic Spatio-Temporal Graph Learning

Figure 2 for Towards Unifying Diffusion Models for Probabilistic Spatio-Temporal Graph Learning

Figure 3 for Towards Unifying Diffusion Models for Probabilistic Spatio-Temporal Graph Learning

Figure 4 for Towards Unifying Diffusion Models for Probabilistic Spatio-Temporal Graph Learning

Spatio-temporal graph learning is a fundamental problem in the Web of Things era, which enables a plethora of Web applications such as smart cities, human mobility and climate analysis. Existing approaches tackle different learning tasks independently, tailoring their models to unique task characteristics. These methods, however, fall short of modeling intrinsic uncertainties in the spatio-temporal data. Meanwhile, their specialized designs limit their universality as general spatio-temporal learning solutions. In this paper, we propose to model the learning tasks in a unified perspective, viewing them as predictions based on conditional information with shared spatio-temporal patterns. Based on this proposal, we introduce Unified Spatio-Temporal Diffusion Models (USTD) to address the tasks uniformly within the uncertainty-aware diffusion framework. USTD is holistically designed, comprising a shared spatio-temporal encoder and attention-based denoising networks that are task-specific. The shared encoder, optimized by a pre-training strategy, effectively captures conditional spatio-temporal patterns. The denoising networks, utilizing both cross- and self-attention, integrate conditional dependencies and generate predictions. Opting for forecasting and kriging as downstream tasks, we design Gated Attention (SGA) and Temporal Gated Attention (TGA) for each task, with different emphases on the spatial and temporal dimensions, respectively. By combining the advantages of deterministic encoders and probabilistic diffusion models, USTD achieves state-of-the-art performances compared to deterministic and probabilistic baselines in both tasks, while also providing valuable uncertainty estimates.

Via

Access Paper or Ask Questions

Minimally Informed Linear Discriminant Analysis: training an LDA model with unlabelled data

Oct 17, 2023
Nicolas Heintz, Tom Francart, Alexander Bertrand

Figure 1 for Minimally Informed Linear Discriminant Analysis: training an LDA model with unlabelled data

Figure 2 for Minimally Informed Linear Discriminant Analysis: training an LDA model with unlabelled data

Figure 3 for Minimally Informed Linear Discriminant Analysis: training an LDA model with unlabelled data

Figure 4 for Minimally Informed Linear Discriminant Analysis: training an LDA model with unlabelled data

Linear Discriminant Analysis (LDA) is one of the oldest and most popular linear methods for supervised classification problems. In this paper, we demonstrate that it is possible to compute the exact projection vector from LDA models based on unlabelled data, if some minimal prior information is available. More precisely, we show that only one of the following three pieces of information is actually sufficient to compute the LDA projection vector if only unlabelled data are available: (1) the class average of one of the two classes, (2) the difference between both class averages (up to a scaling), or (3) the class covariance matrices (up to a scaling). These theoretical results are validated in numerical experiments, demonstrating that this minimally informed Linear Discriminant Analysis (MILDA) model closely matches the performance of a supervised LDA model. Furthermore, we show that the MILDA projection vector can be computed in a closed form with a computational cost comparable to LDA and is able to quickly adapt to non-stationary data, making it well-suited to use as an adaptive classifier.

* 13 pages, 7 figures

Via

Access Paper or Ask Questions

Channel Autocorrelation Estimation for IRS-Aided Wireless Communications Based on Power Measurements

Oct 17, 2023
Ge Yan, Lipeng Zhu, Rui Zhang

Intelligent reflecting surface (IRS) can bring significant performance enhancement for wireless communication systems by reconfiguring wireless channels via passive signal reflection. However, such performance improvement generally relies on the knowledge of channel state information (CSI) for IRS-associated links. Prior IRS channel estimation strategies mainly estimate IRS-cascaded channels based on the excessive pilot signals received at the users/base station (BS) with time-varying IRS reflections, which, however, are not compatible with the existing channel training/estimation protocol for cellular networks. To address this issue, we propose in this paper a new channel estimation scheme for IRS-assisted communication systems based on the received signal power measured at the user, which is practically attainable without the need of changing the current protocol. Specifically, due to the lack of signal phase information in power measurements, the autocorrelation matrix of the BS-IRS-user cascaded channel is estimated by solving equivalent matrix-rank-minimization problems. Simulation results are provided to verify the effectiveness of the proposed channel estimation algorithm as well as the IRS passive reflection design based on the estimated channel autocorrelation matrix.

* 6 pages, 4 figures, accepted for Globecom 2023 workshop

Via

Access Paper or Ask Questions

DialogueLLM: Context and Emotion Knowledge-Tuned LLaMA Models for Emotion Recognition in Conversations

Oct 17, 2023
Yazhou Zhang, Mengyao Wang, Prayag Tiwari, Qiuchi Li, Benyou Wang, Jing Qin

Figure 1 for DialogueLLM: Context and Emotion Knowledge-Tuned LLaMA Models for Emotion Recognition in Conversations

Figure 2 for DialogueLLM: Context and Emotion Knowledge-Tuned LLaMA Models for Emotion Recognition in Conversations

Figure 3 for DialogueLLM: Context and Emotion Knowledge-Tuned LLaMA Models for Emotion Recognition in Conversations

Figure 4 for DialogueLLM: Context and Emotion Knowledge-Tuned LLaMA Models for Emotion Recognition in Conversations

Large language models (LLMs) and their variants have shown extraordinary efficacy across numerous downstream natural language processing (NLP) tasks, which has presented a new vision for the development of NLP. Despite their remarkable performance in natural language generating (NLG), LLMs lack a distinct focus on the emotion understanding domain. As a result, using LLMs for emotion recognition may lead to suboptimal and inadequate precision. Another limitation of LLMs is that they are typical trained without leveraging multi-modal information. To overcome these limitations, we propose DialogueLLM, a context and emotion knowledge tuned LLM that is obtained by fine-tuning LLaMA models with 13,638 multi-modal (i.e., texts and videos) emotional dialogues. The visual information is considered as the supplementary knowledge to construct high-quality instructions. We offer a comprehensive evaluation of our proposed model on three benchmarking emotion recognition in conversations (ERC) datasets and compare the results against the SOTA baselines and other SOTA LLMs. Additionally, DialogueLLM-7B can be easily trained using LoRA on a 40GB A100 GPU in 5 hours, facilitating reproducibility for other researchers.

Via

Access Paper or Ask Questions

Does Graph Distillation See Like Vision Dataset Counterpart?

Oct 13, 2023
Beining Yang, Kai Wang, Qingyun Sun, Cheng Ji, Xingcheng Fu, Hao Tang, Yang You, Jianxin Li

Figure 1 for Does Graph Distillation See Like Vision Dataset Counterpart?

Figure 2 for Does Graph Distillation See Like Vision Dataset Counterpart?

Figure 3 for Does Graph Distillation See Like Vision Dataset Counterpart?

Figure 4 for Does Graph Distillation See Like Vision Dataset Counterpart?

Training on large-scale graphs has achieved remarkable results in graph representation learning, but its cost and storage have attracted increasing concerns. Existing graph condensation methods primarily focus on optimizing the feature matrices of condensed graphs while overlooking the impact of the structure information from the original graphs. To investigate the impact of the structure information, we conduct analysis from the spectral domain and empirically identify substantial Laplacian Energy Distribution (LED) shifts in previous works. Such shifts lead to poor performance in cross-architecture generalization and specific tasks, including anomaly detection and link prediction. In this paper, we propose a novel Structure-broadcasting Graph Dataset Distillation (SGDD) scheme for broadcasting the original structure information to the generation of the synthetic one, which explicitly prevents overlooking the original structure information. Theoretically, the synthetic graphs by SGDD are expected to have smaller LED shifts than previous works, leading to superior performance in both cross-architecture settings and specific tasks. We validate the proposed SGDD across 9 datasets and achieve state-of-the-art results on all of them: for example, on the YelpChi dataset, our approach maintains 98.6% test accuracy of training on the original graph dataset with 1,000 times saving on the scale of the graph. Moreover, we empirically evaluate there exist 17.6% ~ 31.4% reductions in LED shift crossing 9 datasets. Extensive experiments and analysis verify the effectiveness and necessity of the proposed designs. The code is available in the GitHub repository: https://github.com/RingBDStack/SGDD.

* Accepted by NeurIPS 2023

Via

Access Paper or Ask Questions

Prompt-based Grouping Transformer for Nucleus Detection and Classification

Oct 22, 2023
Junjia Huang, Haofeng Li, Weijun Sun, Xiang Wan, Guanbin Li

Automatic nuclei detection and classification can produce effective information for disease diagnosis. Most existing methods classify nuclei independently or do not make full use of the semantic similarity between nuclei and their grouping features. In this paper, we propose a novel end-to-end nuclei detection and classification framework based on a grouping transformer-based classifier. The nuclei classifier learns and updates the representations of nuclei groups and categories via hierarchically grouping the nucleus embeddings. Then the cell types are predicted with the pairwise correlations between categorical embeddings and nucleus features. For the efficiency of the fully transformer-based framework, we take the nucleus group embeddings as the input prompts of backbone, which helps harvest grouping guided features by tuning only the prompts instead of the whole backbone. Experimental results show that the proposed method significantly outperforms the existing models on three datasets.

* MICCAI 2023, released code: https://github.com/lhaof/PGT

Via

Access Paper or Ask Questions

MaRU: A Manga Retrieval and Understanding System Connecting Vision and Language

Oct 22, 2023
Conghao Tom Shen, Violet Yao, Yixin Liu

Manga, a widely celebrated Japanese comic art form, is renowned for its diverse narratives and distinct artistic styles. However, the inherently visual and intricate structure of Manga, which comprises images housing multiple panels, poses significant challenges for content retrieval. To address this, we present MaRU (Manga Retrieval and Understanding), a multi-staged system that connects vision and language to facilitate efficient search of both dialogues and scenes within Manga frames. The architecture of MaRU integrates an object detection model for identifying text and frame bounding boxes, a Vision Encoder-Decoder model for text recognition, a text encoder for embedding text, and a vision-text encoder that merges textual and visual information into a unified embedding space for scene retrieval. Rigorous evaluations reveal that MaRU excels in end-to-end dialogue retrieval and exhibits promising results for scene retrieval.

Via

Access Paper or Ask Questions

ClusVPR: Efficient Visual Place Recognition with Clustering-based Weighted Transformer

Oct 12, 2023
Yifan Xu, Pourya Shamsolmoali, Jie Yang

Figure 1 for ClusVPR: Efficient Visual Place Recognition with Clustering-based Weighted Transformer

Figure 2 for ClusVPR: Efficient Visual Place Recognition with Clustering-based Weighted Transformer

Figure 3 for ClusVPR: Efficient Visual Place Recognition with Clustering-based Weighted Transformer

Figure 4 for ClusVPR: Efficient Visual Place Recognition with Clustering-based Weighted Transformer

Visual place recognition (VPR) is a highly challenging task that has a wide range of applications, including robot navigation and self-driving vehicles. VPR is particularly difficult due to the presence of duplicate regions and the lack of attention to small objects in complex scenes, resulting in recognition deviations. In this paper, we present ClusVPR, a novel approach that tackles the specific issues of redundant information in duplicate regions and representations of small objects. Different from existing methods that rely on Convolutional Neural Networks (CNNs) for feature map generation, ClusVPR introduces a unique paradigm called Clustering-based Weighted Transformer Network (CWTNet). CWTNet leverages the power of clustering-based weighted feature maps and integrates global dependencies to effectively address visual deviations encountered in large-scale VPR problems. We also introduce the optimized-VLAD (OptLAD) layer that significantly reduces the number of parameters and enhances model efficiency. This layer is specifically designed to aggregate the information obtained from scale-wise image patches. Additionally, our pyramid self-supervised strategy focuses on extracting representative and diverse information from scale-wise image patches instead of entire images, which is crucial for capturing representative and diverse information in VPR. Extensive experiments on four VPR datasets show our model's superior performance compared to existing models while being less complex.

Via

Access Paper or Ask Questions

Generative Intrinsic Optimization: Intrisic Control with Model Learning

Oct 12, 2023
Jianfei Ma

Future sequence represents the outcome after executing the action into the environment. When driven by the information-theoretic concept of mutual information, it seeks maximally informative consequences. Explicit outcomes may vary across state, return, or trajectory serving different purposes such as credit assignment or imitation learning. However, the inherent nature of incorporating intrinsic motivation with reward maximization is often neglected. In this work, we propose a variational approach to jointly learn the necessary quantity for estimating the mutual information and the dynamics model, providing a general framework for incorporating different forms of outcomes of interest. Integrated into a policy iteration scheme, our approach guarantees convergence to the optimal policy. While we mainly focus on theoretical analysis, our approach opens the possibilities of leveraging intrinsic control with model learning to enhance sample efficiency and incorporate uncertainty of the environment into decision-making.

Via

Access Paper or Ask Questions

Cell-free Massive MIMO and SWIPT: Access Point Operation Mode Selection and Power Control

Oct 12, 2023
Mohammadali Mohammadi, Le-Nam Tran, Zahra Mobini, Hien Quoc Ngo, Michail Matthaiou

Figure 1 for Cell-free Massive MIMO and SWIPT: Access Point Operation Mode Selection and Power Control

Figure 2 for Cell-free Massive MIMO and SWIPT: Access Point Operation Mode Selection and Power Control

Figure 3 for Cell-free Massive MIMO and SWIPT: Access Point Operation Mode Selection and Power Control

This paper studies cell-free massive multiple-input multiple-output (CF-mMIMO) systems incorporating simultaneous wireless information and power transfer (SWIPT) for separate information users (IUs) and energy users (EUs) in Internet of Things (IoT) networks. To optimize both the spectral efficiency (SE) of IUs and harvested energy (HE) of EUs, we propose a joint access point (AP) operation mode selection and power control design, wherein certain APs are designated for energy transmission to EUs, while others are dedicated to information transmission to IUs. We investigate the problem of maximizing the total HE for EUs, considering constraints on SE for individual IUs and minimum HE for individual EUs. Our numerical results showcase that the proposed AP operation mode selection algorithm can provide up to $76\%$ and $130\%$ performance gains over random AP operation mode selection with and without power control, respectively.

* 6 pages, 2 figures, to be presented at GLOBECOM 2023, Kuala Lumpur

Via

Access Paper or Ask Questions