Social networks influence behaviors, preferences, and relationships and play a crucial role in the dissemination of information and norms within human societies. As large language models (LLMs) increasingly integrate into social and professional environments, understanding their behavior within the context of social networks and interactions becomes essential. Our study analyzes the behaviors of standard network structures and real-world networks to determine whether the dynamics of multiple LLMs align with human social dynamics. We explore various social network principles, including micro-level concepts such as preferential attachment, triadic closure, and homophily, as well as macro-level concepts like community structure and the small-world phenomenon. Our findings suggest that LLMs demonstrate all these principles when they are provided with network structures and asked about their preferences regarding network formation. Furthermore, we investigate LLMs' decision-making based on real-world networks to compare the strengths of these principles. Our results reveal that triadic closure and homophily have a stronger influence than preferential attachment and that LLMs substantially exceed random guessing in the task of network formation predictions. Overall, our study contributes to the development of socially aware LLMs by shedding light on LLMs' network formation behaviors and exploring their impacts on social dynamics and norms.
Human mobility behaviours are closely linked to various important societal problems such as traffic congestion, and epidemic control. However, collecting mobility data can be prohibitively expensive and involves serious privacy issues, posing a pressing need for high-quality generative mobility models. Previous efforts focus on learning the behaviour distribution from training samples, and generate new mobility data by sampling the learned distributions. They cannot effectively capture the coherent intentions that drive mobility behavior, leading to low sample efficiency and semantic-awareness. Inspired by the emergent reasoning ability in LLMs, we propose a radical perspective shift that reformulates mobility generation as a commonsense reasoning problem. In this paper, we design a novel Mobility Generation as Reasoning (MobiGeaR) framework that prompts LLM to recursively generate mobility behaviour. Specifically, we design a context-aware chain-of-thoughts prompting technique to align LLMs with context-aware mobility behaviour by few-shot in-context learning. Besides, MobiGeaR employ a divide-and-coordinate mechanism to exploit the synergistic effect between LLM reasoning and mechanistic gravity model. It leverages the step-by-step LLM reasoning to recursively generate a temporal template of activity intentions, which are then mapped to physical locations with a mechanistic gravity model. Experiments on two real-world datasets show MobiGeaR achieves state-of-the-art performance across all metrics, and substantially reduces the size of training samples at the same time. Besides, MobiGeaR also significantly improves the semantic-awareness of mobility generation by improving the intention accuracy by 62.23% and the generated mobility data is proven effective in boosting the performance of downstream applications. The implementation of our approach is available in the paper.
This study is based on the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset and aims to explore early detection and disease progression in Alzheimer's disease (AD). We employ innovative data preprocessing strategies, including the use of the random forest algorithm to fill missing data and the handling of outliers and invalid data, thereby fully mining and utilizing these limited data resources. Through Spearman correlation coefficient analysis, we identify some features strongly correlated with AD diagnosis. We build and test three machine learning models using these features: random forest, XGBoost, and support vector machine (SVM). Among them, the XGBoost model performs the best in terms of diagnostic performance, achieving an accuracy of 91%. Overall, this study successfully overcomes the challenge of missing data and provides valuable insights into early detection of Alzheimer's disease, demonstrating its unique research value and practical significance.
Large language models (LLMs) have recently been extended to the vision-language realm, obtaining impressive general multi-modal capabilities. However, the exploration of multi-modal large language models (MLLMs) for remote sensing (RS) data is still in its infancy, and the performance is not satisfactory. In this work, we introduce SkyEyeGPT, a unified multi-modal large language model specifically designed for RS vision-language understanding. To this end, we meticulously curate an RS multi-modal instruction tuning dataset, including single-task and multi-task conversation instructions. After manual verification, we obtain a high-quality RS instruction-following dataset with 968k samples. Our research demonstrates that with a simple yet effective design, SkyEyeGPT works surprisingly well on considerably different tasks without the need for extra encoding modules. Specifically, after projecting RS visual features to the language domain via an alignment layer, they are fed jointly with task-specific instructions into an LLM-based RS decoder to predict answers for RS open-ended tasks. In addition, we design a two-stage tuning method to enhance instruction-following and multi-turn dialogue ability at different granularities. Experiments on 8 datasets for RS vision-language tasks demonstrate SkyEyeGPT's superiority in image-level and region-level tasks, such as captioning and visual grounding. In particular, SkyEyeGPT exhibits encouraging results compared to GPT-4V in some qualitative tests. The online demo, code, and dataset will be released in https://github.com/ZhanYang-nwpu/SkyEyeGPT.
Semantic segmentation's performance is often compromised when applied to unlabeled adverse weather conditions. Unsupervised domain adaptation is a potential approach to enhancing the model's adaptability and robustness to adverse weather. However, existing methods encounter difficulties when sequentially adapting the model to multiple unlabeled adverse weather conditions. They struggle to acquire new knowledge while also retaining previously learned knowledge.To address these problems, we propose a semantic segmentation method for multiple adverse weather conditions that incorporates adaptive knowledge acquisition, pseudolabel blending, and weather composition replay. Our adaptive knowledge acquisition enables the model to avoid learning from extreme images that could potentially cause the model to forget. In our approach of blending pseudo-labels, we not only utilize the current model but also integrate the previously learned model into the ongoing learning process. This collaboration between the current teacher and the previous model enhances the robustness of the pseudo-labels for the current target. Our weather composition replay mechanism allows the model to continuously refine its previously learned weather information while simultaneously learning from the new target domain. Our method consistently outperforms the stateof-the-art methods, and obtains the best performance with averaged mIoU (%) of 65.7 and the lowest forgetting (%) of 3.6 against 60.1 and 11.3, on the ACDC datasets for a four-target continual multi-target domain adaptation.
With the emergence of foundation model, this novel paradigm of deep learning has encouraged many powerful achievements in natural language processing and computer vision. There are many advantages of foundation model, such as excellent feature extraction power, mighty generalization ability, great few-shot and zero-shot learning capacity, etc. which are beneficial to vision tasks. As the unique identity of vehicle, different countries and regions have diverse license plate (LP) styles and appearances, and even different types of vehicles have different LPs. However, recent deep learning based license plate detectors are mainly trained on specific datasets, and these limited datasets constrain the effectiveness and robustness of LP detectors. To alleviate the negative impact of limited data, an attempt to exploit the advantages of foundation model is implement in this paper. We customize a vision foundation model, i.e. Segment Anything Model (SAM), for LP detection task and propose the first LP detector based on vision foundation model, named SamLP. Specifically, we design a Low-Rank Adaptation (LoRA) fine-tuning strategy to inject extra parameters into SAM and transfer SAM into LP detection task. And then, we further propose a promptable fine-tuning step to provide SamLP with prompatable segmentation capacity. The experiments show that our proposed SamLP achieves promising detection performance compared to other LP detectors. Meanwhile, the proposed SamLP has great few-shot and zero-shot learning ability, which shows the potential of transferring vision foundation model. The code is available at https://github.com/Dinghaoxuan/SamLP
Existing deep-learning-based methods for nighttime video deraining rely on synthetic data due to the absence of real-world paired data. However, the intricacies of the real world, particularly with the presence of light effects and low-light regions affected by noise, create significant domain gaps, hampering synthetic-trained models in removing rain streaks properly and leading to over-saturation and color shifts. Motivated by this, we introduce NightRain, a novel nighttime video deraining method with adaptive-rain-removal and adaptive-correction. Our adaptive-rain-removal uses unlabeled rain videos to enable our model to derain real-world rain videos, particularly in regions affected by complex light effects. The idea is to allow our model to obtain rain-free regions based on the confidence scores. Once rain-free regions and the corresponding regions from our input are obtained, we can have region-based paired real data. These paired data are used to train our model using a teacher-student framework, allowing the model to iteratively learn from less challenging regions to more challenging regions. Our adaptive-correction aims to rectify errors in our model's predictions, such as over-saturation and color shifts. The idea is to learn from clear night input training videos based on the differences or distance between those input videos and their corresponding predictions. Our model learns from these differences, compelling our model to correct the errors. From extensive experiments, our method demonstrates state-of-the-art performance. It achieves a PSNR of 26.73dB, surpassing existing nighttime video deraining methods by a substantial margin of 13.7%.
Unsupervised object discovery and localization aims to detect or segment objects in an image without any supervision. Recent efforts have demonstrated a notable potential to identify salient foreground objects by utilizing self-supervised transformer features. However, their scopes only build upon patch-level features within an image, neglecting region/image-level and cross-image relationships at a broader scale. Moreover, these methods cannot differentiate various semantics from multiple instances. To address these problems, we introduce Hierarchical mErging framework via contrAstive grouPing (HEAP). Specifically, a novel lightweight head with cross-attention mechanism is designed to adaptively group intra-image patches into semantically coherent regions based on correlation among self-supervised features. Further, to ensure the distinguishability among various regions, we introduce a region-level contrastive clustering loss to pull closer similar regions across images. Also, an image-level contrastive loss is present to push foreground and background representations apart, with which foreground objects and background are accordingly discovered. HEAP facilitates efficient hierarchical image decomposition, which contributes to more accurate object discovery while also enabling differentiation among objects of various classes. Extensive experimental results on semantic segmentation retrieval, unsupervised object discovery, and saliency detection tasks demonstrate that HEAP achieves state-of-the-art performance.
The advances of Large Language Models (LLMs) are expanding their utility in both academic research and practical applications. Recent social science research has explored the use of these "black-box" LLM agents for simulating complex social systems and potentially substituting human subjects in experiments. Our study delves into this emerging domain, investigating the extent to which LLMs exhibit key social interaction principles, such as social learning, social preference, and cooperative behavior, in their interactions with humans and other agents. We develop a novel framework for our study, wherein classical laboratory experiments involving human subjects are adapted to use LLM agents. This approach involves step-by-step reasoning that mirrors human cognitive processes and zero-shot learning to assess the innate preferences of LLMs. Our analysis of LLM agents' behavior includes both the primary effects and an in-depth examination of the underlying mechanisms. Focusing on GPT-4, the state-of-the-art LLM, our analyses suggest that LLM agents appear to exhibit a range of human-like social behaviors such as distributional and reciprocity preferences, responsiveness to group identity cues, engagement in indirect reciprocity, and social learning capabilities. However, our analysis also reveals notable differences: LLMs demonstrate a pronounced fairness preference, weaker positive reciprocity, and a more calculating approach in social learning compared to humans. These insights indicate that while LLMs hold great promise for applications in social science research, such as in laboratory experiments and agent-based modeling, the subtle behavioral differences between LLM agents and humans warrant further investigation. Careful examination and development of protocols in evaluating the social behaviors of LLMs are necessary before directly applying these models to emulate human behavior.